Python BeautifulSoup 模块

Python使用Beautiful Soup爬取豆瓣音乐排行榜过程解析

前言要想学好爬虫，必须把基础打扎实，之前发布了两篇文章，分别是使用XPATH和requests爬取网页，今天的文章是学习Beautiful Soup并通过一个例子来实现如何使用Beautiful Soup爬取网页。什么是Beautiful Soup Beautiful

$Python3 爬虫 BeautifulSoup模块（5）：'gbk' codec can't encode character '\xa0' in position 2966: illegal multibyte sequence$

Python3 爬虫 BeautifulSoup模块（5）：'gbk' codec can't encode character '\xa0' in position 2966: illegal multibyte sequence

windows 7 系统下爬虫抓取提示如下错误& 39;gbk& 39; codec can& 39;t encode character & 39; xa0& 39; 对于此Unicode字符(myUnWebItems)，需要print出来的话，由于本地系统是Windows中的cmd，默认codepage是CP936，即GBK的编

Python3 爬虫 BeautifulSoup模块（4）： bs4 Tag类型转换为字符串 insert插入数据错误

Python3 爬虫 BeautifulSoup模块（4）： bs4 Tag类型转换为字符串 insert插入数据错误 cur execute( "insert into p_links(title,href,content) values ( %s , %s , %s ) " % (titleContents,full_url,cont_p))

MySQLdb模块还不支持 Python3.x，所以 Python3.x 如果想连接MySQL需要安装pymysql模块

windows系统中Python3.6 安装MySQLdb模块一直不成功，后来安装了pymysql模块

Python3 BeautifulSoup模块（3）：bs4通过contents[0]获取子节点中不包含span标签，并且a标签本身不包含class属性的a标签

tag的 .contents 属性可以将tag的子节点以列表的方式输出:

Python3 BeautifulSoup模块（2）：获取没有class标签且herf为静态格式的a标签（超链）

soup = BeautifulSoup(cent,"html.parser")，slink = soup.find_all("a",href=re.compile(r"\/php\/(.+?)\/(\d+).html")) 输出a标签（超链）中没有class属性的

python3.x beautifulsoup4模块（1）： windows pip 安装beautifulsoup4

确定你的电脑已经安装了pip：我本地的安装路径：E:\python\Scripts\pip3.6.exe，然后执行e:\python\Scripts>pip install beautifulsoup4