最新消息: 新版网站上线了!!!

python爬虫selenium和phantomJs使用方法解析

1.selenum£ºÈý·½¿â¡£¿ÉÒÔʵÏÖÈÃä¯ÀÀÆ÷Íê³É×Ô¶¯»¯µÄ²Ù×÷¡£

2.»·¾³´î½¨

2.1 °²×°£º

pip install selenium

2.2 »ñÈ¡ä¯ÀÀÆ÷µÄÇý¶¯³ÌÐò

ÏÂÔصØÖ·£º

http://chromedriver.storage.googleapis.com/index.html

http://npm.taobao.org/mirrors/chromedriver/

ä¯ÀÀÆ÷°æ±¾ºÍÇý¶¯°æ±¾µÄ¶ÔÓ¦¹Øϵ±í£º

chromedriver°æ±¾ Ö§³ÖµÄChrome°æ±¾
v2.46 v71-73
v2.45 v70-72
v2.44 v69-71
v2.43 v69-71
v2.42 v68-70
v2.41 v67-69
v2.40 v66-68
v2.39 v66-68
v2.38 v65-67
v2.37 v64-66
v2.36 v63-65
v2.35 v62-64
v2.34 v61-63
v2.33 v60-62
v2.32 v59-61
v2.31 v58-60
v2.30 v58-60
v2.29 v56-58
v2.28 v55-57
v2.27 v54-56
v2.26 v53-55
v2.25 v53-55
v2.24 v52-54
v2.23 v51-53
v2.22 v49-52
v2.21 v46-50
v2.20 v43-48
v2.19 v43-47
v2.18 v43-46
v2.17 v42-43
v2.13 v42-45
v2.15 v40-43
v2.14 v39-42
v2.13 v38-41
v2.12 v36-40
v2.11 v36-40
v2.10 v33-36
v2.9 v31-34
v2.8 v30-33
v2.7 v30-33
v2.6 v29-32
v2.5 v29-32
v2.4 v29-32

ËùÓÐchromedriver¾ù¿ÉÔÚÏÂÃæÁ´½ÓÖÐÏÂÔص½£º

http://chromedriver.storage.googleapis.com/index.html 

ÏÖÔÚÓÐÒ»µãºÃµÄÊdzöÁË°´ÕÕchrome°æ±¾¶ÔÓ¦µÄdriver£¬Ö±½Ó°´ÕÕä¯ÀÀÆ÷°æ±¾È¥ÕÒ¶ÔÓ¦µÄdriver£¨Ö»¶ÔÓ¦´ó°æ±¾¾ÍÐУ©£¬²»ÓÃÔÙ·ÑÐÄÈ¥¶ÔÓ¦ÁË£¬´ó¼Ò¿ÉÒÔ³¢ÊÔÒ»ÏÂ

ÓÐЩͬѧ˵ϲ»ÁË£¬µ½taobaoÏÂÒ²ÊÇ¿ÉÒԵģº

http://npm.taobao.org/mirrors/chromedriver/

¶¨Î»ÔªËصÄ8ÖÖ·½Ê½£º

#ʹÓÃÏÂÃæµÄ·½·¨£¬²éÕÒÖ¸¶¨µÄÔªËؽøÐвÙ×÷¼´¿É
find_element_by_id ¸ù¾ÝidÕÒ½Úµã
find_elements_by_name ¸ù¾ÝnameÕÒ
find_elements_by_xpath ¸ù¾Ýxpath²éÕÒ
find_elements_by_tag_name ¸ù¾Ý±êÇ©ÃûÕÒ
find_elements_by_class_name ¸ù¾ÝclassÃû×Ö²éÕÒ
# ͨ¹ýid¶¨Î»:
dr.find_element_by_id("kw")

# ͨ¹ýname¶¨Î»:
dr.find_element_by_name("wd")

# ͨ¹ýclass name¶¨Î»:
dr.find_element_by_class_name("s_ipt")

# ͨ¹ýtag name¶¨Î»:
dr.find_element_by_tag_name("input")

# ͨ¹ýxpath¶¨Î»£¬xpath¶¨Î»ÓÐNÖÖд·¨£¬ÕâÀïÁм¸¸ö³£ÓÃд·¨:
dr.find_element_by_xpath("//*[@id='kw']")
dr.find_element_by_xpath("//*[@name='wd']")
dr.find_element_by_xpath("//input[@class='s_ipt']")
dr.find_element_by_xpath("/html/body/form/span/input")
dr.find_element_by_xpath("//span[@class='soutu-btn']/input")
dr.find_element_by_xpath("//form[@id='form']/span/input")
dr.find_element_by_xpath("//input[@id='kw' and @name='wd']")

# ͨ¹ýcss¶¨Î»£¬css¶¨Î»ÓÐNÖÖд·¨£¬ÕâÀïÁм¸¸ö³£ÓÃд·¨:
dr.find_element_by_css_selector("#kw")
dr.find_element_by_css_selector("[name=wd]")
dr.find_element_by_css_selector(".s_ipt")
dr.find_element_by_css_selector("html > body > form > span > input")
dr.find_element_by_css_selector("span.soutu-btn> input#kw")
dr.find_element_by_css_selector("form#form > span > input")

½ÓÏÂÀ´£¬ÎÒÃǵÄÒ³ÃæÉÏÓÐÒ»×éÎı¾Á´½Ó¡£

<a class="mnav" href="http://news.baidu.com" rel="external nofollow" name="tj_trnews">ÐÂÎÅ</a>
<a class="mnav" href="http://www.hao123.com" rel="external nofollow" name="tj_trhao123">hao123</a>
# ͨ¹ýlink text¶¨Î»:
dr.find_element_by_link_text("ÐÂÎÅ")
dr.find_element_by_link_text("hao123")

# ͨ¹ýpartial link text¶¨Î»:
dr.find_element_by_partial_link_text("ÐÂ")
dr.find_element_by_partial_link_text("hao")
dr.find_element_by_partial_link_text("123")

Selenium¿âÏÂwebdriverÄ£¿é³£Ó÷½·¨µÄʹÓÃ

¿ØÖÆä¯ÀÀÆ÷²Ù×÷µÄһЩ·½·¨

·½·¨ ˵Ã÷

  • set_window_size() ÉèÖÃä¯ÀÀÆ÷µÄ´óС
  • back() ¿ØÖÆä¯ÀÀÆ÷ºóÍË
  • forward() ¿ØÖÆä¯ÀÀÆ÷Ç°½ø
  • refresh() ˢе±Ç°Ò³Ãæ
  • clear() Çå³ýÎı¾
  • send_keys (value) Ä£Äâ°´¼üÊäÈë
  • click() µ¥»÷ÔªËØ
  • submit() ÓÃÓÚÌá½»±íµ¥
  • get_attribute(name) »ñÈ¡ÔªËØÊôÐÔÖµ
  • is_displayed() ÉèÖøÃÔªËØÊÇ·ñÓû§¿É¼û
  • size ·µ»ØÔªËصijߴç
  • text »ñÈ¡ÔªËصÄÎı¾

---------------------

Àý×Ó£º

1.chromä¯ÀÀÆ÷×Ô¶¯ËÑË÷

from selenium import webdriver
from time import sleep
# ´´½¨ä¯ÀÀÆ÷¶ÔÏó Çý¶¯Îªä¯ÀÀÆ÷µÄ·¾¶
bro = webdriver.Chrome("./chromedriver.exe")
url = "https://www.baidu.com"7 # ·¢ËÍÇëÇó
bro.get(url)
# ÈðٶÈÖ¸¶¨´ÊÌõµÄËÑË÷
text = bro.find_element_by_id('kw')
# ·¢Ë͹ؼü×Ö
text.send_keys('python')
# µã»÷ËÑË÷°´Å¥
button = bro.find_element_by_id('su')
button.click()
# ¹Ø±Õä¯ÀÀÆ÷
bro.quit()

2.phantomjsÎÞ½çÃæä¯ÀÀÆ÷,Æä×Ô¶¯»¯Á÷³ÌÉÏÊö²Ù×÷¹È¸è×Ô¶¯»¯Á÷³ÌÒ»Ö¡£

from selenium import webdriver
from time import sleep
bro = webdriver.PhantomJS('E:/BaiduNetdiskDownload/ÅÀ³æ¿Î¼þ/5. ¶¯Ì¬Êý¾Ý¼ÓÔØ
ÅÀÈ¡\phantomjs©\2.1.1©\windows/bin/phantomjs.exe')
url = "https://www.baidu.com"
# ·¢ËÍÇëÇó
bro.get(url)
bro.save_screenshot('./1.png')
# ÈðٶÈÖ¸¶¨´ÊÌõµÄËÑË÷
text = bro.find_element_by_id('kw')
# ·¢Ë͹ؼü×Ö
text.send_keys('python')
bro.save_screenshot('./2.png')
# µã»÷ËÑË÷°´Å¥
button = bro.find_element_by_id('su')
button.click()
sleep(3)
bro.save_screenshot('./3.png')
# ¹Ø±Õä¯ÀÀÆ÷
bro.quit()

3.¶¹°êÍøµçÓ°ÅÅÐаñ¹ö¶¯ÌõÊý¾Ý

from selenium import webdriver
from time import sleep3 url = 'https://movie.douban.com/typerank?type_name=%E5%96%9C%E5%89%A7&typ
e=24&interval_id=100:90&action='
bro = webdriver.PhantomJS('E:/BaiduNetdiskDownload/ÅÀ³æ¿Î¼þ/5. ¶¯Ì¬Êý¾Ý¼ÓÔØ
ÅÀÈ¡\phantomjs©\2.1.1©\windows/bin/phantomjs.exe')
bro.get(url)
sleep(1)
bro.save_screenshot("./1.png")
js = 'window.scrollTo(0,document.body.scrollHeight)'
# Ö´ÐÐjs´úÂë ¹ö¶¯Ìõ
bro.execute_script(js)
sleep(1)
bro.save_screenshot('./2.png')
# »ñÈ¡Ò³ÃæÊý¾Ý
page_source = bro.page_source
print(page_source)

ÒÔÉϾÍÊDZ¾ÎĵÄÈ«²¿ÄÚÈÝ£¬Ï£Íû¶Ô´ó¼ÒµÄѧϰÓÐËù°ïÖú£¬Ò²Ï£Íû´ó¼Ò¶à¶àÖ§³Ö½Å±¾Ö®¼Ò¡£

转载请注明:谷谷点程序 » python爬虫selenium和phantomJs使用方法解析