深入浅出 Python 中文版.pdf电子书下载
1.selenum£ºÈý·½¿â¡£¿ÉÒÔʵÏÖÈÃä¯ÀÀÆ÷Íê³É×Ô¶¯»¯µÄ²Ù×÷¡£
2.»·¾³´î½¨
2.1 °²×°£º
pip install selenium
2.2 »ñÈ¡ä¯ÀÀÆ÷µÄÇý¶¯³ÌÐò
ÏÂÔصØÖ·£º
http://chromedriver.storage.googleapis.com/index.html
http://npm.taobao.org/mirrors/chromedriver/
ä¯ÀÀÆ÷°æ±¾ºÍÇý¶¯°æ±¾µÄ¶ÔÓ¦¹Øϵ±í£º
chromedriver°æ±¾ | Ö§³ÖµÄChrome°æ±¾ |
---|---|
v2.46 | v71-73 |
v2.45 | v70-72 |
v2.44 | v69-71 |
v2.43 | v69-71 |
v2.42 | v68-70 |
v2.41 | v67-69 |
v2.40 | v66-68 |
v2.39 | v66-68 |
v2.38 | v65-67 |
v2.37 | v64-66 |
v2.36 | v63-65 |
v2.35 | v62-64 |
v2.34 | v61-63 |
v2.33 | v60-62 |
v2.32 | v59-61 |
v2.31 | v58-60 |
v2.30 | v58-60 |
v2.29 | v56-58 |
v2.28 | v55-57 |
v2.27 | v54-56 |
v2.26 | v53-55 |
v2.25 | v53-55 |
v2.24 | v52-54 |
v2.23 | v51-53 |
v2.22 | v49-52 |
v2.21 | v46-50 |
v2.20 | v43-48 |
v2.19 | v43-47 |
v2.18 | v43-46 |
v2.17 | v42-43 |
v2.13 | v42-45 |
v2.15 | v40-43 |
v2.14 | v39-42 |
v2.13 | v38-41 |
v2.12 | v36-40 |
v2.11 | v36-40 |
v2.10 | v33-36 |
v2.9 | v31-34 |
v2.8 | v30-33 |
v2.7 | v30-33 |
v2.6 | v29-32 |
v2.5 | v29-32 |
v2.4 | v29-32 |
ËùÓÐchromedriver¾ù¿ÉÔÚÏÂÃæÁ´½ÓÖÐÏÂÔص½£º
http://chromedriver.storage.googleapis.com/index.html
ÏÖÔÚÓÐÒ»µãºÃµÄÊdzöÁË°´ÕÕchrome°æ±¾¶ÔÓ¦µÄdriver£¬Ö±½Ó°´ÕÕä¯ÀÀÆ÷°æ±¾È¥ÕÒ¶ÔÓ¦µÄdriver£¨Ö»¶ÔÓ¦´ó°æ±¾¾ÍÐУ©£¬²»ÓÃÔÙ·ÑÐÄÈ¥¶ÔÓ¦ÁË£¬´ó¼Ò¿ÉÒÔ³¢ÊÔÒ»ÏÂ
ÓÐЩͬѧ˵ϲ»ÁË£¬µ½taobaoÏÂÒ²ÊÇ¿ÉÒԵģº
http://npm.taobao.org/mirrors/chromedriver/
¶¨Î»ÔªËصÄ8ÖÖ·½Ê½£º
#ʹÓÃÏÂÃæµÄ·½·¨£¬²éÕÒÖ¸¶¨µÄÔªËؽøÐвÙ×÷¼´¿É find_element_by_id ¸ù¾ÝidÕÒ½Úµã find_elements_by_name ¸ù¾ÝnameÕÒ find_elements_by_xpath ¸ù¾Ýxpath²éÕÒ find_elements_by_tag_name ¸ù¾Ý±êÇ©ÃûÕÒ find_elements_by_class_name ¸ù¾ÝclassÃû×Ö²éÕÒ # ͨ¹ýid¶¨Î»: dr.find_element_by_id("kw") # ͨ¹ýname¶¨Î»: dr.find_element_by_name("wd") # ͨ¹ýclass name¶¨Î»: dr.find_element_by_class_name("s_ipt") # ͨ¹ýtag name¶¨Î»: dr.find_element_by_tag_name("input") # ͨ¹ýxpath¶¨Î»£¬xpath¶¨Î»ÓÐNÖÖд·¨£¬ÕâÀïÁм¸¸ö³£ÓÃд·¨: dr.find_element_by_xpath("//*[@id='kw']") dr.find_element_by_xpath("//*[@name='wd']") dr.find_element_by_xpath("//input[@class='s_ipt']") dr.find_element_by_xpath("/html/body/form/span/input") dr.find_element_by_xpath("//span[@class='soutu-btn']/input") dr.find_element_by_xpath("//form[@id='form']/span/input") dr.find_element_by_xpath("//input[@id='kw' and @name='wd']") # ͨ¹ýcss¶¨Î»£¬css¶¨Î»ÓÐNÖÖд·¨£¬ÕâÀïÁм¸¸ö³£ÓÃд·¨: dr.find_element_by_css_selector("#kw") dr.find_element_by_css_selector("[name=wd]") dr.find_element_by_css_selector(".s_ipt") dr.find_element_by_css_selector("html > body > form > span > input") dr.find_element_by_css_selector("span.soutu-btn> input#kw") dr.find_element_by_css_selector("form#form > span > input")
½ÓÏÂÀ´£¬ÎÒÃǵÄÒ³ÃæÉÏÓÐÒ»×éÎı¾Á´½Ó¡£
<a class="mnav" href="http://news.baidu.com" rel="external nofollow" name="tj_trnews">ÐÂÎÅ</a> <a class="mnav" href="http://www.hao123.com" rel="external nofollow" name="tj_trhao123">hao123</a>
# ͨ¹ýlink text¶¨Î»: dr.find_element_by_link_text("ÐÂÎÅ") dr.find_element_by_link_text("hao123") # ͨ¹ýpartial link text¶¨Î»: dr.find_element_by_partial_link_text("ÐÂ") dr.find_element_by_partial_link_text("hao") dr.find_element_by_partial_link_text("123")
Selenium¿âÏÂwebdriverÄ£¿é³£Ó÷½·¨µÄʹÓÃ
¿ØÖÆä¯ÀÀÆ÷²Ù×÷µÄһЩ·½·¨
·½·¨ ˵Ã÷
- set_window_size() ÉèÖÃä¯ÀÀÆ÷µÄ´óС
- back() ¿ØÖÆä¯ÀÀÆ÷ºóÍË
- forward() ¿ØÖÆä¯ÀÀÆ÷Ç°½ø
- refresh() ˢе±Ç°Ò³Ãæ
- clear() Çå³ýÎı¾
- send_keys (value) Ä£Äâ°´¼üÊäÈë
- click() µ¥»÷ÔªËØ
- submit() ÓÃÓÚÌá½»±íµ¥
- get_attribute(name) »ñÈ¡ÔªËØÊôÐÔÖµ
- is_displayed() ÉèÖøÃÔªËØÊÇ·ñÓû§¿É¼û
- size ·µ»ØÔªËصijߴç
- text »ñÈ¡ÔªËصÄÎı¾
---------------------
Àý×Ó£º
1.chromä¯ÀÀÆ÷×Ô¶¯ËÑË÷
from selenium import webdriver from time import sleep # ´´½¨ä¯ÀÀÆ÷¶ÔÏó Çý¶¯Îªä¯ÀÀÆ÷µÄ·¾¶ bro = webdriver.Chrome("./chromedriver.exe") url = "https://www.baidu.com"7 # ·¢ËÍÇëÇó bro.get(url) # ÈðٶÈÖ¸¶¨´ÊÌõµÄËÑË÷ text = bro.find_element_by_id('kw') # ·¢Ë͹ؼü×Ö text.send_keys('python') # µã»÷ËÑË÷°´Å¥ button = bro.find_element_by_id('su') button.click() # ¹Ø±Õä¯ÀÀÆ÷ bro.quit()
2.phantomjsÎÞ½çÃæä¯ÀÀÆ÷,Æä×Ô¶¯»¯Á÷³ÌÉÏÊö²Ù×÷¹È¸è×Ô¶¯»¯Á÷³ÌÒ»Ö¡£
from selenium import webdriver from time import sleep bro = webdriver.PhantomJS('E:/BaiduNetdiskDownload/ÅÀ³æ¿Î¼þ/5. ¶¯Ì¬Êý¾Ý¼ÓÔØ ÅÀÈ¡\phantomjs©\2.1.1©\windows/bin/phantomjs.exe') url = "https://www.baidu.com" # ·¢ËÍÇëÇó bro.get(url) bro.save_screenshot('./1.png') # ÈðٶÈÖ¸¶¨´ÊÌõµÄËÑË÷ text = bro.find_element_by_id('kw') # ·¢Ë͹ؼü×Ö text.send_keys('python') bro.save_screenshot('./2.png') # µã»÷ËÑË÷°´Å¥ button = bro.find_element_by_id('su') button.click() sleep(3) bro.save_screenshot('./3.png') # ¹Ø±Õä¯ÀÀÆ÷ bro.quit()
3.¶¹°êÍøµçÓ°ÅÅÐаñ¹ö¶¯ÌõÊý¾Ý
from selenium import webdriver from time import sleep3 url = 'https://movie.douban.com/typerank?type_name=%E5%96%9C%E5%89%A7&typ e=24&interval_id=100:90&action=' bro = webdriver.PhantomJS('E:/BaiduNetdiskDownload/ÅÀ³æ¿Î¼þ/5. ¶¯Ì¬Êý¾Ý¼ÓÔØ ÅÀÈ¡\phantomjs©\2.1.1©\windows/bin/phantomjs.exe') bro.get(url) sleep(1) bro.save_screenshot("./1.png") js = 'window.scrollTo(0,document.body.scrollHeight)' # Ö´ÐÐjs´úÂë ¹ö¶¯Ìõ bro.execute_script(js) sleep(1) bro.save_screenshot('./2.png') # »ñÈ¡Ò³ÃæÊý¾Ý page_source = bro.page_source print(page_source)
ÒÔÉϾÍÊDZ¾ÎĵÄÈ«²¿ÄÚÈÝ£¬Ï£Íû¶Ô´ó¼ÒµÄѧϰÓÐËù°ïÖú£¬Ò²Ï£Íû´ó¼Ò¶à¶àÖ§³Ö½Å±¾Ö®¼Ò¡£