国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<fieldset id="eukyc"><abbr id="eukyc"></abbr></fieldset><noframes id="eukyc"><tfoot id="eukyc"></tfoot></noframes>

<noframes id="eukyc"><pre id="eukyc"></pre></noframes>

【Python爬蟲】selenium的詳細使用方法

2年前作者：J-Py分類：Toy博客閱讀(23)違法舉報

這篇具有很好參考價值的文章主要介紹了【Python爬蟲】selenium的詳細使用方法。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

selenium介紹

selenium是一個用于web應(yīng)用測試的工具，selenium所做的測試會直接運行在瀏覽器中，就像真人進行操作一樣，像是打開瀏覽器，輸入賬號密碼登錄等等。目前selenium支持大部分的瀏覽器，例如：IE，Mozilla Firefox，Safari，Google Chrome，Opera，Edge等等瀏覽器，selenium是一個相當(dāng)成功的開源工具，支持眾多的語言包括C#，Java，php，python等等，在爬蟲領(lǐng)域也有他的一席之地

【Python爬蟲】selenium的詳細使用方法,python,爬蟲,selenium,開發(fā)語言

安裝selenium

win+r輸入cmd，輸入以下代碼即可完成python selenium庫的安裝

pip install selenium

安裝完畢后可以使用pip show selenium來檢驗是否安裝成功

安裝瀏覽器驅(qū)動

不同的瀏覽器需要不同的驅(qū)動去支持，但目前瀏覽器版本的更新速度很快，有時候驅(qū)動的版本趕不上瀏覽器的版本，這時候可以根據(jù)需求選擇降低瀏覽器版本或者在selenium的高版本中有些功能不需要webdriver也能實現(xiàn)（個人經(jīng)歷，有誤還請指出）

Firefox webdriver：Firefox

阿里鏡像：https://npm.taobao.org/mirrors/geckodriver/
Google Chrome：Chrome

阿里鏡像：https://npm.taobao.org/mirrors/chromedriver/
Opera：Opera

阿里鏡像：https://npm.taobao.org/mirrors/operadriver/
PhantomJS：PhantomJS
Edge：Edge
IE：IE

確認(rèn)版本號并下載

本人常用瀏覽器為edge，且谷歌的webdriver更新速度較慢，所以這里就用edge來展示操作內(nèi)容，但是最好推薦使用firefox進行測試，官方推薦的報錯的概率小一些

【Python爬蟲】selenium的詳細使用方法,python,爬蟲,selenium,開發(fā)語言

上圖中的"版本 117.0.2045.60 (正式版本) (64 位)"中的117.0.2045.60就是瀏覽器的版本號，然后去edge的webdriver網(wǎng)站上查找相關(guān)的webdiver驅(qū)動即可

【Python爬蟲】selenium的詳細使用方法,python,爬蟲,selenium,開發(fā)語言

選擇自己的版本進行下載即可，我的是x64版本的，點擊開始下載，下載完畢后將文件進行解壓，獲得一個名為msedgedriver.exe的運行文件，將該文件保存到你想放的位置

【Python爬蟲】selenium的詳細使用方法,python,爬蟲,selenium,開發(fā)語言

配置環(huán)境變量

搜索查看高級系統(tǒng)設(shè)置，或者找到【我的電腦】——【右鍵打開屬性】——【高級系統(tǒng)設(shè)置】——【環(huán)境變量】

【Python爬蟲】selenium的詳細使用方法,python,爬蟲,selenium,開發(fā)語言

檢驗環(huán)境變量

運行以下代碼，檢測是否可以成功運行，其他瀏覽器指令類似

from selenium import webdriver

#edge
driver=webdriver.Edge()
#chrome
driver=webdriver.Chrome()
#firefox
driver=webdriver.Firefox()

定位元素

注意：在說定位前，需要注意的是selenium的新舊版本是有差別的，本文章將對兩種都分別給出代碼

隨便打開一個網(wǎng)頁點擊F12打開開發(fā)者工具，可以看到下面的就是網(wǎng)頁的代碼，我們要做的就是定位我們需要的元素，比如說登錄框這個a標(biāo)簽，但是我們首先要學(xué)會打開這個頁面

【Python爬蟲】selenium的詳細使用方法,python,爬蟲,selenium,開發(fā)語言

打開指定網(wǎng)頁

from selenium import webdriver

#edge
driver=webdriver.Edge()
driver.get('https://www.pixiv.net/')

運行上述代碼后會發(fā)現(xiàn)瀏覽器在出現(xiàn)一瞬間后就會馬上消失，為了方便我們看，可以添加語句防止selenium自動關(guān)閉瀏覽器

from selenium import webdriver
from selenium.webdriver.edge.options import Options

#設(shè)置options
options=Options()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')

另外一種寫法，兩種寫法效果一致，下面一種代碼較為簡潔

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')

輸出網(wǎng)頁代碼

定位的標(biāo)準(zhǔn)就是在這個driver獲取的網(wǎng)頁源碼中找的，否則會報錯

#在driver.get('https://www.pixiv.net/')后加下面一句話
print(driver.page_source)

運行代碼

<html lang="zh" class=" page-cool-index" xmlns:wb="http://open.weibo.com/wb" data-theme="default"><head>

<meta charset="utf-8">
    <meta name="viewport" content="width=1160">


<meta name="format-detection" content="telephone=no">
<meta property="og:site_name" content="pixiv">
<meta property="fb:app_id" content="140810032656374">
<meta property="wb:webmaster" content="4fd391fccdb49500">
                        <meta property="twitter:card" content="summary_large_image">
                                <meta property="twitter:site" content="@pixiv">
                                <meta property="twitter:title" content="插畫交流網(wǎng)站 [pixiv]">
                                <meta property="twitter:description" content="pixiv是提供插畫等作品的投稿、閱覽服務(wù)的「插畫交流網(wǎng)站」。這里有各種各樣不同風(fēng)格的投稿作品，我們還會舉辦官方、用戶企畫的各種比賽。">
                                <meta property="twitter:image" 
......展示一部分

ID定位

我們知道html中的id標(biāo)簽具有唯一性，就像是車牌號或是身份證之類的，比如有如下的標(biāo)簽，可以知道這個form標(biāo)簽的id為nav-searchform

<form id="nav-searchform" class="" style="border-radius:8px 8px 8px 8px;"></form>

現(xiàn)在我們通過id定位到這個form，如果定位到就輸出，下面是演示代碼

較老版本

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.bilibili.com/')
#定位元素
element=driver.find_element_by_id('nav-searchform')
print(element)

較新版本

from selenium import webdriver
from selenium.webdriver.common.by import By

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.bilibili.com/')
#定位元素
element=driver.find_element(By.ID,'nav-searchform')
print(element)

運行結(jié)果

<selenium.webdriver.remote.webelement.WebElement (session="b141db7f98f14935fcf0b02a16ec9697", element="73445B08D791A4DDB8C1767A1DBFBF95_element_38")>

NAME定位

標(biāo)簽中的name參數(shù)可以不是唯一的，所以這了就有兩個函數(shù)等下分別介紹，有如下html代碼片段

<meta name="viewport" content="width=1160">

現(xiàn)在我們通過name對這個標(biāo)簽進行定位，找到了就輸出，下面是演示代碼

獲取單一標(biāo)簽

較老版本

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element_by_name('viewport')
print(element)

較新版本

from selenium import webdriver
from selenium.webdriver.common.by import By

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element(By.NAME,'viewport')
print(element)

運行結(jié)果

<selenium.webdriver.remote.webelement.WebElement (session="25b0d0b64d3509caceeeb4d230439803", element="4788C4EB780A9EB1FC46683E4EE01C58_element_19")>

在上文中我們提到，name不是唯一的，所以說selenium提供了多個定位的函數(shù)，將element改成elements即可，演示代碼將打印獲取到的數(shù)據(jù)是什么樣的

獲取多個標(biāo)簽

較老版本

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
print(driver.find_elements(By.NAME,'viewport'))

較新版本

from selenium import webdriver
from selenium.webdriver.common.by import By

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
print(driver.find_elements(By.NAME,'viewport'))

運行結(jié)果

[<selenium.webdriver.remote.webelement.WebElement (session="261f740729f8b59c23a38ed2f5f4641d", element="E79CC23F1DED57B9C919314A269D19B4_element_19")>]

返回的是一個列表，如果我們要對每一個都進行操作，只要遍歷即可

CLASS定位

在html中，class參數(shù)也不是唯一的，所以和NAME定位一樣也有獲取多個標(biāo)簽的方法，這里不再舉例，只展示單個標(biāo)簽獲取的代碼，例如有如下html片段

<a href="/login.php?ref=wwwtop_accounts_index" class="signup-form__submit--login">登錄</a>

現(xiàn)在我們通過class對這個標(biāo)簽進行定位，找到了就輸出內(nèi)容，下面是演示代碼

較老版本

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element_by_class_name('signup-form__submit--login')
print(element)

較新版本

from selenium import webdriver
from selenium.webdriver.common.by import By

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element(By.CLASS_NAME,'signup-form__submit--login')
print(element)

運行結(jié)果

<selenium.webdriver.remote.webelement.WebElement (session="d082718db12b989c1ddb140d2c6a3993", element="D076E06740ACFCF12941AB330A3E4FCB_element_16")>

TAG定位

在html中，tag就是標(biāo)簽例如input標(biāo)簽，div標(biāo)簽等等，我們也可以通過tag來定位，顯然tag不止有一個，同樣和name和class一樣可以找多個，由于原理相同，這里只展示獲取一個，有如下代碼

<meta charset="utf-8">
<meta name="viewport" content="width=1160">
<meta name="format-detection" content="telephone=no">
<meta property="og:site_name" content="pixiv">

現(xiàn)在我們通過class對這個標(biāo)簽進行定位，找到了就輸出內(nèi)容，下面是演示代碼

較老版本

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element_by_tag_name('meta')
print(element)

較新版本

from selenium import webdriver
from selenium.webdriver.common.by import By

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element(By.TAG_NAME,'meta')
print(element)

運行結(jié)果

<selenium.webdriver.remote.webelement.WebElement (session="98f22eab1bc456c95a8caafea956d8c6", element="A08DF5AC2486B8A4A58AF09E9A149177_element_19")>

XPATH定位

xpath是一種常見的對html和xml文本進行定位的一種手段，下面先簡單介紹一下xpath中常見的語法，由于部分xpath的表述同樣有多個標(biāo)簽符合要求，所以同樣擁有查找多個的方法，這里不再做過多展示

#使用絕對路徑（層級關(guān)系）進行定位
xpath="/html/body/div[2]/div[2]/div[1]/div[1]/div/div/form"
#使用元素屬性進行定位
xpath="http://*[@class='username']"    #注意此處單引號和雙引號的使用
#使用層級+元素屬性進行定位
xpath="http://div[@class='center-search__bar']/form/div/input"
#在xpath中使用邏輯表達進行定位
xpath="http://*[@name='viewport' and @content='width=1160']"

有如下html片段，接下來將使用xpath對其進行定位，找到了就輸出

<input class="nav-search-input" type="text" autocomplete="off" accesskey="s" maxlength="100" x-webkit-speech="" x-webkit-grammar="builtin:translate" value="" placeholder="李佳琦offer3" title="李佳琦offer3">

下面是演示代碼

較老版本

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.bilibili.com/')
#定位元素
element=driver.find_element_by_xpath('//*[@id="nav-searchform"]/div[1]/input')
print(element)

較新版本

from selenium import webdriver
from selenium.webdriver.common.by import By

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.bilibili.com/')
#定位元素
element=driver.find_element(By.XPATH,'//*[@id="nav-searchform"]/div[1]/input')
print(element)

運行結(jié)果

<selenium.webdriver.remote.webelement.WebElement (session="02861773f803549fb01e7c8e57b904e2", element="DB7C3EEAA07DFC4162BAA688C9963C04_element_2")>

CSS定位

css定位十分靈活，一般速度較快，但是需要對css語法有一定的掌握才能更好的使用，以下介紹一些常見的語法，同時和name一樣也有其不唯一性，這里不在贅述

方法	舉例	概述
*	*	選擇所有
#id	#i_cecream	選擇id="i_cecream"的標(biāo)簽
.class	.bili-feed4	選擇class="bili-feed4"的標(biāo)簽
element	form	選擇所有<form>標(biāo)簽
element1>element2	form>input	選擇父標(biāo)簽為<form>之下的所有<input>標(biāo)簽
element1+element2	form+input	選擇在同一級在<form>之后的所有<input>標(biāo)簽
[attribute=value]	type=“text/javascript”	找到所有type="text/javascript"的標(biāo)簽

比如有如下html片段，接下來將使用css對其進行定位，找到了就輸出

<a href="//cm.bilibili.com" data-target-url="**" data-v-7f4a51a0=""></a>

較老版本

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.bilibili.com/')
#定位元素
element=driver.find_element_by_css_selector('[)
print(element)

較新版本

from selenium import webdriver
from selenium.webdriver.common.by import By

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.bilibili.com/')
#定位元素
element=driver.find_element(By.CSS_SELECTOR,'[)
print(element)

運行結(jié)果

<selenium.webdriver.remote.webelement.WebElement (session="1a8502b5fd94e89fb3672d5fb2ed0a0b", element="E808FDEC6AEA4FA3127234D28D4406C2_element_179")>

LINK定位

link定位是通過定位文本內(nèi)容來定位標(biāo)簽，對于動態(tài)的標(biāo)簽并不是十分的好用，link定位也和name定位類似有不唯一性，例如有如下html片段

<a href="/login.php?ref=wwwtop_accounts_index" class="signup-form__submit--login">登錄</a>

下面將通過"登錄"來進行對標(biāo)簽的定位，演示代碼如下

較老版本

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element_by_link_text('登錄')
print(element)

較新版本

from selenium import webdriver
from selenium.webdriver.common.by import By

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element(By.LINK_TEXT,'登錄')
print(element)

運行結(jié)果

<selenium.webdriver.remote.webelement.WebElement (session="8c18f357881cc23da103887ff7edb56a", element="E6F3BF9A611DD85EB20D78401F4609A4_element_19")>

PARTIAL_LINK 定位

對于較長的文本內(nèi)容，如果全部復(fù)制到代碼中會影響到代碼的簡潔性的觀賞性，所以partial_link就是通過部分內(nèi)容來進行定位，同樣的name等定位方法一樣具有不唯一性，例如有如下html片段

<a href="https://policies.google.com/terms"> Terms of Service</a>

下面將通過"登錄"來進行對標(biāo)簽的定位，演示代碼如下

較老版本

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element_by_partial_link_text('Terms')
print(element)

較新版本

from selenium import webdriver
from selenium.webdriver.common.by import By

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
#定位元素
element=driver.find_element(By.PARTIAL_LINK_TEXT,'Terms')
print(element)

運行結(jié)果

<selenium.webdriver.remote.webelement.WebElement (session="21869cd58281999dc690c87f145183a9", element="63373D4A1F4660B9672E1B408575CA74_element_19")>

頁面控制

在實際操作中根據(jù)需要我們要對瀏覽器的頁面進行操作，比如刷新網(wǎng)頁，切換標(biāo)簽頁等等，就像我們手動控制一樣，selenium也能實現(xiàn)這些功能

修改瀏覽器窗口

selenium中常見的三種方式調(diào)整瀏覽器窗口，自定義，最大窗口，最小化窗口

自定義大小

通過使用set_window_size(寬，高)來調(diào)整窗口大小，代碼實現(xiàn)如下

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
driver.set_window_size(1000,1000)	#寫在driver.get()的上面也行

最小化窗口

通過使用minimize_window()方法實現(xiàn)瀏覽器最小化

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
driver.minimize_window()	#寫在driver.get()的上面也行

全屏窗口

通過使用maximize_window()方法實現(xiàn)瀏覽器最小化

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
driver.get('https://www.pixiv.net/')
driver.maximize_window()	#寫在driver.get()的上面也行

頁面的前進與后退

selenium提供了back()和forward()兩個方法實現(xiàn)頁面的前進與后退

from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開pixiv網(wǎng)頁
driver.get('https://www.pixiv.net/')
time.sleep(2.5)
#打開bilibili網(wǎng)頁
driver.get('https://www.bilibili.com/')
time.sleep(2.5)
#后退至pixiv網(wǎng)頁
driver.back()
time.sleep(1.5)
#前進到bilibili網(wǎng)頁
driver.forward()
time.sleep(1.5)

上述方法中使用了兩次，如果想要在新標(biāo)簽頁打開，可以使用js的方法進行操作

#打開pixiv網(wǎng)頁
driver.get('https://www.pixiv.net/')
time.sleep(2.5)
#打開bilibili網(wǎng)頁
driver.get('https://www.bilibili.com/')
time.sleep(2.5)

#替換為如下內(nèi)容

#打開pixiv網(wǎng)頁
driver.get('https://www.pixiv.net/')
time.sleep(2.5)
#在新建標(biāo)簽頁打開bilibili網(wǎng)頁
js="window.open('https://www.bilibili.com/')"
driver.execute_script(js)
time.sleep(2.5)

頁面刷新

有一些特殊情況下我們可能需要刷新頁面才能加載出來，要通過刷新來獲取最新數(shù)據(jù)，這就要用到refresh()方法了

from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開pixiv網(wǎng)頁
driver.get('https://www.pixiv.net/')
driver.refresh()

切換窗口

在我們訪問網(wǎng)站的時候，比如某站就會彈出一個登錄窗口讓我們登錄，但是我們直接定位這些窗口的元素是無法直接定位到的，所以要先切換到窗口上，一般我們把這些窗口稱為句柄，當(dāng)我們點擊登錄按鈕后就會彈出新的句柄，我們通過driver.window_handles來獲取句柄，由于句柄是按時間順序獲得的，取得的數(shù)據(jù)類似于數(shù)據(jù)和列表，可以通過索引來進行切換，由于我們一般會獲取最新的句柄，所以一般會取最新的也就是-1所對應(yīng)的句柄，下面是某站的代碼演示

from selenium.webdriver.common.by import By
from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開b站
driver.get('https://www.bilibili.com/')
time.sleep(1.5)
#點擊登錄按鈕
login_element=driver.find_element(By.CSS_SELECTOR,'#i_cecream > div.bili-feed4 > div.bili-header.large-header > div.bili-header__bar > ul.right-entry > li:nth-child(1) > li > div.right-entry__outside.go-login-btn > div > span')
login_element.click()
#獲取句柄并切換至最新句柄
windows=driver.window_handles
driver.switch_to.window(windows[-1])
#點擊注冊按鈕
register_element=driver.find_element(By.CLASS_NAME,'btn_other')
register_element.click()

鼠標(biāo)控制

selenium的鼠標(biāo)控制可以模擬人一樣的操作對元素、標(biāo)簽進行點擊來達到我們想要的效果或者數(shù)據(jù)，通常我們正常的操作有：單擊左鍵、單擊右鍵、雙擊、拖拽、懸停等

鼠標(biāo)單擊左鍵

我們一般使用單機左鍵來實現(xiàn)一些普通的點擊功能，如點擊登錄等，在selenium中也是一樣，對標(biāo)簽使用click()的方法即可完成簡單的單擊左鍵，下面以點擊某網(wǎng)站的登錄鍵為例

from selenium.webdriver.common.by import By
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開p站
driver.get('https://www.pixiv.net/')
#點擊登錄按鈕
login_element=driver.find_element(By.CLASS_NAME,'signup-form__submit--login')
login_element.click()

鼠標(biāo)單擊右鍵

由于單擊左鍵和單擊右鍵差別較大，所以selenium提供了不同的方法需要用到名為ActionChains的類來實現(xiàn)，下面以單擊右鍵某網(wǎng)站的登錄鍵為例

from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開p站
driver.get('https://www.pixiv.net/')
#右鍵點擊登錄按鈕
login_element=driver.find_element(By.CLASS_NAME,'signup-form__submit--login')
ActionChains(driver).context_click(login_element).perform()

鼠標(biāo)左鍵雙擊

雙擊與單機同樣有所差異，所以我們依舊要使用ActionChains的類來實現(xiàn)，下面我們將雙擊某網(wǎng)站輸入密碼的小眼睛來觀察

from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開b站登錄界面
driver.get('https://passport.bilibili.com/login')
#左鍵雙擊小眼睛按鈕
eye_element=driver.find_element(By.CLASS_NAME,'eye-btn')
ActionChains(driver).double_click(eye_element).perform()

鼠標(biāo)拖拽

將一個元素拖拽到另一個要釋放的元素上，比如滑動登錄和設(shè)計拖拽等就用到這種方法，以下是某網(wǎng)站的實例，我們將標(biāo)題拖到一個畫布上

from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開某網(wǎng)站的設(shè)計界面并點擊文字選項
driver.get('https://www.canva.cn/design/play?locale=zh-CN')
time.sleep(2)
text_element=driver.find_element(By.CLASS_NAME,'sW2TIg')
text_element.click()
driver.implicitly_wait(20)
#定位標(biāo)題元素位置
title_element=driver.find_element(By.CLASS_NAME,'mqHySA')
#定位畫布元素位置
draw_element=driver.find_element(By.CLASS_NAME,'xvJQZA')
#推拽圖片至畫布位置
ActionChains(driver).drag_and_drop(title_element, draw_element).perform()

鼠標(biāo)懸停

有時候當(dāng)我們鼠標(biāo)懸停在某個元素上的時候才會出現(xiàn)內(nèi)容，所以selenium也十分貼心的準(zhǔn)備了這個功能，同樣是在ActionChains的類中實現(xiàn)，下面以某科技網(wǎng)站為例

from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開apple網(wǎng)站
driver.get('https://www.apple.com')
#定位Store標(biāo)簽位置
store_element=driver.find_element(By.XPATH,'//*[@id="globalnav-list"]/li[2]/div/div/div[1]')
time.sleep(1)
#移動鼠標(biāo)至Store并懸浮，因為edge有外部彈窗，所以我們寫一個死循環(huán)，讓鼠標(biāo)一直懸浮在Store上
while True:
    ActionChains(driver).move_to_element(store_element).perform()

鍵盤控制

我們使用鍵盤常用的就是輸入作用，selenium也實現(xiàn)了這些功能，同時還實現(xiàn)了快捷鍵的使用，比如復(fù)制粘貼的鍵盤快捷鍵，但要實現(xiàn)快捷鍵的使用則需要使用Keys來實現(xiàn)

輸入內(nèi)容

輸入內(nèi)容通過定位標(biāo)簽并輸入數(shù)據(jù)來實現(xiàn)，使用了send_keys()的方法函數(shù)，下面是輸入登錄框的代碼實現(xiàn)

from selenium.webdriver.common.by import By
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開b網(wǎng)的登錄頁
driver.get('https://passport.bilibili.com/login')
#定位賬號標(biāo)簽
username_element=driver.find_element(By.XPATH,'//[@id="app"]/div[2]/div[2]/div[3]/div[2]/div[1]/div[1]/input')
#輸入內(nèi)容
username_element.send_keys('123456789')

其他操作

使用其他操作就要用到Keys來實現(xiàn)，下面是Keys中的常見用法

代碼	等效操作
Keys.ENTER	回車鍵
Keys.BACK_SPACE	按一次刪除鍵
Keys.CONTROL	按住Ctrl鍵
Keys.F1	按F1鍵【F2等同理】
Keys.SPACE	按空格鍵
Keys.TAB	按Tab鍵
Keys.ESCAPE	按ESC鍵
Keys.ALT	按Alt鍵
Keys.SHIFT	按Shift鍵
Keys.ARROW_DOWN	按向下箭頭
Keys.ARROW_LEFT	按向左箭頭
Keys.ARROW_RIGHT	按向右箭頭
Keys.ARROW_UP	按向上箭頭

下面演示其中的幾種來感受以下效果

from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開b網(wǎng)的登錄頁
driver.get('https://passport.bilibili.com/login')
#定位賬號標(biāo)簽
username_element=driver.find_element(By.CSS_SELECTOR,'#app > div.login_wp > div.login__main > div.main__right > div.login-pwd > div.tab__form > div:nth-child(1) > input[type=text]')
#輸入內(nèi)容
username_element.send_keys('123456789')
time.sleep(1.5)
#刪除一個符號
username_element.send_keys(Keys.BACKSPACE)
time.sleep(1.5)
#全選內(nèi)容
username_element.send_keys(Keys.CONTROL,'a')
time.sleep(1.5)
#按Tab鍵
username_element.send_keys(Keys.TAB)
time.sleep(1.5)

等待操作

由于許多頁面采用異步實現(xiàn)不可能一次性全部加載完畢，所以我們需要進行等待，而等待操作分為顯式等待、隱式等待兩種，這兩者都是selenium自己提供的，還有一種是外界的強制等待，也就是使用time.sleep()來實現(xiàn)進程的休息

顯式等待

我們設(shè)定一個超時的時限，每過一段實現(xiàn)就去檢驗一次某元素是否出現(xiàn)，如果出現(xiàn)了，就執(zhí)行下一句代碼，如果超出了時間限制還沒有出現(xiàn)，則報錯，我們需要用到一個叫WebDriverWait的類，代碼基本格式如下

WebDriverWait(driver,timeout,poll_frequency=0.5,ignored_exceptions=None).until(expected_conditions.title_is(data),message='')
#dirver就是驅(qū)動，你寫的driver.get()的driver
#timeout就是超時時間，單位是秒
#poll_frequency就是每次檢測的時間間隔，默認(rèn)是0.5秒檢測一次
#ignored_exceptions就是指定忽略的異常，如果在調(diào)用until或until_not的過程中拋出指定忽略的異常，則不中斷代碼，默認(rèn)忽略的只有NoSuchElementException
#until()也可改為until_not(),until()用于指定預(yù)期條件的判斷方法，在等待期間，每隔一段時間調(diào)用該方法，判斷元素是否存在，直到元素出現(xiàn),until_not()則正好相反，當(dāng)元素消失或指定條件不成立，則繼續(xù)執(zhí)行后續(xù)代碼

WebDriverWait方法函數(shù)判斷由expected_conditions提供，可以在設(shè)置message來返回自定義報錯信息，以下是常見的幾種方法判斷函數(shù)，先來定義一下方法判斷中要出現(xiàn)的參數(shù)

locator=(By.ID,"1")
element = driver.find_element_by_id('1')
content='1'

方法	概述
title_is(content)	判斷當(dāng)前頁面的title是否等于content
title_contains(content)	判斷當(dāng)前頁面的title是否包含content
presence_of_element_located(locator)	判斷元素是否被加到了dom樹里，但并不代表該元素一定可見
visibility_of_element_located(locator)	判斷元素是否可見，可見代表元素非隱藏，并且元素的寬和高都不等于0
visibility_of(element)	同上，但傳入的是element
text_to_be_present_in_element(locator ,content)	判斷元素中的text是否包含了content
text_to_be_present_in_element_value(locator ,content)	判斷元素中的value屬性是否包含了content
frame_to_be_available_and_switch_to_it(locator)	判斷該frame是否可以switch進去，True則switch進去，反之False
invisibility_of_element_located(locator)	判斷元素中是否不存在于dom樹或不可見
element_to_be_clickable(locator)	判斷元素中是否可見并且是可點擊的
element_to_be_clickable(locator)	等待元素從dom樹中移除
element_to_be_selected(element)	等待元素從dom樹中移除
element_selection_state_to_be(element, True)	判斷元素的選中狀態(tài)是否符合預(yù)期，參數(shù) element，第二個參數(shù)為True/False
element_located_selection_state_to_be(locator, True)	跟上一個方法作用相同，但傳入?yún)?shù)為locator
alert_is_present()	判斷頁面上是否存在alert

下面是例子，我們定義一個不存在的元素，然后讓代碼拋出異常

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開b網(wǎng)的登錄頁
driver.get('https://passport.bilibili.com/login')
#定位標(biāo)簽
element = WebDriverWait(driver, 8, 0.5).until(
            EC.presence_of_element_located((By.ID, '1')),
                                           message='訪問超時')

運行結(jié)果

selenium.common.exceptions.TimeoutException: Message: 訪問超時

隱式等待

隱式等待同樣是設(shè)置時間，如果沒有被加載出來就拋出NoSuchElementException異常，在隱式等待的情況下你是可以在下面寫driver.find_element()的語段的，如果沒有找到，隱式就會不斷的查找元素，知道被找到或者超出時間限制拋出異常，隱式等待語法為driver.implicitly_wait(“時間【單位秒】”)，下面是示例

from selenium.webdriver.common.by import By
from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開b網(wǎng)的登錄頁
driver.get('https://passport.bilibili.com/login')
#設(shè)置隱式等待時間
driver.implicitly_wait(5)
#設(shè)置初始時間用于計時
start=time.time()
#查找元素
try:
    element=driver.find_element(By.ID,'1')
except Exception as E:
    print('error')
finally:
    #輸出耗時
    print(time.time()-start)

運行結(jié)果

error
5.047070503234863

強制等待

強制等待需要使用time.sleep()來實現(xiàn)，單位為秒，這里不再過多贅述，需要注意的是，time.sleep()是一種進程掛起，相比隱式等待，如果設(shè)定等待時間均為5秒，而元素在2秒內(nèi)被加載出來，此時隱式等待將直接運行下一段代碼，而強制等待則需要等待5秒結(jié)束

表單切換

由于網(wǎng)頁會使用嵌套來寫也就是iframe和frame，使用嵌套的表單selenium是無法直接獲取的，所以就有了表單切換這個需求，下面是代碼演示

from selenium.webdriver.common.by import By
from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開某視頻網(wǎng)站
driver.get('http://yhdm72.com/acg/68717/1.html')
driver.implicitly_wait(10)
#查找iframe標(biāo)簽
driver.switch_to.frame('playiframe')    #默認(rèn)使用id和name進行定位
#因為該網(wǎng)站有兩個嵌套且id相同，所以切換兩次
driver.switch_to.frame('playiframe')

#使用xpath定位方法
#iframe_element=driver.find_element(By.XPATH,'/html/body/div[3]/div[2]/iframe')
#driver.switch_to.frame(iframe_element)

#查找video標(biāo)簽
video_element=driver.find_element(By.TAG_NAME,'video')
print(video_element.get_attribute('src'))

運行結(jié)果，可知這個src是加密

blob:http://ss2.quelingfei.com:9900/8322436d-6f35-4f90-905f-531cde8352e5

JS彈窗處理

彈出js彈窗時，我們先定位switch_to.alert來獲取彈窗，然后在使用方法進行操作

方法	概述
text	獲取彈窗中的文字
accept	確認(rèn)彈窗內(nèi)容
dismiss	取消彈窗
send_keys	發(fā)送內(nèi)容至警告框

下面是對某網(wǎng)站的實際操作

from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://incsky16.github.io/2022/06/18/142test/')
#切換至彈窗
alert=driver.switch_to.alert
time.sleep(2)
#點擊取消
alert.dismiss()
time.sleep(2)
#切換至下一個彈窗
new_alert=driver.switch_to.alert
#獲取內(nèi)容并點擊確認(rèn)
print(new_alert.text)
time.sleep(2)
new_alert.accept()

運行結(jié)果

密碼錯誤，將返回主頁！

文件處理

常見的有文件上傳處理，以及文件下載處理，selenium也同樣可以實現(xiàn)

上傳文件

一般上傳文件是使用input標(biāo)簽寫的，所以說可以直接寫入文件路徑就行，下面是示例

from selenium.webdriver.common.by import By
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://deershare.com/send')
#查找input標(biāo)簽
video_element=driver.find_element(By.XPATH,'/html/body/div[1]/div[1]/div[2]/div[1]/div[3]/input')
#輸入內(nèi)容
video_element.send_keys(r'C:\Users\14040\Desktop\新建文本文檔.txt')

下載文件

下載文件也十分簡單，找到標(biāo)簽點擊下載即可，下面是示例代碼

from selenium.webdriver.common.by import By
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://www.lifeofpix.com/')
driver.implicitly_wait(5)
#查找下載標(biāo)簽
download_element=driver.find_element(By.CLASS_NAME,'download')
download_element.click()

Cookie的操作

cookie是瀏覽器與服務(wù)器會話的十分重要的東西，webdriver對cookie有增刪查的操作，下面是具體用法和說明

方法	概述
get_cookies()	以字典形式返回會話中的cookie全部信息
get_cookie(name)	返回字典中key=name的cookie信息
add_cookie(cookie_dict)	添加自定義cookie到會話中
delete_cookie(name)	刪除key=name的cookie信息
delete_all_cookies()	刪除會話范圍內(nèi)的所有cookie

一下是演示代碼，只展示一部分的方法

from selenium.webdriver.common.by import By
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://www.apple.com/')
#打印當(dāng)前所有cookie
print(driver.get_cookies())

運行結(jié)果如下

[{'domain': '.apple.com', 
  'expiry': 1697020222, 
  'httpOnly': False, 
  'name': 's_vi', 
  'path': '/', 
  'sameSite': 'None',
  'secure': True,
  'value': '[CS]v1|3293391B4A21700D-4000122643359156[CE]'},
 {'domain': '.apple.com', 
  'httpOnly': False,
  'name': 'mk_epub', 
  'path': '/', 
  'sameSite': 'Lax',
  'secure': True,
  'value': '%7B%22btuid%22%3A%221q1ybut%22%2C%22prop57%22%3A%22www.us.homepage%22%7D'},
 {'domain': '.apple.com',
  'httpOnly': False, 
  'name': 's_cc',
  'path': '/',
  'sameSite': 'Lax',
  'secure': True,
  'value': 'true'},
 {'domain': '.apple.com', 
  'expiry': 1697020221, 
  'httpOnly': False, 
  'name': 's_fid', 
  'path': '/', 
  'sameSite': 'Lax',
  'secure': True, 
  'value': '63B91D4F3A5FB748-21AF8E60CCC2B67D'},
 {'domain': '.apple.com',
  'httpOnly': False, 
  'name': 'geo', 
  'path': '/', 
  'sameSite': 'Lax',
  'secure': False,
  'value': 'HK'}]

JavaScript應(yīng)用

有時候我們需要JavaScript才能實現(xiàn)某些功能，比如說懶加載的顯示就需要我們滑到最下面才能繼續(xù)加載，selenium提供了execute_script()的方法來實現(xiàn)對JavaScript的操控，下面介紹兩種常見思想

通過頁面坐標(biāo)來滑動

我們來嘗試一下使用頁面坐標(biāo)滑動來訪問bilibili的首頁，下面是代碼示例

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://www.bilibili.com/')
#滑動頁面
js='window.scrollTo(0,1000)'
driver.execute_script(js)

滑動標(biāo)簽來實現(xiàn)懶加載運行

還是以b站首頁為例，來演示一下一直執(zhí)行懶加載，下面是代碼示例

from selenium.webdriver.common.by import By
from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://www.bilibili.com/')
driver.implicitly_wait(2)
#由于bilibili網(wǎng)頁特殊性，先下滑
js = "window.scrollTo(0,1000);"
driver.execute_script(js)
time.sleep(2)
#寫一個死循環(huán)，找到card就往下滑
while True:
    time.sleep(1)
    target=driver.find_elements(By.CLASS_NAME,'bili-video-card')[-1]
    driver.execute_script("arguments[0].scrollIntoView();", target)

其他常用操作

東西太多不好逐個去講解，這里的常見操作相對簡單，所以就不逐一展示，我把怕們放在一個長代碼中來展示并講解，下面是代碼內(nèi)容及解釋

from selenium.webdriver.common.by import By
from selenium import webdriver
import time

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://www.bilibili.com/')

#關(guān)閉全部窗口并退出驅(qū)動
driver.quit()
#關(guān)閉當(dāng)前頁面
driver.close()
#對當(dāng)前頁面進行截圖保存
driver.get_screenshot_as_file('你的路徑')
# 獲取當(dāng)前頁面url
driver.current_url
# 獲取當(dāng)前html源碼
driver.page_source
# 獲取當(dāng)前頁面標(biāo)題
driver.title
# 獲取瀏覽器名稱(chrome)
driver.name
# 對頁面進行截圖，返回二進制數(shù)據(jù)
driver.get_screenshot_as_png()
# 設(shè)置瀏覽器尺寸
driver.get_window_size()
# 獲取瀏覽器尺寸，位置
driver.get_window_rect()
# 獲取瀏覽器位置(左上角)
driver.get_window_position()
# 設(shè)置瀏覽器尺寸
driver.set_window_size(width=100, height=100)
# 設(shè)置瀏覽器位置(左上角)
driver.set_window_position(x=100, y=100)
# 設(shè)置瀏覽器的尺寸，位置
driver.set_window_rect(x=100, y=100, width=100, height=100)

關(guān)于反爬的一些配置

selenium雖然看上去十分強大，但是依然會被某些網(wǎng)站輕松檢測出來，所以說我們就要對options進行一些設(shè)置來反反爬

header的設(shè)置

有時候我們會覺得selenium突然將頁面彈出十分的麻煩，會導(dǎo)致電腦看上去十分混亂，selenium就提供了headless的方式來隱藏顯示，但是headless有一個很大的缺陷，就是十分容易被網(wǎng)站檢測到是爬蟲，但是這里還是講一下

headless

以下是不顯示的無頭瀏覽器headless的代碼演示

from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
options.add_argument('--headless')
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://bot.sannysoft.com/')
#截圖網(wǎng)頁
driver.save_screenshot('page.png')

【Python爬蟲】selenium的詳細使用方法,python,爬蟲,selenium,開發(fā)語言

我們會發(fā)現(xiàn)這是一片紅，雖然說正常網(wǎng)站是不會太在意反爬這件事情的，但是但凡別人想反爬，我們這種方法肯定不行，所以我們盡量不使用headless來請求，那與之相對的，我們可以設(shè)置隨機頭來幫助我們這時就要用到一個fake-useragent的庫

fake-useragent+selenium

首先我們先下載fake-useragent庫，win+r并輸入cmd回車進入命令提示符，輸入以下代碼完成下載

pip install fake-useragent

由于fake-useragent遠程訪問獲取的地址需要等待時間很久，所以我們使用本地加載的文件來進行獲取

下載地址

下載完成后放在與你python文件運行的同級目錄下，或者其他目錄但要自己輸入絕對路徑來使用，下面來演示一下fake-useragent與selenium的結(jié)合

from fake_useragent import UserAgent
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
ua=UserAgent(cache_path='fake_useragent.json')    #注意版本不同cache_path可能為path或者不用填寫
user_agent=ua.random
options.add_experimental_option('detach', True)

#也可以在這里使用option.add_argument("--headless")，然后和下面的代碼結(jié)合

options.add_argument(f'user-agent={user_agent}')
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://bot.sannysoft.com/')
#截圖網(wǎng)頁
driver.save_screenshot('page.png')

【Python爬蟲】selenium的詳細使用方法,python,爬蟲,selenium,開發(fā)語言

由上圖我們可以知道，現(xiàn)在只有一個參數(shù)可以知道我們是爬蟲了

隱藏指紋

在上面我們看到了我們還差一條就差不多躲過了大部分的網(wǎng)站了，我們現(xiàn)在將在這里實現(xiàn)，讓上面的表格全部偽造成綠色。隱藏指紋這個事情已經(jīng)有前人為我們開好了路，需要用到一個名為stealth.min.js的文件，我在這里提供下載資源和源碼地址，將下好的文件放在與你python文件運行的同級目錄下，或者其他目錄但要自己輸入絕對路徑來使用

CSDN站內(nèi)下載資源
源碼地址

下面來演示一下使用，代碼如下

from fake_useragent import UserAgent
from selenium import webdriver

#設(shè)置options
options=webdriver.EdgeOptions()
ua=UserAgent(cache_path='fake_useragent.json')    #注意版本不同cache_path可能為path或者不用填寫
user_agent=ua.random
options.add_experimental_option('detach', True)
#也可以在這里使用option.add_argument("--headless")，然后和下面的代碼結(jié)合
options.add_argument(f'user-agent={user_agent}')

#edge,將option傳入edge中，并隱藏指紋
driver=webdriver.Edge(options=options)
with open('stealth.min.js') as f:
    js = f.read()
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
  "source": js
})
#打開網(wǎng)站
driver.get('https://bot.sannysoft.com/')
#截圖網(wǎng)頁
driver.save_screenshot('page.png')

【Python爬蟲】selenium的詳細使用方法,python,爬蟲,selenium,開發(fā)語言

對網(wǎng)頁Networks的記錄

由于selenium的更新，新舊版本的獲取networks的方法可能會有所不同，這里會進行區(qū)分，大家可以通過嘗試來看是否可行，此外除了selenium自帶的方法外，github上還有人提供了使用java的方法來實現(xiàn)記錄，在下文中我也會進行呈現(xiàn)

selenium自帶

這里將給大家講解selenium自帶的記錄networks的方法，會分新舊兩個版本進行講解

較老版本

有多種方法進行實現(xiàn)，首先先將selenium中的方法，driver.get_log()的方法

from selenium import webdriver
import json

#設(shè)置caps
caps = {
        'browserName': 'edge',
        'loggingPrefs': {
            'browser': 'ALL',
            'driver': 'ALL',
            'performance': 'ALL',
        },
        'goog:edgeOptions': {
            'perfLoggingPrefs': {
            'enableNetwork': True,
            },
            'w3c': False,
        },
    }
driver = webdriver.Edge(desired_capabilities=caps)
driver.get('https://www.bilibili.com/')
#獲取日志信息
networks = driver.get_log('performance')  
for network in networks:
    #將json格式信息轉(zhuǎn)化為字典
    dic_info = json.loads(network["message"])  
    #request信息，在字典的鍵["message"]["params"]中
    info = dic_info["message"]["params"]    
    #打印一下request
    print(info)

由于并沒有親自嘗試?yán)习姹镜拇a，可能出現(xiàn)紕漏，大家可以自己試試看

較新版本

在較新的版本中，selenium不在提供上面所用的方法，而是改用了selenium-wire的外部庫來實現(xiàn)，首先下載selenium-wire庫，win+r輸入cmd打開命令提示符輸入以下代碼

pip install selenium-wire

下面進行代碼演示，并展示結(jié)果

from selenium.webdriver.edge.options import Options
from seleniumwire import webdriver

#設(shè)置options
options=Options()
options.add_experimental_option('detach', True)
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#打開網(wǎng)站
driver.get('https://www.bilibili.com/')
#輸出network內(nèi)容
for request in driver.requests:
    print(request.url)
    print(request.headers)
    print(request.method)
    print(request.response)
    print(request.date)
    break

輸出結(jié)果

#request.url
https://edge.microsoft.com/serviceexperimentation/v3/?osname=win&channel=stable&scpfull=0&scpguard=0&scpver=0&osver=10.0.19045&devicefamily=desktop&installdate=1674739257&clientversion=117.0.2045.60&experimentationmode=2

#request.headers
pragma: no-cache
cache-control: no-cache
sec-mesh-client-edge-version: 117.0.2045.60
sec-mesh-client-edge-channel: stable
sec-mesh-client-os: Windows
sec-mesh-client-os-version: 10.0.19045
sec-mesh-client-arch: x86_64
sec-mesh-client-webview: 0
x-client-data: eyIxIjoiMCIsIjEwIjoiIiwiMiI6IjAiLCIzIjoiMCIsIjQiOiI1NDUwNTU1ODEyNjE5MzA2NzIyIiwiNSI6IiIsIjYiOiJzdGFibGUiLCI3IjoiMSIsIjkiOiJkZXNrdG9wIn0=
sec-fetch-site: none
sec-fetch-mode: no-cors
sec-fetch-dest: empty
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 Edg/117.0.2045.60
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6

#request.method
GET

#request.response
200 

#request.date
2023-10-11 20:39:49.448413

通過js控制來獲取

通過js語法對networks進行獲取，以下是代碼示例

from selenium import webdriver

# 打開瀏覽器并訪問目標(biāo)網(wǎng)頁
driver = webdriver.Edge()
driver.get('https://www.bilibili.com/')
# 執(zhí)行js代碼，獲取網(wǎng)頁的network信息
networks = driver.execute_script('return window.performance.getEntries();')
# 輸出獲取到的信息
for network in networks:
    print(network)

結(jié)果類似如下，可根據(jù)需要自行提取數(shù)據(jù)

{'activationStart': 0, 'connectEnd': 23.80000001192093, 'connectStart': 3, 'criticalCHRestart': 0, 'decodedBodySize': 106758, 'deliveryType': '', 'domComplete': 2404.5999999940395, 'domContentLoadedEventEnd': 1988.300000011921, 'domContentLoadedEventStart': 1986.7000000178814, 'domInteractive': 1301, 'domainLookupEnd': 1.9000000059604645, 'domainLookupStart': 1.9000000059604645, 'duration': 2418.5999999940395, 'encodedBodySize': 26255, 'entryType': 'navigation', 'fetchStart': 1.9000000059604645, 'firstInterimResponseStart': 0, 'initiatorType': 'navigation', 'loadEventEnd': 2418.5999999940395, 'loadEventStart': 2404.800000011921, 'name': 'https://www.bilibili.com/', 'nextHopProtocol': 'h2', 'redirectCount': 0, 'redirectEnd': 0, 'redirectStart': 0, 'renderBlockingStatus': 'non-blocking', 'requestStart': 23.900000005960464, 'responseEnd': 1086.9000000059605, 'responseStart': 1077.7000000178814, 'responseStatus': 200, 'secureConnectionStart': 3.5999999940395355, 'serverTiming': [], 'startTime': 0, 'toJSON': {}, 'transferSize': 26555, 'type': 'navigate', 'unloadEventEnd': 0, 'unloadEventStart': 0, 'workerStart': 0}

{'duration': 0, 'entryType': 'visibility-state', 'name': 'visible', 'startTime': 0, 'toJSON': {}}

{'connectEnd': 0, 'connectStart': 0, 'decodedBodySize': 0, 'deliveryType': '', 'domainLookupEnd': 0, 'domainLookupStart': 0, 'duration': 97.40000000596046, 'encodedBodySize': 0, 'entryType': 'resource', 'fetchStart': 1081.300000011921, 'firstInterimResponseStart': 0, 'initiatorType': 'script', 'name': 'https://s1.hdslb.com/bfs/static/laputa-home/client/assets/svgfont.af80f0d3.js', 'nextHopProtocol': '', 'redirectEnd': 0, 'redirectStart': 0, 'renderBlockingStatus': 'non-blocking', 'requestStart': 0, 'responseEnd': 1178.7000000178814, 'responseStart': 0, 'responseStatus': 0, 'secureConnectionStart': 0, 'serverTiming': [], 'startTime': 1081.300000011921, 'toJSON': {}, 'transferSize': 0, 'workerStart': 0}

外部實現(xiàn)

外部實現(xiàn)需要用到browsermob-proxy和庫，以及java環(huán)境才能使用，相對較為麻煩，首先我們先下載相應(yīng)的庫win+r輸入cmd并輸入以下代碼回車下載

pip install browsermob-proxy

然后我們下載browsermob-proxy的官方文件，以下是我上傳的地址和官方地址

CSDN站內(nèi)地址
官網(wǎng)地址

現(xiàn)在我們來配置java環(huán)境，要想使用browsermob-proxy我們必須需要1.8以上的jdk環(huán)境才行，大家可以上網(wǎng)查一下jdk的配置，這里不在過多贅述，下面來展示通過browsermob-proxy實現(xiàn)的獲取networks的代碼

from selenium.webdriver.edge.options import Options
from selenium.webdriver.common.by import By
from browsermobproxy import Server
from selenium import webdriver
import time

#開啟服務(wù)
server = Server(r'.\browsermob-proxy\bin\browsermob-proxy.bat')
server.start()
proxy = server.create_proxy()
#設(shè)置options
options=Options()
options.add_experimental_option('detach', True)
options.add_argument('--proxy-server={0}'.format(proxy.proxy))
#edge,將option傳入edge中
driver=webdriver.Edge(options=options)
#開始捕獲網(wǎng)絡(luò)流量
proxy.new_har("example")
#打開網(wǎng)站
driver.get('https://www.bilibili.com/')
#由于我在打開中出現(xiàn)了連接不安全，所以這種方式來解決
driver.switch_to.window(driver.window_handles[-1])
driver.find_element(By.XPATH,'./html').send_keys('thisisunsafe')
time.sleep(2)
#停止捕獲網(wǎng)絡(luò)流量
result = proxy.har
for entry in result['log']['entries']:
        print(entry)
#關(guān)閉瀏覽器和代理服務(wù)器
driver.quit()
server.stop()

輸出結(jié)果

{'pageref': 'example', 'startedDateTime': '2023-10-11T21:19:40.492+08:00', 'request': {'method': 'CONNECT', 'url': 'https://www.bilibili.com', 'httpVersion': 'HTTP/1.1', 'cookies': [], 'headers': [], 'queryString': [], 'headersSize': 0, 'bodySize': 0, 'comment': ''}, 'response': {'status': 0, 'statusText': '', 'httpVersion': 'unknown', 'cookies': [], 'headers': [], 'content': {'size': 0, 'mimeType': '', 'comment': ''}, 'redirectURL': '', 'headersSize': -1, 'bodySize': -1, 'comment': '', '_error': 'Unable to connect to host'}, 'cache': {}, 'timings': {'comment': '', 'ssl': -1, 'receive': 0, 'blocked': 0, 'connect': 161, 'send': 0, 'dns': 0, 'wait': 0}, 'serverIPAddress': '183.131.147.27', 'comment': '', 'time': 161}
......

參考文獻以及溫馨提示

selenium用法詳解【從入門到實戰(zhàn)】【Python爬蟲】【4萬字】_selenium學(xué)習(xí)-CSDN博客

【selenium自動化過程中的api抓包】browsermobproxy的安裝和配置_browsermob-proxy-CSDN博客

Python+Selenium+Browsermob-Proxy 爬蟲-獲取瀏覽器Network請求和響應(yīng)_selenium獲取network-CSDN博客

Selenium獲取瀏覽器Network數(shù)據(jù)包_selenium獲取network_0xActive的博客-CSDN博客
溫馨提示：文章來源地址http://www.zghlxwxcb.cn/news/detail-725569.html

有的代碼可能由于網(wǎng)站問題定位會發(fā)生變化，可能需要大家自行調(diào)節(jié)
本文章最后更新于2023年10月14日

到了這里，關(guān)于【Python爬蟲】selenium的詳細使用方法的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符，請點擊違法舉報進行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費用

Python 爬蟲使用代理 IP 的正確方法
代理 IP 是爬蟲中非常常用的方法，可以避免因為頻繁請求而被封禁。下面是 Python 爬蟲使用代理 IP 的正確方法： 1. 選擇可靠的代理 IP 供應(yīng)商，購買或者免費使用代理 IP 列表。 2. 在爬蟲中使用第三方庫 requests ，并在 requests.get() 或 requests.post() 請求時添加代理 IP 參數(shù)，例如：
2024年02月11日
瀏覽(34)
關(guān)于Python中使用selenium八大定位方法
1.通過id元素定位? ? ? ? ? ? ? ? ? ? ? ? ? ? ?.find_element_by_id(\\\"id\\\") 2.通過name元素定位? ? ? ? ? ? ? ? ? ? ? ?.find_element_by_name(\\\"name\\\")?? ? 3.通過路徑導(dǎo)航定位? ? ? ? ? ? ? ? ? ? ? ? ?.find_element_by_xpath(\\\"xpath\\\") ?? ????? 說明 :右鍵所選的網(wǎng)頁元素,點擊copy,點擊copy ,x
2023年04月23日
瀏覽(31)
【Python_Selenium學(xué)習(xí)筆記（一）】Selenium介紹及基本使用方法
Selenium是一套 Web 網(wǎng)站的程序自動化操作解決方案，廣泛應(yīng)用于自動化測試及爬蟲。此篇文章主要介紹 Selenium 的安裝和基本使用流程。 Selenium 框架的安裝主要就是安裝兩樣?xùn)|西： Selenium 客戶端庫和瀏覽器驅(qū)動。 1.1、Selenium 框架安裝使用 pip 命令安裝 pip install selenium ，安裝
2023年04月13日
瀏覽(27)
【Python爬蟲】requests庫get和post方法使用
requests庫是一個常用于http請求的模塊，性質(zhì)是和urllib，urllib2是一樣的，作用就是向指定目標(biāo)網(wǎng)站的后臺服務(wù)器發(fā)起請求，并接收服務(wù)器返回的響應(yīng)內(nèi)容。 1. 安裝requests庫使用pip install requests安裝如果再使用pip安裝python模塊出現(xiàn)timeout超時異常，可使用國內(nèi)豆瓣源進行安裝。
2024年02月22日
瀏覽(17)
python爬蟲之使用bs4方法進行數(shù)據(jù)解析
2024年02月08日
瀏覽(22)
Python：列表的詳細使用方法
本篇文章將對列表的使用方法進行詳盡說明（本人第一次寫文章，若有不當(dāng)之處，還請指正）開發(fā)環(huán)境：Python3.8 1.1、列表的兩種表示方法： ?列表里可以存儲不同的數(shù)據(jù)類型 1.2、生成各個各樣的列表運行結(jié)果： ? 1.3、查詢列表中的元素（索引、切片）：索引：運行結(jié)果：
2023年04月09日
瀏覽(32)
python網(wǎng)絡(luò)爬蟲之selenium的詳細安裝配置以及簡單使用--菜鳥復(fù)習(xí)日記
?學(xué)習(xí)python selenium已經(jīng)是好久以前的事情了，自己都快要忘記了，所以寫篇博客復(fù)習(xí)復(fù)習(xí)，本文包括安裝selenium驅(qū)動以及selenium的一些簡單使用。本文默認(rèn)安裝python以及selenium庫。目錄一、安裝seleium的驅(qū)動(以谷歌瀏覽器為例) 二、selenium庫的一些簡單用法。 ? ? ? ? 1.向輸入框
2024年02月06日
瀏覽(14)
selenium+python自動化測試之使用webdriver操作瀏覽器的方法
WebDriver簡介 selenium從2.0開始集成了webdriver的API，提供了更簡單，更簡潔的編程接口。selenium webdriver的目標(biāo)是提供一個設(shè)計良好的面向?qū)ο蟮腁PI，提供了更好的支持進行web-app測試。從這篇博客開始，將學(xué)習(xí)使用如何使用python調(diào)用webdriver框架對瀏覽器進行一系列的操作打開瀏覽
2024年01月25日
瀏覽(28)
【python】flask中藍圖使用方法詳細解析
?? 歡迎大家來到景天科技苑?? ???? 養(yǎng)成好習(xí)慣，先贊后看哦~???? ?? 作者簡介：景天科技苑 ??《頭銜》：大廠架構(gòu)師，華為云開發(fā)者社區(qū)專家博主，阿里云開發(fā)者社區(qū)專家博主，CSDN全棧領(lǐng)域優(yōu)質(zhì)創(chuàng)作者，掘金優(yōu)秀博主，51CTO博客專家等。 ??《博客》：Python全棧，
2024年04月23日
瀏覽(14)
【爬蟲開發(fā)】爬蟲從0到1全知識md筆記第5篇：Selenium課程概要,selenium的其它使用方法【附代碼文檔】
爬蟲開發(fā)從0到1全知識教程完整教程（附代碼資料）主要內(nèi)容講述：爬蟲課程概要，爬蟲基礎(chǔ)爬蟲概述, ,http協(xié)議復(fù)習(xí)。requests模塊，requests模塊1. requests模塊介紹,2. response響應(yīng)對象,3. requests模塊發(fā)送請求,4. requests模塊發(fā)送post請求,5. 利用requests.session進行狀態(tài)保持。數(shù)據(jù)提取概要
2024年04月16日
瀏覽(24)

<noframes id="e4mue"><tfoot id="e4mue"></tfoot></noframes>

<td id="e4mue"><pre id="e4mue"></pre></td>

<pre id="e4mue"><tfoot id="e4mue"></tfoot></pre>