一、前言
接著上一篇的筆記,Scrapy爬取普通無反爬、靜態(tài)頁面的網(wǎng)頁時可以順利爬取我們要的信息。但是大部分情況下我們要的數(shù)據(jù)所在的網(wǎng)頁它是動態(tài)加載出來的(ajax請求后傳回前端頁面渲染、js調(diào)用function等)。這種情況下需要使用selenium進行模擬人工操作瀏覽器行為,實現(xiàn)自動化采集動態(tài)網(wǎng)頁數(shù)據(jù)。文章來源:http://www.zghlxwxcb.cn/news/detail-765925.html
二、環(huán)境搭建
- Scrapy框架的基本依賴包(前幾篇有記錄)
- selenium依賴包
- pip install selenium==4.0.0a6.post2
- pip install certifi
- pip install urllib3==1.25.11
- 安裝Firefox瀏覽器和對應(yīng)版本的驅(qū)動包
- 火狐瀏覽器我用的是最新版121.0
- 驅(qū)動的版本為0.3.0,見上方資源鏈接
- 把驅(qū)動放到python環(huán)境的Scripts文件夾下
三、代碼實現(xiàn)
- settings設(shè)置
SPIDER_MIDDLEWARES = {
'stock_spider.middlewares.StockSpiderSpiderMiddleware': 543,
}
DOWNLOADER_MIDDLEWARES = {
'stock_spider.middlewares.StockSpiderDownloaderMiddleware': 543,
}
ITEM_PIPELINES = {
'stock_spider.pipelines.StockSpiderPipeline': 300,
}
- middlewares中間件
from selenium.webdriver.firefox.options import Options as firefox_options
spider.driver = webdriver.Firefox(options=firefox_options()) # 指定使用的瀏覽器
- process_request
def process_request(self, request, spider):
# Called for each request that goes through the downloader
# middleware.
# Must either:
# - return None: continue processing this request
# - or return a Response object
# - or return a Request object
# - or raise IgnoreRequest: process_exception() methods of
# installed downloader middleware will be called
spider.driver.get("http://www.baidu.com")
return None
- process_response
from scrapy.http import HtmlResponse
def process_response(self, request, response, spider):
# Called with the response returned from the downloader.
# Must either;
# - return a Response object
# - return a Request object
# - or raise IgnoreRequest
response_body = spider.driver.page_source
return HtmlResponse(url=request.url, body=response_body, encoding='utf-8', request=request)
啟動爬蟲后就可以看到爬蟲啟動了瀏覽器驅(qū)動,接下來就可以實現(xiàn)各種模擬人工操作了文章來源地址http://www.zghlxwxcb.cn/news/detail-765925.html
到了這里,關(guān)于python爬蟲進階篇:Scrapy中使用Selenium模擬Firefox火狐瀏覽器爬取網(wǎng)頁信息的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!