今天爬取的是一本小說(shuō)
代碼如下:文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-822276.html
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy, ProxyType
import random
import time
from selenium.webdriver.common.by import By
def check():
option = webdriver.ChromeOptions()
option.add_argument('--ignore-certificate-errors')
driver = webdriver.Chrome(options=option)
url="https://www.fd80.com/305/305890/2099286.html"
for i in range(267,445):
print("正在爬取第"+str(i)+"章")
driver.get(url)
time.sleep(1)
url=get_text(driver)
print("爬取完成")
def get_text(driver):
element = driver.find_element(By.XPATH, '//*[@id="novelcontent"]/div')
title=driver.find_element(By.XPATH, '//*[@id="chaptertitle"]')
nexthtml=driver.find_element(By.XPATH, '//*[@id="next_url"]')
# 獲取下一章的鏈接
next_url = nexthtml.get_attribute('href')
# 將結(jié)果寫(xiě)入文件
with open('無(wú)敵六皇子.txt', 'a', encoding='utf-8') as f:
f.write(title.text + '\n')
f.write(element.text + '\n\n')
return next_url
if __name__ == '__main__':
check()
接著寫(xiě)一個(gè)網(wǎng)頁(yè)來(lái)表示出文本內(nèi)容(此段代碼由陳同學(xué)提供,不方便展示),效果如下:
最近新開(kāi)了公眾號(hào),請(qǐng)大家關(guān)注一下。文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-822276.html
到了這里,關(guān)于爬蟲(chóng)之牛刀小試(九):爬取小說(shuō)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!