1. 需要的類庫(kù)
import requests
import pandas as pd
2. 分析
通過(guò)分析,本站的熱榜數(shù)據(jù)可以直接通過(guò)接口拿到,故不需要解析標(biāo)簽,請(qǐng)求熱榜數(shù)據(jù)接口
url = "https://xxxt/xxxx/web/blog/hot-rank?page=0&pageSize=25&type=" #本站地址
直接請(qǐng)求解析會(huì)有點(diǎn)問(wèn)題,數(shù)據(jù)無(wú)法解析,加上請(qǐng)求頭
headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9",
"Sec-Ch-Ua": "\"Chromium\";v=\"116\", \"Not)A;Brand\";v=\"24\", \"Google Chrome\";v=\"116\"",
"Sec-Ch-Ua-Mobile": "?1",
"Sec-Ch-Ua-Platform": "\"Android\"",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-site",
"User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Mobile Safari/537.36"
}
完整請(qǐng)求代碼文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-795050.html
# 發(fā)送HTTP請(qǐng)求
r = requests.get(url, headers=headers)
# 解析JSON數(shù)據(jù)
data = r.json()
# 提取所需信息
articles = []
for item in data["data"]:
title = item["articleTitle"]
link = item["articleDetailUrl"]
rank = item["hotRankScore"]
likes = item["favorCount"]
comments = item["commentCount"]
views = item["viewCount"]
author = item["nickName"]
time = item["period"]
articles.append({
"標(biāo)題": title,
"鏈接": link,
"熱度分": rank,
"點(diǎn)贊數(shù)": likes,
"評(píng)論數(shù)": comments,
"查看數(shù)": views,
"作者": author,
"時(shí)間": time
})
3.導(dǎo)出Excel
# 創(chuàng)建DataFrame
df = pd.DataFrame(articles)
# 將DataFrame保存為Excel文件
df.to_excel("csdn_top.xlsx", index=False)
4. 成果展示
文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-795050.html
到了這里,關(guān)于python爬蟲(chóng)實(shí)戰(zhàn)(10)--獲取本站熱榜的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!