在進(jìn)行網(wǎng)頁(yè)爬蟲(chóng)時(shí),常常會(huì)遇到如何將數(shù)據(jù)返回并保存到文件的問(wèn)題。以下是一個(gè)基于Python的示例代碼,展示了如何從特定網(wǎng)站提取數(shù)據(jù),并將結(jié)果保存為Excel文件。此代碼使用Pandas數(shù)據(jù)框架,以便于后續(xù)的數(shù)據(jù)操作。
from bs4 import BeautifulSoup as soup from selenium import webdriver import time import pandas as pd def checkproduct(url): driver = webdriver.Chrome() driver.get(url) driver.execute_script("window.scrollTo(0, 3000);") time.sleep(10) page_html = driver.page_source data = soup(page_html, 'html.parser') allproduct = data.findAll('div', {'class':'c16H9d'}) list_title = [] list_url = [] list_price = [] list_image = [] for pd in allproduct: pd_title = pd.text pd_url = 'https:' + pd.a['href'] list_title.append(pd_title) list_url.append(pd_url) allprice = data.findAll('span',{'class':'c13VH6'}) for pc in allprice: pc_price = pc.text.replace('?','').replace(',','') list_price.append(float(pc_price)) allimages = data.findAll('img',{'class':'c1ZEkM'}) for productimages in allimages: list_image.append(productimages['src']) driver.close() return [list_title, list_price, list_url, list_image] base_url = "https://www.lazada.co.th/shop-smart-tv?pages=" n = 3 rows = [] for i in range(1, n+1): url = base_url + f"{i}" print(url) results = checkproduct(url) rows.append(pd.DataFrame(results).T) df = pd.concat(rows).reset_index(drop=True) df.columns = ['Product', 'Price', 'URL', 'Images'] df.to_excel("Lazada_Product.xlsx")
代碼解析
導(dǎo)入庫(kù):使用
BeautifulSoup
進(jìn)行HTML解析,Selenium
進(jìn)行網(wǎng)頁(yè)操作,pandas
用于數(shù)據(jù)處理和保存。定義函數(shù):
checkproduct
函數(shù)負(fù)責(zé)訪問(wèn)網(wǎng)頁(yè),提取產(chǎn)品信息并返回一個(gè)列表。數(shù)據(jù)存儲(chǔ):在主循環(huán)中,我們構(gòu)建了URL,并調(diào)用
checkproduct
函數(shù)來(lái)獲取數(shù)據(jù)。將每次爬取的結(jié)果轉(zhuǎn)換為DataFrame并存入列表。合并數(shù)據(jù)并保存:最后,使用
pandas
將所有數(shù)據(jù)合并,并保存為Excel文件。文章來(lái)源:http://www.zghlxwxcb.cn/article/782.html
通過(guò)此方法,您可以有效地抓取網(wǎng)頁(yè)數(shù)據(jù),并使用Pandas進(jìn)行簡(jiǎn)單的操作與保存,使數(shù)據(jù)的管理更加方便。文章來(lái)源地址http://www.zghlxwxcb.cn/article/782.html
到此這篇關(guān)于如何使用Python抓取網(wǎng)頁(yè)的結(jié)果并保存到 Excel 文件?的文章就介紹到這了,更多相關(guān)內(nèi)容可以在右上角搜索或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!