前言:
本片文章主要對爬蟲爬取網(wǎng)頁數(shù)據(jù)來進行一個簡單的解答,對與其中的數(shù)據(jù)來進行一個爬取。
一:環(huán)境配置
Python版本:3.7.3
IDE:PyCharm
所需庫:requests ,parsel?
二:網(wǎng)站頁面
我們需要獲取以下數(shù)據(jù):
'地區(qū)',?'店名',?'標題',?'價格',?'瀏覽次數(shù)',?'賣家承諾',?'在售只數(shù)',
'年齡',?'品種',?'預防',?'聯(lián)系人',?'聯(lián)系方式',?'異地運費',?'是否純種',
'貓咪性別',?'驅(qū)蟲情況',?'能否視頻',?'詳情頁'文章來源:http://www.zghlxwxcb.cn/news/detail-694633.html
三:具體代碼實現(xiàn)?
# _*_ coding : utf-8 _*_
# @Time : 2023/9/3 23:03
# @Author : HYT
# @File : 貓
# @Project : 爬蟲教程
import requests
import parsel
import csv
url ='http://www.maomijiaoyi.com/index.php?/list_0_78_0_0_0_0.html'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36'
}
response = requests.get(url=url, headers=headers)
selector = parsel.Selector(response.text)
href = selector.css('div.content:nth-child(1) a::attr(href)').getall()
areas = selector.css('div.content:nth-child(1) a .area span.color_333::text').getall()
areas = [i.strip() for i in areas]
zip_data = zip(href, areas)
for index in zip_data:
# http://www.maomijiaoyi.com/index.php?/chanpinxiangqing_546549.html
index_url = 'http://www.maomijiaoyi.com' + index[0]
response_1 = requests.get(url=index_url, headers=headers)
selector_1 = parsel.Selector(response_1.text)
area = index[1] # 地區(qū)
shop = selector_1.css('.dinming::text').get().strip() # 店名
title = selector_1.css('.detail_text .title::text').get().strip() # 標題
price = selector_1.css('span.red.size_24::text').get() # 價格
views = selector_1.css('.info1 span:nth-child(4)::text').get() # 瀏覽次數(shù)
promise = selector_1.css('.info1 div:nth-child(2) span::text').get().replace('賣家承諾: ', '') # 賣家承諾
sale = selector_1.css('.info2 div:nth-child(1) div.red::text').get() # 在售
age = selector_1.css('.info2 div:nth-child(2) div.red::text').get() # 年齡
breed = selector_1.css('.info2 div:nth-child(3) div.red::text').get() # 品種
safe = selector_1.css('.info2 div:nth-child(4) div.red::text').get() # 預防
people = selector_1.css('div.detail_text .user_info div:nth-child(1) .c333::text').get() # 聯(lián)系人
phone = selector_1.css('div.detail_text .user_info div:nth-child(2) .c333::text').get() # 聯(lián)系方式
fare = selector_1.css('div.detail_text .user_info div:nth-child(3) .c333::text').get().strip() # 異地運費
purebred = selector_1.css(
'.xinxi_neirong div:nth-child(1) .item_neirong div:nth-child(1) .c333::text').get().strip() # 是否純種
sex = selector_1.css(
'.xinxi_neirong div:nth-child(1) .item_neirong div:nth-child(4) .c333::text').get().strip() # 貓咪性別
worming = selector_1.css(
'.xinxi_neirong div:nth-child(2) .item_neirong div:nth-child(2) .c333::text').get().strip() # 驅(qū)蟲情況
video = selector_1.css(
'.xinxi_neirong div:nth-child(2) .item_neirong div:nth-child(4) .c333::text').get().strip() # 能否視頻
dit = {
'地區(qū)': area,
'店名': shop,
'標題': title,
'價格': price,
'瀏覽次數(shù)': views,
'賣家承諾': promise,
'在售只數(shù)': sale,
'年齡': age,
'品種': breed,
'預防': safe,
'聯(lián)系人': people,
'聯(lián)系方式': phone,
'異地運費': fare,
'是否純種': purebred,
'貓咪性別': sex,
'驅(qū)蟲情況': worming,
'能否視頻': video,
'詳情頁': index_url,
}
print(area, shop, title, price, views, promise, sale, age, breed,
safe, people, phone, fare, purebred, sex, worming, video, index_url, sep=' | ')
四:結(jié)果展示
文章來源地址http://www.zghlxwxcb.cn/news/detail-694633.html
到了這里,關于爬蟲源碼---爬取小貓貓交易網(wǎng)站的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章,希望大家以后多多支持TOY模板網(wǎng)!