国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<table id="dihsi"></table>

Python多線程爬取鏈家房源，保存表格，實(shí)現(xiàn)數(shù)據(jù)可視化分析！

2年前作者：輕松學(xué)Python分類：Toy博客閱讀(27)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了Python多線程爬取鏈家房源，保存表格，實(shí)現(xiàn)數(shù)據(jù)可視化分析！。希望對大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

使用Python來爬取二手房源數(shù)據(jù)，并保存表格，實(shí)現(xiàn)數(shù)據(jù)分析！

軟件環(huán)境

Python 3.8

Pycharm

代碼展示

模塊

# 數(shù)據(jù)請求模塊 --> 第三方模塊, 需要安裝 pip install requests
import requests
# 解析數(shù)據(jù)模塊 --> 第三方模塊, 需要安裝 pip install parsel
import parsel
# csv模塊
import csv

?

創(chuàng)建文件

f = open('data.csv', mode='w', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=[
    '標(biāo)題',
    '小區(qū)',
    '區(qū)域',
    '售價(jià)',
    '單價(jià)',
    '戶型',
    '面積',
    '朝向',
    '裝修',
    '樓層',
    '年份',
    '建筑類型',
    '詳情頁',
])
csv_writer.writeheader()

?

發(fā)送請求, 模擬瀏覽器對于 url地址發(fā)送請求

模擬瀏覽器

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
}

?

請求網(wǎng)址/網(wǎng)站

url = 'https://cs.lianjia.com/ershoufang/'

# 完整源碼，視頻講解直接＋這個(gè)扣裙：279199867 免費(fèi)領(lǐng)取

?

發(fā)送請求

response = requests.get(url=url, headers=headers)
# <Response [200]> 響應(yīng)對象 200 狀態(tài)碼 表示請求成功
print(response)

?

獲取數(shù)據(jù), 獲取網(wǎng)頁源代碼 <獲取服務(wù)器返回響應(yīng)數(shù)據(jù)>

解析數(shù)據(jù), 提取我們想要的數(shù)據(jù)內(nèi)容

解析方法:

re: 對于字符串?dāng)?shù)據(jù)直接進(jìn)行解析提取
css: 根據(jù)標(biāo)簽屬性提取數(shù)據(jù)內(nèi)容
xpath: 根據(jù)標(biāo)簽節(jié)點(diǎn)提取數(shù)據(jù)內(nèi)容

使用css: 根據(jù)標(biāo)簽屬性提取數(shù)據(jù)內(nèi)容

把獲取到html字符串?dāng)?shù)據(jù), 轉(zhuǎn)成可解析對象

selector = parsel.Selector(response.text)

?

獲取所有房源信息所在li標(biāo)簽

lis = selector.css('.sellListContent li.clear')

?

for循環(huán)遍歷

for li in lis:
    """

    提取具體房源信息: 標(biāo)題 / 價(jià)格 / 位置 / 戶型...
    .title a --> 表示定位class類名為title下面a標(biāo)簽
    """
    title = li.css('.title a::text').get()  # 標(biāo)題
    info_list = li.css('.positionInfo a::text').getall()
    area = info_list[0]  # 小區(qū)名字
    area_1 = info_list[1]  # 地區(qū)
    totalPrice = li.css('.totalPrice span::text').get()  # 售價(jià)
    unitPrice = li.css('.unitPrice span::text').get().replace('元/平', '').replace(',', '')  # 單價(jià)
    houseInfo = li.css('.houseInfo::text').get().split(' | ')  # 信息
    houseType = houseInfo[0]  # 戶型
    houseArea = houseInfo[1].replace('平米', '')  # 面積
    houseFace = houseInfo[2]  # 朝向
    fitment = houseInfo[3]  # 裝修
    fool = houseInfo[4]  # 樓層

    if len(houseInfo) == 7 and '年' in houseInfo[5]:
        year = houseInfo[5].replace('年建', '')
    else:
        year = ''
    house = houseInfo[-1]  # 建筑類型
    href = li.css('.title a::attr(href)').get()  # 詳情頁
    dit = {
        '標(biāo)題': title,
        '小區(qū)': area,
        '區(qū)域': area_1,
        '售價(jià)': totalPrice,
        '單價(jià)': unitPrice,
        '戶型': houseType,
        '面積': houseArea,
        '朝向': houseFace,
        '裝修': fitment,
        '樓層': fool,
        '年份': year,
        '建筑類型': house,
        '詳情頁': href,
    }
    csv_writer.writerow(dit)
    print(dit)
    # print(title, area, area_1, totalPrice, unitPrice, houseType, houseArea, houseFace, fitment, fool, year, house, href)

?

多線程

導(dǎo)入模塊

import requests
import parsel
import re
import csv
# 線程池模塊
import concurrent.futures
import time

?

發(fā)送請求函數(shù)

def get_response(html_url):

:param html_url:
:return:
"""
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}
response = requests.get(url=html_url, headers=headers)
return response

?

獲取數(shù)據(jù)函數(shù)

def get_content(html_url):
    """
    :param html_url:
    :return:
    """
    response = get_response(html_url)
    html_data = get_response(link).text
    selector = parsel.Selector(response.text)
    select = parsel.Selector(html_data)
    lis = selector.css('.sellListContent li')
    content_list = []
    for li in lis:

        title = li.css('.title a::text').get()  # 標(biāo)題
        area = '-'.join(li.css('.positionInfo a::text').getall())  # 小區(qū)
        Price = li.css('.totalPrice span::text').get()  # 總價(jià)
        Price_1 = li.css('.unitPrice span::text').get().replace('元/平', '')  # 單價(jià)
        houseInfo = li.css('.houseInfo::text').get()  # 信息
        HouseType = houseInfo.split(' | ')[0]  # 戶型
        HouseArea = houseInfo.split(' | ')[1].replace('平米', '')  # 面積
        direction = houseInfo.split(' | ')[2].replace(' ', '')  # 朝向
        renovation = houseInfo.split(' | ')[3]  # 裝修
        floor_info = houseInfo.split(' | ')[4]
        floor = floor_info[:3]  # 樓層
        floor_num = re.findall('(\d+)層', floor_info)[0]  # 層數(shù)
        BuildingType = houseInfo.split(' | ')[-1]
        string = select.css('.comments div:nth-child(7) .comment_text::text').get()
        href = li.css('.title a::attr(href)').get()  # 詳情頁
        if len(houseInfo.split(' | ')) == 6:
            date = 'None'
        else:
            date = houseInfo.split(' | ')[5].replace('年建', '')  # 日期
        print(string)
        dit = {
            '標(biāo)題': title,
            '內(nèi)容': string,
            '小區(qū)': area,
            '總價(jià)': Price,
            '單價(jià)': Price_1,
            '戶型': HouseType,
            '面積': HouseArea,
            '朝向': direction,
            '裝修': renovation,
            '樓層': floor,
            '層數(shù)': floor_num,
            '建筑日期': date,
            '建筑類型': BuildingType,
            '詳情頁': href,
        }
        content_list.append(dit)
    return content_list

?

主函數(shù)

def main(page):
    """
    :param page:
    :return:
    """
    print(f'===============正在采集第{page}頁的數(shù)據(jù)內(nèi)容===============')
    url = f'https:///ershoufang/yuelu/p{page}/'
    content_list = get_content(html_url=url)
    for content in content_list:
        csv_writer.writerow(content)


if __name__ == '__main__':
    time_1 = time.time()
    link = 'http://******/article/149'
    # 創(chuàng)建文件
    f = open('data多線程.csv', mode='a', encoding='utf-8', newline='')
    csv_writer = csv.DictWriter(f, fieldnames=[
        '標(biāo)題',
        '內(nèi)容',
        '小區(qū)',
        '總價(jià)',
        '單價(jià)',
        '戶型',
        '面積',
        '朝向',
        '裝修',
        '樓層',
        '層數(shù)',
        '建筑日期',
        '建筑類型',
        '詳情頁',
    ])
    csv_writer.writeheader()

    # 線程池執(zhí)行器 max_workers 最大線程數(shù)
    exe = concurrent.futures.ThreadPoolExecutor(max_workers=10)
    for page in range(1, 11):
        exe.submit(main, page)
    exe.shutdown()
    time_2 = time.time()
    use_time = int(time_2 - time_1)
    # 總計(jì)耗時(shí): 9
    print('總計(jì)耗時(shí):', use_time)

?文章來源地址http://www.zghlxwxcb.cn/news/detail-445477.html

到了這里，關(guān)于Python多線程爬取鏈家房源，保存表格，實(shí)現(xiàn)數(shù)據(jù)可視化分析！的文章就介紹完了。如果您還想了解更多內(nèi)容，請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

Python爬蟲之Scrapy框架系列（21）——重寫媒體管道類實(shí)現(xiàn)保存圖片名字自定義及多頁爬取
spider文件中要拿到圖片列表并yield item； item里需要定義特殊的字段名：image_urls=scrapy.Field()； settings里設(shè)置IMAGES_STORE存儲路徑，如果路徑不存在，系統(tǒng)會幫助我們創(chuàng)建；使用默認(rèn)管道則在s
2024年02月10日
瀏覽(94)
python爬取招聘網(wǎng)信息并保存為csv文件
我們以獵聘網(wǎng)為例一、打開網(wǎng)站查找信息進(jìn)入后搜索想要爬取的崗位信息，右鍵選擇 “檢查” 進(jìn)入開發(fā)者界面點(diǎn)擊右上角的network，選擇doc 然后點(diǎn)擊圖中的搜索按鈕，輸入想要爬取的崗位名稱，然后刷新頁面，選擇搜索下邊的第二個(gè) 這個(gè)時(shí)候我們看到有我們需要的url，從
2024年02月09日
瀏覽(21)
【python】爬取斗魚直播照片保存到本地目錄【附源碼+文末免費(fèi)送書】
英杰社區(qū) https://bbs.csdn.net/topics/617804998 ??? 這篇博客將介紹如何使用Python編寫一個(gè)爬蟲程序，從斗魚直播網(wǎng)站上獲取圖片信息并保存到本地。我們將使用 request s 模塊發(fā)送HTTP請求和接收響應(yīng)，以及 os 模塊處理文件和目錄操作。 ??????? 如果出現(xiàn)模塊報(bào)錯(cuò) ??????? 進(jìn)入控
2024年02月04日
瀏覽(17)
【python】爬取知乎熱榜Top50保存到Excel文件中【附源碼】
歡迎來到英杰社區(qū) https://bbs.csdn.net/topics/617804998 ??? 這篇博客將介紹如何使用Python編寫一個(gè)爬蟲程序，從斗魚直播網(wǎng)站上獲取圖片信息并保存到本地。我們將使用 request s 模塊發(fā)送HTTP請求和接收響應(yīng)，以及 os 模塊處理文件和目錄操作。 ??????? 如果出現(xiàn)模塊報(bào)錯(cuò) ?????
2024年02月03日
瀏覽(43)
Python爬取讀書網(wǎng)的圖片鏈接和書名并保存在數(shù)據(jù)庫中
一個(gè)比較基礎(chǔ)且常見的爬蟲，寫下來用于記錄和鞏固相關(guān)知識。本項(xiàng)目采用 scrapy 框架進(jìn)行爬取，需要提前安裝由于需要保存數(shù)據(jù)到數(shù)據(jù)庫，因此需要下載 pymysql 進(jìn)行數(shù)據(jù)庫相關(guān)的操作同時(shí)在數(shù)據(jù)庫中創(chuàng)立對應(yīng)的表在終端進(jìn)入準(zhǔn)備存放項(xiàng)目的文件夾中 1、創(chuàng)建項(xiàng)目創(chuàng)建成功
2024年02月06日
瀏覽(22)
Python吉林長春二手房源爬蟲數(shù)據(jù)可視化系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)
?博主介紹：黃菊華老師《Vue.js入門與商城開發(fā)實(shí)戰(zhàn)》《微信小程序商城開發(fā)》圖書作者，CSDN博客專家，在線教育專家，CSDN鉆石講師；專注大學(xué)生畢業(yè)設(shè)計(jì)教育和輔導(dǎo)。所有項(xiàng)目都配有從入門到精通的基礎(chǔ)知識視頻課程，學(xué)習(xí)后應(yīng)對畢業(yè)設(shè)計(jì)答辯。項(xiàng)目配有對應(yīng)開發(fā)文檔、
2024年04月28日
瀏覽(17)
Python爬蟲入門之2022軟科中國大學(xué)排名爬取保存到csv文件
1、獲得“2022軟科中國大學(xué)排名”數(shù)據(jù)，從【軟科排名】2022年最新軟科中國大學(xué)排名|中國最好大學(xué)排名網(wǎng)頁中獲得排名數(shù)據(jù)信息，并將數(shù)據(jù)保存到csv文件中。 2、調(diào)用兩個(gè)CSV文件，將他們合成一個(gè)文件，并按排名先后對其進(jìn)行排序 3、將合并文件儲存為txt文件和json文件我們采
2024年02月07日
瀏覽(29)
（十五）python網(wǎng)絡(luò)爬蟲（理論+實(shí)戰(zhàn)）——實(shí)戰(zhàn)：eastmoney滬深京A股股票數(shù)據(jù)爬取，表格解析
目錄 7 爬取滬深京A股股票數(shù)據(jù) ? ? ? 7.1 爬取目標(biāo)
2023年04月22日
瀏覽(26)
簡單的用Python采集股票數(shù)據(jù)，保存表格后分析歷史數(shù)據(jù)
字節(jié)跳動如果上市，那么鐘老板將成為我國第一個(gè)世界首富趁著現(xiàn)在還沒上市，咱們提前學(xué)習(xí)一下用Python分析股票歷史數(shù)據(jù)，抱住粗大腿坐等起飛~ 好了話不多說，我們直接開始正文環(huán)境使用 Python 3.10 解釋器 Pycharm 編輯器模塊使用 requests — 數(shù)據(jù)請求模塊 csv - 保存csv表格
2024年02月05日
瀏覽(31)
Python北京二手房源爬蟲數(shù)據(jù)可視化分析大屏全屏系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn) 開題報(bào)告
?博主介紹：《Vue.js入門與商城開發(fā)實(shí)戰(zhàn)》《微信小程序商城開發(fā)》圖書作者，CSDN博客專家，在線教育專家，CSDN鉆石講師；專注大學(xué)生畢業(yè)設(shè)計(jì)教育和輔導(dǎo)。所有項(xiàng)目都配有從入門到精通的基礎(chǔ)知識視頻課程，免費(fèi) 項(xiàng)目配有對應(yīng)開發(fā)文檔、開題報(bào)告、任務(wù)書、PPT、論文模版
2024年02月04日
瀏覽(20)

<strike id="pa5ue"><var id="pa5ue"></var></strike>

<kbd id="pa5ue"></kbd>

<th id="pa5ue"><li id="pa5ue"></li></th>

<del id="pa5ue"><tr id="pa5ue"><td id="pa5ue"></td></tr></del>