国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<menuitem id="1jgd9"></menuitem>

<ul id="1jgd9"><abbr id="1jgd9"><video id="1jgd9"></video></abbr></ul>

b站爬蟲大作業(yè)（大二）--（利用selenium模塊爬取數(shù)據(jù)、利用pyecharts模塊制作可視化圖表）（bilibili數(shù)據(jù)可視化）

2年前作者：netexsy分類：Toy博客閱讀(26)違法舉報

這篇具有很好參考價值的文章主要介紹了b站爬蟲大作業(yè)（大二）--（利用selenium模塊爬取數(shù)據(jù)、利用pyecharts模塊制作可視化圖表）（bilibili數(shù)據(jù)可視化）。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

目錄

一、爬取前期準備工作

二、爬取目標

三、爬取過程（重點）

四、生成可視化圖表

一、爬取前期準備工作

1.安裝selenium模塊及其相關驅(qū)動

安裝selenium模塊（以PyCharm為例）

方法一：打開PyCharm，依次點擊 “文件”--->“設置”--->“python解釋器”--->選擇適合的環(huán)境(環(huán)境可以自己新建，也可以使用基礎環(huán)境，不過建議新建一個)--->“加號”進入如下頁面，

輸入“selenium”，選擇版本為“3.141.0”（這里一定要使用這個版本或者附近的版本，不要用最新的版本，最新的版本有些老的指令被廢掉了，使用起來不方便，用這個版本就行了）

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

方法二：打開命令行,進入自己指定的環(huán)境或者基礎環(huán)境，輸入“pip?install?selenium==3.141.0”，一樣也可以下載selenium模塊。

2.安裝chrome以及chromedriver（以chrome為例，firefox等等參考其他教程吧，這里我只用了chrome）

chrome可以隨便百度搜索安裝一下，但是要注意版本問題，最好使用114版本一下的，因為chromedriver的目前版本114以上的不好用，很少，chromedriver版本要和chrome版本對應，不然運行的時候會報錯。

我使用的是109版本的chrome

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

下載鏈接：Chromev109.0.5414.120下載-Chrome2023最新版下載_3DM軟件

安裝完成后大多數(shù)人會遇到一個問題------chrome會自動升級，它會自動升級到116版本甚至116版本以上，這個時候就需要我們手動設置來阻止chrome自動升級。

如何阻止chrome自動升級呢：

在下載完chrome后，先別急著打開chrome。一般下載完chrome后桌面會自動創(chuàng)建快捷方式，右鍵點擊快捷方式，然后點擊“打開文件所在位置”，進入程序所在根目錄

之后，按照操作，選擇>>>Google目錄

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

選擇>>>Update

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

右鍵>>>屬性

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

安全>>>SYSTEM>>>編輯

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

全部勾選拒絕

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

繼續(xù)選擇>>>高級

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

最重要的一步：

首先點擊禁用繼承

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

然后將所有類型為允許的條目刪除

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

最后檢查一下是否成功，點擊Update文件夾，發(fā)現(xiàn)無權訪問，那么就差不多成功了！

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

打開Chrome，點擊右上角的設置>>>幫助>>>關于Google Chrome，發(fā)現(xiàn)檢查更新報錯，那么就成功了。

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

接下來我們安裝chromedriver，這個東西可以幫助我們運行爬蟲代碼，實時測試網(wǎng)頁。

安裝地址：

CNPM Binaries Mirror

ChromeDriver - WebDriver for Chrome - Downloads

Download older versions of Google Chrome for Windows, Linux and Mac

注意：你的chrome瀏覽器是什么版本的，那你下載的這個chromedriver也要對應

我下載的chrome是109.0.5414.120

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

那么下載的chromedriver也要對應，不一定要完全一樣，選最接近的版本就行了。

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

下載好之后將我們的chromedriver放到和我們python安裝路徑相同的目錄下

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

其實這個時候差不多已經(jīng)可以正常運行了，但有些人可能運行不了，可能是對應的環(huán)境變量沒有添加（如果你把python安裝目錄已經(jīng)添加到環(huán)境變量里面就應該沒問題，有問題可能是沒有完全添加進去，導致這個chromedriver系統(tǒng)識別不到）

打開我們的查看高級系統(tǒng)設置，點擊環(huán)境變量，打開系統(tǒng)變量里面的path環(huán)境。添加我們的驅(qū)動路徑進去。

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

完成之后我們可以打開PyCharm運行一段代碼：

from selenium import webdriver  

if __name__ == '__main__':

    url = "https://www.bilibili.com/"
    driver = webdriver.Chrome()
    driver.get(url)

如果成功跳出chrome瀏覽器并顯示已經(jīng)到b站主頁，那么安裝就算完成了！

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

二、爬取目標

我們要爬取什么數(shù)據(jù)，并且該利用這些數(shù)據(jù)制作什么圖表。這應該是我們要最先明確的，有了目標才能事半功倍。

我們小組在進行爬取信息決策時，想出了以下5條爬取目標：

1.bilibili熱門榜top100視頻相關數(shù)據(jù)的爬取

需要爬取內(nèi)容：當前榜單top100視頻的標題，up主，觀看量，彈幕數(shù)，點贊數(shù)，投幣數(shù)，收藏數(shù)，轉(zhuǎn)發(fā)數(shù)。

分析點：比較觀看量，彈幕數(shù)，點贊數(shù)，投幣數(shù)，收藏數(shù)，轉(zhuǎn)發(fā)數(shù)的差異。

2.bilibili熱歌排行榜數(shù)據(jù)爬取

需要爬取內(nèi)容：各個種類歌曲排行榜，MV排行榜。

分析點：統(tǒng)計各個種類歌曲的播放量，得出b站用戶最愛哪種類型的歌曲。

3.bilibili美食區(qū)視頻標簽的數(shù)據(jù)爬取

需要爬取內(nèi)容：視頻標題，各個視頻的相關標簽。

分析點：分析標簽詞條出現(xiàn)頻次，分析當前最熱標簽詞條。

4.單一視頻的評論數(shù)據(jù)爬取

需要爬取內(nèi)容：選擇一個內(nèi)容新穎的視頻，爬取其評論信息

分析點：分析各個評論傳達出的情感態(tài)度，進行情感態(tài)度詞條的統(tǒng)計，分析出該視頻內(nèi)容的好壞。

5.單一視頻一周內(nèi)各個參數(shù)數(shù)據(jù)的爬取

需要爬取內(nèi)容：該視頻一周內(nèi)的觀看量, 彈幕數(shù), 點贊數(shù), 投幣數(shù), 收藏數(shù), 轉(zhuǎn)發(fā)數(shù)。

分析點：分析該視頻一周內(nèi)各個數(shù)據(jù)的變化，推斷出該視頻的熱度以及受歡迎度。

有了目標后，我們就可以開始干活了！

三、爬取過程

1.bilibili熱門榜top100視頻相關數(shù)據(jù)的爬取

由于這個top100榜單肯定是實時變化的，所以我們的這個榜單肯定不一樣，但是爬取的過程是一樣的，只是爬取下來的數(shù)據(jù)不一樣。

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

這是我寫這篇文章時的榜單

接下來是爬取過程：

先說一下我的爬取過程思考：

第一步：先將這個總頁面中的100個視頻的url（鏈接）都爬取下來，寫入一個文件url.csv里面

第二部：循環(huán)讀取url.csv文件里面的100個url（鏈接），進入每個視頻的頁面，然后將每個視頻的具體信息爬取下來，寫入一個文件top100.csv里面。

這樣視頻的相關信息就被我順利爬取下來了（其實也可以直接在讀取到每個視頻url時就進入每個視頻頁面直接爬取信息，這就省略了第一步。但是當時我沒有考慮這個，堅持一步一步來，所以這里就分享這個稍微麻煩一點的方法）

1.直接給出代碼，下面這個是爬取top100所有視頻鏈接的代碼。

import csv                              # csv模塊在生成、寫入文件時用到  
from selenium import webdriver          # selenium模塊下的webdriver是爬蟲要用的

if __name__ == '__main__':              # 入口

    url = 'https://www.bilibili.com/v/popular/rank/all'    # top100總頁面鏈接
    driver = webdriver.Chrome()                            # 啟動chromediver進行調(diào)試
    driver.get(url)                                        # 傳入url

    csv_file = "data/top100_url.csv"                       # 新建一個名為top100_url的csv類型的文件放在data文件夾下（data文件夾可以自己新建，代碼運行后也會自動生成）


    with open(csv_file, 'a',newline='', encoding='utf-8') as f:      # 打開剛剛定義的文件，'a'是追加模式，也可以換成'w'，'w'為重寫模式，encoding為編碼，設置為'utf-8'
        writer = csv.writer(f)                 # 自定義一個名為writer的變量，這句就直接抄，不解釋
        writer.writerow(['b站實時排行榜前一百視頻url','up主昵稱'])   # 寫入列標題

        i = 1                                                           # i的初始值設置為1
        print()
        while(i < 101):                                                 # 循環(huán)爬取100個視頻的url
            all_datas = driver.find_elements_by_xpath(f'//*[@id="app"]/div/div[2]/div[2]/ul/li[{i}]/div/div[2]/a')                 # 這里是通過xpath來定位視頻的鏈接，all_datas返回的值為當前視頻所在小模塊的所有信息，當然也包括了url
            all_up_name = driver.find_elements_by_xpath(f'//*[@id="app"]/div/div[2]/div[2]/ul/li[{i}]/div/div[2]/div/a/span')       # 這里通過xpath爬取up的名字，all_up_name返回值為一個list
            href_values = [element.get_attribute("href") for element in all_datas]    # 從all_datas中提取每個視頻的url（鏈接）
            up_name = all_up_name[0].text                                             # 從all_ip_data中提取up主名字
            writer.writerow([href_values[0], up_name])               # 將每個視頻的url與對應up主名字寫入文件
            print(f'第{i}個視頻已經(jīng)爬取完成')                         # 顯示進度
            i += 1

要注意的是：在爬取的過程中，需要我們實時加載頁面，因為頁面如果不加載，數(shù)據(jù)就無法被系統(tǒng)檢測到，會導致程序卡死，這個時候我們大概率只能重新來過，如果你看懂了代碼，稍微修改一下就能夠?qū)崿F(xiàn)在對應的地方繼續(xù)爬蟲或者重復爬蟲。

爬取下來的數(shù)據(jù)差不多是這樣的（這里只展示了前一部分，一共應該有100行）----第一列是url，第二列是up主名字，其實只需要第一列，第二列沒啥用。

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

歐克！現(xiàn)在我們第一步就算完成了

2.直接給出代碼，這里是根據(jù)上面我們爬下來的url（鏈接）循環(huán)爬取每個視頻的相關信息

# 導入模塊

import csv
from selenium import webdriver
import pandas as pd


# 提取上一步爬取下來的文件中的url

all_urls = pd.read_csv('./data/top100_url.csv')                   # 利用pandas模塊讀取csv文件
    all_video_urls = all_urls['b站實時排行榜前一百視頻url']        # 利用類標題獲取url所在列的信息，all_video_urls返回值類型為（pandas.core.series.Series），這個理解為一個表格就行了
    all_video_up = all_urls['up主昵稱']                           # 同上這里獲取up名字


    driver = webdriver.Chrome()                                  # 啟動chromedriver
    csv_file = "data/top100_details.csv"                         # 新建一個文件，存儲所有視頻的相關信息

    with open(csv_file, 'a', newline='', encoding='utf-8') as f:           # 打開文件，循環(huán)寫入信息
        writer = csv.writer(f)
        writer.writerow(['視頻標題', 'up主', '觀看量', '彈幕數(shù)', '點贊數(shù)', '投幣數(shù)', '收藏數(shù)', '轉(zhuǎn)發(fā)數(shù)'])               # 我們要爬取的視頻信息包括（1.視頻標題 2.up主 3.觀看量 4.彈幕數(shù) 5.點贊數(shù) 6.投幣數(shù) 7.收藏數(shù) 8.轉(zhuǎn)發(fā)數(shù)）  

        i = 0
        for url in all_video_urls:                                   # 循環(huán)遍歷all_video_urls中的每一個url（鏈接）
            driver.get(url)                                          # 打開每一個鏈接
    ###############################################################################
            data_title = driver.find_elements_by_xpath('// *[ @ id = "viewbox_report"] / h1')
            title = data_title[0].text  ###### 視頻標題
    ###############################################################################               
            up = all_video_up[i]   ###### up主
    ###############################################################################
            data_watch_dm = driver.find_elements_by_xpath('// *[ @ id = "viewbox_report"] / div / div / span')                                      # data_watch_dm包含了播放量和彈幕數(shù)的相關數(shù)據(jù)

    ######################################################################
    ####因為爬取出來的數(shù)據(jù)都是數(shù)字加上漢字“萬”                          ###
    ####所以我處理了一下，將“萬”全部都變成×10000，使得所有數(shù)據(jù)都為數(shù)字形式###
    #####################################################################

            watch = data_watch_dm[0].text  ###### 播放量
            # 處理漢字‘萬’
            if watch[-1] in '萬':
                num = float(watch[0:-1])
                num *= 10000
                watch = str(num)
   ###############################################################################
            dm = data_watch_dm[1].text  ###### 彈幕數(shù)
            # 處理漢字‘萬’
            if dm[-1] in '萬':
                num = float(dm[0:-1])
                num *= 10000
                dm = str(num)
   ###############################################################################
            data_dz_tb_sc_fx = driver.find_elements_by_xpath('// *[ @ id = "arc_toolbar_report"] / div[1] / div')                      # data_dz_tb_sc_fx包含了帶你贊數(shù)，投幣數(shù)，收藏數(shù)，分享數(shù)的相關數(shù)據(jù)

    ######################################################################
    ####因為爬取出來的數(shù)據(jù)都是數(shù)字加上漢字“萬”                           ###
    ####所以我處理了一下，將“萬”全部都變成×10000，使得所有數(shù)據(jù)都為數(shù)字形式###
    #####################################################################

            video_like_info = data_dz_tb_sc_fx[0].text  ###### 點贊數(shù)
            # 處理漢字‘萬’
            if video_like_info[-1] in '萬':
                num = float(video_like_info[0:-1])
                num *= 10000
                video_like_info = str(num)
   ###############################################################################
            video_coin_info = data_dz_tb_sc_fx[1].text  ###### 投幣數(shù)
            # 處理漢字‘萬’
            if video_coin_info[-1] in '萬':
                num = float(video_coin_info[0:-1])
                num *= 10000
                video_coin_info = str(num)
   ###############################################################################
            video_fav_info = data_dz_tb_sc_fx[2].text  ###### 收藏數(shù)
            # 處理漢字‘萬’
            if video_fav_info[-1] in '萬':
                num = float(video_fav_info[0:-1])
                num *= 10000
                video_fav_info = str(num)
   ###############################################################################
            video_share_info = data_dz_tb_sc_fx[3].text  ###### 分享數(shù)
            # 處理漢字‘萬’
            if video_share_info[-1] in '萬':
                num = float(video_share_info[0:-1])
                num *= 10000
                video_share_info = str(num)
    ###############################################################################

            row = [title, up, watch, dm, video_like_info, video_coin_info,
                   video_fav_info, video_share_info]                         # 將數(shù)據(jù)打包為一個list（列表）
            writer.writerow(row)                                          # 寫入文件
            print(f'第{i + 1}個視頻已經(jīng)爬取成功！')                        # 提示進度
            i += 1

到這里所有得視頻信息就都被我們爬取下來了，可以檢查一下爬取的數(shù)據(jù)集，下面是我爬取的部分數(shù)據(jù)。

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

3.下面是完整代碼，可以直接復制使用，爬取的是b站熱門榜top100視頻的相關鏈接與具體數(shù)據(jù)，代碼會生成兩個csv文件-------第一個是top100_url.csv，存儲100個視頻的鏈接；第二個是top100_details.csv，存儲100個視頻的具體參數(shù)及數(shù)據(jù)。（包括 1.視頻標題 2.up主 3.觀看量 4.彈幕數(shù) 5.點贊數(shù) 6.投幣數(shù) 7.收藏數(shù) 8.轉(zhuǎn)發(fā)數(shù)）（數(shù)據(jù)是粗數(shù)據(jù)，只能精確到萬位，對于大作業(yè)來說應該到這里就差不多了）

import csv
from selenium import webdriver
import pandas as pd

if __name__ == '__main__':

    url = 'https://www.bilibili.com/v/popular/rank/all'
    driver = webdriver.Chrome()
    driver.get(url)

    csv_file = "data/top100_url.csv"


    with open(csv_file, 'a',newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['b站實時排行榜前一百視頻url','up主昵稱'])

        i = 1
        print()
        while(i < 101):
            all_datas = driver.find_elements_by_xpath(f'//*[@id="app"]/div/div[2]/div[2]/ul/li[{i}]/div/div[2]/a')
            all_up_name = driver.find_elements_by_xpath(f'//*[@id="app"]/div/div[2]/div[2]/ul/li[{i}]/div/div[2]/div/a/span')
            href_values = [element.get_attribute("href") for element in all_datas]
            up_name = all_up_name[0].text
            writer.writerow([href_values[0], up_name])
            print(f'第{i}個視頻已經(jīng)爬取完成')
            i += 1

########################################################################################################################

# 提取上一步爬取下來的文件中的url
    all_urls = pd.read_csv('./data/top100_url.csv')
    all_video_urls = all_urls['b站實時排行榜前一百視頻url']
    all_video_up = all_urls['up主昵稱']

    driver = webdriver.Chrome()
    csv_file = "data/top100_details.csv"

    with open(csv_file, 'a', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['視頻標題', 'up主', '觀看量', '彈幕數(shù)', '點贊數(shù)', '投幣數(shù)', '收藏數(shù)', '轉(zhuǎn)發(fā)數(shù)'])

        i = 0
        for url in all_video_urls:
            driver.get(url)

            data_title = driver.find_elements_by_xpath('// *[ @ id = "viewbox_report"] / h1')
            title = data_title[0].text  ###### 視頻標題

            up = all_video_up[i]   ###### up主

            data_watch_dm = driver.find_elements_by_xpath('// *[ @ id = "viewbox_report"] / div / div / span')


            watch = data_watch_dm[0].text  ###### 播放量
            if watch[-1] in '萬':
                num = float(watch[0:-1])
                num *= 10000
                watch = str(num)

            dm = data_watch_dm[1].text  ###### 彈幕數(shù)
            if dm[-1] in '萬':
                num = float(dm[0:-1])
                num *= 10000
                dm = str(num)

            data_dz_tb_sc_fx = driver.find_elements_by_xpath('// *[ @ id = "arc_toolbar_report"] / div[1] / div')

            video_like_info = data_dz_tb_sc_fx[0].text  ###### 點贊數(shù)
            if video_like_info[-1] in '萬':
                num = float(video_like_info[0:-1])
                num *= 10000
                video_like_info = str(num)

            video_coin_info = data_dz_tb_sc_fx[1].text  ###### 投幣數(shù)
            if video_coin_info[-1] in '萬':
                num = float(video_coin_info[0:-1])
                num *= 10000
                video_coin_info = str(num)

            video_fav_info = data_dz_tb_sc_fx[2].text  ###### 收藏數(shù)
            if video_fav_info[-1] in '萬':
                num = float(video_fav_info[0:-1])
                num *= 10000
                video_fav_info = str(num)

            video_share_info = data_dz_tb_sc_fx[3].text  ###### 分享數(shù)
            if video_share_info[-1] in '萬':
                num = float(video_share_info[0:-1])
                num *= 10000
                video_share_info = str(num)

            row = [title, up, watch, dm, video_like_info, video_coin_info,
                   video_fav_info, video_share_info]
            writer.writerow(row)
            print(f'第{i + 1}個視頻已經(jīng)爬取成功！')
            i += 1

歐克歐克！到這里所有的數(shù)據(jù)就爬取完成了，第一個目標就算完成了，看到這里，你應該差不多了解這個爬蟲的具體過程（上面的注釋詳細看看）

如果你不了解的話，可以找我問問，看到了問題私信的話我會回的??

接下來直接放代碼

bilibili熱歌排行榜數(shù)據(jù)爬取

import csv
from selenium import webdriver

if __name__ == '__main__':

    url = "https://www.bilibili.com/v/musicplus/video"
    driver = webdriver.Chrome()
    driver.get(url)

    csv_file = "data_analysis/music_hank.csv"
    i = 50
    music_type_list = []
    while(i < 120):
        data_type_elements = driver.find_elements_by_xpath(f'//*[@id="main"]/div/div[2]/ul[2]/li[{int(i/5)}]')
        data_type = data_type_elements[0].text
        i += 1
        print(i)
    # print(music_type_list)
        with open(csv_file, 'a', newline='', encoding='utf-8') as f:
            writer = csv.writer(f)
            writer.writerow([data_type])

            j = 1
            while j:
                data_bf_element = driver.find_elements_by_xpath(f'//*[@id="main"]/div/div[3]/div[{j}]/div/a/div[1]/div[1]/span[1]')
                if not data_bf_element:
                    break
                else:
                    data_bf = data_bf_element[0].text
                    if data_bf[-1] in '萬':
                        num = float(data_bf[0:-1])
                        num *= 10000
                        data_bf = str(num)
                    writer.writerow([data_bf])
                    print(j)
                    j += 1

注意：這個代碼不要直接點運行，推薦使用“調(diào)試”，因為程序運行很快，我們來不及加載頁面。

這個代碼爬取的是音樂區(qū)---->最熱里面的“全部曲風”的每一個曲風的前5頁的所有視頻的播放量

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

爬取下來又整理之后的數(shù)據(jù)集差不多是這樣的（數(shù)據(jù)全部都是播放量）（部分）

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

那么，第二個爬取目標也就完成了

bilibili美食區(qū)視頻標簽的數(shù)據(jù)爬取

import csv
from selenium import webdriver
import pandas as pd

if __name__ == '__main__':

    url = 'https://www.bilibili.com/v/food'
    driver = webdriver.Chrome()
    driver.get(url)

    csv_file = "data/food_part_url.csv"
    with open(csv_file, 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['欄目', '鏈接'])

        i = 3
        while(i < 8):
            all_part_name = (driver.find_elements_by_xpath(f'//*[@id="i_cecream"]/div/main/div/div[{i}]/div/div[1]/div[1]/a/span'))[0].text
            all_part_url = driver.find_elements_by_xpath(f'//*[@id="i_cecream"]/div/main/div/div[{i}]/div/div[1]/div[2]/a')
            href_values = [element.get_attribute("href") for element in all_part_url]  # 欄目鏈接
            writer.writerow([all_part_name, href_values[0]])
            i += 1
######################################################################################################################################################
    df = pd.read_csv("data/food_part_url.csv")
    all_urls = df['鏈接']
    name = df['欄目']
    driver = webdriver.Chrome()
    csv_file = "data/food_part_video_url.csv"

    with open(csv_file, 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['欄目', '視頻標題', '視頻鏈接'])

        j = 0
        for url in all_urls:
            driver.get(url)
            i = 1
            while(i < 51):
                video_name = (driver.find_elements_by_xpath(f'//*[@id="i_cecream"]/div/main/div/div[3]/div[2]/div[{i}]/div[2]/div/div/h3'))[0].text
                video_element = driver.find_elements_by_xpath(f'//*[@id="i_cecream"]/div/main/div/div[3]/div[2]/div[{i}]/div[2]/div/div/h3/a')
                href_values = [element.get_attribute("href") for element in video_element]  # 視頻鏈接
                video_url = href_values[0]
                writer.writerow([name[j], video_name, video_url])
                i += 1
            j += 1
#######################################################################################################################################################
    df = pd.read_csv("data/food_part_video_url.csv")
    all_urls = df['視頻鏈接']
    # print(all_urls)
    driver = webdriver.Chrome()
    csv_file = 'data/food_video_label.csv'

    with open(csv_file, 'a', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['序號', '標簽'])

        xh = 1
        for url in all_urls:
            driver.get(url)
            # //*[@id="v_tag"]/div
            label_str = (driver.find_elements_by_xpath('//*[@id="v_tag"]/div'))[0].text.split('\n')
            label_len = len(label_str)
            i = 1
            while(i < label_len):
                label = label_str[i]
                writer.writerow([xh, label])
                i += 1
                xh += 1

    pass

這個代碼爬取的是美食專區(qū)各欄目下視頻的標簽

這個代碼也不能直接運行，需要調(diào)試，不會私信我，也可以自己試試看。

爬取的數(shù)據(jù)集（部分）

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

單一視頻的評論數(shù)據(jù)爬取

from selenium import webdriver
import csv
from selenium.webdriver.common.action_chains import ActionChains


if __name__ == '__main__':

    url = 'https://www.bilibili.com/video/BV1Dh4y1B7hL/?vd_source=aa7ea87c008d6da6708ad822cc3ba7e0'
    driver = webdriver.Chrome()
    driver.get(url)
    count_comment = driver.find_elements_by_xpath('//*[@id="comment"]/div/div/div/div[1]/div/ul/li[1]/span[2]')
    num = int(count_comment[0].text)

    csv_file = "data/comment.csv"
    with open(csv_file, 'a', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['序號', '評論者', '評論內(nèi)容'])

        i = 1
        while(i < num):
            comment_data = driver.find_elements_by_xpath(f'//*[@id="comment"]/div/div/div/div[2]/div[2]/div[{i}]/div[2]/div[2]/div[3]/span/span')
            commenter_data = driver.find_elements_by_xpath(f'//*[@id="comment"]/div/div/div/div[2]/div[2]/div[{i}]/div[2]/div[2]/div[2]/div')

            comment = comment_data[0].text
            commenter = commenter_data[0].text
            # print(comment)
            # print(commenter)
            xh = str(i)
            row = [xh, commenter, comment]
            writer.writerow(row)
            print(f'成功爬取第{i}條評論')
            print(commenter)
            i += 1

這個代碼是爬取某個視頻的相關評論（視頻自己選），將url替換一下就可以了

在爬取時需要不斷加載評論，不然代碼就立刻停止了，卡在未刷新的地方。

爬取的數(shù)據(jù)集（部分）

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

單一視頻一周內(nèi)各個參數(shù)數(shù)據(jù)的爬取

from selenium import webdriver
import csv
import datetime
from time import strftime

if __name__ == '__main__':

    url = "https://www.bilibili.com/video/BV1vw411r7yL/?spm_id_from=333.337.search-card.all.click&vd_source=5bfdd9c5aae2db8e974ef5d8db543de8"
    driver = webdriver.Chrome()
    driver.get(url)

    csv_file = "data_analysis/jl_change.csv"
    with open(csv_file, 'a', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['視頻標題', '觀看量', '彈幕數(shù)', '點贊數(shù)', '投幣數(shù)', '收藏數(shù)', '轉(zhuǎn)發(fā)數(shù)', ['時間']])
        all_datas_part0 = driver.find_elements_by_xpath('// *[ @ id = "viewbox_report"] / h1')
        data_title = all_datas_part0[0].text  ###### 視頻標題

        all_datas_part2 = driver.find_elements_by_xpath('// *[ @ id = "viewbox_report"] / div / div / span')
        data_watch = all_datas_part2[0].text  ###### 播放量
        if data_watch[-1] in '萬':
            num = float(data_watch[0:-1])
            num *= 10000
            data_watch = str(num)

        data_dm = all_datas_part2[1].text  ###### 彈幕數(shù)
        if data_dm[-1] in '萬':
            num = float(data_dm[0:-1])
            num *= 10000
            data_dm = str(num)

        all_datas_part3 = driver.find_elements_by_xpath('// *[ @ id = "arc_toolbar_report"] / div[1] / div')
        data_video_like_info = all_datas_part3[0].text  ###### 點贊數(shù)
        if data_video_like_info[-1] in '萬':
            num = float(data_video_like_info[0:-1])
            num *= 10000
            data_video_like_info = str(num)

        data_video_coin_info = all_datas_part3[1].text  ###### 投幣數(shù)
        if data_video_coin_info[-1] in '萬':
            num = float(data_video_coin_info[0:-1])
            num *= 10000
            data_video_coin_info = str(num)

        data_video_fav_info = all_datas_part3[2].text  ###### 收藏數(shù)
        if data_video_fav_info[-1] in '萬':
            num = float(data_video_fav_info[0:-1])
            num *= 10000
            data_video_fav_info = str(num)

        data_video_share_info = all_datas_part3[3].text  ###### 分享數(shù)
        if data_video_share_info[-1] in '萬':
            num = float(data_video_share_info[0:-1])
            num *= 10000
            data_video_share_info = str(num)

        data_time = datetime.datetime.now().strftime("%Y-%m-%d")
        # print(data_time)

        row = [data_title, data_watch, data_dm, data_video_like_info, data_video_coin_info,data_video_fav_info, data_video_share_info]
        writer.writerow(row)

這個和第一個差不多，直接爬取了一個指定的視頻的所有視頻，直接輸入url。

但是后面這個要做數(shù)據(jù)可視化，我們在一周內(nèi)的每一天都要運行一次這個代碼，這樣一周之后就會有七組數(shù)據(jù)。

到此為止，數(shù)據(jù)的爬取工作就基本完成了。

四、生成可視化圖表

爬取完數(shù)據(jù)后，接下來最后一步就是根據(jù)數(shù)據(jù)生成可視化圖表了

制作圖表最常用的是pyecharts模塊和matplotlib模塊。這里我使用的是pyecharts模塊，也沒有做的很豪華，只是大概的生成圖表。

回顧一下我們的目標：

1.bilibili熱門榜top100視頻相關數(shù)據(jù)的爬取

需要爬取內(nèi)容：當前榜單top100視頻的標題，up主，觀看量，彈幕數(shù)，點贊數(shù)，投幣數(shù)，收藏數(shù)，轉(zhuǎn)發(fā)數(shù)。

分析點：比較觀看量，彈幕數(shù)，點贊數(shù)，投幣數(shù)，收藏數(shù)，轉(zhuǎn)發(fā)數(shù)的差異。

2.bilibili熱歌排行榜數(shù)據(jù)爬取

需要爬取內(nèi)容：各個種類歌曲排行榜，MV排行榜。

分析點：統(tǒng)計各個種類歌曲的播放量，得出b站用戶最愛哪種類型的歌曲。

3.bilibili美食區(qū)視頻標簽的數(shù)據(jù)爬取

需要爬取內(nèi)容：視頻標題，各個視頻的相關標簽。

分析點：分析標簽詞條出現(xiàn)頻次，分析當前最熱標簽詞條。

4.單一視頻的評論數(shù)據(jù)爬取

需要爬取內(nèi)容：選擇一個內(nèi)容新穎的視頻，爬取其評論信息

分析點：分析各個評論傳達出的情感態(tài)度，進行情感態(tài)度詞條的統(tǒng)計，分析出該視頻內(nèi)容的好壞。

5.單一視頻一周內(nèi)各個參數(shù)數(shù)據(jù)的爬取

需要爬取內(nèi)容：該視頻一周內(nèi)的觀看量, 彈幕數(shù), 點贊數(shù), 投幣數(shù), 收藏數(shù), 轉(zhuǎn)發(fā)數(shù)。

分析點：分析該視頻一周內(nèi)各個數(shù)據(jù)的變化，推斷出該視頻的熱度以及受歡迎度。

歐克！現(xiàn)在讓我們開始制作圖表！

1.bilibili熱門榜top100視頻相關數(shù)據(jù)（柱狀圖）

import pandas as pd
from pyecharts import options as opts
from pyecharts.charts import Bar
from pyecharts.globals import ThemeType
from pyecharts.render import make_snapshot
from snapshot_selenium import snapshot

if __name__ == '__main__':
    df = pd.read_csv("data/top100_details.csv")           # 打開文件，使用參數(shù)df接收所有數(shù)據(jù)
    df_title = df['視頻標題']                             # 提取視頻標題
    df_watch = df['觀看量']                               # 提取觀看量
    df_dm = df['彈幕數(shù)']                                  # 提取彈幕數(shù)
    df_dz = df['點贊數(shù)']                                  # 提取點贊數(shù)
    df_tb = df['投幣數(shù)']                                  # 提取投幣數(shù)
    df_sc = df['收藏數(shù)']                                  # 提取收藏數(shù)
    df_zf = df['轉(zhuǎn)發(fā)數(shù)']                                  # 提取轉(zhuǎn)發(fā)數(shù)

    # 為所有參數(shù)各自新建一個空list
    Title = []                                            
    Watch = []
    Dm = []
    Dz = []
    Tb = []
    Sc = []
    Zf = []

    # 將所有數(shù)據(jù)寫入各自的list
    for element in df_title:
        Title.append(element)

    for element in df_watch:
        Watch.append(element)

    for element in df_dm:
        Dm.append(element)

    for element in df_dz:
        Dz.append(element)

    for element in df_tb:
        Tb.append(element)

    for element in df_sc:
        Sc.append(element)

    for element in df_zf:
        Zf.append(element)

    # 自定義bar1為一個Bar類型，并設置 圖表主題/寬度/高度
    bar1 = Bar(init_opts=opts.InitOpts(theme=ThemeType.VINTAGE, width="4500px", height="1200px"))

    bar1.add_xaxis(Title)           # x軸參數(shù)為各個視頻的名稱

    # 設置圖表標題
    bar1.set_global_opts(
        title_opts=opts.TitleOpts(title="b站熱門榜top100數(shù)據(jù)統(tǒng)計柱狀圖", pos_left="50%", pos_top="5%"),
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=45)))

    # 設置y軸參數(shù)
    # bar1.add_yaxis('播放量', Watch)
    bar1.add_yaxis('彈幕數(shù)', Dm)
    bar1.add_yaxis('點贊數(shù)', Dz)
    bar1.add_yaxis('投幣數(shù)', Tb)
    bar1.add_yaxis('收藏數(shù)', Sc)
    bar1.add_yaxis('轉(zhuǎn)發(fā)數(shù)', Zf)

    # 生成html文件
    bar1.render('b站熱門榜top100數(shù)據(jù)統(tǒng)計柱狀圖.html')

    # 制作快照，這個代碼會生成png圖片，但是要安裝其他模塊（make_snaposhot模塊/snapshot_selenium模塊/snapshot模塊）(其實截圖就行了，可選）
    make_snapshot(snapshot, "b站熱門榜top100數(shù)據(jù)統(tǒng)計柱狀圖.html", "./picture/b站熱門榜top100數(shù)據(jù)統(tǒng)計柱狀圖.png")

代碼運行需要個7s~8s左右，主要是make_snapshot（）運行時間長。

代碼會生成一個html文件和一個png圖片

圖片參考如下：

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

接下來直接放代碼

bilibili熱歌排行榜數(shù)據(jù)? （漏斗圖）

import pandas as pd
from pyecharts import options as opts
from pyecharts.charts import Funnel
from pyecharts.render import make_snapshot
from snapshot_selenium import snapshot
from pyecharts.globals import ThemeType

if __name__ == '__main__':

    df = pd.read_csv('data_analysis/music_hank_new.csv', encoding='gbk')
    type_sums = df.sum()
    print(type_sums)
    df_type_sum = list(zip(type_sums.index.to_list(),type_sums.to_list()))
    sort_type_sum = sorted(df_type_sum, key=lambda x:x[1])
    funnel = Funnel(init_opts=opts.InitOpts(theme=ThemeType.VINTAGE))
    funnel.add("", sort_type_sum,
               gap=0.9,
               label_opts=opts.LabelOpts(formatter=" : n5n3t3z%"),
               )
    funnel.set_global_opts(
        title_opts=opts.TitleOpts(title="熱歌榜各曲風音樂播放排行榜漏斗圖", pos_left="center"),
        legend_opts=opts.LegendOpts(pos_left='70%',pos_bottom='40%'),  # 將圖例放到右側(cè)
    )

    funnel.render('熱歌榜各曲風音樂播放排行榜漏斗圖.html')
    make_snapshot(snapshot, "熱歌榜各曲風音樂播放排行榜漏斗圖.html", "./picture/熱歌榜各曲風音樂播放排行榜漏斗圖.png")

參考如下：

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

bilibili美食區(qū)視頻標簽的數(shù)據(jù)? （云圖）

import pyecharts.options as opts
from pyecharts.charts import WordCloud
import pandas as pd
from pyecharts.globals import SymbolType
from pyecharts.globals import ThemeType
from pyecharts.render import make_snapshot
from snapshot_selenium import snapshot


if __name__ == '__main__':

    df = pd.read_csv("data_analysis/food_video_label.csv")
    # print(df)
    df_label = df.groupby('標簽').size().sort_values(ascending=False)
    # print(df_label)
    datas = list(zip(df_label.index.to_list(),df_label.to_list()))
    # print(datas)
    cloud = WordCloud(init_opts=opts.InitOpts(theme=ThemeType.VINTAGE))
    cloud.add('', datas,shape='circle')
    cloud.set_global_opts(
        title_opts=opts.TitleOpts(title="b站美食熱點標簽統(tǒng)計分析云圖", pos_left="37%", pos_top="3%")
    )
    cloud.render("b站美食熱點標簽統(tǒng)計分析云圖.html")
    make_snapshot(snapshot, "b站美食熱點標簽統(tǒng)計分析云圖.html", "./picture/b站美食熱點標簽統(tǒng)計分析云圖.png")

參考如下：

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

單一視頻的評論數(shù)據(jù)爬取? （餅狀圖）

import pandas as pd
import numpy as np
from pyecharts import options as opts
from pyecharts.charts import Pie
from pyecharts.globals import ThemeType
from pyecharts.render import make_snapshot
from snapshot_selenium import snapshot

if __name__ == '__main__':

    df = pd.read_csv('./data_analysis/comments_finish.csv', encoding='gbk')

    df_mood = df.groupby('感情').size().sort_values(ascending=False)
    datas = list(zip(df_mood.index.to_list(),df_mood.to_list()))
    # print(datas)
    title = "有關'AI越來越“變態(tài)”了，10大AI神器聞所未聞！'的相關評論的情感分析餅狀圖"
    pie = Pie(init_opts=opts.InitOpts(theme=ThemeType.VINTAGE))
    pie.add("", datas)
    pie.set_global_opts(
        title_opts=opts.TitleOpts(title=title),
        legend_opts=opts.LegendOpts(pos_right="right")
    )
    pie.set_series_opts(label_opts=opts.LabelOpts(formatter=": {c}: n5n3t3z%"))
    pie.render('AI_視頻情感態(tài)度分析統(tǒng)計餅狀圖.html')

    make_snapshot(snapshot, "AI_視頻情感態(tài)度分析統(tǒng)計餅狀圖.html", "./picture/AI_視頻情感態(tài)度分析統(tǒng)計餅狀圖.png")

參考如下：

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

單一視頻一周內(nèi)各個參數(shù)數(shù)據(jù)的爬取? ?（折線圖）

import pandas as pd
from pyecharts import options as opts
from pyecharts.charts import Line
from pyecharts.globals import ThemeType
from pyecharts.render import make_snapshot
from snapshot_selenium import snapshot

if __name__ == '__main__':
    df = pd.read_csv("data_analysis/jl_change.csv", encoding='gbk')
    df_watch = df['觀看量']
    df_dm = df['彈幕數(shù)']
    df_dz = df['點贊數(shù)']
    df_tb = df['投幣數(shù)']
    df_sc = df['收藏數(shù)']
    df_zf = df['轉(zhuǎn)發(fā)數(shù)']
    df_time = df['時間']
    Watch = []
    Dm = []
    Dz = []
    Tb = []
    Sc = []
    Zf = []
    Sj = []
    for element in df_watch:
        Watch.append(element)
    for element in df_dm:
        Dm.append(element)
    for element in df_dz:
        Dz.append(element)
    for element in df_tb:
        Tb.append(element)
    for element in df_sc:
        Sc.append(element)
    for element in df_zf:
        Zf.append(element)
    for element in df_time:
        Sj.append(element)

    line = Line(init_opts=opts.InitOpts(theme=ThemeType.VINTAGE))
    line.add_xaxis(Sj)
    # line.add_yaxis('播放量', Watch)
    line.add_yaxis('彈幕數(shù)', Dm)
    line.add_yaxis('點贊數(shù)', Dz)
    line.add_yaxis('投幣數(shù)', Tb)
    line.add_yaxis('收藏數(shù)', Sc)
    line.add_yaxis('轉(zhuǎn)發(fā)數(shù)', Zf)

    line.set_global_opts(
        title_opts=opts.TitleOpts(title='星穹鐵道鏡流角色pv劍出無回各指數(shù)變化趨勢折線圖',pos_left="25%", pos_top="6%"),
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=45), name="時間"),
        yaxis_opts=opts.AxisOpts(name="參數(shù)")
    )

    line.render('星穹鐵道鏡流角色pv劍出無回各指數(shù)變化趨勢折線圖.html')
    make_snapshot(snapshot, '星穹鐵道鏡流角色pv劍出無回各指數(shù)變化趨勢折線圖.html', 'picture/星穹鐵道鏡流角色pv劍出無回各指數(shù)變化趨勢折線圖.png')

參考如下：

爬取數(shù)據(jù)pyecharts可視化,爬蟲,selenium,測試工具,python,echarts,課程設計

到此位置，所有的數(shù)據(jù)都經(jīng)過了可視化操作，生成了5張直觀的圖表，整個大作業(yè)到此也就結(jié)束了！文章來源地址http://www.zghlxwxcb.cn/news/detail-836688.html

如果需要相關文件或者有什么問題@我

到了這里，關于b站爬蟲大作業(yè)（大二）--（利用selenium模塊爬取數(shù)據(jù)、利用pyecharts模塊制作可視化圖表）（bilibili數(shù)據(jù)可視化）的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權/違法違規(guī)/事實不符，請點擊違法舉報進行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領支付寶紅包贊助服務器費用

Python爬蟲實戰(zhàn)：selenium爬取電商平臺商品數(shù)據(jù)(1)
def index_page(page): “”\\\" 抓取索引頁 :param page: 頁碼 “”\\\" print(‘正在爬取第’, str(page), ‘頁數(shù)據(jù)’) try: url = ‘https://search.jd.com/Search?keyword=iPhoneev=exbrand_Apple’ driver.get(url) if page 1: input = driver.find_element_by_xpath(‘//*[@id=“J_bottomPage”]/span[2]/input’) button = driver.find_element_by_xpath(‘
2024年04月28日
瀏覽(39)
【爬蟲學習】1、利用get方法對豆瓣電影數(shù)據(jù)進行爬取
??作者：白日參商 ???♂?個人主頁：白日參商主頁 ??堅持分析平時學習到的項目以及學習到的軟件開發(fā)知識，和大家一起努力呀?。?！ ????加油！加油！加油！加油 ??歡迎評論 ??點贊???? 收藏 ??加關注+！ 1、導入python庫 2、獲取豆瓣電影的第一頁的數(shù)據(jù) 并
2024年02月12日
瀏覽(37)
【Python爬蟲】基于selenium庫爬取京東商品數(shù)據(jù)——以“七夕”為例
小白學爬蟲，費了一番功夫終于成功了哈哈！本文將結(jié)合本人踩雷經(jīng)歷，分享給各位學友~ 用寫入方式打開名為data的csv文件，并確定將要提取的五項數(shù)據(jù)。上面第一行代碼值得一提，driver =? webdriver.Edge()括號內(nèi)為Edge瀏覽器驅(qū)動程序地址，需要在Edge瀏覽器設置中查找Edge瀏覽器
2024年02月06日
瀏覽(27)
爬蟲與數(shù)據(jù)分析項目實戰(zhàn)2.1 Selenium爬取Boss招聘信息
完成： 1.爬取信息 2.基于爬取結(jié)果篩選符合條件的信息 ? ?崗位名稱、薪資、崗位要求、地區(qū)、公司名稱、公司規(guī)模、細節(jié)鏈接 3.篩選base杭州的崗位保存到csv文件中 But容易出現(xiàn)網(wǎng)絡不穩(wěn)定造成的無法定位元素所在位置的情況，小范圍爬取可以 4.基于csv分析后續(xù)
2024年02月08日
瀏覽(26)
Python網(wǎng)絡爬蟲逆向分析爬取動態(tài)網(wǎng)頁、使用Selenium庫爬取動態(tài)網(wǎng)頁、?編輯將數(shù)據(jù)存儲入MongoDB數(shù)據(jù)庫
目錄逆向分析爬取動態(tài)網(wǎng)頁了解靜態(tài)網(wǎng)頁和動態(tài)網(wǎng)頁區(qū)別 1.判斷靜態(tài)網(wǎng)頁 ?2.判斷動態(tài)網(wǎng)頁 ?逆向分析爬取動態(tài)網(wǎng)頁使用Selenium庫爬取動態(tài)網(wǎng)頁安裝Selenium庫以及下載瀏覽器補丁頁面等待 ?頁面操作 1.填充表單 2.執(zhí)行JavaScript 元素選取 Selenium庫的find_element的語法使用格式如下
2024年02月15日
瀏覽(65)
Python網(wǎng)絡爬蟲爬取招聘數(shù)據(jù)（利用python簡單零基礎）可做可視化
身為一個求職者，或者說是對于未來的職業(yè)規(guī)劃還沒明確目標的大學生來說，獲取各大招聘網(wǎng)上的數(shù)據(jù)對我們自身的發(fā)展具有的幫助作用，本文章就簡答零基礎的來介紹一下如何爬取招聘數(shù)據(jù)。我們以東莞的Python數(shù)據(jù)分析師這個職位來做一個簡單的分析，頁面如下圖所示：
2024年02月03日
瀏覽(27)
Python爬蟲Selenium手動接管Edge爬取裁判文書網(wǎng)“環(huán)境污染”數(shù)據(jù)（Mac環(huán)境）
目標數(shù)據(jù)：爬取從2007年到2022年，各地級市中級法院歷年關于“環(huán)境污染”的裁判文書數(shù)量。由于裁判文書網(wǎng)需要登錄，Selenium手動接管爬取可避免頻繁登錄造成的封號風險。 Selenium如何手動接管Edge瀏覽器： 1、打開終端，將命令 /Applications/Microsoft Edge.app/Contents/MacOS/Microsof
2023年04月09日
瀏覽(24)
利用爬蟲爬取圖片并保存
1 問題在工作中，有時會遇到需要相當多的圖片資源，可是如何才能在短時間內(nèi)獲得大量的圖片資源呢？ 2 方法我們知道，網(wǎng)頁中每一張圖片都是一個連接，所以我們提出利用爬蟲爬取網(wǎng)頁圖片并下載保存下來。首先通過網(wǎng)絡搜索找到需要的圖片集，將其中圖片鏈接復制然
2024年02月13日
瀏覽(19)
selenium爬蟲框架爬取某壁紙網(wǎng)站
基礎知識環(huán)境配置開始爬蟲簡單分析目標網(wǎng)站寫函數(shù) 獲取瀏覽器對象：下載每一張圖片：獲取每一頁的源代碼：運行print_result_every_page python基礎語法面向?qū)ο蠡A html基礎 xpath基礎 selenium框架的基本使用 request庫 lxml庫 ? ? ?3.安裝瀏覽器xpath插件 ? ? ? ? 打開谷歌瀏覽
2024年02月05日
瀏覽(24)
Scrapy爬取數(shù)據(jù)，使用Django+PyEcharts實現(xiàn)可視化大屏
使用Scrapy進行數(shù)據(jù)爬取，MySQL存儲數(shù)據(jù)，Django寫后端服務，PyEcharts制作可視化圖表，效果如下。項目下載地址：Scrapy爬取數(shù)據(jù)，并使用Django框架+PyEcharts實現(xiàn)可視化大屏發(fā)現(xiàn)每個模塊都有詳情頁，可以通過點擊首頁各個模塊的標簽，進行訪問。基于數(shù)據(jù)可視化的游客行為分析
2024年02月09日
瀏覽(24)

<address id="dbojt"><blockquote id="dbojt"></blockquote></address>

<pre id="dbojt"></pre>