畢設(shè)開(kāi)源大數(shù)據(jù)電影數(shù)據(jù)分析與可視化系統(tǒng)

這篇具有很好參考價(jià)值的文章主要介紹了畢設(shè)開(kāi)源大數(shù)據(jù)電影數(shù)據(jù)分析與可視化系統(tǒng)。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問(wèn)。

0 簡(jiǎn)介

今天學(xué)長(zhǎng)向大家介紹一個(gè)機(jī)器視覺(jué)的畢設(shè)項(xiàng)目

??基于大數(shù)據(jù)的電影數(shù)據(jù)分析與可視化系統(tǒng)

項(xiàng)目運(yùn)行效果(視頻)：

畢業(yè)設(shè)計(jì) 大數(shù)據(jù)電影評(píng)論情感分析

項(xiàng)目獲取：

https://gitee.com/assistant-a/project-sharing文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-841240.html

1 課題背景

研究中國(guó)用戶電影數(shù)據(jù),有助于窺探中國(guó)電影市場(chǎng)發(fā)展背后的規(guī)律,理解其來(lái)龍去脈,獲知未來(lái)走向。如今互聯(lián)網(wǎng)上中國(guó)用戶的電影數(shù)據(jù)集缺失,缺少如MovieLens、Kaggle等獨(dú)立機(jī)構(gòu)完成長(zhǎng)期收集電影數(shù)據(jù)工作,研究人員只能自行收集或下載來(lái)自國(guó)外的公共電影數(shù)據(jù)集,不具有本地屬性。
本項(xiàng)目爬取豆瓣網(wǎng)相關(guān)電影信息，建立數(shù)據(jù)庫(kù)。并根據(jù)此數(shù)據(jù)庫(kù)進(jìn)行了可視化分析,從中提取出大量數(shù)據(jù)背后信息,多維度分析了電影在公映時(shí)間、觀眾分布、類別占比、各國(guó)市場(chǎng)情況的關(guān)系,從評(píng)論詞云、文本情感角度挖掘單部電影呈現(xiàn)的規(guī)律。

2 效果實(shí)現(xiàn)

評(píng)論情感得分隨時(shí)間變化情況如下

畢設(shè)開(kāi)源大數(shù)據(jù)電影數(shù)據(jù)分析與可視化系統(tǒng),python

熱門(mén)評(píng)論列表情況如下
畢設(shè)開(kāi)源大數(shù)據(jù)電影數(shù)據(jù)分析與可視化系統(tǒng),python

3 爬蟲(chóng)及實(shí)現(xiàn)

簡(jiǎn)介
網(wǎng)絡(luò)爬蟲(chóng)是一種按照一定的規(guī)則，自動(dòng)地抓取萬(wàn)維網(wǎng)信息的程序或者腳本。爬蟲(chóng)對(duì)某一站點(diǎn)訪問(wèn)，如果可以訪問(wèn)就下載其中的網(wǎng)頁(yè)內(nèi)容，并且通過(guò)爬蟲(chóng)解析模塊解析得到的網(wǎng)頁(yè)鏈接，把這些鏈接作為之后的抓取目標(biāo)，并且在整個(gè)過(guò)程中完全不依賴用戶，自動(dòng)運(yùn)行。若不能訪問(wèn)則根據(jù)爬蟲(chóng)預(yù)先設(shè)定的策略進(jìn)行下一個(gè) URL的訪問(wèn)。在整個(gè)過(guò)程中爬蟲(chóng)會(huì)自動(dòng)進(jìn)行異步處理數(shù)據(jù)請(qǐng)求，返回網(wǎng)頁(yè)的抓取數(shù)據(jù)。在整個(gè)的爬蟲(chóng)運(yùn)行之前，用戶都可以自定義的添加代理，偽裝請(qǐng)求頭以便更好地獲取網(wǎng)頁(yè)數(shù)據(jù)。
爬蟲(chóng)流程圖如下：
畢設(shè)開(kāi)源大數(shù)據(jù)電影數(shù)據(jù)分析與可視化系統(tǒng),python
部分代碼實(shí)現(xiàn)

import re
import requests
import json
import time
from openpyxl import load_workbook, Workbook
from requests import RequestException


def get_detail_page(html):
    try:
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36"
        }
        cookies = {}
        response = requests.get(url=html, headers=headers, cookies=cookies)
        response.encoding = 'utf-8'
        if response.status_code == 200:
            return response.text
        return None
    except RequestException:
        print('獲取詳情頁(yè)錯(cuò)誤')
        time.sleep(3)
        return get_detail_page(html)

def parse_index_page(html):
    html = get_detail_page(html)
    html = html[12:-1]
    data = json.loads(html)
    id_list = []
    if data:
        for item in data:
            id_list.append(item['url'])
    return id_list

def parse_detail_page(data):
    html = get_detail_page(data)
    info = []
    # 獲取電影名稱
    name_pattern = re.compile('<span property="v:itemreviewed">(.*?)</span>')
    name = re.findall(name_pattern, html)
    info.append(name[0])
    # 獲取評(píng)分
    score_pattern = re.compile('rating_num" property="v:average">(.*?)</strong>')
    score = re.findall(score_pattern, html)
    info.append(score[0])
    # 獲取導(dǎo)演
    director_pattern = re.compile('rel="v:directedBy">(.*?)</a>')
    director = re.findall(director_pattern, html)
    print(director)
    info.append(str(director[0]))
    # 獲取演員
    actor_pattern = re.compile('rel="v:starring">(.*?)</a>')
    actor = re.findall(actor_pattern, html)
    info.append(str(actor[0]))
    # 獲取年份
    year_pattern = re.compile('<span class="year">\((.*?)\)</span>')
    year = re.findall(year_pattern, html)
    info.append(year[0])
    # 獲取類型
    type_pattern = re.compile('property="v:genre">(.*?)</span>')
    type = re.findall(type_pattern, html)
    info.append(type[0].split(' /')[0])
    # 獲取時(shí)長(zhǎng)
    try:
        time_pattern = re.compile('property="v:runtime" content="(.*?)"')
        time = re.findall(time_pattern, html)
        info.append(time[0])
    except:
        info.append('1')
    # 獲取語(yǔ)言
    language_pattern = re.compile('pl">語(yǔ)言:</span>(.*?)<br/>')
    language = re.findall(language_pattern, html)
    info.append(language[0].split(' /')[0])
    # 獲取評(píng)價(jià)人數(shù)
    comment_pattern = re.compile('property="v:votes">(.*?)</span>')
    comment = re.findall(comment_pattern, html)
    info.append(comment[0])
    # 獲取地區(qū)
    area_pattern = re.compile(' class="pl">制片國(guó)家/地區(qū):</span>(.*?)<br/>')
    area = re.findall(area_pattern, html)
    info.append(area[0].split(' /')[0])
    return info


html = 'https://movie.douban.com/j/search_subjects?type=movie&tag=%E5%86%B7%E9%97%A8%E4%BD%B3%E7%89%87&sort=rank&page_limit=20&page_start='


wc = Workbook()
sheet = wc.active
sheet.title = "New"
ws = wc['New']
sheet['A1'] = 'name'
sheet['B1'] = 'score'
sheet['C1'] = 'director'
sheet['D1'] = 'actor'
sheet['E1'] = 'year'
sheet['F1'] = 'type'
sheet['G1'] = 'time'
sheet['H1'] = 'language'
sheet['I1'] = 'comment'
sheet['J1'] = 'area'
ws = wc[wc.sheetnames[0]]
wc.save('豆瓣電影.xlsx')

ti = 1
for i in range(20, 50):
    print(i)
    html1 = html+str(i*20)
    u = parse_index_page(html1)
    print(u)
    for t in u:
        time.sleep(0.5)
        b = parse_detail_page(t)
        print(b)
        ws.append(b)
        wc.save('豆瓣電影.xlsx')
        ti += 1

4 Flask框架

簡(jiǎn)介
Flask是一個(gè)基于Werkzeug和Jinja2的輕量級(jí)Web應(yīng)用程序框架。與其他同類型框架相比，F(xiàn)lask的靈活性、輕便性和安全性更高，而且容易上手，它可以與MVC模式很好地結(jié)合進(jìn)行開(kāi)發(fā)。Flask也有強(qiáng)大的定制性，開(kāi)發(fā)者可以依據(jù)實(shí)際需要增加相應(yīng)的功能，在實(shí)現(xiàn)豐富的功能和擴(kuò)展的同時(shí)能夠保證核心功能的簡(jiǎn)單。Flask豐富的插件庫(kù)能夠讓用戶實(shí)現(xiàn)網(wǎng)站定制的個(gè)性化，從而開(kāi)發(fā)出功能強(qiáng)大的網(wǎng)站。

Flask項(xiàng)目結(jié)構(gòu)圖
畢設(shè)開(kāi)源大數(shù)據(jù)電影數(shù)據(jù)分析與可視化系統(tǒng),python
部分相關(guān)代碼

from flask import Flask, render_template, jsonify
import requests
from bs4 import BeautifulSoup
from snownlp import SnowNLP
import jieba
import numpy as np

app = Flask(__name__)
app.config.from_object('config')

# 中文停用詞
STOPWORDS = set(map(lambda x: x.strip(), open(r'./stopwords.txt', encoding='utf8').readlines()))

headers = {
    'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    'accept-language': "en-US,en;q=0.9,zh-CN;q=0.8,zh-TW;q=0.7,zh;q=0.6",
    'cookie': 'll="108296"; bid=ieDyF9S_Pvo; __utma=30149280.1219785301.1576592769.1576592769.1576592769.1; __utmc=30149280; __utmz=30149280.1576592769.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); _vwo_uuid_v2=DF618B52A6E9245858190AA370A98D7E4|0b4d39fcf413bf2c3e364ddad81e6a76; ct=y; dbcl2="40219042:K/CjqllYI3Y"; ck=FsDX; push_noty_num=0; push_doumail_num=0; douban-fav-remind=1; ap_v=0,6.0',
    'host': "search.douban.com",
    'referer': "https://movie.douban.com/",
    'sec-fetch-mode': "navigate",
    'sec-fetch-site': "same-site",
    'sec-fetch-user': "?1",
    'upgrade-insecure-requests': "1",
    'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36 Edg/79.0.309.56"
}

login_name = None


# --------------------- html render ---------------------
@app.route('/')
def index():
    return render_template('index.html')


@app.route('/search')
def search():
    return render_template('search.html')


@app.route('/search/<movie_name>')
def search2(movie_name):
    return render_template('search.html')


@app.route('/hot_movie')
def hot_movie():
    return render_template('hot_movie.html')


@app.route('/movie_category')
def movie_category():
    return render_template('movie_category.html')


# ------------------ ajax restful api -------------------
@app.route('/check_login')
def check_login():
    """判斷用戶是否登錄"""
    return jsonify({'username': login_name, 'login': login_name is not None})


@app.route('/register/<name>/<pasw>')
def register(name, pasw):
    conn = sqlite3.connect('user_info.db')
    cursor = conn.cursor()

    check_sql = "SELECT * FROM sqlite_master where type='table' and name='user'"
    cursor.execute(check_sql)
    results = cursor.fetchall()
    # 數(shù)據(jù)庫(kù)表不存在
    if len(results) == 0:
        # 創(chuàng)建數(shù)據(jù)庫(kù)表
        sql = """
                CREATE TABLE user(
                    name CHAR(256),
                    pasw CHAR(256)
                );
                """
        cursor.execute(sql)
        conn.commit()
        print('創(chuàng)建數(shù)據(jù)庫(kù)表成功！')

    sql = "INSERT INTO user (name, pasw) VALUES (?,?);"
    cursor.executemany(sql, [(name, pasw)])
    conn.commit()
    return jsonify({'info': '用戶注冊(cè)成功！', 'status': 'ok'})


@app.route('/login/<name>/<pasw>')
def login(name, pasw):
    global login_name
    conn = sqlite3.connect('user_info.db')
    cursor = conn.cursor()

    check_sql = "SELECT * FROM sqlite_master where type='table' and name='user'"
    cursor.execute(check_sql)
    results = cursor.fetchall()
    # 數(shù)據(jù)庫(kù)表不存在
    if len(results) == 0:
        # 創(chuàng)建數(shù)據(jù)庫(kù)表
        sql = """
                CREATE TABLE user(
                    name CHAR(256),
                    pasw CHAR(256)
                );
                """
        cursor.execute(sql)
        conn.commit()
        print('創(chuàng)建數(shù)據(jù)庫(kù)表成功！')

    sql = "select * from user where name='{}' and pasw='{}'".format(name, pasw)
    cursor.execute(sql)
    results = cursor.fetchall()

    login_name = name
    if len(results) > 0:
        return jsonify({'info': name + '用戶登錄成功！', 'status': 'ok'})
    else:
        return jsonify({'info': '當(dāng)前用戶不存在！', 'status': 'error'})

5 Ajax技術(shù)

Ajax 是一種獨(dú)立于 Web 服務(wù)器軟件的瀏覽器技術(shù)。

Ajax使用 JavaScript 向服務(wù)器提出請(qǐng)求并處理響應(yīng)而不阻塞的用戶核心對(duì)象XMLHttpRequest。通過(guò)這個(gè)對(duì)象，您的 JavaScript 可在不重載頁(yè)面的情況與 Web 服務(wù)器交換數(shù)據(jù)，即在不需要刷新頁(yè)面的情況下，就可以產(chǎn)生局部刷新的效果。

前端將需要的參數(shù)轉(zhuǎn)化為JSON字符串，再通過(guò)get/post方式向服務(wù)器發(fā)送一個(gè)請(qǐng)并將參數(shù)直接傳遞給后臺(tái)，后臺(tái)對(duì)前端請(qǐng)求做出反應(yīng)，接收數(shù)據(jù)，將數(shù)據(jù)作為條件查詢，但會(huì)j’son字符串格式的查詢結(jié)果集給前端，前端接收到后臺(tái)返回的數(shù)據(jù)進(jìn)行條件判斷并作出相應(yīng)的頁(yè)面展示。

$.ajax({
			    url: 'http://127.0.0.1:5000/updatePass',
				type: "POST",
				data:JSON.stringify(data.field),
				contentType: "application/json; charset=utf-8",
				dataType: "json",
				success: function(res) {
					if (res.code == 200) {
                        layer.msg(res.msg, {icon: 1});
                    } else {
                        layer.msg(res.msg, {icon: 2});
				    }
				}
			})

6 Echarts

ECharts（Enterprise Charts）是百度開(kāi)源的數(shù)據(jù)可視化工具，底層依賴輕量級(jí)Canvas庫(kù)ZRender。兼容了幾乎全部常用瀏覽器的特點(diǎn)，使它可廣泛用于PC客戶端和手機(jī)客戶端。ECharts能輔助開(kāi)發(fā)者整合用戶數(shù)據(jù)，創(chuàng)新性的完成個(gè)性化設(shè)置可視化圖表。支持折線圖（區(qū)域圖）、柱狀圖（條狀圖）、散點(diǎn)圖（氣泡圖）、K線圖、餅圖（環(huán)形圖）等，通過(guò)導(dǎo)入 js 庫(kù)在 Java Web 項(xiàng)目上運(yùn)行。

7 最后

項(xiàng)目分享：

https://gitee.com/assistant-a/project-sharing

到了這里，關(guān)于畢設(shè)開(kāi)源大數(shù)據(jù)電影數(shù)據(jù)分析與可視化系統(tǒng)的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！