學(xué)習(xí)主要內(nèi)容:使用Python定時(shí)在非節(jié)假日爬取東方財(cái)富股行情數(shù)據(jù)存入數(shù)據(jù)庫(kù)中,
東方財(cái)富行情中心網(wǎng)地址如下:
http://quote.eastmoney.com/center/gridlist.html#hs_a_board
東方財(cái)富行情中心網(wǎng)地址
通過(guò)點(diǎn)擊該網(wǎng)站的下一頁(yè)發(fā)現(xiàn),網(wǎng)頁(yè)內(nèi)容在變化,但是網(wǎng)站的 URL 卻不變,說(shuō)明這里使用了 Ajax 技術(shù),動(dòng)態(tài)從服務(wù)器拉取數(shù)據(jù),這種方式的好處是可以在不重新加載整幅網(wǎng)頁(yè)的情況下更新部分?jǐn)?shù)據(jù),減輕網(wǎng)絡(luò)負(fù)荷,加快頁(yè)面加載速度。
通過(guò) F12 來(lái)查看網(wǎng)絡(luò)請(qǐng)求情況,可以很容易的發(fā)現(xiàn),網(wǎng)頁(yè)上的數(shù)據(jù)都是通過(guò)如下地址請(qǐng)求的:
http://38.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112409036039385296142_1658838397275&pn=3&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1658838404848
Json及URL地址
接下來(lái)同過(guò)多次,來(lái)觀察該地址的變化情況,發(fā)現(xiàn)其中的pn參數(shù)代表這頁(yè)數(shù),于是通過(guò)修改&pn=后面的數(shù)字來(lái)訪問(wèn)不同頁(yè)面對(duì)應(yīng)的數(shù)據(jù):
import requests
import json
import os
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"
}
json_url = "http://48.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112402508937289440778_1658838703304&pn=1&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1658838703305"
res = requests.get(json_url, headers=headers)
接下來(lái)通過(guò)觀察返回的數(shù)據(jù),可以得出數(shù)據(jù)并不是標(biāo)準(zhǔn)的 json 數(shù)據(jù),需要轉(zhuǎn)換一下數(shù)據(jù)格式,于是先進(jìn)行 json 化:
result = res.text.split("jQuery112402508937289440778_1658838703304")[1].split("(")[1].split(");")[0]
result_json = json.loads(result)
print(result_json)
這樣數(shù)據(jù)就整齊多了,所有的股票數(shù)據(jù)都在data.diff下面,我們只需要編寫(xiě)解析函數(shù)即可
返回各參數(shù)對(duì)應(yīng)含義。
先準(zhǔn)備一個(gè)存儲(chǔ)函數(shù):
def save_data(data):
# "股票代碼,股票名稱,最新價(jià),漲跌幅,漲跌額,成交量(手),成交額,振幅,換手率,市盈率,量比,最高,最低,今開(kāi),昨收,市凈率"
for i in data:
Code = i["f12"]
Name = i["f14"]
Close = i['f2'] if i["f2"] != "-" else None
ChangePercent = i["f3"] if i["f3"] != "-" else None
Change = i['f4'] if i["f4"] != "-" else None
Volume = i['f5'] if i["f5"] != "-" else None
Amount = i['f6'] if i["f6"] != "-" else None
Amplitude = i['f7'] if i["f7"] != "-" else None
TurnoverRate = i['f8'] if i["f8"] != "-" else None
PERation = i['f9'] if i["f9"] != "-" else None
VolumeRate = i['f10'] if i["f10"] != "-" else None
Hign = i['f15'] if i["f15"] != "-" else None
Low = i['f16'] if i["f16"] != "-" else None
Open = i['f17'] if i["f17"] != "-" else None
PreviousClose = i['f18'] if i["f18"] != "-" else None
PB = i['f23'] if i["f23"] != "-" else None
然后再把前面處理好的 json 數(shù)據(jù)傳入
stock_data = result_json['data']['diff']
save_data(stock_data)
這樣我們就得到了第一頁(yè)的股票數(shù)據(jù),最后我們只需要循環(huán)抓取所有網(wǎng)頁(yè)即可。
代碼如下:
def craw_data():
stock_data = []
json_url1= "http://72.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112406903204148811937_1678420818118&pn=%s&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1678420818127" % str(
1)
res1 = requests.get(json_url1, headers=headers)
result1 = res1.text.split("jQuery112406903204148811937_1678420818118")[1].split("(")[1].split(");")[0]
result_json1 = json.loads(result1)
total_value = result_json1['data']['total']
maxn = math.ceil(total_value/20)
for i in range(1, maxn + 1):
json_url = "http://72.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112406903204148811937_1678420818118&pn=%s&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1678420818127" % str(
i)
res = requests.get(json_url, headers=headers)
result = res.text.split("jQuery112406903204148811937_1678420818118")[1].split("(")[1].split(");")[0]
result_json = json.loads(result)
stock_data.extend(result_json['data']['diff'])
return stock_data
最后,針對(duì)此塊,全部python代碼整合如下,在非節(jié)假日每天定時(shí)15:05分爬取,并存入Mysql數(shù)據(jù)庫(kù)中:
import math
import requests
import json
import db
import time
import holidays
import datetime
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"
}
def save_data(data):
# "股票代碼,股票名稱,最新價(jià),漲跌幅,漲跌額,成交量(手),成交額,振幅,換手率,市盈率,量比,最高,最低,今開(kāi),昨收,市凈率"
for i in data:
Code = i["f12"]
Name = i["f14"]
Close = i['f2'] if i["f2"] != "-" else None
ChangePercent = i["f3"] if i["f3"] != "-" else None
Change = i['f4'] if i["f4"] != "-" else None
Volume = i['f5'] if i["f5"] != "-" else None
Amount = i['f6'] if i["f6"] != "-" else None
Amplitude = i['f7'] if i["f7"] != "-" else None
TurnoverRate = i['f8'] if i["f8"] != "-" else None
PERation = i['f9'] if i["f9"] != "-" else None
VolumeRate = i['f10'] if i["f10"] != "-" else None
Hign = i['f15'] if i["f15"] != "-" else None
Low = i['f16'] if i["f16"] != "-" else None
Open = i['f17'] if i["f17"] != "-" else None
PreviousClose = i['f18'] if i["f18"] != "-" else None
PB = i['f23'] if i["f23"] != "-" else None
insert_sql = """
insert t_stock_code_price(code, name, close, change_percent, `change`, volume, amount, amplitude, turnover_rate, peration, volume_rate, hign, low, open, previous_close, pb, create_time)
values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
"""
val = (Code, Name, Close, ChangePercent, Change, Volume, Amount, Amplitude,
TurnoverRate, PERation, VolumeRate, Hign, Low, Open, PreviousClose, PB, datetime.now().strftime('%F'))
db.insert_or_update_data(insert_sql, val)
def craw_data():
stock_data = []
json_url1= "http://72.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112406903204148811937_1678420818118&pn=%s&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1678420818127" % str(
1)
res1 = requests.get(json_url1, headers=headers)
result1 = res1.text.split("jQuery112406903204148811937_1678420818118")[1].split("(")[1].split(");")[0]
result_json1 = json.loads(result1)
total_value = result_json1['data']['total']
maxn = math.ceil(total_value/20)
for i in range(1, maxn + 1):
json_url = "http://72.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112406903204148811937_1678420818118&pn=%s&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1678420818127" % str(
i)
res = requests.get(json_url, headers=headers)
result = res.text.split("jQuery112406903204148811937_1678420818118")[1].split("(")[1].split(");")[0]
result_json = json.loads(result)
stock_data.extend(result_json['data']['diff'])
return stock_data
def craw():
# print("運(yùn)行爬取任務(wù)...")
stock_data = craw_data()
save_data(stock_data)
def is_business_day(date):
cn_holidays = holidays.CountryHoliday('CN')
return date.weekday() < 5 and date not in cn_holidays
def schedule_task():
while True:
now = datetime.datetime.now()
target_time = now.replace(hour=15, minute=10, second=0, microsecond=0)
if now >= target_time and is_business_day(now.date()):
craw()
# 計(jì)算下一個(gè)工作日的日期
next_day = now + datetime.timedelta(days=1)
while not is_business_day(next_day.date()):
next_day += datetime.timedelta(days=1)
# 計(jì)算下一個(gè)工作日的目標(biāo)時(shí)間
next_target_time = next_day.replace(hour=15, minute=5, second=0, microsecond=0)
# 計(jì)算下一個(gè)任務(wù)運(yùn)行的等待時(shí)間
sleep_time = (next_target_time - datetime.datetime.now()).total_seconds()
# 休眠等待下一個(gè)任務(wù)運(yùn)行時(shí)間
time.sleep(sleep_time)
if __name__ == "__main__":
schedule_task()
其中db.py及mysql相關(guān)配置如下:
import pymysql
def get_conn():
return pymysql.connect(
host='localhost',
user='root',
password='1234',
database='test',
port=3306
)
def query_data(sql):
conn = get_conn()
try:
cursor = conn.cursor()
cursor.execute(sql)
return cursor.fetchall()
finally:
conn.close()
def insert_or_update_data(sql, val):
conn = get_conn()
try:
cursor = conn.cursor()
cursor.execute(sql,val)
conn.commit()
finally:
conn.close()
mysql相關(guān)表結(jié)構(gòu)如下:文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-794558.html
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for t_stock_code_price
-- ----------------------------
DROP TABLE IF EXISTS `t_stock_code_price`;
CREATE TABLE `t_stock_code_price` (
`id` bigint NOT NULL AUTO_INCREMENT,
`code` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL COMMENT '股票代碼',
`name` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL COMMENT '股票名稱',
`close` double DEFAULT NULL COMMENT '最新價(jià)',
`change_percent` double DEFAULT NULL COMMENT '漲跌幅',
`change` double DEFAULT NULL COMMENT '漲跌額',
`volume` double DEFAULT NULL COMMENT '成交量(手)',
`amount` double DEFAULT NULL COMMENT '成交額',
`amplitude` double DEFAULT NULL COMMENT '振幅',
`turnover_rate` double DEFAULT NULL COMMENT '換手率',
`peration` double DEFAULT NULL COMMENT '市盈率',
`volume_rate` double DEFAULT NULL COMMENT '量比',
`hign` double DEFAULT NULL COMMENT '最高',
`low` double DEFAULT NULL COMMENT '最低',
`open` double DEFAULT NULL COMMENT '今開(kāi)',
`previous_close` double DEFAULT NULL COMMENT '昨收',
`pb` double DEFAULT NULL COMMENT '市凈率',
`create_time` varchar(64) NOT NULL COMMENT '寫(xiě)入時(shí)間',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
SET FOREIGN_KEY_CHECKS = 1;
以上即是全部?jī)?nèi)容。文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-794558.html
到了這里,關(guān)于Python定時(shí)爬取東方財(cái)富行情數(shù)據(jù)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!