python3 爬蟲相關(guān)學(xué)習(xí)3：response= requests.get(url)的各種屬性

這篇具有很好參考價值的文章主要介紹了python3 爬蟲相關(guān)學(xué)習(xí)3：response= requests.get(url)的各種屬性。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

1? requests.get(url) 的各種屬性，也就是response的各種屬性

2 下面進(jìn)行測試

2.1?response.text

1.2??response.content.decode()

1.2.1?response.content.decode() 或者??response.content.decode("utf-8")

1.2.2? ?response.content.decode("GBK") 報錯

1.2.3 關(guān)于編碼知識

1.3?response.url

?1.4 response.status_code

插入知識：網(wǎng)頁的基礎(chǔ)跳出命令

1.5 響應(yīng)頭response.headers

?1.6 response.request.headers

插入知識：cookies

?1.7 print(response.cookies)

?1.8 print(response.request._cookies)

3 帶參數(shù)的? requests.get(url,para)

3.1? requests.get(url,headers=headers)

3.1.1常見的headers錯誤寫法

3.2 帶參數(shù)headers后的輸出內(nèi)容

3.3 其他寫法

3.4 帶參數(shù)cookies 或者 headers 帶包含cookies(試驗不成功?)

3.5 response = requests.get(url, timeout=3)

3.6 response = requests.get(url,proxies=proxies)? 未試驗

1? requests.get(url) 的各種屬性，也就是response的各種屬性

接觸的requests模塊的常用功能：
一般把 response = requests.get(url)

requests.get(url)的各種屬性

print(response.text)
print(response.content.decode()) ? ? ? ? ?# 注意這里！
print(response.url) ? ? ? ? ? ? ? ? ? ? ? ? # 打印響應(yīng)的url
print(response.status_code) ? ? ? ? ? ? ? ? # 打印響應(yīng)的狀態(tài)碼
print(response.request.headers) ? ? ? ? ? ? # 打印響應(yīng)對象的請求頭
print(response.headers) ? ? ? ? ? ? ? ? ? ? # 打印響應(yīng)頭
print(response.request._cookies) ? ? ? ? ? ?# 打印請求攜帶的cookies
print(response.cookies) ? ? ? ? ? ? ? ? ? ? # 打印響應(yīng)中攜帶的cookies
?

2 下面進(jìn)行測試

respone =requests(),python,java,numpy


#E:\work\FangCloudV2\personal_space\2學(xué)習(xí)\python3\py3_test1.txt

import requests

url='https://baidu.com'
response=requests.get(url)
#print(response.text)
print(" ")
print(response.content.decode())
print(" ")
print(response.url)
print(" ")
print(response.status_code)
print(" ")
print(response.request.headers)
print(" ")
print(response.headers)
print(" ")
print(response.request._cookies)
print(" ")
print(response.cookies)

2.1?response.text

也就是 requests.get(url).text

response.text 是 requests模塊自動根據(jù)HTTP 頭部對響應(yīng)的編碼作出有根據(jù)的推測，推測的文本編碼
返回的類型是，str 類型

下面是print(response.text) 的結(jié)果
請求baidu.com 可以看到返回的,有一些是亂碼
英文是對的，亂碼是中文沒有解析正確導(dǎo)致。

respone =requests(),python,java,numpy

1.2??response.content.decode()

也就是 requests.get(url).content
response.content 返回的內(nèi)容，沒有指定解碼類型，需要解碼
缺省默認(rèn)的是 "utf-8"
返回的類型是，byte

1.2.1?response.content.decode() 或者??response.content.decode("utf-8")

print(response.content.decode()) ? ? ? ? ?# 注意這里！
要選擇合適的decode()
比如這里選擇??decode("utf-8")? 或者缺省默認(rèn)也是 utf-8, 漢字顯示就正常了不亂碼了
如果解碼選擇了 "GBK" 就報錯，不同地方需要注意

#E:\work\FangCloudV2\personal_space\2學(xué)習(xí)\python3\py3_test1.txt

import requests

url='https://baidu.com'
response=requests.get(url)
#print(response.text)
print(" ")
print(response.content.decode())

respone =requests(),python,java,numpy

1.2.2? ?response.content.decode("GBK") 報錯

respone =requests(),python,java,numpy

1.2.3 關(guān)于編碼知識

如下，還沒有整理完

編碼方式：?	將計算機(jī)的二級制數(shù)據(jù)一一映射設(shè)到各種文字符號	編碼字符集	二級制的不同數(shù)字---映射到某些文字符號的對應(yīng)集合/可查表/字典等
			不同的子集
ANSI編碼	系統(tǒng)默認(rèn)的編碼方式	中文GBK，英文ASCII ，繁體中文big5
也稱MBCS	不同操作系統(tǒng)下，對應(yīng)不同的編碼字符集
	一種ANSI碼不能保存大于1種以上的語言文字
unicode編碼	講世界上全部語言文字都保存在一種編碼內(nèi)	Unicode字符集	utf-8編碼，有bom無BOM			utf-8 兼容 ascii
			utf-16編碼
			utf-32編碼

GBXXX編碼	漢字編碼	GBXXX字符集	GB2312-80			和ascii沖突
			GBK	65536	2^16	雙字節(jié)編碼， (1個字節(jié)是8位2進(jìn)制，2個字節(jié)是16位) 編碼范圍是0x8140~0xFEFE 共收錄了21003個漢字，883個字符
			GB18030

ascii 編碼	美國的	ascii 字符集	標(biāo)準(zhǔn)ascii 字符集	7位	2^7	128個字符
			擴(kuò)展ascii 字符集	8位	2^8	256個字符
UCS-2, UCS-4		UCS-通用字符集	ISO			雙字節(jié)編碼


BIG5編碼		BIG5字符集				繁體漢字，感覺可以忘了這玩意


源字符集編碼

可執(zhí)行字符集編碼

? respone =requests(),python,java,numpy

1.3?response.url

response.url

respone =requests(),python,java,numpy

?1.4 response.status_code

也就是 requests.get(url).status_code
返回的狀態(tài)碼

respone =requests(),python,java,numpy

200	成功
302	跳轉(zhuǎn)，新的url在響應(yīng)的Location頭中給出
303	瀏覽器對于POST的響應(yīng)進(jìn)行重定向至新的url
307	瀏覽器對于GET的響應(yīng)重定向至新的url
403	資源不可用；服務(wù)器理解客戶的請求，但拒絕處理它（沒有權(quán)限）
404	找不到該頁面
500	服務(wù)器內(nèi)部錯誤
503	服務(wù)器由于維護(hù)或者負(fù)載過重未能應(yīng)答，在響應(yīng)中可能可能會攜帶Retry-After響應(yīng)頭；有可能是因為爬蟲頻繁訪問url，使服務(wù)器忽視爬蟲的請求，最終返回503響應(yīng)狀態(tài)碼
403	資源不可用；服務(wù)器理解客戶的請求，但拒絕處理它（沒有權(quán)限）

插入知識：網(wǎng)頁的基礎(chǔ)跳出命令

網(wǎng)頁上點右鍵，查看網(wǎng)頁源代碼---查看網(wǎng)頁的html格式網(wǎng)頁
（F12也可以打開）網(wǎng)頁上點右鍵，檢查-----調(diào)出網(wǎng)頁的控制臺頁面

檢查
空白處點檢查就是下面的界面
選中某一個元素如圖片，點擊檢查可以定位到當(dāng)前圖片的標(biāo)記位置

respone =requests(),python,java,numpy

檢查
按F12
看的內(nèi)容不一樣
之后可以補(bǔ)充一下，詳細(xì)的每個分欄下面的內(nèi)容
待補(bǔ)充

1.5 響應(yīng)頭response.headers

也就是 requests.get(url).headers
響應(yīng)頭反映的是，網(wǎng)站網(wǎng)頁的信息
比如一些時間，內(nèi)容，連接情況等
比較用python 爬蟲連接的，requests.get(url) 的

respone =requests(),python,java,numpy

而用PC的網(wǎng)頁打開的就不一樣

respone =requests(),python,java,numpy

?1.6 response.request.headers

也就是 requests.get(url).request.headers
響應(yīng)對象的請求頭，也就是訪問網(wǎng)頁的客戶端（有可能是pc /phone /或者python等）的情況
明顯看出，
爬蟲連接的，顯示 User-Agent': 'python-requests/2.30.0'
PC的瀏覽器上網(wǎng)的檢查里顯示的，顯示
Accept: text/html,application/xhtml+xml,application/xml

respone =requests(),python,java,numpy

其中 request.headers里面的這個可以當(dāng)作 request.headers的參數(shù)內(nèi)容

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36

插入知識：cookies

百度得：Cookie，有時也用其復(fù)數(shù)形式 Cookies。類型為“小型文本文件”，是某些網(wǎng)站為了辨別用戶身份，進(jìn)行Session跟蹤而儲存在用戶本地終端上的數(shù)據(jù)（通常經(jīng)過加密），由用戶客戶端計算機(jī)暫時或永久保存的信息?
網(wǎng)站經(jīng)常利用請求頭中的Cookie字段來做用戶訪問狀態(tài)的保持
也就是? 緩存
存儲在客戶端上，而不是網(wǎng)頁得服務(wù)器端！
可能存儲：用戶名，密碼，注冊信息等內(nèi)容
也可能只是一個唯一標(biāo)識得臨時ID，方便網(wǎng)站辨識你，方便再session內(nèi)繼續(xù)連接，而不用重復(fù)識別

?

?1.7 print(response.cookies)

也就是 requests.get(url).cookies
網(wǎng)頁得cookies

respone =requests(),python,java,numpy

?1.8 print(response.request._cookies)

也就是 requests.get(url).request.cookies

客戶端的cookies

respone =requests(),python,java,numpy

3 帶參數(shù)的? requests.get(url,para)

3.1? requests.get(url,headers=headers)

注意headers正確寫法
headers是一個字典，寫法是 {"":"" , "":""}? ，就是需要是? key:value 鍵值對
headers={"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"}

3.1.1常見的headers錯誤寫法

常見的headers錯誤寫法
headers="user-agentMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"
headers={"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"}
headers={"user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"}

這個報錯原因，是不懂 headers的寫法

headers是一個字典，寫法是 {"":"" , "":""}?

respone =requests(),python,java,numpy

3.2 帶參數(shù)headers后的輸出內(nèi)容

其中headers是直接拷貝的 pc 網(wǎng)頁檢查里的 requests.get(url)里 request.headers里最下面的內(nèi)容
目的是為了冒充pc客戶端的瀏覽器，訪問網(wǎng)頁的感覺

respone =requests(),python,java,numpy

#print(response.text)
#print(response.content.decode())

respone =requests(),python,java,numpy

?print(response.headers) respone =requests(),python,java,numpy

print(response.request.headers)

respone =requests(),python,java,numpy

3.3 其他寫法

? 查詢關(guān)鍵字/ 后面跟的是查詢字符串 / 請求參數(shù)
也可以用字典的寫法??params=kw

寫法1

url = 'https://www.baidu.com/s?wd=python'

寫法2

url = 'https://www.baidu.com/s?
kw = {'wd': 'python'}
response = requests.get(url, headers=headers, params=kw)

3.4 帶參數(shù)cookies 或者 headers 帶包含cookies(試驗不成功?)

?帶cookies有3種寫法

requests.get(url,headers=headers)? ? ?#其中heads字典里包含cookies

?requests.get(url,headers=headers, cookies)?

resp = requests.get(url, headers=headers, cookies=cookies_dict)

cookie一般是有過期時間的，一旦過期需要重新獲取
cookies = {"cookie的name":"cookie的value"}
將cookie字符串轉(zhuǎn)換為cookies參數(shù)所需的字典：
cookies_dict = {cookie.split('=')[0]:cookie.split('=')[-1] for cookie in cookies_str.split('; ')}
使用requests獲取的resposne對象，具有cookies屬性。該屬性值是一個cookieJar類型，包含了對方服務(wù)器設(shè)置在本地的cookie。我們?nèi)绾螌⑵滢D(zhuǎn)換為cookies字典呢？
其中response.cookies返回的就是cookieJar類型的對象
cookies_dict = requests.utils.dict_from_cookiejar(response.cookies)

cookies 試驗不成功?

respone =requests(),python,java,numpy

import requests

url="https://www.jianshu.com"

headers={"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36","cookies":cookie1}


response=requests.get(url,headers=headers)

print(response.text)

3.5 response = requests.get(url, timeout=3)

timeout 設(shè)置超時時間
timeout=3表示：發(fā)送請求后，3秒鐘內(nèi)返回響應(yīng)，否則就拋出異常

3.6 response = requests.get(url,proxies=proxies)? 未試驗

proxy 用來設(shè)置代理
正向代理，幫助客戶端轉(zhuǎn)發(fā)請求的，比如vpn
反向代理，幫助服務(wù)器轉(zhuǎn)服請求的，比如nginx 反向代理

代理分為3種
透明代理
匿名代理
高匿代理

REMOTE_ADDR = ? ? ? ? ?Proxy IP / proxy IP / proxy IP
HTTP_VIA = ? ? ? ? ? ? Proxy IP/ proxy IP / not determined
HTTP_X_FORWARDED_FOR = Your IP / proxy IP/ not determined

代理協(xié)議

http協(xié)議代理，目標(biāo)url是http
https協(xié)議代理
socks隧道代理 ,只簡單傳遞數(shù)據(jù)包，不關(guān)心是那種協(xié)議

proxy是字典形式的，如果有多個數(shù)據(jù)，會根據(jù)目標(biāo)url選擇一種proxy

response = requests.get(url, proxies=proxies)
proxies = {?
????"http": "http://12.34.56.79:9527",?
????"https": "https://12.34.56.79:9527",?
?? }
?文章來源地址http://www.zghlxwxcb.cn/news/detail-770665.html

到了這里，關(guān)于python3 爬蟲相關(guān)學(xué)習(xí)3：response= requests.get(url)的各種屬性的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！