研究Python爬蟲(chóng),網(wǎng)上很多爬取pexels圖片的案例,我下載下來(lái)運(yùn)行沒(méi)有成功,總量有各種各樣的問(wèn)題。
作為菜鳥(niǎo)初學(xué)者,網(wǎng)上的各個(gè)案例代碼對(duì)我還是有不少啟發(fā)作用,我用搜索引擎+chatGPT逐步對(duì)代碼進(jìn)行了完善。
最終運(yùn)行成功。特此記錄。
運(yùn)行環(huán)境:Win10,Python3.10、Google Chrome111.0.5563.148(正式版本)
?文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-410904.html
1 import urllib.request 2 from bs4 import BeautifulSoup 3 import os 4 import html 5 import requests 6 import urllib.parse 7 8 path = r"C:\Users\xiaochao\pexels" 9 url_lists = ['https://www.pexels.com/search/book/?page={}'.format(i) for i in range(1, 21)] #頁(yè)面范圍請(qǐng)自行根據(jù)實(shí)際情況修改。 10 headers = { 11 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36", 12 "Referer": "https://www.pexels.com/", 13 "Accept-Language": "en-US,en;q=0.9", 14 } 15 16 for url in url_lists: 17 print(url) 18 req = urllib.request.Request(url, headers=headers) 19 try: 20 resp = urllib.request.urlopen(req) 21 except urllib.error.HTTPError as e: 22 print("HTTPError occurred: {}".format(e)) 23 continue 24 25 html_content = resp.read().decode() 26 soup = BeautifulSoup(html_content, "html.parser") 27 28 import re 29 pattern = re.compile('"Download" href="(.*?)/?cs=', re.S) 30 matches = re.findall(pattern, html_content) 31 print(matches) 32 33 if not os.path.exists(path): 34 os.makedirs(path) 35 36 for match in matches: 37 match_cleaned = match.split('?')[0] # 去除圖片URL地址最后帶的“?”號(hào)。 38 print(match_cleaned) # 輸出去除圖片URL“?”號(hào)的地址 39 match_cleaned = html.unescape(match_cleaned) #解碼 HTML 編碼字符,將文件鏈接還原為正常的 URL 格式 40 match_cleaned = urllib.parse.unquote(match_cleaned) # 對(duì) URL 進(jìn)行進(jìn)一步處理,解碼URL,確保它的格式正確,包括刪除多余的引號(hào)和處理特殊字符。 41 match_cleaned = urllib.parse.urljoin(url, match_cleaned) # 將相對(duì) URL 轉(zhuǎn)換為絕對(duì) URL 42 43 44 # 按URL地址后段命名 45 filename = match_cleaned.split("/")[-1] 46 with open(os.path.join(path, filename), "wb") as f: 47 f.write(requests.get(match_cleaned).content)
?文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-410904.html
到了這里,關(guān)于Python爬取pexels圖片的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!