開始安裝
筆者使用的是一臺(tái)M2版本的Macbook Air,雖然蘋果作為深度學(xué)習(xí)的訓(xùn)練機(jī)不太合適,但是由于macbook作為打字機(jī)實(shí)在是無可挑剔,所以使用macbook調(diào)試一下pytorch的代碼再放到集群上訓(xùn)練或者直接在mac上調(diào)試運(yùn)行代碼都是不錯(cuò)的體驗(yàn),本文以在mac上直接調(diào)試yolov5為目標(biāo),大概記錄一下步驟。
零,獲取代理
這一步就是大家八仙過海各顯神通的時(shí)候了??
或者直接使用國內(nèi)源安裝,可以參考Homebrew國內(nèi)如何自動(dòng)安裝(國內(nèi)地址)(Mac & Linux)
以下過程都是基于有代理的情況下的安裝過程,基本無痛
一,配置代理
總之開啟代理后,除了瀏覽器可以走代理訪問之外,還需要配置zsh和git走代理,否則homebrew的安裝會(huì)比較痛苦。
配置zsh走代理
#首先創(chuàng)建~/.zshrc
vim ~/.zshrc
加入本地代理的端口地址,例如127.0.0.1:8080
function onproxy() {
export no_proxy="localhost,127.0.0.1,localaddress,.localdomain.com"
export http_proxy="http://127.0.0.1:8080"
export https_proxy=$http_proxy
export all_proxy=socks5://127.0.0.1:8080
echo -e "\033[32mproxy on!\033[0m"
}
function offproxy(){
unset http_proxy
unset https_proxy
unset all_proxy
echo -e "\033[31mproxy off!\033[0m"
}
然后source一下
source ~/.zshrc
運(yùn)行一下onproxy開啟代理(后面也要記得重啟terminal的時(shí)候運(yùn)行開一下,或者寫到.zshrc里)
onproxy
測(cè)試一下是不是成功
curl -vv https://www.google.com
如果返回報(bào)文中有status 200 OK等字樣就說明現(xiàn)在配置成功,但是ping依然是不能正常使用代理的,所以即便配置了代理,ping google還是不行的。
配置git走代理
將git也配置一下走代理:
git config --global http.proxy http://127.0.0.1:8080
完工!
二,安裝homebrew
安裝homebrew十分簡(jiǎn)單
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
執(zhí)行完成后會(huì)提示你讓你把homebrew添加到path中,按圖中操作就可以
(echo; echo 'eval "$(/opt/homebrew/bin/brew shellenv)"') >> /Users/xxx/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
然后就可以愉快的使用brew install了,就和ubuntu的apt install一樣啦!
三,安裝miniforge
有了homebrew安裝miniforge就簡(jiǎn)單了
brew install miniforge
完成!
四,創(chuàng)建conda環(huán)境
首先初始化conda環(huán)境
conda init zsh
關(guān)閉terminal重新打開就能看到(base)環(huán)境已經(jīng)啟用
然后就是創(chuàng)建pytorch的conda環(huán)境
conda create -n torch python=3.8
創(chuàng)建完運(yùn)行
conda activate torch
就可以開始使用專屬的conda虛擬環(huán)境了。
五,安裝pytorch
直接按照官網(wǎng)的安裝,已經(jīng)包含了MPS加速(MPS:Metal Performance Shaders)
# MPS acceleration is available on MacOS 12.3+
pip3 install torch torchvision torchaudio
完成!
六,運(yùn)行yolov5
首先clone源碼,安裝依賴
git clone https://github.com/ultralytics/yolov5.git
pip install -r requirements.txt
等待安裝完畢
現(xiàn)在就可以開始測(cè)試檢測(cè)圖片了
我們?cè)趛olov5文件夾里創(chuàng)建一個(gè)imgs文件夾,里面放一張1.jpg的測(cè)試圖像
如下圖所示
然后運(yùn)行
python detect.py --weights yolov5s.pt --source 'imgs/1.jpg'
過程中會(huì)自動(dòng)下載yolov5s.pt的權(quán)重
然后將檢測(cè)結(jié)果保存到y(tǒng)olov5/runs/detect/exp中
打開1.jpg看一下
檢測(cè)成功,yolov5在MacBook上就跑起來了!環(huán)境初步搭建完成!
六,測(cè)試Apple Silicon的MPS GPU加速
測(cè)試yolov5的mps加速
由于pytorch可以跑cpu,所以我們還不知道蘋果的mps gpu加速能否正常起作用,那我們來測(cè)試一下。
打開vscode,創(chuàng)建testyolov5mps.py,環(huán)境選擇我們剛剛創(chuàng)建的conda環(huán)境torch,編輯一下
import torch
print(torch.backends.mps.is_available)
print(torch.backends.mps.is_built)
device = torch.device("mps" if torch.backends.mps.is_available else"cpu")
print(device)
exit()
運(yùn)行一下
(torch) xx@xxx-MacBook-Air testyolov5 % python testyolov5mps.py
<functools._lru_cache_wrapper object at 0x114d0f0d0>
<function is_built at 0x114d06ee0>
mps
(torch) xx@xxx-MacBook-Air testyolov5 %
都有輸出結(jié)果,并且device為mps,mps加速有效,我們接下來用yolov5的官方示例簡(jiǎn)單測(cè)試一下效果
import torch
print(torch.backends.mps.is_available)
print(torch.backends.mps.is_built)
device = torch.device("mps" if torch.backends.mps.is_available else"cpu")
print(device)
# exit()
# Model
model = torch.hub.load("ultralytics/yolov5", "yolov5s") # or yolov5n - yolov5x6, custom
model.to(device)
# exit()
# Images
img = "https://ultralytics.com/images/zidane.jpg" # or file, Path, PIL, OpenCV, numpy, list
# Inference
results = model(img)
# Results
results.print() # or .show(), .save(), .crop(), .pandas(), etc.
結(jié)果發(fā)現(xiàn)報(bào)錯(cuò)了,如下
(torch) xx@xxx-MacBook-Air testyolov5 % python testyolov5mps.py
<functools._lru_cache_wrapper object at 0x140872160>
<function is_built at 0x140869f70>
mps
Using cache found in /Users/xx/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 ?? 2023-2-27 Python-3.8.16 torch-1.13.1 CPU
Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape...
Traceback (most recent call last):
File "testyolov5mps.py", line 23, in <module>
results.print() # or .show(), .save(), .crop(), .pandas(), etc.
File "/Users/xx/.cache/torch/hub/ultralytics_yolov5_master/models/common.py", line 825, in print
LOGGER.info(self.__str__())
File "/Users/xx/.cache/torch/hub/ultralytics_yolov5_master/models/common.py", line 831, in __str__
return self._run(pprint=True) # print results
File "/Users/xx/.cache/torch/hub/ultralytics_yolov5_master/models/common.py", line 745, in _run
for c in pred[:, -1].unique():
File "/opt/homebrew/Caskroom/miniforge/base/envs/torch/lib/python3.8/site-packages/torch/_tensor.py", line 806, in unique
return torch.unique(
File "/opt/homebrew/Caskroom/miniforge/base/envs/torch/lib/python3.8/site-packages/torch/_jit_internal.py", line 485, in fn
return if_false(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/torch/lib/python3.8/site-packages/torch/_jit_internal.py", line 485, in fn
return if_false(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/torch/lib/python3.8/site-packages/torch/functional.py", line 877, in _return_output
output, _, _ = _unique_impl(input, sorted, return_inverse, return_counts, dim)
File "/opt/homebrew/Caskroom/miniforge/base/envs/torch/lib/python3.8/site-packages/torch/functional.py", line 791, in _unique_impl
output, inverse_indices, counts = torch._unique2(
NotImplementedError: The operator 'aten::_unique2' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
(torch) xx@xxx-MacBook-Air testyolov5 %
發(fā)現(xiàn)模型中的aten::_unique2操作符沒有mps實(shí)現(xiàn),所以只能用fallback模式再試一下
PYTORCH_ENABLE_MPS_FALLBACK=1 python testyolov5mps.py
成功運(yùn)行
(torch) xx@xxx-MacBook-Air testyolov5 % PYTORCH_ENABLE_MPS_FALLBACK=1 python testyolov5mps.py
<functools._lru_cache_wrapper object at 0x10cb72160>
<function is_built at 0x10cb69f70>
mps
Using cache found in /Users/xx/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 ?? 2023-2-27 Python-3.8.16 torch-1.13.1 CPU
Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape...
image 1/1: 900x1200 8 persons, 2 sports balls
Speed: 58.2ms pre-process, 228.4ms inference, 24.2ms NMS per image at shape (1, 3, 480, 640)
可以看到推理時(shí)間228.4ms
如果我們注釋掉model.to(device),使用cpu模式來跑
(torch) xx@xxx-MacBook-Air testyolov5 % python testyolov5mps.py
<functools._lru_cache_wrapper object at 0x1248f2280>
<function is_built at 0x1248f20d0>
mps
Using cache found in /Users/xx/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 ?? 2023-2-27 Python-3.8.16 torch-1.13.1 CPU
Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape...
image 1/1: 900x1200 8 persons, 2 sports balls
Speed: 24.1ms pre-process, 82.8ms inference, 0.7ms NMS per image at shape (1, 3, 480, 640)
僅用了82.8ms就完成了推理,這邊先不下結(jié)論,還需要再研究一下,不過可以看到mac的gpu加速還不是很完善。
測(cè)試resnet50的mps加速
既然yolov5網(wǎng)絡(luò)太復(fù)雜有不支持mps的算子,那我們?cè)囋嚭?jiǎn)單的resnet50
import torch
from PIL import Image
import torchvision.transforms as transforms
import numpy as np
import json
import requests
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
import time
# %matplotlib inline
# print(torch.backends.mps.is_available)
# print(torch.backends.mps.is_built)
device = torch.device("mps" if torch.backends.mps.is_available else"cpu")
#device = torch.device("cpu")
print(device)
# exit()
resnet50 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True)
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_convnets_processing_utils')
resnet50.eval().to(device)
uris = ["/Users/xx/workspace/yolov5/imgs/1.jpg" for x in range(1024)]
batch = torch.cat(
[utils.prepare_input_from_uri(uri) for uri in uris]
).to(device)
with torch.no_grad():
start = time.time()
output = resnet50(batch)
end = time.time()
# output = torch.nn.functional.softmax(output, dim=1)
print("using {} total time:{}s".format(device,end-start))
# results = utils.pick_n_best(predictions=output, n=5)
# print(results)
# for uri, result in zip(uris, results):
# img = Image.open(requests.get(uri, stream=True).raw)
# img = Image.open(uri)
# img.thumbnail((256,256), Image.ANTIALIAS)
# plt.imshow(img)
# plt.show()
# print(result)
分別用cpu和mps跑1024張圖像,計(jì)算推理時(shí)間
結(jié)果
(torch) xx@xxx-MacBook-Air testyolov5 % python testresnetmps.py
mps
using mps total time:8.64146113395691s
(torch) xx@xxx-MacBook-Air testyolov5 % python testresnetmps.py
cpu
using cpu total time:85.01676988601685s
我們可以看到,推理1024張圖使用mps 8.6秒,使用cpu花了85秒,快了10倍,確實(shí)有效。
對(duì)比測(cè)試1080ti,3700x
我們同樣的代碼用amd 3700x的純cpu模式跑一下:
cpu
using cpu total time:41.04833006858826s
用cuda在1080ti上再跑一下,由于顯存不夠1024張分成兩個(gè)batch來跑:
cuda
batch0infer
batch1infer
using cuda total time:2.7815260887145996s
3700x用時(shí)41秒,使用cuda只需要2.78秒
總結(jié)
device | data | model | 用時(shí)(秒) |
---|---|---|---|
Apple M2(8+10) (CPU) | 1024張 1 batch | resnet50 cls | 85.02 |
Apple M2(8+10) (MPS) | 1024張 1 batch | resnet50 cls | 8.64 |
AMD 3700X + 1080Ti (CPU) | 1024張 1 batch | resnet50 cls | 41.05 |
AMD 3700X + 1080Ti (CUDA) | 1024張 2 batch | resnet50 cls | 2.78 |
文章來源:http://www.zghlxwxcb.cn/news/detail-787733.html
測(cè)試不是很完善,但是也能看個(gè)大概,5w戰(zhàn)350w能有這個(gè)水平其實(shí)還可以。文章來源地址http://www.zghlxwxcb.cn/news/detail-787733.html
到了這里,關(guān)于Mac Apple Silicon M1/M2 homebrew miniforge conda pytorch yolov5深度學(xué)習(xí)環(huán)境搭建并簡(jiǎn)單測(cè)試MPS GPU加速的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!