進(jìn)行人臉識(shí)別首要的任務(wù)就是要定位出畫(huà)面中的人臉,這個(gè)任務(wù)就是人臉檢測(cè)。人臉檢測(cè)總體上算是目標(biāo)檢測(cè)的一個(gè)特殊情況,但也有自身的特點(diǎn),比如角度多變,表情多變,可能存在各類(lèi)遮擋。早期傳統(tǒng)的方法有Haar Cascade、HOG等,基本做法就是特征描述子+滑窗+分類(lèi)器,隨著2012年Alexnet的出現(xiàn),慢慢深度學(xué)習(xí)在這一領(lǐng)域開(kāi)始崛起。算法和硬件性能的發(fā)展,也讓基于深度學(xué)習(xí)的人臉識(shí)別不僅性能取得了很大的提升,速度也能達(dá)到實(shí)時(shí),使得人臉技術(shù)真正進(jìn)入了實(shí)用。
人臉檢測(cè)大體上跟隨目標(biāo)檢測(cè)技術(shù)的發(fā)展,不過(guò)也有些自己的方法,主要可以分為一下幾類(lèi)方法.
人臉檢測(cè)算法概覽
由于這個(gè)系列重點(diǎn)并不在于算法細(xì)節(jié)本身,因而對(duì)于一些算法只是提及,有興趣可以自己精讀。
Cascade-CNN Based Models
這類(lèi)方法通過(guò)級(jí)聯(lián)幾個(gè)網(wǎng)絡(luò)來(lái)逐步提高準(zhǔn)確率,比較有代表性的是MTCNN方法。
MTCNN通過(guò)級(jí)聯(lián)PNet, RNet, ONet,層層過(guò)濾來(lái)提高整個(gè)檢測(cè)的精度。這個(gè)方法更適合CPU,那個(gè)時(shí)期的嵌入式設(shè)備使用比較多。 由于有3個(gè)網(wǎng)絡(luò),訓(xùn)練起來(lái)比較麻煩。
R-CNN
這一塊主要來(lái)源于目標(biāo)檢測(cè)中的RCNN, Fast RCNN, Faster RCNN
這類(lèi)方法精度高,但速度相對(duì)較慢。
Single Shot Detection Models
SSD是目標(biāo)檢測(cè)領(lǐng)域比較有代表性的一個(gè)算法,與RCNN系列相比,它是one stage方法,速度比較快?;谒挠糜谌四槞z測(cè)的代表性方法是SSH.
Feature Pyramid Network Based Models
YOLO系列
YOLO系列在目標(biāo)檢測(cè)領(lǐng)域比較成功,自然的也會(huì)用在人臉檢測(cè)領(lǐng)域,比如tiny yolo face,yolov5face, yolov8face等,基本上每一代都會(huì)應(yīng)用于人臉。
開(kāi)源模型的選型
為了能夠達(dá)到實(shí)時(shí),同時(shí)也要有較好的效果,我們將目光鎖定在yolo系列上,yolo在精度和速度的平衡上做的比較好,也比較易用。目前最新的是yolov8, 經(jīng)過(guò)搜索,也已經(jīng)有人將其用在人臉檢測(cè)上了:derronqi/yolov8-face: yolov8 face detection with landmark (github.com),
推理框架的選擇
簡(jiǎn)單起見(jiàn),我們選擇onnxruntime,該框架既支持CPU也支持GPU, 基本滿足了我們的開(kāi)發(fā)要求。
yolov8-face的使用
為了減少重復(fù)工作,我們可以定義一個(gè)模型的基類(lèi), 對(duì)模型載入、推理的操作進(jìn)行封裝,這樣就不需要每個(gè)模型都實(shí)現(xiàn)一遍了:
from easydict import EasyDict as edict
import onnxruntime
import threading
class BaseModel:
def __init__(self, model_path, device='cpu', **kwargs) -> None:
self.model = self.load_model(model_path, device)
self.input_layer = self.model.get_inputs()[0].name
self.output_layers = [output.name for output in self.model.get_outputs()]
self.lock = threading.Lock()
def load_model(self, model_path:str, device:str='cpu'):
available_providers = onnxruntime.get_available_providers()
if device == "gpu" and "CUDAExecutionProvider" not in available_providers:
print("CUDAExecutionProvider is not available, use CPUExecutionProvider instead")
device = "cpu"
if device == 'cpu':
self.model = onnxruntime.InferenceSession(model_path, providers=['CPUExecutionProvider'])
else:
self.model = onnxruntime.InferenceSession(model_path,providers=['CUDAExecutionProvider'])
return self.model
def inference(self, input):
with self.lock:
outputs = self.model.run(self.output_layers, {self.input_layer: input})
return outputs
def preprocess(self, **kwargs):
pass
def postprocess(self, **kwargs):
pass
def run(self, **kwargs):
pass
繼承BaseModel, 實(shí)現(xiàn)模型的前處理和后處理:
class Yolov8Face(BaseModel):
def __init__(self, model_path, device='cpu',**kwargs) -> None:
super().__init__(model_path, device, **kwargs)
self.conf_threshold = kwargs.get('conf_threshold', 0.5)
self.iou_threshold = kwargs.get('iou_threshold', 0.4)
self.input_size = kwargs.get('input_size', 640)
self.input_width, self.input_height = self.input_size, self.input_size
self.reg_max=16
self.project = np.arange(self.reg_max)
self.strides=[8, 16, 32]
self.feats_hw = [(math.ceil(self.input_height / self.strides[i]), math.ceil(self.input_width / self.strides[i])) for i in range(len(self.strides))]
self.anchors = self.make_anchors(self.feats_hw)
def make_anchors(self, feats_hw, grid_cell_offset=0.5):
"""Generate anchors from features."""
anchor_points = {}
for i, stride in enumerate(self.strides):
h,w = feats_hw[i]
x = np.arange(0, w) + grid_cell_offset # shift x
y = np.arange(0, h) + grid_cell_offset # shift y
sx, sy = np.meshgrid(x, y)
# sy, sx = np.meshgrid(y, x)
anchor_points[stride] = np.stack((sx, sy), axis=-1).reshape(-1, 2)
return anchor_points
def preprocess(self, image, **kwargs):
return resize_image(image, keep_ratio=True, dst_width=self.input_width, dst_height=self.input_height)
def distance2bbox(self, points, distance, max_shape=None):
x1 = points[:, 0] - distance[:, 0]
y1 = points[:, 1] - distance[:, 1]
x2 = points[:, 0] + distance[:, 2]
y2 = points[:, 1] + distance[:, 3]
if max_shape is not None:
x1 = np.clip(x1, 0, max_shape[1])
y1 = np.clip(y1, 0, max_shape[0])
x2 = np.clip(x2, 0, max_shape[1])
y2 = np.clip(y2, 0, max_shape[0])
return np.stack([x1, y1, x2, y2], axis=-1)
def postprocess(self, preds, scale_h, scale_w, top, left, **kwargs):
bboxes, scores, landmarks = [], [], []
for i, pred in enumerate(preds):
stride = int(self.input_height/pred.shape[2])
pred = pred.transpose((0, 2, 3, 1))
box = pred[..., :self.reg_max * 4]
cls = 1 / (1 + np.exp(-pred[..., self.reg_max * 4:-15])).reshape((-1,1))
kpts = pred[..., -15:].reshape((-1,15)) ### x1,y1,score1, ..., x5,y5,score5
# tmp = box.reshape(self.feats_hw[i][0], self.feats_hw[i][1], 4, self.reg_max)
tmp = box.reshape(-1, 4, self.reg_max)
bbox_pred = softmax(tmp, axis=-1)
bbox_pred = np.dot(bbox_pred, self.project).reshape((-1,4))
bbox = self.distance2bbox(self.anchors[stride], bbox_pred, max_shape=(self.input_height, self.input_width)) * stride
kpts[:, 0::3] = (kpts[:, 0::3] * 2.0 + (self.anchors[stride][:, 0].reshape((-1,1)) - 0.5)) * stride
kpts[:, 1::3] = (kpts[:, 1::3] * 2.0 + (self.anchors[stride][:, 1].reshape((-1,1)) - 0.5)) * stride
kpts[:, 2::3] = 1 / (1+np.exp(-kpts[:, 2::3]))
bbox -= np.array([[left, top, left, top]]) ###合理使用廣播法則
bbox *= np.array([[scale_w, scale_h, scale_w, scale_h]])
kpts -= np.tile(np.array([left, top, 0]), 5).reshape((1,15))
kpts *= np.tile(np.array([scale_w, scale_h, 1]), 5).reshape((1,15))
bboxes.append(bbox)
scores.append(cls)
landmarks.append(kpts)
bboxes = np.concatenate(bboxes, axis=0)
scores = np.concatenate(scores, axis=0)
landmarks = np.concatenate(landmarks, axis=0)
bboxes_wh = bboxes.copy()
bboxes_wh[:, 2:4] = bboxes[:, 2:4] - bboxes[:, 0:2] ####xywh
classIds = np.argmax(scores, axis=1)
confidences = np.max(scores, axis=1) ####max_class_confidence
mask = confidences>self.conf_threshold
bboxes_wh = bboxes_wh[mask] ###合理使用廣播法則
confidences = confidences[mask]
classIds = classIds[mask]
landmarks = landmarks[mask]
if len(bboxes_wh) == 0:
return np.empty((0, 5)), np.empty((0, 5))
indices = cv2.dnn.NMSBoxes(bboxes_wh.tolist(), confidences.tolist(), self.conf_threshold,
self.iou_threshold).flatten()
if len(indices) > 0:
mlvl_bboxes = bboxes_wh[indices]
confidences = confidences[indices]
classIds = classIds[indices]
## convert box to x1,y1,x2,y2
mlvl_bboxes[:, 2:4] = mlvl_bboxes[:, 2:4] + mlvl_bboxes[:, 0:2]
# concat box, confidence, classId
mlvl_bboxes = np.concatenate((mlvl_bboxes, confidences.reshape(-1, 1), classIds.reshape(-1, 1)), axis=1)
landmarks = landmarks[indices]
return mlvl_bboxes, landmarks.reshape(-1, 5, 3)[..., :2]
else:
return np.empty((0, 5)), np.empty((0, 5))
def run(self, image, **kwargs):
img, newh, neww, top, left = self.preprocess(image)
scale_h, scale_w = image.shape[0]/newh, image.shape[1]/neww
# convert to RGB
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32)
img = img / 255.0
img = np.transpose(img, (2, 0, 1))
img = np.expand_dims(img, axis=0)
output = self.inference(img)
bboxes, landmarks = self.postprocess(output, scale_h, scale_w, top, left)
# limit box in image
bboxes[:, 0] = np.clip(bboxes[:, 0], 0, image.shape[1])
bboxes[:, 1] = np.clip(bboxes[:, 1], 0, image.shape[0])
return bboxes, landmarks
測(cè)試
在Intel? Core? i5-10210U上,yolov8-lite-t耗時(shí)50ms, 基本可以達(dá)到實(shí)時(shí)的需求。文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-767169.html
參考文獻(xiàn):
ZOU, Zhengxia, et al. Object detection in 20 years: A survey. Proceedings of the IEEE, 2023.
MINAEE, Shervin, et al. Going deeper into face detection: A survey. arXiv preprint arXiv:2103.14983, 2021.文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-767169.html
到了這里,關(guān)于人臉識(shí)別實(shí)戰(zhàn)之基于開(kāi)源模型搭建實(shí)時(shí)人臉識(shí)別系統(tǒng)(二):人臉檢測(cè)概覽與模型選型的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!