使用 PyTorch 和 Torchvision 进行 RetinaNet 对象检测

由柏拉图重新发布

关注： 0

介绍

目标检测是计算机视觉中的一个大领域，也是计算机视觉“在野外”更重要的应用之一。一方面，它可以用来构建在环境中导航代理的自主系统——无论是执行任务的机器人还是自动驾驶汽车，但这需要与其他领域的交叉。然而，异常检测（例如生产线上的缺陷产品）、在图像中定位对象、面部检测和对象检测的各种其他应用都可以在不与其他领域相交的情况下完成。

对象检测不像图像分类那样标准化，主要是因为大多数新开发通常是由个人研究人员、维护人员和开发人员完成的，而不是大型库和框架。很难将必要的实用程序脚本打包到 TensorFlow 或 PyTorch 等框架中，并维护迄今为止指导开发的 API 指南。

这使得对象检测稍微复杂一些，通常更冗长（但并非总是如此），并且比图像分类更不易接近。在生态系统中的主要好处之一是它为您提供了一种无需搜索有关良好实践、工具和使用方法的有用信息的方法。使用对象检测——大多数人必须对该领域的景观进行更多的研究才能获得良好的抓地力。

使用 PyTorch/TorchVision 的 RetinaNet 进行对象检测

torchvision 是 PyTorch 的计算机视觉项目，旨在通过提供转换和增强脚本，使基于 PyTorch 的 CV 模型的开发更容易，这是一个具有预先训练的权重、数据集和实用程序的模型动物园，对从业者很有用。

虽然仍处于测试阶段且非常实验性—— torchvision 提供了一个相对简单的对象检测 API，有几个模型可供选择：

更快的R-CNN
视网膜网
FCOS（全卷积 RetinaNet）
SSD（VGG16 主干……哎呀）
SSDLite（MobileNetV3 骨干网）

虽然 API 不像其他一些第三方 API 那样精致或简单，但对于那些仍然希望在他们熟悉的生态系统中安全的人来说，这是一个非常不错的起点。在继续之前，请确保您安装了 PyTorch 和 Torchvision：

$ pip install torch torchvision

让我们加载一些实用函数，例如 read_image(), draw_bounding_boxes() 和 to_pil_image() 为了更容易阅读、绘制和输出图像，然后导入 RetinaNet 及其预训练的权重 (MS COCO)：

from torchvision.io.image import read_image
from torchvision.utils import draw_bounding_boxes
from torchvision.transforms.functional import to_pil_image
from torchvision.models.detection import retinanet_resnet50_fpn_v2, RetinaNet_ResNet50_FPN_V2_Weights

import matplotlib.pyplot as plt

RetinaNet 在其上使用 ResNet50 主干和特征金字塔网络 (FPN)。虽然类的名称很冗长，但它表明了体系结构。让我们使用 requests 库并将其保存为我们本地驱动器上的文件：

import requests
response = requests.get('https://i.ytimg.com/vi/q71MCWAEfL8/maxresdefault.jpg')
open("obj_det.jpeg", "wb").write(response.content)

img = read_image("obj_det.jpeg")

有了图像——我们可以实例化我们的模型和权重：

weights = RetinaNet_ResNet50_FPN_V2_Weights.DEFAULT
model = retinanet_resnet50_fpn_v2(weights=weights, score_thresh=0.35)

model.eval()

preprocess = weights.transforms()

score_thresh 参数定义了将对象检测为类对象的阈值。直观地说，它是置信度阈值，如果模型对某个对象属于某个类别的置信度低于 35%，我们就不会将其归类为某个类别。

让我们使用权重的变换对图像进行预处理，创建批处理并运行推理：

batch = [preprocess(img)]
prediction = model(batch)[0]

就是这样，我们的 prediction 字典包含推断的对象类和位置！现在，这种形式的结果对我们来说不是很有用——我们希望从权重中提取与元数据相关的标签并绘制边界框，这可以通过 draw_bounding_boxes():

labels = [weights.meta["categories"][i] for i in prediction["labels"]]

box = draw_bounding_boxes(img, boxes=prediction["boxes"],
                          labels=labels,
                          colors="cyan",
                          width=2, 
                          font_size=30,
                          font='Arial')

im = to_pil_image(box.detach())

fig, ax = plt.subplots(figsize=(16, 12))
ax.imshow(im)
plt.show()

结果是：

RetinaNet 居然对偷看车后的人进行了分类！这是一个相当困难的分类。

查看我们的 Git 学习实践指南，其中包含最佳实践、行业认可的标准以及随附的备忘单。停止谷歌搜索 Git 命令，实际上学习它！

您可以通过替换将 RetinaNet 切换为 FCOS（完全卷积 RetinaNet） retinanet_resnet50_fpn_v2 fcos_resnet50_fpn，并使用该 FCOS_ResNet50_FPN_Weights 重量：

from torchvision.io.image import read_image
from torchvision.utils import draw_bounding_boxes
from torchvision.transforms.functional import to_pil_image
from torchvision.models.detection import fcos_resnet50_fpn, FCOS_ResNet50_FPN_Weights

import matplotlib.pyplot as plt
import requests
response = requests.get('https://i.ytimg.com/vi/q71MCWAEfL8/maxresdefault.jpg')
open("obj_det.jpeg", "wb").write(response.content)

img = read_image("obj_det.jpeg")
weights = FCOS_ResNet50_FPN_Weights.DEFAULT
model = fcos_resnet50_fpn(weights=weights, score_thresh=0.35)
model.eval()

preprocess = weights.transforms()
batch = [preprocess(img)]
prediction = model(batch)[0]

labels = [weights.meta["categories"][i] for i in prediction["labels"]]

box = draw_bounding_boxes(img, boxes=prediction["boxes"],
                          labels=labels,
                          colors="cyan",
                          width=2, 
                          font_size=30,
                          font='Arial')

im = to_pil_image(box.detach())

fig, ax = plt.subplots(figsize=(16, 12))
ax.imshow(im)
plt.show()