Object Detection Inference In Python With YOLOv5 And PyTorch

Republished By Plato

Followers: 0

Introduction

Object detection is a large field in computer vision, and one of the more important applications of computer vision “in the wild”. On one end, it can be used to build autonomous systems that navigate agents through environments – be it robots performing tasks or self-driving cars, but this requires intersection with other fields. However, anomaly detection (such as defective products on a line), locating objects within images, facial detection and various other applications of object detection can be done without intersecting other fields.

Object detection isn’t as standardized as image classification, mainly because most of the new developments are typically done by individual researchers, maintainers and developers, rather than large libraries and frameworks. It’s difficult to package the necessary utility scripts in a framework like TensorFlow or PyTorch and maintain the API guidelines that guided the development so far.

This makes object detection somewhat more complex, typically more verbose (but not always), and less approachable than image classification. One of the major benefits of being in an ecosystem is that it provides you with a way to not search for useful information on good practices, tools and approaches to use. With object detection – most people have to do way more research on the landscape of the field to get a good grip.

Fortunately for the masses – Ultralytics has developed a simple, very powerful and beautiful object detection API around their YOLOv5 implementation.

In this short guide, we’ll be performing Object Detection in Python, with YOLOv5 built by Ultralytics in PyTorch, using a set of pre-trained weights trained on MS COCO.

YOLOv5

YOLO (You Only Look Once) is a methodology, as well as family of models built for object detection. Since the inception in 2015, YOLOv1, YOLOv2 (YOLO9000) and YOLOv3 have been proposed by the same author(s) – and the deep learning community continued with open-sourced advancements in the continuing years.

Ultralytics’ YOLOv5 is the first large-scale implementation of YOLO in PyTorch, which made it more accessible than ever before, but the main reason YOLOv5 has gained such a foothold is also the beautifully simple and powerful API built around it. The project abstracts away the unnecessary details, while allowing customizability, practically all usable export formats, and employs amazing practices that make the entire project both efficient and as optimal as it can be. Truly, it’s an example of the beauty of open source software implementation, and how it powers the world we live in.

The project provides pre-trained weights on MS COCO, a staple dataset on objects in context, which can be used to both benchmark and build general object detection systems – but most importantly, can be used to transfer general knowledge of objects in context to custom datasets.

Object Detection with YOLOv5

Before moving forward, make sure you have torch and torchvision installed:

! python -m pip install torch torchvision

YOLOv5’s got detailed, no-nonsense documentation and a beautifully simple API, as shown on the repo itself, and in the following example:

import torch

model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
img = 'https://i.ytimg.com/vi/q71MCWAEfL8/maxresdefault.jpg'  
results = model(img)
fig, ax = plt.subplots(figsize=(16, 12))
ax.imshow(results.render()[0])
plt.show()

The second argument of the hub.load() method specifies the weights we’d like to use. By choosing anywhere between yolov5n to yolov5l6 – we’re loading in the MS COCO pre-trained weights. For custom models:

model = torch.hub.load('ultralytics/yolov5', 'custom', path='path_to_weights.pt')

In any case – once you pass the input through the model, the returned object includes helpful methods to interpret the results, and we’ve chosen to render() them, which returns a NumPy array that we can chuck into an imshow() call. This results in a nicely formatted:

Saving Results as Files

You can save the results of the inference as a file, using the results.save() method:

results.save(save_dir='results')

This will create a new directory if it isn’t already present, and save the same image we’ve just plotted as a file.

Cropping Out Objects

You can also decide to crop out the detected objects as individual files. In our case, for every label detected, a number of images can be extracted. This is easily achieved via the results.crop() method, which rcreates a runs/detect/ directory, with expN/crops (where N increases for each run), in which a directory with cropped images is made for each label:

results.crop()

Saved 1 image to runs/detect/exp2
Saved results to runs/detect/exp2

[{'box': [tensor(295.09409),
   tensor(277.03699),
   tensor(514.16113),
   tensor(494.83691)],
  'conf': tensor(0.25112),
  'cls': tensor(0.),
  'label': 'person 0.25',
  'im': array([[[167, 186, 165],
          [174, 184, 167],
          [173, 184, 164],

You can also verify the output file structure with:

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

! ls runs/detect/exp2/crops


! ls runs/detect/exp2/crops

Object Counting

By default, when you perform detection or print the results object – you’ll gget the number of images that inference was performed on for that results object (YOLOv5 works with batches of images as well), its resolution and the count of each label detected:

print(results)

This results in:

image 1/1: 720x1280 14 persons, 1 car, 3 buss, 6 traffic lights, 1 backpack, 1 umbrella, 1 handbag
Speed: 35.0ms pre-process, 256.2ms inference, 0.7ms NMS per image at shape (1, 3, 384, 640)

Inference with Scripts

Alternatively, you can run the detection script, detect.py, by cloning the YOLOv5 repository:

$ git clone https://github.com/ultralytics/yolov5 
$ cd yolov5
$ pip install -r requirements.txt

And then running:

$ python detect.py --source img.jpg

Alternatively, you can provide a URL, video file, path to a directory with multiple files, a glob in a path to only match for certain files, a YouTube link or any other HTTP stream. The results are saved into the runs/detect directory.

Going Further – Practical Deep Learning for Computer Vision

Your inquisitive nature makes you want to go further? We recommend checking out our Course: “Practical Deep Learning for Computer Vision with Python”.

Another Computer Vision Course?

We won’t be doing classification of MNIST digits or MNIST fashion. They served their part a long time ago. Too many learning resources are focusing on basic datasets and basic architectures before letting advanced black-box architectures shoulder the burden of performance.

We want to focus on demystification, practicality, understanding, intuition and real projects. Want to learn how you can make a difference? We’ll take you on a ride from the way our brains process images to writing a research-grade deep learning classifier for breast cancer to deep learning networks that “hallucinate”, teaching you the principles and theory through practical work, equipping you with the know-how and tools to become an expert at applying deep learning to solve computer vision.

What’s inside?

The first principles of vision and how computers can be taught to “see”
Different tasks and applications of computer vision
The tools of the trade that will make your work easier
Finding, creating and utilizing datasets for computer vision
The theory and application of Convolutional Neural Networks
Handling domain shift, co-occurrence, and other biases in datasets
Transfer Learning and utilizing others’ training time and computational resources for your benefit
Building and training a state-of-the-art breast cancer classifier
How to apply a healthy dose of skepticism to mainstream ideas and understand the implications of widely adopted techniques
Visualizing a ConvNet’s “concept space” using t-SNE and PCA
Case studies of how companies use computer vision techniques to achieve better results
Proper model evaluation, latent space visualization and identifying the model’s attention
Performing domain research, processing your own datasets and establishing model tests
Cutting-edge architectures, the progression of ideas, what makes them unique and how to implement them
KerasCV – a WIP library for creating state of the art pipelines and models
How to parse and read papers and implement them yourself
Selecting models depending on your application
Creating an end-to-end machine learning pipeline
Landscape and intuition on object detection with Faster R-CNNs, RetinaNets, SSDs and YOLO
Instance and semantic segmentation
Real-Time Object Recognition with YOLOv5
Training YOLOv5 Object Detectors
Working with Transformers using KerasNLP (industry-strength WIP library)
Integrating Transformers with ConvNets to generate captions of images
DeepDream