Motivation and context

YOLO is a popular object detection algorithm that I want to experiment with and learn how to use both for research and industrial applications. Computer Vision is part of my domain and background and not having tried out YOLO feels so wrong to me. This reddit post oddly spurred me to begin learning about it → Help with YOLOv8: Incorrect Labels Displayed in Detection Boxes

The key difference between YOLO and other object detectors is that it ==frames object detection as a regression problem rather than a classification problem==. Is there an advantage to this? I guess regression networks are simpler than classification networks and if YOLO uses the former, that makes it ~~lighter~~ faster than other models -[ ] YOLO also performs a single pass of the image and predicts the bounding boxes and their associated probabilities (confidence %), making it quite efficient and fast. A single NN is used, which enables users to optimize it end-to-end. No more going down the NN rabbithole thanks to the YOLO model’s simplicity. YOLO does a great job with generalization and does not throw false positives often YOLO is compared to DPM (Deformable Parts Model) and R-CNN

R-CNN uses region proposal to generate bounding boxes on an image. Then, a classifier is ran on that image. After the classification, post-processing is employed to eliminate duplicates (this is probably where NMS (Non-Max Suppression is used?) and refine the results. It’s a slow and complex pipeline and one would have to train the individual components separately. YOLO unifies everything into a single pipeline, making it easy to train
[?] What is mAP? First guess: It is a mean parameter used to compare the performance of a DL model. mAP is the mean average precision

YOLO sees the entire image at once and doesn’t focus on specific localized parts of the image. This allows it to make use of vital background context of objects in the image to reduce background errors. However, it lags behind other detectors in accuracy and localization

The image is divided into an S x S grid (S = number of rows and columns). This is the residual blocks process
- Each cell in the grid predicts its bounding boxes and their associated confidence scores. After the single pass, the BBs are all averaged out together to reveal the final BBs. The confidence of the BB predicted by each cell is the IOU (Intersection Over Union) between the predicted box and the ground truth ← making YOLO a supervised learning algorithm
- The cell can only identify a single class. Each BB has 5 predictions made
  - (x, y) - The coordinates of the center of the BB w.r.t the bounds of the cell
  - (w, h) - The width and height of the BB w.r.t the whole image
  - c - The confidence i.e. the IOU In the end, each grid cell produces a class-specific confidence score for each box (P(Class) * IOU) The predictions are encoded as a S * S * (B * 5 + C) tensor

YOLO uses a CNN to extract features and a fully connected NN to perform predictions. It is inspired from GoogLeNet

[?] What is an inception module? GoogLeNet uses these but YOLO does not

YOLO (the original one) has a few limitations though

As it imposes a lot of spatial constraints on the bounding box predictions, it struggles to detect background objects such as birds in groups
Incorrect localizations

Note to-do

Compile the pros and cons into a neat table
Write in my own words. Generate personal knowledge and useful thoughts
Codify the algorithm into an intuitive flowchart for visual understanding
If required, split the note into separate notes of different but connected topics to encourage atomicity

External links

YOLO Object Detection Explained - DataCamp - I am using this website to learn YOLO
Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788). - This is the original YOLO paper by Redmon et. al. I am reading this paper as this is the first-degree source of the algorithm

✳️ Zero Space

Explorer

YOLO Object Detection

Note to-do

External links

Graph View

Table of Contents

Backlinks