How YOLOv2 works in detail

wsh
8 min readNov 24, 2018

Introduction

The other day, I trained YOLOv2 on my custom objects to see how first it runs. That was very impressive since it run at over 40 fps even on my poor desktop computer (i5–7500–3.4GHz, GTX-1060–6GB).

According to the paper of YOLOv2, it became more accurate and faster than the previous version (YOLO). This is because YOLOv2 uses some techniques that YOLO didn’t use, such as Batch-Normalization and Anchor-Boxes.

Batch-Normalization or BN is used to normalize the outputs of hidden layers. This makes learning much faster. Anchor-Boxes is assumption on the shapes of the bounding boxes. Since the shapes of objects we’re trying to detect do not vary so much, we don’t have to find boxes that do not look like any of objects we want to detect. Let’s say we want to detect humans, then the shapes of anchor boxes are usually vertical rectangles and it’s less likely that they are squares or horizontal rectangle. So we don’t have to search such boxes. This makes prediction much faster.

This post heavily depends on darkflow(GitHub), which is an object detection API based on YOLO. I used this API to train YOLOv2 network on my custom objects or training data. If you are new on YOLO, I recommend you to try training your own network with this API.
This API’s prediction run very first, at least on my computer, because some time consuming pert of the program are written in C or Cython. For example, drawing rectangles on an image is done in C.

--

--