Train your own Object Detection easily

4 min readAug 31, 2020

Hello, and welcome to this simple implementation tutorial on how to train your own object detection model on a custom dataset, using YOLOv3 with darknet 53 as a backbone.

You don’t have to be very familiar with Tensorflow 2, but basic understanding of computer vision tasks is a must to get started :)

Background

Prior detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections.
YOLOv3 applies a single neural network to the full image.

The network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. It looks at the whole image at test time so its predictions are informed by global context in the image.

It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. Thus, it makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN. Watch the full description.

Network Architecture

YOLOv3 uses a multi-label classification. Non-exclusive output labels can show a score that is more than one.

Instead of using the softmax function, YOLOv3 uses independent logistic classifiers to calculate the likelihood of the input to be associated with a specific label.

YOLOv3 uses binary cross-entropy loss for each label instead of the MSE in calculating the classification loss. By avoiding the softmax function reduces the computation complexity.

Let’s get started

First of all, clone this repository that allows you to train easily your own YOLOv3 object detection with just a few minor adjustments:

git clone https://github.com/jonykoren/Object_Detection_YOLOv3.git

Now, install the requirements, such as tensorflow2, keras, etc.

cd Object_Detection_YOLOv3
pip3 install -r requirements.txt

Download Dataset

I choose to download subset from ‘Open Images Dataset V6 + Extensions’, by selecting specific classes that i want to train my model on.

The classes parameter determines which classes you want to download, and limit parameter determines how many pictures you want to download

You can leave this command lines as is but if you want to change the classes, you can explore your own classes at the Open Images Dataset.

Download train set:

python download_dataset.py downloader -- classes ‘Man’ ‘Woman’ ‘Jeans’ ‘Shirt’ ‘Coffee’ ‘Dog’ ‘Cat’ ‘Hat’ ‘Shorts’ ‘Balloon’ -- type_csv train -- limit 5000

Download test set:

python download_dataset.py downloader --classes 'Man' 'Woman' 'Jeans' 'Shirt' 'Coffee' 'Dog' 'Cat' 'Hat' 'Shorts' 'Balloon' --type_csv test --limit 1000

By running these commands (of downloading train and test set), it will download the images and the .csv annotations files, as well as txt files, each corresponding to an image, inside labels folder.

Structure

Object_Detection_YOLOv3
|    ...
|    requirements.txt
│    download_dataset.py
│    Generate_xml_files.py
|    Prepare_data.py
|    README.md
|    train.py
|    evaluate_mAP.py
|    detect.py
│    ...
└─── data
      │
      └─── csv_folder
      │   │
      │   └─── class-descriptions-boxable.csv
      │   │
      │   └─── test-annotations-bbox.csv
      │   │
      │   └─── train-annotations-bbox.csv
      │        
      └─── Dataset
          │
          └─── train
          │   │
          │   └─── Man, Woman, Shorts, Shirt, balloon, ...
          │   
          └─── test
              │
              └─── Man, Woman, Shorts, Shirt, balloon, ...

Dataset Preparation

Once the dataset downloaded, we neet to generate xml format files from these txt annotations.

python Generate_xml_files.py

Then, the following script generates: ‘data_names.txt’ that contains the class names, ’data_train.txt’ and ‘data_test.txt’ that contain the mapping directory and their annotations.

python Prepare_data.py

Finally, adjust your custom configurations at config/configs.py if you want to change some paths, hyperparameters(such as LR, IOU loss threshold,…), etc.

Training

The training uses transfer learning of darknet network. You should download the pre-trained model of yolov3 through running the following command:

mkdir config_data
cd config_data
wget -P model_data https://pjreddie.com/media/files/yolov3.weights

Now, create directories for logs and for weights:

cd ..
mkdir logs
mkdir weights

Finally, you are good to go to train your custom model:

python train.py

Tensorboard

In order to control the process run:

tensorboard --logdir=logs

You can access it through http://127.0.0.1:6006/

Detect model

As you can see, it does not take so much time to train. it’s actually pretty fast. Let’s test the trained model!

Before detecting, you have to edit this file and determine which image or video you insert as an input, as well as the image or video output directory.
In addition, you have to adjust the last row in the script to your preferences: image, video or real-time.

For an image:

detect_image(yolo, image_path, img_det, input_size=YOLO_INPUT_SIZE, show=True, CLASSES=TRAIN_CLASSES, rectangle_colors=(255,0,0))

For a video:

detect_video(yolo, video_path, vidy, input_size=YOLO_INPUT_SIZE, show=False, CLASSES=TRAIN_CLASSES, rectangle_colors=(255,0,0))

For a real-time:

detect_realtime(yolo, vidy, input_size=YOLO_INPUT_SIZE, show=True, CLASSES=TRAIN_CLASSES, rectangle_colors=(255, 0, 0))

Detect your trained custom model on image, video or real-time by adjusting the final comments in the following script:

python detect.py

Summary

Although it seems difficult, this repository demonstrate how to train state-of-the-art model with your own dataset easily, and perform fast and high results. One of the advantages of this Repository, that it does not require a massive editing of code, and with only few adjustments of configuration you can easily do it even today.
I hope you enjoyed this tutorial and hope to see you soon again :)