Note: Tensorflow object detection is an accurate machine learning API capable of localizing and identifying multiple objects in a single image. You can use the API for multiple use cases like object detection , person recognition, text detection, etc..
Today, we will see together how tensorflow can recognize people. In this post I'll outline the steps I took to get from a collection of Celebrities images (crawled from the internet)
- Data Set Download
- Image Annotation
- Xml to CSV
- TF-Record Creation
- Label Map preparation
- Pipeline Configuration
- Training
- Exporting Graph
You can crawl celebrity pictures from google images if you don't have a ready Data set. Try to order the data set as bellow:
CelebrityDB/
    TomHanks/
        img001.jpg
        tomhanks.jpg
        ...
    WillSmith/
        willsmith1.jpg
        will-smith-pic.jpg
        ...
    ... 
you can annotate the images using an annotation tool like labelImg. But it will take a lot of time. That's why i created a script to generate xml files(exactly like PASCAL VOC). I used opencv to detect faces but, you can change it with any other tool( i recommend dlib or a neural network face detection model which are much more accurate than opencv).
The Xml files should look like :
<annotation verified="yes">
    <folder>celebrityDB</folder><filename>1d9k49.jpeg</filename>
    <path>/media/emna/datapartition/tutos/celebrityDB/dewayneJohnson/1d9k49.jpeg</path>
    <source>
        <database>Emna Amor</database>
    </source>
    <size>
        <width>104</width>
        <height>142</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>dewayneJohnson</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>2</xmin>
            <ymin>34</ymin>
            <xmax>133</xmax>
            <ymax>99</ymax>
        </bndbox>
    </object>
</annotation>After annotating the pictures we have, We need to generate a csv file containing all pictures details / classes.
The csv File should look like this:
| filename | width | height | class | xmin | ymin | xmax | ymax | 
|---|---|---|---|---|---|---|---|
| 1jth1461.jpeg | 76 | 105 | kimKardashian | 10 | 30 | 89 | 59 | 
| wenn33850496.jpg | 470 | 654 | AndySerkis | 52 | 142 | 494 | 352 | 
Then we need to split the data into train and test using this python notebook .
To train our model, we need to convert the data to Tensorflow file format called Tfrecords. Most of the batch operations aren’t done directly from images, rather they are converted into a single tfrecord file (images which are numpy arrays and labels which are a list of strings).
 “… TFRECORD is an approach that convert whatever data you have into a supported format. This approach makes it easier to mix and match data sets and network architectures. The recommended format for TensorFlow is a TFRecords file containing tf.train.Example protocol buffers (which contain Features as a field).“
Use this python script to generate te tf records files (train.record and test.record)
Usage:
  # Create train data:
  python generate_tfrecords.py --csv_input=train.csv  --output_path=data/train.record
  # Create test data:
  python generate_tfrecords.py --csv_input=test.csv  --output_path=data/test.record
Use the same order you appended the labels in the generate_tfrecords python script
item {
  id: 1
  name: 'AndySerkis'
}
item {
  id: 2
  name: 'dewayneJohnson'
}
item {
  id: 3
  name: 'drake'
}
item {
  id: 4
  name: 'jayZ'
}
item {
  id: 5
  name: 'justinBieber'
}
item {
  id: 6
  name: 'kimKardashian'
}
item {
  id: 7
  name: 'kimKardashian'
}
item {
  id: 8
  name: 'tomHanks'
}
item {
  id: 9
  name: 'willSmith'
}
We will use ssd_mobilenet_v1_coco to train our face recognition model.
Do not forget to edit the ssd_mobilenet_v1_coco.config file with the number of classes( 9 in my case) , ssd_mobilenet_v1_coco model.ckpt under ssd_mobilenet_v1_coco_2018_01_28, the train record path , test record path and the label.pbtxt path.
# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}/data/ssd_mobilenet_v1_coco.config
MODEL_DIR={path to model directory}/ssd_mobilenet_v1_coco_2018_01_28
NUM_TRAIN_STEPS=50000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python object_detection/model_main.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --num_train_steps=${NUM_TRAIN_STEPS} \
    --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
    --alsologtostderr
    
- to keep training infinetly, remove NUM_TRAIN_STEPS=50000
# From tensorflow/models/research/
INPUT_TYPE=image_tensor
PIPELINE_CONFIG_PATH={path to pipeline config file}/data/ssd_mobilenet_v1_coco.config
TRAINED_CKPT_PREFIX={path to model directory}/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt-num
EXPORT_DIR={path to folder that will be used for export}
python object_detection/export_inference_graph.py \
    --input_type=${INPUT_TYPE} \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --trained_checkpoint_prefix=${TRAINED_CKPT_PREFIX} \
    --output_directory=${EXPORT_DIR}
You can use the object detection notebook to make prediction (change the model download section with your model path)
