Landmark recognition using Inception and TensorFlow on Kaggle’s Landmark Retrieval dataset
Sometimes ago Kaggle launched a very interesting pair of image recognition challenge for landmarks. This challenge has a large dataset of landmark images. Given an image query, your program should find similar landmark images from the dataset. For example when an image of Al-Hamra Palace of Spain is given as a query image, the program should find and bring other images of Al-Hamra from dataset. Further details about that challenge can be found on Kaggle’s website.
Our motivation was to test if we can use Google’s Inception model and how much accuracy we can get using it. Google’s Inception which is also referred as GoogLeNet is a Deep Learning Convolutional Neural Network (CNN) architecture which was originally designed for ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). It has been battle-tested, delivering world-class results in that challenge.
We are using Python 3.6 and TensorFlow. Once python is installed make sure it is accessible in command prompt or terminal. If it’s not we will need to add it in system path.
We will also need to install TensorFlow. Installation guide for TensorFlow can be found here.
Preparing Training set
Kaggle has provided index.csv file containing around 1600 different landmarks images which can be found here. For our simplicity purpose we are going to hand pick around 20 landmarks from this 1600 list. This idea can be extended to Kaggle’s provided complete dataset but for understanding purposes 20 would be enough. Our simplified version of training dataset can be found here. It’s a CSV file which contains URLs of images from different landmarks. First let’s download the images using the CSV file provided above. We have provided a python script which will help downloading this large set of images. This script can be found here. To run this script first we need to install a dependency via PIP.
> pip install tqdm
After installing tdqm we need to run this script with following arguments.
> python download.py train.csv ./training_images
Training Inception Model
First we need to install TensorFlow Hub. TensorFlow Hub is a library for reusable parts of machine learning models. We will train our Inception model on top of existing training data from TensorFlow Hub.
> pip install tensorflow-hub
Now we need to get the script for re-training our Inception. Google has provided a script in TensorFlow examples which can be downloaded from here. After downloading run the script with following arguments.
> python ./retrain.py --image_dir ./training_images/ --output_graph ./retrained_graph.pb --intermediate_output_graphs_dir ./intermediate_graph/ --output_labels ./retrained_labels.txt --how_many_training_steps 500 --summaries_dir ./summaries/ --bottleneck_dir ./bottleneck_data/ --final_tensor_name "final_result" --saved_model_dir ./model/
This command will take approx. 30 mins to run depending on system configuration. We are running it with lower number of how_many_training_steps for quick training. For production purposes increase it to 4000 or above. Once this command finishes our Inception model is ready for some testing.
Testing Inception Trained Model
Our testing images can be found here. These pics are slightly different from the ones we trained our model on. And we have to test how much close our Inception model can figure out its resemblance. First, let’s download a script from TensorFlow example which will feed our test image to our trained Inception model and predict result. This script can be found here. Create a folder called “testing_images” and put all testing images there and run the following command.
> python.exe .label_image.py --graph .retrained_graph.pb --labels .retrained_labels.txt -- input_layer "Placeholder" --output_layer "final_result" --image .test_images1.jpg
It will give the result in probability showing how much likely the given image matches with dataset categories. Below is an example of result.
Al-Hamra Spain 0.73449653
Atlantis Hotel 0.053302728
Peace Palace 0.03108669
Flinders Street Station 0.020717677
When testing on a dataset containing pictures from 40-50 landmarks and every landmark having sufficient images to train on, Inception gives very high accuracy. You can use it straight of the shelf just by training it on a precise dataset of landmark images.