Receipt Store Detection via Tensorflow Inception Model

Receipt Store Detection via Tensorflow Inception Model

The Motivation

InfoScout’s data analytics and business intelligence platform is built on top of receipts. A lot of receipts...

Receipts today come in many different shapes, sizes, and vary from retailer to retailer. Most differences are fairly simple, such as length, or possibly having a coupon or two which can be pretty easily accounted for. The real trouble begins when you look at the differences in product descriptions. One store may use a short description to describe their products, another may print a UPC, while yet another may print an internal SKU number. This makes transcribing a receipt from an image a difficult task.

Good Receipt
###### Bad Receipt

At Infoscout we deal with hundreds of thousands of receipts a day. These receipts are captured as images on hundreds of different types of mobile devices by hundreds of thousands of panelists across the country. Taking those images and transforming them into usable data takes a pipeline of technology that all ultimately relies on correctly identifying and extracting the product related text on the receipt. If the store (“banner” as we call it internally here at InfoScout) is known before these rules are applied, we can use our previous knowledge of the format of those receipts to drastically increase the accuracy and speed of our automatic transcription along with faster responses and interactivity for our app users.

This brings us to the our technology topic of the day: on device banner detection via machine learning with Tensorflow.


The Nuts and Bolts

Taking a look at a few highlights of how this went from idea to implementation. This implementation takes advantage of Tensorflow and the Inception Model. To get this running with receipts it took some retraining of the Inception Model.

Since the original application was written for a hackathon, this was limited to training on two banners, and a miscellaneous category which acts as a catch-all for everything else. The catch-all category allows the model to see that there are other receipts in the small world it knows about. This helps to prevent the model overfitting. These models tend to be the know-it-all in the room, making predictions even if they are not 100% sure the answer is correct.

Training

For training all we really need is receipt images. Simple enough, we’ve got hundreds of millions of those. But then as always, the devil is in the details.

How many do we need?
Does quality matter?
What banners should this focus on?

First off, as with most(all) Machine Learning, bad data in, means bad data out so focusing on high quality images seemed like the way to go.

How many good images should these models be trained on?

We used a little bit of “science" here, and choose an arbitrary number of 750 images per banner and figured if more data was needed, retraining would cost nothing but time. With the images in hand, training can be started with a handy Tensorflow script (outlined at the end of this section).

Training Oddities

Since there was not ton of time to curate each and every image (something that could be done), we ended up with a couple of oddities.

Example 1

The banner Target for example was trained on a few blurry images, whereas the other buckets of data were not. The model would then assume every blurry image belonged to the banner Target.

Example 2

The results could be a mixed bag if the camera didn’t pick up the banner on the receipt, but did pick up the body of the receipt.
This shows the importance of data quality in these models, or possibly focusing on just one part of the receipt when making predictions.

One last note, not all classes have to be trained on the same amount of data. If you have a class that only has 50 images, but they are good images, this should suffice. Again not a one-size fits all approach here.

python ~/var/local/tensorflow/examples/image_retraining/retrain.py --bottleneck_dir=/tensor_flow_files/bottlenecks --model_dir=/tensor_flow_files/inception --output_graph=/tensor_flow_files/retrained_graph.pb --output_labels=/tensor_flow_files/retrained_labels.txt --image_dir=/tensor_flow_files/images

Just a quick rundown of what these command line options refer to :
bottleneck_dir - This stores cached values (1x1024 matrix by default) for all the images being trained on. It caches the bottleneck values from every image, and is reused if you rerun this script.

model_dir - Since this is a retraining script, the script needs to know where the original model lives, in our case this points to the Inception model.

output_graph - Where the model graph will be stored.

output_labels - This file contains the classes we want to label our data as. In our case this was a simple three line text file :

target
walmart 
misc 

image_dir - Where the images live, the directory structure looks like this.

images 
| 
|-target 
  | 
  |-file1.jpg 
  |-file2.jpg 
  |-... 
  |-file750.jpg
|-walmart
  | 
  |-file1.jpg 
  |-file2.jpg 
  |-... 
  |-file750.jpg
|-misc 
  | 
  |-file1.jpg 
  |-file2.jpg 
  |-... 
  |-file750.jpg

Kicking this off runs a default of 4000 training passes, hopefully getting better and better accuracy as it goes along:

2017-02-23 22:38:08.696257: Step 100: Train accuracy = 71.0%
2017-02-23 22:38:08.696344: Step 100: Cross entropy = 0.671635
2017-02-23 22:38:08.987991: Step 100: Validation accuracy = 73.0% (N=100)

The 4000 step default can be modified depending on individual use cases, but for our simple three class model it was more than enough.

Predicting

Once the model finished learning what Walmart and Target receipts look like, the real question could be answered: could this data be used to make future predictions (on data the model has never seen before) quickly and accurately?

Running a few quick verifications against the model we found that our initial 750 images did seem to be enough to make definitive predictions against a test set. For example, it could predict with a 95% confidence interval that a Walmart receipt was, in fact, the banner walmart:

root@32d1cc496145:/tensorflow# python /tensor_flow_files/label_image.py /tensor_flow_files/walmart_image.jpeg
walmart (score = 0.95185)
other (score = 0.04815)
root@32d1cc496145:/tensorflow#

The opposite case was also true. Given a receipt that did not fall into one of our trained banners, it was able to predict that it was not one of the trained banners with a >90% confidence interval:

root@32d1cc496145:/tensorflow# python /tensor_flow_files/label_image.py /tensor_flow_files/kroger_image2.jpeg
other (score = 0.92185)
target (score = 0.06680)
walmart (score = 0.01135)
root@32d1cc496145:/tensorflow#

The model's early returns were great and diving further in and testing it with loads of other types of receipts yielded similar results. However, if you’ve been following along closely, you’ll notice at the beginning of this post we said we wanted on device ML, not on MacBook Pro command line ML. So we went one step further and put it directly onto the phones.

On the Device

This actually proved way simpler than one might think. While the actual steps are a bit more involved than this. It essentially boils down to:

  • Put the model in an application on an Android Device
  • Open the application
  • Point the phone at a receipt, or other object
  • See the Results

Screen shots from my phone

Future Work

As this was only a hackathon project it served more as a proof of concept then ready for production code. Our ongoing work involves building this model to support thousands of different banners while still keeping the model small enough and fast enough to run on a phone.

In our next blog post we’ll jump over to another major source of ML research for us, which is automatically classifying our >1 billion receipt line items to our taxonomy of 2200 categories and 1800 brands.

Following that, we’ll take a look at how to use the bottleneck files to create a reverse image search.

if you love ML/Computer Science/Market Research, and want a really cool place to work InfoScout is hiring!