There are only 124 authentic pizza places out of 1579 in New York. This is what an AI said. Combining computer vision and machine learning to ease the search for Neapolitan pizza based on photos from public crowd-sourced reviews.

Check your city! Neapolitan pizza finder is available on the following link: https://vasilykorf.com/pizza/

Outline

There are many different types of pizza to order: New York-style pizza, Chicago-style deep-dish pizza, Detroit-style square pizza, you name it.

Although, there’s only one pizza above all others in terms of its taste and simplicity: Neapolitan pizza. That’s classic! Let me google for you to show some examples:

Guess you got the idea. In other words, a really good, well-made Neapolitan pizza is a unique experience. It's something you can sit back and enjoy. This type of pizza was invented in Naples, Italy. Italian law insists Neapolitan pizza must include wheat flour, flour yeast, mineral water, peeled tomatoes, mozzarella cheese, sea salt & olive oil. That’s it. So why should you spend time scrolling photos of pizza from restaurant reviews if computer vision can do it for you?

Research design

I teamed up with my friend Dmitrii Stepakov to build this tool, including parsing and labeling data, defining tech stack, applying proper machine learning techniques, and deploy a website with predictions and basic UI.

Dataset

It is a binary classification problem. Since our target is Neapolitan pizza, we have to look at pizza places in Naples and get photos from public reviews. Moreover, we need the same amount of samples for non-Neapolitan pizza, like trashy pizza from Papa John’s or mediocre pizza around the corner cooked on an electric grill. Google, yelp, foursquare APIs, and imageNet database are the main sources.

Source: Hybrid Knowledge Routed Modules for Large-scale Object Detection

It's not as simple as that. Data Scientists, not coincidentally, spend a lot of time cleaning, verifying, and organizing data. This project is no different. After reviewing the first batch of data, we noticed that there are many photos of interiors, visitors, street signs. That brought additional module – image recognition or pizza detection in our case.

We ended up having about 6k labeled photos of pizza.

Food Detection with CV

The first step was to identify food on photos from our dataset. This might sound easy if you would like our model to provide labels such as “burger” or “pasta”. Although, we were counting on getting objects features for further model, like types of dough, pizza sizes. We approached the problem from different sides and tried several solutions.

Pizza detection with CV: dataset and predicted labels

First attempt – Histograms of Oriented Gradients (HOG)

Histograms of Oriented Gradients for Human Detection is the scientific paper that inspired us.

The model gives very good results for person detection. Nevertheless, after we applied HOG as a featured and trained SVM to classify images the result wasn’t promising at all. The implementation shows accuracy between 68-75%.

Second attempt – OpenCV

Subsequently, we reframed the problem and tried to segment and detach pizza object with OpenCV and SVM.

There have also been many attempts to detect pizza rim and use it for classification as a feature. The initial idea came from this paper: Pizza sauce spread classification using colour vision and support vector machines. Also, here is a great article describing the SVM algorithm for image classification.

However, the result was pretty bad, it’s close to 60% of the accuracy. Here is an example:

Edge detection with OpenCV

Third time's a charm, right?

Sadly, the abovementioned attempts ended in failure, meaning nothing worked as well as we expected. That said, we had to put it simply. The last and the most effective try was made with TensorFlow CNN. Convolutional neural networks are the best way to solve this problem. Tensorflow has a great dataset food101, including 101 food categories. This model shows 82.5% accuracy.

Pizza detection

For the second part, which is Neapolitan pizza detection, ResNet50 came to the aid. Broadly speaking, ResNet is an updated version of CNN, it corrects CNN flaws by using shortcuts between layers. Default ResNet50 classifies photos from ImageNet. Seeing that we had to retrain it on our pizza dataset and did finetuning. Surprisingly, it works.

Resnet50 architecture. Source: https://morioh.com/p/dd3ffff216c5

Tech Stack

Project pipeline

To make a long story short, we use:

  • TensorFlow
  • ResNet50
  • Folium
  • OpenStreetMap

Conclusion

The model shows 94,6% accuracy on a test set. Given that, we extrapolated the model to the biggest cities in the US and visualized results using Folium and OpenStreetMap.

It is readily checked that the initial idea is satisfied. I did real-world testing and ordered pizza from two places in Denver suggested by the model – expectations have been fully met.

Check your city – https://vasilykorf.com/pizza/

Share your feedback, bug report, new city request in the comments below or via email.

And enjoy your meal!