Scaling neural network image classification using Kubernetes with TensorFlow Serving

In 2011, Google developed an internal deep learning infrastructure called

The process of introducing machine learning into your product involves creating and training a model on your dataset, and then pushing the model to production to serve requests. In this blog post, we’ll show you how you can use

Let’s use image classification as an

|

You can implement image classification with TensorFlow using the

Once the model is trained and

Inference can be very resource intensive. Our server executes the following TensorFlow graph to process every classification request it receives. The Inception-v3 model has over 27 million parameters and runs 5.7 billion floating point operations per inference.

|

Fortunately, this is where Kubernetes can help us. Kubernetes distributes inference request processing across a cluster using its External Load Balancer. Each pod in the cluster contains a

To help you try this out yourself, we’ve written a