Machine Learning Deployment: Shadow Mode

A black and white photo of bricks making up Kresge Auditorium at MIT.

Deploying a machine learning product so that it can be used is essential to getting value out of it. But it is one of the hardest parts of building the product.

In this post I will focus on a small piece of deployment: “How do I test my new model in production?” One answer, and a method I often employ when initially deploying models, is shadow mode.

If you’re interested in a broader overview of building and deploying machine learning products, I highly recommend Emmanuel Ameisen’s book: Building Machine Learning Powered Applications!1

What Is Shadow Mode?

To launch a model in shadow mode, you deploy the new, shadow model alongside the old, live model.2 The live model continues to handle all requests, but the shadow model also runs on some (or all) of the requests. This allows you to safely test the new model against real data while avoiding the risk of service disruptions.

When Would I Use Shadow Mode?

Shadow mode is a great way to test a few things:

Shadow mode works well when the result of the model does not need a user action to validate it. Models where you try to influence the user—for example a recommendation model where success means more sales converted—are better tested using an A/B test. The big difference between an A/B test and shadow mode is that in an A/B test traffic is split between the two models whereas in shadow mode the two models operate on the same events.

How Do I Deploy In Shadow Mode?

There are two general methods that I use for deploying in shadow mode. Both are relative to the API for the live model: either in front of the live API or behind the live API.

In Front of the API

To put a model in shadow mode in front of the API, you host two API endpoints: one for the live model and one for the shadow model. The caller makes a call to both of them whenever they would normally call the live model. The caller can disregard the response, but they should log it so that the results can be compared. I have drawn this structure below:

A diagram showing how in front of the API shadow mode is constructed.

This way of deploying is well-suited to situations where the calling team is change-adverse or has very strict requirements for how the shadow model must perform because it gives them control. I have found it useful for deploying models that have a large effect on some conversion funnel, like a model that runs at new user creation and blocks suspected bad actors.

The advantages of this method are:

The main disadvantages are:

Behind the API

To put a model in shadow mode behind the API, you change the code that responds to API requests to call the live and shadow model. You log the results of both models3 but only return the result from the live model. I have drawn a schematic of this below:

A diagram showing how behind the API shadow mode is constructed.

This method is great when you want to move quickly (and break things), because you can change the shadow model without having to coordinate with the calling team. To the outside world the API looks unchanged and so hides the testing going on behind it.

The advantages of this method are:

The main disadvantages are:


Deploying a model in shadow mode is an easy way to test your model on live data. It is flexible and allows you to empower the right team to control the experiment.

  1. Disclaimer: I was a technical editor for the book, but make no money off sales. 

  2. By “live model”, I mean whatever system is currently doing the job that the shadow model will do. It could be a model, a heuristic, a simple if statement, or even nothing at all. 

  3. You are logging your live results, right?