Machine Learning Deployment:
Return Actions, Not Scores

A colorful pencil drawing of two robots fussing with some techno-thingy between them. Generated with stable diffusion. Prompt: A simple color pencil drawing a ((cute robot)), plugging cat5 cable into a network switch, white background

At a previous job, my team built models that stopped ATOs—Account takeovers, where a fraudster steals someone’s account credentials and attempts to use them. The engineering team that owned the login flow would call our model API, and we would return the model score. The engineering team had a threshold in their code, and if the score crossed that threshold, they would take some action.

You can probably already see the problem: APIs are meant to hide the inner workings behind them. But by returning the raw model scores, we revealed too much detail. Any changes to the model, like retraining it, could change the scores and break the front end.

In my guide to deploying machine learning models in shadow mode, I stated that deploying changes “in front of the API” has the advantage of giving the calling team control. This is precisely why we built the ATO API the way we did: to address the organizational issue that the engineering team did not trust the machine learning team.

But if your teams trust each other, there is a much better way to build.

What is a better way?

A better way is for the API to return a set of actions. For example, the ATO model API might return the following actions:

These actions do a really good job of hiding the implementation behind the API. You can freely change thresholds when the model performance changes, retrain the model, or even replace it entirely.

But you can do something else too, you can add more models!

Using multiple systems

A common fraud-prevention strategy is to train a model for each new fraud pattern identified. This allows each model to be highly precise, while also improving the recall of the overall system. These multi-model systems are often augmented with simple rules, such as “No logins from Russia allowed.” In the end, the system takes the outputs of the various models and rules and aggregates them in some way. In our ATO example, the system returns the most drastic action recommended by any model or rule.

In code:

def ato_api(event_token):
  # List of actions returned by all the models and rules,
  # consists of values from {'Allow', 'Step-up', 'Lock'}
  all_results = get_ato_system_results(event_token)

  if 'Lock' in all_results:
    return 'Lock'
  elif 'Step-up' in all_results:
    return 'Step-up'

  return 'Allow'

Of course, this is a great place to use enums and max:

from enum import IntEnum, unique

@unique
class Action(IntEnum):
  ALLOW = 0
  STEPUP = 1
  LOCK = 2

def ato_api(event_token):
  # List of actions returned by all the models and rules,
  # consists of values from Action() enum
  all_results = get_ato_system_results(event_token)

  return max(all_results)