© 2017 by Doran Bae 

You are reading Coding Otter, stuffed with articles about big data, data science development, ML in production, managing data science projects, and many more. ​
About the host
I'm Doran Bae, Data Scientist @TVB turning data into products and stories. More about me.

Machine Learning in Production

Helpful tips on launching your ML model in production for the first time

Recently, I had the honor of speaking at Big Data and Machine Learning Leaders Summit 2018 in Hong Kong. As a product gal, I thought this will be a great chance to talk about machine learning in production.

Why is this important?

If the answer to this question does not come to you naturally, the chances are that you are a R-type data scientist. In my opinion, I classify data scientists into two broad groups: R-type and F-type. R in R-type stands for research. R-types work on projects that is highly focused on finding answers through algorithm optimization. Their work is often to prove a concept or to find an insight. F in F-type stands for factory. F-types work on projects that gets served to the users on a user platform. These projects need to be productized and focus more on functionality rather than optimization. For F-types, productizing a ML model is the goal of every project they undergo. If the model is not productized, it means the project has failed or is on hold - either way, it is a negative indicator to themselves. For us, F-types, productization is everything we live for. We want to build a ML model, but what we want more is to launch the model in production.

What is the current problem?

Unfortunately, there is not much information out there about launching ML model in production. Wait, let me rephrase this: there is not as much information as there is for building the model. So if it is your first time, often data scientists struggle with how to do this well. So this is my attempt to give a bit of introduction on what not to do/what to do when you are launching your ML model for the first time.

1. Start simple and iterate

I think many data scientists want to work on cool things, like deep learning. I am speaking from a experience. You hear about self-driving cars and alpha-go; and you feel like you would be missing out if you are not working on something related to deep learning.

But my first advice is don't be afraid to launch a product without deep learning. There are few reasons why I think this.

Simple logic can help you plan for more complex model later

Start from simple feature that you can understand and easily tracked in data. For example, if you want to make a video recommendation for our users, the first step should be thinking "What would make our users to watch something new?" It could be something related to trendy shows right now, or dramas from the same genre.

Let's say we pick ranking. What you should do is to build your model based on ranking, and see how your users interact with it. Unless you are solving a very complex problem, I am sure ranking can get you at least halfway there. For instance, 80% of my company's video streams come from the top 10% of the videos we offer. This means if we offer our users any video from the top 10%, the majority of our users will find it useful.

Doing this over and over again, you can identify which feature works and which doesn't. Only this way, you can decide which feature is meaningful enough for your more complex model and how to engineer it.

In addition, your simple model can provide you a baseline metric which you can use to compare more complex model against.

Get your foot in the door and then update your model

There is a very practical reason why you shouldn't be afraid to launch a simple model.

You want to get your foot in the door first. Complex models don't grow on trees. You need to work hard for it - maybe for weeks or even months if you want to train it with enough data.

But the truth is, your project manager will not wait for you to finish training the model. If you can't deliver it now, then you may never be able to deliver it for a long time.

So if you are in the situation where you need to give something fast; don't play hard-to-catch. My advice is get in there, even if it means you have to deliver a static file! Occupy a section on your product display. Get them hooked! Then you can start thinking about how to improve your model.

You will have a plenty of time to upgrade your model if not today, then tomorrow. I think it is a good idea to launch a model, or upgrade your model, at least once every quarter. The point is if it's possible, start simple and get your model there first.

The incremental gain in accuracy may not be sufficient enough to justify the extra resource

This is another reason why we may want to choose a simpler logic instead of deep learning.

Deep learning comes with more baggage than we want. In the field of research, optimizing an algorithm even for a fraction of increase in accuracy may be considered as a breakthrough

But in production world, it won't received the same sentiment. In SW engineering, anything that decrease speed, for example - calculating for the prediction - must be justified. It is not only about the computation power, but the maintenance of the model and the infrastructure also can weigh heavier than you want. As a data scientist, you need to think about the trade off between the complexity of the model and accuracy. If your model is too complex for the potential gain, maybe it is a time for you to think about optimizing your model.

2. Choose the right tool

In fact, 90% of the job for putting ML in production is about setting up the infrastructure. You need to set up new infrastructure that can support the model front and back. For example, you need to deal with data aggregation issues, different cloud conversions, and serving infrastructure. You also need a task manager like Airflow to manage your tasks, and version controller to control your model versions.

It is not a small matter. It is huge and it requires lots of investment and learning time.

It also means a lot of solutions/packages/options to choose from. We also tried many different solutions and packages that claimed they can solve our problem. They all had sleek logos and they all seemed to have really good references. While few of them we still continue to use, but a lot of them had unhappy endings. Even the very well-known tools with big brand names - did not deliver what they promised. Or it had too small community to guarantee a seamless support.

You should find your own combination of tools - depending on your use-case. But be careful of your choices and choose the one that fits your use-case. One thing I would like the data science community to take a bigger part in is sharing the infrastructure design for machine learning. I wish we can share more of the cases, so we can learn from each other and grow together.

3. Include human in the loop

Before we deploy the model, there is a process called QA - checking for errors which can save us from potentially make a fun of ourselves. While it is common in product development, it is really difficult to do QA for machine learning, well because there is no right or wrong answer in machine learning. Everything is just a probability.

Now this creates a potential problem, because it makes data scientists think that QA is not needed in machine learning - as long as the model promises a decent overall accuracy

But accuracy does not tell us anything about the user experience quality. It is just a summed up number at the end. As a data scientist who is putting the model in production, I think it is your job to do sanity check, meaning you should always make sure that your model is reasonable in the eyes of the customers. Of course there is no possible way for us to QA for all cases.

But do this much. Check your website in QA, have a set of test data where you can check for extreme cases - or cases that are important to managers. We had an incident in the early days, where a model we made produced a decent recommendation for the 95% of our videos. The model missed the rest, 5%, because there was no data for it. In the minds of a data scientist, 5% - not that big of a deal. But in this 5%, included a very important videos that the business manager cared about. A simple checking could have allowed us to discover this problem and fix it before we deploy the model

When your model goes to production, it is not a model anymore - it is a product. And you need to confirm that your product is good enough for your users to use

4. Follow business metrics

After a while you deploy your model, you will be required to measure the performance of your model or models, in plural. In the dev stage, we usually use accuracy to determine whether one model is better than the other.

In production, you will find yourself quickly ditching accuracy for more pragmatic metrics.

For example, we had two competing models in production. Model A was giving us a higher CTR - a better one in terms of accuracy. But the users in group B were spending more on our platform. In this case, there is no doubt that a sound person will choose the model B.

Accuracy does not carry the same weight as it did in the dev-stage. Because accuracy only tells one side of the story - of the models. In production you want to listen to the feedback from the customers. Accuracy is important, but what is more important than accuracy is how you use this predictions.

5. Watch for hidden failures

Maintenance is a very important step in ML in production. After you deploy your model, you need to keep watching your model. Now what is the worst case scenario of a model failure?

It is when the model breaks without you knowing. Hidden failures describe a situation when the model's predictive power is deteriorating without alerting the data science team

This actually happens more than you think.

Let's assume one of the features that our model relies on comes from a certain data table. For example, we have a table which all models rely on, and it is a table where we get our video's hierarchy from- you know how which episode belongs to which program, and to which genre. When a new program is uploaded, this table should be updated with the new program. If this table is outdated even for couple days, the model will not be able to recommend any new programs.

Hidden failures are my worst nightmare, because on the outside, it looks fine - but actually the model is broken and I am not aware of it. You should avoid any situation where someone else finds your mistake before you do.

One way to prevent the hidden failures is to track and log your model all the time. Without adequate maintenance, your model may slowly decay, and in the worst case it can break. And at that time, it is already too late.

Final thoughts

So these are some of my tips in deploying your model in production. Even if you build the best model, if it does not fulfil the requirement as a product, it has no value to our users. And it is our job as a data scientist, that our effort in building the model can be safely extended to production. Building a model is fun, but productizing your machine learning model is something you can be proud of because your model is reaching your users and shaping their experience on your product.