Trained model, prediction service, and performance monitoring

The output of model validation is a trained model that can be pushed to the model registry.

A machine learning model registry is a centralized tracking system that stores lineage, versioning, and related metadata for published machine learning models.

3/20 Trained model, prediction service, and performance monitoring

A registry may capture governance data required for auditing purposes, such as:

4/20 Trained model, prediction service, and performance monitoring

Who trained and published a model

5/20 Trained model, prediction service, and performance monitoring

Which datasets were used for training

6/20 Trained model, prediction service, and performance monitoring

The values of metrics measuring predictive performance

7/20 Trained model, prediction service, and performance monitoring

When the model was deployed to production

It’s a place to find, publish, and use ML models or model pipeline components.

8/20 Trained model, prediction service, and performance monitoring

Machine learning uses data to answer questions.

So prediction, or inference, is the step where we get to answer the questions we posed–whether it be a business problem or an academic research problem.

The trained model is served as a prediction service to production.

9/20 Trained model, prediction service, and performance monitoring

It’s important to note that the process is concerned only with deploying the trained model as a prediction service, for example, a microservice with a REST API, rather than deploying the entire ML system.

10/20 Trained model, prediction service, and performance monitoring

For example, Google’s AI Platform Prediction service has an API for serving predictions from machine learning models.

11/20 Trained model, prediction service, and performance monitoring

In this particular example, AI Platform Prediction retrieves the trained model and saves it as a pickle in Cloud Storage.

Pickle is the standard way of serializing objects in Python.

12/20 Trained model, prediction service, and performance monitoring

Trained models deployed in AI Platform Prediction service are exposed as REST endpoints that can be invoked from any standard client that supports HTTP, such as a JupyterLab notebook.

13/20 Trained model, prediction service, and performance monitoring

The AI Platform Prediction service can host models trained in popular machine learning frameworks including TensorFlow, XGBoost, and Scikit-Learn.

14/20 Trained model, prediction service, and performance monitoring

As a best practice, you need a way to actively monitor the quality of your model in production.

Monitoring lets you detect model performance degradation or model staleness.

The output of monitoring for these changes then feeds into the data analysis component, which could serve as a trigger to execute the pipeline or to execute a new experimental cycle.

For example, monitoring should be designed to detect data skews, which occur when your model training data is not representative of the live data.

That is to say, the data that we used to train the model in the research or production environment does not represent the data that we actually get in our live system, and this leads to model staleness.

15/20 Trained model, prediction service, and performance monitoring

To understand other performance metrics, you can configure Google’s Cloud Monitoring to monitor your model’s:

16/20 Trained model, prediction service, and performance monitoring

Traffic patterns

17/20 Trained model, prediction service, and performance monitoring

Error rates

18/20 Trained model, prediction service, and performance monitoring

Latency

19/20 Trained model, prediction service, and performance monitoring

Resource utilization

20/20 Trained model, prediction service, and performance monitoring

This can help spot problems with your models and find the right machine type to optimize latency and cost.

Eduardo Avelar

Trained model, prediction service, and performance monitoring