Actions to mitigate concept drift

As previously mentioned, both data drift and concept drift lead to Model Drift.

Data drift - Examples of data drift can include concepts of:

The change in the spamming behavior to try to fool the model.

A rule update in the app–in other words, a change in the limit of user messages per minute.

Selection bias.

Non-stationary environment–training data for a given season that has no power to generalize to another season.

eCommerce apps are good examples of potential concept drift due to their reliance on personalization, for example, the fact that people’s preferences ultimately do change over time.

Sensors may also be subject to concept drift due to the nature of the data they collect and how it may change over time.

Movie recommendations - again - similar to eCommerce apps - rely on user preferences - and they may change.

Demand forecasting heavily relies on time, and as we have seen, time is a major contributor to potential concept drift.

So, what if you diagnose data drift?

If you diagnose data drift, enough of the data needs to be labeled to introduce new classes and the model retrained.

What if you diagnose concept drift? If you diagnose concept drift, the old data needs to be relabeled and the model retrained.

Also, for concept drift, you can design your systems to detect changes.

Periodically updating your static model with more recent historical data, for example, is a common way to mitigate concept drift.

You can either discard the static model completely or you can use the existing state as the starting point for a better model to update your model by using a sample of the most recent historical data.

You can also use an ensemble approach to train your new model in order to correct the predictions from prior models.

The prior knowledge learnt from the old concept is used to improve the learning of the new concept.

Ensembles which learnt the old concept with high diversity are trained by using low diversity on the new concept.

Remember, concept drift is the change in relationships between the model inputs and the model output.

After your diagnosis and mitigation efforts, retraining or refreshing the model over time will help to maintain model quality.

As the world changes, your data may change.

The change can be gradual, sudden, and seasonal.

These changes will impact model performance.

Thus, machine learning models can be expected to degrade or decay.

Sometimes, the performance drop is due to low data quality, broken data pipelines, or technical bugs.

Eduardo Avelar

Actions to mitigate concept drift