1/25 Changing distributions
2/25 Changing distributions

Earlier you saw how, in the context of ingesting an upstream model,

3/25 Changing distributions

our model’s performance would degrade if it expected one input but ingested another.

4/25 Changing distributions

The statistical term for changes in the likelihood of observed values like model inputs is changes in the distribution.

And it turns out that the distribution of the data can change for all sorts of reasons.

5/25 Changing distributions

For example, sometimes the distribution of the label changes.

6/25 Changing distributions

We’ve looked at the natality dataset in BigQuery

7/25 Changing distributions

and tried to predict baby weight.

Baby weight has actually changed over time.

It peaked in the 1980s and has since been declining.

8/25 Changing distributions

In 1969, babies weighed significantly less than they did in 1984.

When the distribution of the label changes, it could mean that the relationship between features and labels is changing as well.

At the very least, it’s likely that our model’s predictions, which will typically match the distribution of the labels in the training set, will be significantly less accurate.

However, sometimes it’s not the labels,

9/25 Changing distributions

but the features, that change their distribution.

10/25 Changing distributions

For example, say you’ve trained your model to predict population movement patterns

11/25 Changing distributions

using postal code as a feature.

Surprisingly, postal codes aren’t fixed.

Every year, governments release new ones and deprecate old ones.

12/25 Changing distributions

Now as a ML practitioner, you know that postal codes aren’t really numbers.

So you’ve chosen to represent them as categorical feature columns, but this might lead to problems.

If you chose to specify a vocabulary, but set the number of out of vocab buckets to 0, and didn’t specify a default, then the distribution may become skewed toward the default value, which is -1.

13/25 Changing distributions

And this might be problematic because the model may be forced to make predictions in regions of the feature space which were not well represented in the training data.

There’s another name for when models are asked to make predictions on points in feature space that are far away from the training data,

14/25 Changing distributions

and that’s extrapolation.

Extrapolation means to generalize outside the bounds of what we’ve previously seen.

15/25 Changing distributions

Interpolation is the opposite.

It means to generalize within the bounds of what we’ve previously seen.

Interpolation is always much easier.

For example, let’s say that the model got to see the yellow data and not the gray data.

The blue line reflects a linear regression on the yellow data.

Predictions in the yellow region are interpolated and reasonably accurate.

In contrast, predictions in the gray region are extrapolated and are increasingly inaccurate the farther we get from the yellow region.

16/25 Changing distributions

You can protect yourself from changing distributions using a few different methods.

The first thing you can do is

17/25 Changing distributions

be vigilant through monitoring.

You can look at the descriptive summaries of your inputs,

18/25 Changing distributions

and compare them to what the model has seen.

If, for example, the mean or the variance has changed substantially,

19/25 Changing distributions

then you can analyze this new segment of the input space, to see if the relationships learned still hold.

20/25 Changing distributions

You can also look to see whether the model’s residuals, that is the

21/25 Changing distributions

difference between its predictions and the labels, has changed as a function of your inputs.

22/25 Changing distributions

If, for example, you used to have small errors at one slice of the input and large in another, and now it’s switched, this could be evidence of a change in the relationship.

23/25 Changing distributions

Finally, if you have reason to believe that the relationship is changing over time,

24/25 Changing distributions

you can force the model to treat more recent observations as more important by writing a custom loss function,

25/25 Changing distributions

or by retraining the model on the most recent data.