1/51 Training design decisions

One of the key decisions you’ll need to make about your production ML system concerns training.

2/51 Training design decisions

Here’s a question.

3/51 Training design decisions

How is physics

4/51 Training design decisions

unlike fashion?**

If we assume that science is about discovering relationships that already exist in the world, then the answer is that

5/51 Training design decisions

physics is constant whereas fashion isn’t.

6/51 Training design decisions

To see some proof, just look at some old pictures of yourself.

7/51 Training design decisions

Now, you might be asking, why is this relevant?

8/51 Training design decisions

Well, when making decisions about training, you have to decide whether the phenomenon you’re modelling

9/51 Training design decisions

is more like physics,

10/51 Training design decisions

or like fashion.

11/51 Training design decisions

When training your model, there are two paradigms; static training and dynamic training.

12/51 Training design decisions

In static training, we gather our data, we partition it, we train our model, and then we deploy it.

13/51 Training design decisions

In dynamic training, we do this repeatedly as more data arrives.

This leads to the fundamental trade-off between static and dynamic.

14/51 Training design decisions

Static is simpler to build and test,

15/51 Training design decisions

but likely to become stale.

16/51 Training design decisions

Whereas dynamic is harder to build and test,

17/51 Training design decisions

but will adapt to changes.

And the tendency to become or not become stale is what was being alluded to earlier when we contrasted physics and fashion.

18/51 Training design decisions

If the relationship you’re trying to model is one that’s constant, like physics, then a statically trained model may be sufficient.

19/51 Training design decisions

If the relationship you’re trying to model is one that changes, then the dynamically trained model might be more appropriate.

Part of the reason the dynamic is harder to build and test is that new data may have all sorts of bugs in it.

And that’s something we’ll talk about more deeply in a later module on designing adaptable ML systems.

20/51 Training design decisions

Engineering might also be harder because we need more

21/51 Training design decisions

monitoring,

22/51 Training design decisions

model rollback,

23/51 Training design decisions

and data quarantine capabilities.

24/51 Training design decisions

Let’s explore some use cases and think about which sort of training style would be most appropriate.

25/51 Training design decisions

The first use case concerns spam detection, and the question you should ask yourself is,

how fresh does spam detection need to be?

26/51 Training design decisions

You could do this as static, but spammers are a crafty and determined bunch.

They will probably discover ways of passing whatever filter you impose within a short time.

27/51 Training design decisions

So, dynamic is likely to be more effective over time.

28/51 Training design decisions

What about Android Voice-to-Text?

Note that this question has some subtlety.

29/51 Training design decisions

For a global model, training offline is probably fine.

30/51 Training design decisions

But, if you want to personalize the voice recognition, you may need to do something online, or at least different, on the phone.

So this could be static or dynamic, depending on whether you want global or personalized transcription.

31/51 Training design decisions

**What about ad conversion rate? **

The interesting subtlety here is that conversions may come in very late.

For example, if I’m shopping for a car online, I’m unlikely to buy for a very long time.

32/51 Training design decisions

This system could use dynamic training, then regularly going back at different intervals to catch up on new conversion data that has arrived for the past.

So in practice, most of the time, you’ll need to use dynamic,

33/51 Training design decisions

but you might start with static because it’s simpler.

34/51 Training design decisions

In a reference architecture for static training, models are trained once and then pushed to AI Platform.

35/51 Training design decisions

Now, for dynamic training, there are three potential architectures to explore,

36/51 Training design decisions

Cloud Functions,

37/51 Training design decisions

App Engine,

38/51 Training design decisions

or Cloud Dataflow.

39/51 Training design decisions

In a general architecture for dynamic training using Cloud functions, a new data file appears in Cloud storage and then

40/51 Training design decisions

the Cloud function is launched.

41/51 Training design decisions

After that, the Cloud function starts the AI Platform training job, and then

42/51 Training design decisions

the AI Platform writes out a new model.

43/51 Training design decisions

In a general architecture for dynamic training using App Engine, when a user makes a web request, perhaps from a dashboard to App Engine,

44/51 Training design decisions

an AI Platform training job is launched,

45/51 Training design decisions

and the AI Platform job writes a new model to Cloud storage.

From there,

46/51 Training design decisions

the statistics of the training job are displayed to the user when the job is complete.

47/51 Training design decisions

It’s possible that the Dataflow pipeline is also invoking the model for predictions.

Here, a streaming topic is ingested into Pub/Sub from subscribers.

48/51 Training design decisions

Messages are then aggregated with Dataflow

49/51 Training design decisions

and aggregated data is stored into BigQuery.

50/51 Training design decisions

AI Platform is launched on the arrival of new data in BigQuery

51/51 Training design decisions

and then an updated model is deployed.