1/25 Lab: Adapting to data
2/25 Lab: Adapting to data

When leading a team of engineers,

3/25 Lab: Adapting to data

many decisions are informed by

4/25 Lab: Adapting to data

technical debt and other sorts of

5/25 Lab: Adapting to data

cost-benefit analyses.

6/25 Lab: Adapting to data

The best teams get very high rates of return on their investments.

With that in mind,

7/25 Lab: Adapting to data

let’s consider a few scenarios.

8/25 Lab: Adapting to data

Let’s imagine that you’re the leader of a team of engineers

9/25 Lab: Adapting to data

and you are nearing the end of a code sprint.

10/25 Lab: Adapting to data

One of the team’s goals for the sprint is to increase performance on the model by 5%.

11/25 Lab: Adapting to data

Currently, however, the best performing model is only marginally better than what was around before.

One of the engineers acknowledges this

12/25 Lab: Adapting to data

but still insists that it’s worth spending time doing an extensive ablation analysis

13/25 Lab: Adapting to data

where the value of an individual feature is computed by comparing it

14/25 Lab: Adapting to data

to a model trained without it.

What might this engineer be concerned about?

15/25 Lab: Adapting to data

The engineer might be concerned about legacy and bundled features.

16/25 Lab: Adapting to data

Legacy features are older features that were added, because they were valuable at the time.

But since then, better features have been added, which have made them redundant without our knowledge.

17/25 Lab: Adapting to data

Bundled features on the other hand, are features that were added as part of a bundle, which collectively are valuable but individually may not be.

Both of these features represent additional unnecessary data dependencies.

18/25 Lab: Adapting to data

In another scenario,

19/25 Lab: Adapting to data

another engineer has found a new data source that is very much related to the label.

20/25 Lab: Adapting to data

The problem is that it’s in a unique format and there’s no parser written in Python, which is what the codebase is composed of.

21/25 Lab: Adapting to data

Thankfully, there is a parser on the web but it’s closed source and written in a different language.

The engineer is thinking about the model performance.

22/25 Lab: Adapting to data

Something in the back of your mind seems wrong.

23/25 Lab: Adapting to data

What is it? It’s the smell.

No, really! There’s a concept called

24/25 Lab: Adapting to data

code smell and it applies in ML as well.

25/25 Lab: Adapting to data

In this case, you might be thinking, “I wonder what introducing code that we can’t inspect and are unable to easily modify into our testing in production frameworks will do.”