1/15 TPU strategy

Similar to mirrored strategy, TPU strategy uses a single machine

2/15 TPU strategy

where the same model is replicated on each core, with its variable synchronized mirrored across each replica of the model.

3/15 TPU strategy

The main difference, however, is that the TPU strategy will all-reduce across TPU cores, whereas the Mirrored Strategy will all-reduce across devices.

4/15 TPU strategy

tf.distribute.TPUStrategy lets you run your TensorFlow training on Tensor Processing Units (TPUs).

TPUs are Google’s specialized ASICs designed to dramatically accelerate machine learning workloads.

TPUs provide their own implementation of efficient all-reduce and other collective operations across multiple TPU cores, which are used in TPUStrategy.

5/15 TPU strategy

You’ll also need a variable called strategy but this time you will choose the tf.distribute.TPUStrategy method

6/15 TPU strategy

Because TPUs are very fast, many models ported to the TPU end up with a data bottleneck.

7/15 TPU strategy

The TPU is sitting idle, waiting for data for the most part of each training epoch.

8/15 TPU strategy

TPUs read training data exclusively from Google Cloud Storage (GCS).

9/15 TPU strategy

And GCS can sustain a pretty large throughput if it is continuously streaming from multiple files in parallel.

Following best practices will optimize the throughput.

10/15 TPU strategy

With too few files, GCS will not have enough streams to get max throughput.

11/15 TPU strategy

With too many files, time will be wasted accessing each individual file.

12/15 TPU strategy

Let’s summarize the distribution strategies using code.

Our base scope is a Keras sequential model.

13/15 TPU strategy

Now, to improve training, we can use the mirrored strategy.

14/15 TPU strategy

Or for faster training, the multi-worker mirrored strategy.

15/15 TPU strategy

And for really fast training, the TPU strategy.