Highlights of Tensorflow Dev Summit 2018 From Infra Engineer’s perspective

Quickly watched talks related to infra and here are highlights from my personal interests.

1) Tensorflow Hub.

Link: https://www.tensorflow.org/hub/

This is definitely a must-to-have piece in Tensorflow. Existing TF training is based on copy-paste model. Assume you’re a machine learning engineer and wants to use a pre-trained inception V3 network. You have to write lots of glue code to download module, make your network uses the output of pre-trained models, etc.

As transfer learning becomes more and more popular. Deep neural networks can be reused as the input to other networks. The core part of Tensorflow Hub is Module. A module is essentially a group of functions (called signature). And downstream modules / tensorflow programs can decide which module to use.

In addition to using pre-trained modules as a blackbox (like using DLL or jars). Engineers can do fine-tuning of modules by feeding own data, which can save tens of thousands GPU hours comparing to train all parameters from scratch. (all $$$).

2) Distributed learning


This year, TF team announced the new DistributionStrategy API. Per my understanding, It is introduced a way to better control the execution of distribution tasks and allows distributed program doesn’t need to aware underlying distributed communication implementation.

if you knew MPI (message passing interface), you can better understand what it is trying to achieve. It looks like MPI for Tensorflow. Existing DistributionStrategy is still in early phase, as of now, it only supports single node multiple GPU card. Not sure how popular it could be since it is a lower level APIs. Many TF users are still using estimator’s train_and_evaluate since it is more straightforward and can support multi-node training.

3) Preferred High-level APIs:

TF recommend using following high-level APIs: (Just a copy of screenshots so you can get an overlook and dig into details if you have more interests).

(Captured from Keynote 9min-12min).





4) Eager execution.

Video: https://www.youtube.com/watch?v=T8AW0fKP0Hs&list=PLQY2H8rRoyvxjVx3zfw4vA4cvlKogyLNN&index=3

This is really an important piece to convert users from communities like Pytorch turns to TF. Prior to eager execution, Tensorflow can only support static graph, which is harder for debugging and to construct the static graph. You have to write a lot of boilerplate code.

If you have tried to debug TF program before, you will know how unfriendly the static graph for debugging: All variables are evaluated from training execution. And training execution is basically done by Tensorflow black boxes: it’s not so simple to add breakpoints using your Python IDE.

With the brand new eager execution, you can write program like:

tf.executing_eagerly()        # => True
x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))  # => "hello, [[4.]]"

And use your favorite IDE to debug your program.

5) Tensorflow.js.

I think this is the biggest announcement of Tensorflow dev summit this year.

It looks useful for TF community, but not quite useful for our use case (From enterprise customer’s POV): it trains model use the browser resource and serve model use browser resource as well. which means it cannot support larger models/data.

Enterprise’s use case uses a cluster to handle model which hard to be handled by a less powerful machine.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s