Highlights of Tensorflow Dev Summit 2018 From Infra Engineer’s perspective

Quickly watched talks related to infra and here are highlights from my personal interests. 1) Tensorflow Hub. Link: https://www.tensorflow.org/hub/ This is definitely a must-to-have piece in Tensorflow. Existing TF training is based on copy-paste model. Assume you're a machine learning engineer and wants to use a pre-trained inception V3 network. You have to write lots … Continue reading Highlights of Tensorflow Dev Summit 2018 From Infra Engineer’s perspective

Maximum Likelihood Estimation, Cross Entropy and Deep Learning Network

Reading Ian Goodfellow's Deep Learning Book recently, the 5th chapter (Machine Learning Basics) is really great. Comparing to Bishop's Pattern Recognition and Machine Learning, it includes less mathematics and formulas which is good for a casual read. Today I want to share the topic of maximum likelihood estimation (MLE) which might not be straightforward to … Continue reading Maximum Likelihood Estimation, Cross Entropy and Deep Learning Network

Deep understand locality in CapacityScheduler and how to control it.

Locality settings in CapacityScheduler looks straightforward but it is not so simple in reality. We have two level of locality delays: node->rack (delay1), rack->off-switch (delay2).  As of now, delay in CapacityScheduler is based on missed-opportunity (abbr. MO, how many skipped node allocation) instead of wall clock time. Let’s start with an example. We have a … Continue reading Deep understand locality in CapacityScheduler and how to control it.

An updated feature comparison between Capacity Scheduler and Fair Scheduler

INTRODUCTION As every Hadoop YARN user knows, YARN has two schedulers: fair scheduler and capacity scheduler (Actually there's a 3rd scheduler, called Fifo scheduler, but not that widely adopted). "Which scheduler to use?" is one of the most common questions asked by YARN user. I want to write this blogpost to help you understand the … Continue reading An updated feature comparison between Capacity Scheduler and Fair Scheduler

Suggestions About How To Better Use YARN Node Label

Introduction Node label is an attractive feature of YARN, which is available since Apache Hadoop 2.6. It can solve problems in different scenarios. However, from Hadoop JIRA and mail lists, many users encounter issues to setup and use node label. As major designer and maintainer of this feature, I would highly recommend you to read … Continue reading Suggestions About How To Better Use YARN Node Label