An updated feature comparison between Capacity Scheduler and Fair Scheduler

security-decision-making-938x535

INTRODUCTION

As every Hadoop YARN user knows, YARN has two schedulers: fair scheduler and capacity scheduler (Actually there’s a 3rd scheduler, called Fifo scheduler, but not that widely adopted). “Which scheduler to use?” is one of the most common questions asked by YARN user.

I want to write this blogpost to help you understand the latest feature-wise comparision between two schedulers. Hope this could make you become less doubt about making choices between the two schedulers.

FEATUREWISE COMPARISION OF TWO SCHEDULERS

Following table lists featurewise comparison of two schedulers.

Feature Fair Scheduler Capacity Scheduler
1. CPU Scheduling
2. Node Label (Partition)
3. Fairness between applications
4. Hierarchical XML configuration
5. Basic inter-queue preemption
6. Preemption for pending request
7. Resource limit of users
8. Reservation system
9. On-demand queue creation
10. Container resizing

Short explanations of these features:

1) CPU Scheduling

Considering CPU while scheduling. By default, YARN scheduler ignores CPU in resource requests from application.

2) Node Partition (Label)

Node partition is a way to divide a big cluster into several smaller clusters based on hardware / purpose. Capacities and ACLs can be added to partition.

You can take a look at YARN-796, Hadoop Summit node label talk for more details.

3) Fairness between applications

Fairness between applications is introduced to make sure that all applications can make progress with in a queue. Fair Scheduler supports it for sure, Capacity Scheduler will start supporting it since YARN-3319.

4) Hierarchical XML configuration

Fair scheduler uses nested xml configuration to mimic hierarchical of queues, it’s more intuitive than traditional Hadoop-style configuration.

5) Basic Preemption

Preemption is a way to balance resource usage between queues: When the cluster doesn’t have enough idle resource, one queue is over-utilized and  another queue is under-utilized, scheduler can preemption resource from the over-utilized queue.

6) Preemption for pending request

Previously preemption is a shotgun: it doesn’t consider what’s the actual resource request,  this causes when an under-utilized queue needs one single container with 60GB, scheduler could preemption 60 * 1GB containers on different nodes. Such preempted resources cannot be used by target resource request.

Capacity Scheduler supports preemption based on size of pending resource request since YARN-4390.

7) Resource limits for users

Capacity scheduler supports limiting how much resource can be used by each user within a queue. This avoids one user takes over all the resource in the cluster.

8) Reservation System

Reservation system (YARN-1051) is to make applications can reserve resource ahead of time.

9) On-demand queue creation

Fair scheduler supports dynamically creation of queues if the queue requested by application is not existed.

10) Container resizing

Applications can update size of their running containers based on workload changes. It is added to Capacity Scheduler since YARN-1197.

END

Hope you will enjoy this blog! For this feature, credit goes to Apache Hadoop community, and feel free to send mail to Hadoop user/dev mail list if you have further questions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s