As every Hadoop YARN user knows, YARN has two schedulers: fair scheduler and capacity scheduler (Actually there’s a 3rd scheduler, called Fifo scheduler, but not that widely adopted). “Which scheduler to use?” is one of the most common questions asked by YARN user.
I want to write this blogpost to help you understand the latest feature-wise comparision between two schedulers. Hope this could make you become less doubt about making choices between the two schedulers.
FEATUREWISE COMPARISION OF TWO SCHEDULERS
Following table lists featurewise comparison of two schedulers.
|Feature||Fair Scheduler||Capacity Scheduler|
|1. CPU Scheduling||√||√|
|2. Node Label (Partition)||√|
|3. Fairness between applications||√||√|
|4. Hierarchical XML configuration||√|
|5. Basic inter-queue preemption||√||√|
|6. Preemption for pending request||√|
|7. Resource limit of users||√|
|8. Reservation system||√||√|
|9. On-demand queue creation||√|
|10. Container resizing||√|
Short explanations of these features:
1) CPU Scheduling
Considering CPU while scheduling. By default, YARN scheduler ignores CPU in resource requests from application.
2) Node Partition (Label)
Node partition is a way to divide a big cluster into several smaller clusters based on hardware / purpose. Capacities and ACLs can be added to partition.
3) Fairness between applications
Fairness between applications is introduced to make sure that all applications can make progress with in a queue. Fair Scheduler supports it for sure, Capacity Scheduler will start supporting it since YARN-3319.
4) Hierarchical XML configuration
Fair scheduler uses nested xml configuration to mimic hierarchical of queues, it’s more intuitive than traditional Hadoop-style configuration.
5) Basic Preemption
Preemption is a way to balance resource usage between queues: When the cluster doesn’t have enough idle resource, one queue is over-utilized and another queue is under-utilized, scheduler can preemption resource from the over-utilized queue.
6) Preemption for pending request
Previously preemption is a shotgun: it doesn’t consider what’s the actual resource request, this causes when an under-utilized queue needs one single container with 60GB, scheduler could preemption 60 * 1GB containers on different nodes. Such preempted resources cannot be used by target resource request.
Capacity Scheduler supports preemption based on size of pending resource request since YARN-4390.
7) Resource limits for users
Capacity scheduler supports limiting how much resource can be used by each user within a queue. This avoids one user takes over all the resource in the cluster.
8) Reservation System
Reservation system (YARN-1051) is to make applications can reserve resource ahead of time.
9) On-demand queue creation
Fair scheduler supports dynamically creation of queues if the queue requested by application is not existed.
10) Container resizing
Applications can update size of their running containers based on workload changes. It is added to Capacity Scheduler since YARN-1197.
Hope you will enjoy this blog! For this feature, credit goes to Apache Hadoop community, and feel free to send mail to Hadoop user/dev mail list if you have further questions.