Gang Scheduling
What is Gang Scheduling
When Gang Scheduling is enabled, YuniKorn schedules the app only when the app’s minimal resource request can be satisfied. Otherwise, apps will be waiting in the queue. Apps are queued in hierarchy queues, with gang scheduling enabled, each resource queue is assigned with the maximum number of applications running concurrently with min resource guaranteed.
Enable Gang Scheduling
There is no cluster-wide configuration needed to enable Gang Scheduling. The scheduler actively monitors the metadata of each app, if the app has included a valid taskGroups definition, it will be considered as gang scheduling desired.
A task group is a “gang” of tasks in an app, these tasks are having the same resource profile and the same placement constraints. They are considered as homogeneous requests that can be treated as the same kind in the scheduler.
Prerequisite
For the queues which runs gang scheduling enabled applications, the queue sorting policy should be set to FIFO
.
To configure queue sorting policy, please refer to doc: app sorting policies.
Why the FIFO
sorting policy
When Gang Scheduling is enabled, the scheduler proactively reserves resources for each application. If the queue sorting policy is not FIFO-based (StateAware is FIFO based sorting policy), the scheduler might reserve partial resources for each app and causing resource segmentation issues.
Side effects of StateAware
sorting policy
We do not recommend using StateAware
, even-though it is a FIFO based policy. A failure of the first pod or a long initialisation period of that pod could slow down the processing.
This is specifically an issue with Spark jobs when the driver performs a lot of pre-processing before requesting the executors.
The StateAware
timeout in those cases would slow down processing to just one application per timeout.
This in effect will overrule the gang reservation and cause slowdowns and excessive resource usage.
StateAware
sorting is deprecated in YuniKorn 1.5.0 and will be removed from YuniKorn 1.6.0.
App Configuration
On Kubernetes, YuniKorn discovers apps by loading metadata from individual pod, the first pod of the app is required to include a full copy of app metadata. If the app does not have any notion about the first or second pod, then all pods are required to carry the same taskGroups info. Gang scheduling requires taskGroups definition, which can be specified via pod annotations. The required fields are:
Annotation | Value |
---|---|
yunikorn.apache.org/task-group-name | Task group name, it must be unique within the application |
yunikorn.apache.org/task-groups | A list of task groups, each item contains all the info defined for the certain task group |
yunikorn.apache.org/schedulingPolicyParameters | Optional. A arbitrary key value pairs to define scheduling policy parameters. Please read schedulingPolicyParameters section |
How many task groups needed?
This depends on how many different types of pods this app requests from K8s. A task group is a “gang” of tasks in an app, these tasks are having the same resource profile and the same placement constraints. They are considered as homogeneous requests that can be treated as the same kind in the scheduler. Use Spark as an example, each job will need to have 2 task groups, one for the driver pod and the other one for the executor pods.
How to define task groups?
The task group definition is a copy of the app’s real pod definition, values for fields like resources, node-selector, toleration and affinity should be the same as the real pods. This is to ensure the scheduler can reserve resources with the exact correct pod specification.