Skip to main content
Version: Next

Event System

Overview

The Yunikorn scheduler core generates well-defined events at various points during the execution. Since an event has a fixed structure, it is an ideal input for automated tools for further processing. Although Yunikorn has extensive logging, the output is textual and needs to be parsed. Not only is it error-prone, but also nothing guarantees that it never changes over time: it might be completely removed in a future release or the log level can change.

The event object is defined in the scheduler interface repository along with the constants that describe a certain event.

Certain types of events which are related to pods and nodes are also sent to Kubernetes itself. This allows the users to identify certain problems that can arise from misconfiguration, such as insufficient queue or user quota. If a pod is pending because the quota is exceeded, it will be updated with an event about why it cannot be scheduled. Previously this had to be inferred from the logs or state dump.

Various properties of the event subsystem are configurable. Changes are picked up and applied immediately.

Event types

Events always describe a specific object type. There are four types of events: application, queue, node and user/group. When it is appropriate, the event will be sent to Kubernetes using the K8s event API. Since the API requires a "regarding" object (ie. which K8s object the event is about), we only send node and pod events.

The following table summarizes when they are generated.

TypeEventNotes
ApplicationA new application is created
ApplicationApplication state transition
ApplicationSuccessful allocation
ApplicationAn ask is added to the application
ApplicationApplication removed
ApplicationApplication rejected
ApplicationAsk removed
ApplicationAllocation removed
ApplicationPlaceholder timeout
ApplicationAllocation replacement (gang scheduling)
ApplicationApplication is not runnable due to maxApplicationsQueue or user maxApps limit
ApplicationApplication has become runnableOnly sent if limit was hit before
ApplicationAsk is not schedulable due to quota limitationsSent to Kubernetes to the respective pod
ApplicationAsk has become schedulable (there is quota now)Sent to Kubernetes to the respective pod
Only sent if quota was not available before
ApplicationAsk is not schedulable due to predicate error *Sent to Kubernetes to the respective pod
Rate limited: sent once in every 15 seconds
ApplicationReal allocation is larger than the placeholderSent to Kubernetes to the respective pod
NodeNew node added to the scheduler coreSent to Kubernetes to the respective node
NodeNode removed from the scheduler coreSent to Kubernetes to the respective node
NodeAllocation added to a node
NodeAllocation removed from a node
NodeNode "ready" status changed
NodeNode "schedulable" status changed
QueueQueue added to the scheduler core
QueueApplication added to a queue
QueueQueue removed from the scheduler core
QueueMax resource changedFrom leaf to parent or vice-versa
QueueQueue type changed
User/groupLimit configured for a user
User/groupLimit configured for a group
User/groupLimit removed for a user
User/groupLimit removed for a group
User/groupResource usage increased for a user
User/groupResource usage increased for a groupOnly sent if group limit is active
User/groupGroup tracking got associated with a userOnly sent if group limit is active
User/groupGroup tracking removed from a userOnly sent if group limit is active

* a predicate is a plugin located in the default scheduler. They're responsible for functionality like node selectors, affinity, anti-affinity, etc. Yunikorn runs the predicates for every allocation to see if it fits the candidate node.

In-memory storage of historical events

Yunikorn stores a previously generated event objects in a ring buffer. Once it gets full, oldest elements are overwritten. The default size of the buffer is 100000. On a busy cluster this might not be enough. The size can be changed in the configmap by using the key event.ringBufferCapacity. Keep in mind that increasing the size results in higher memory usage and longer GC pause times. Events that are put in the ring buffer are not removed, only overwritten. If the size of the buffer gets smaller after a config change, the oldest entries that do not fit are discarded.

Every event is assigned an ID starting from 0. This can be used to fetch them based on ID.

Yunikorn currently does not offer a solution which stores the generated events in a persistent storage. However, this might change in the upcoming releases.

Retrieving events

The REST API provides two ways to see the generated events.

Batch REST endpoint

The batch endpoint is available at /ws/v1/events/batch. If not defined, the start ID is 0 and a maximum of 10000 events are returned. This can be changed by defining the "start" and "count" URL query parameters.

See batch interface for details.

Streaming REST endpoint

The streaming endpoint is available at /ws/v1/events/stream. Unlike the batch interface, which closes the connection after the query, streaming keeps the HTTP connection open. There is no idle timeout, so as long as the connection is stable, it is never closed. New events are sent immediately to active clients. The URL parameter "count" is also accepted; it tells Yunikorn how many recent events we want to fetch that were generated before the connection.

Since this approach uses more resources inside Yunikorn, the number of streaming connections is limited, but the limits are configurable. See streaming for details. Existing connections are not closed after configuration change.

The active streaming connections are also shown in the state dump.

Event structure

A Yunikorn event has the following fields:

NameDescriptionAlways set
ObjectIDIdentifies the object what the event is about.Yes
ReferenceIDIdentifies a secondary object which is related to the first one. It can be an allocation ID, queue, etc.No
MessageTextual details. Relevant if it is sent to Kubernetes.No
TypeThe type of the ObjectID: application, queue, node, user/group.Yes
EventChangeTypeThe nature of the change: adding, setting, removing or none.Yes
EventChangeDetailFurther details about the change itself.Yes
TimestampNanoUnix nanotime inside Yunikorn when the event was generated.Yes
ResourceSet if the event involves a resource (eg. allocation occurs).No

Example: when an allocation occurs, we generate three events: an application event, a node event and a user event about the usage (there's a group event if the limits are configured in a certain way, but it's not relevant in this example). We use the following unique identifiers:

  • Application is "app-1"
  • The request is "req-1"
  • The selected node is "node-1"
  • Application belongs to the user "yunikorn"
  • The application has been placed in the queue "root.test"
What the event describesObjectIDReferenceIDTypeEventChangeTypeEventChangeDetail
Allocation occurred for a requestapp-1req-1-0 *EventRecord_APPEventRecord_ADDEventRecord_APP_ALLOC
Allocation on a nodenode-1req-1-0 *EventRecord_NODEEventRecord_ADDEventRecord_NODE_ALLOC
Resource usage changed for a useryunikornroot.testEventRecord_USERGROUPEventRecord_ADDEventRecord_UG_USER_RESOURCE

* the allocation ID is based on the ask ID plus a counter.

All events will contain the allocated resource.

Disabling the event system

Although it has negligible overhead, it is possible to completely disable Yunikorn events. This can be achieved by setting event.trackingEnabled to false. This does NOT clear the ring buffer contents, so the history recorded until the change is still available.