Skip to main content

RoleBasedGroup (RBG)

RoleBasedGroup (RBG) is a Kubernetes API for orchestrating distributed, stateful AI inference workloads with multi-role collaboration and built-in service discovery.

It provides a common deployment pattern for production LLM inference, especially disaggregated architectures such as prefill/decode separation.

Why RBG?

Traditional Kubernetes primitives (e.g. plain StatefulSets / Deployments) are ill-suited for LLM inference services that:

  • run as multi-role topologies (gateway / router / prefill / decode),
  • are performance-sensitive to GPU / network topology,
  • and require atomic, cross-role operations (deploy, upgrade, scale, failover).

RBG treats an inference service as a role-based group, not a loose set of workloads. It models the service as a topologized, stateful, coordinated multi-role organism and manages it as a single unit.

Key Concepts

Role

The basic scheduling and rollout unit. Each role (e.g. prefill, decode) has its own spec, lifecycle and policies.

RoleBasedGroup

A group of roles that together form one logical service (e.g. one LLM inference deployment).

Project Status

VersionKubernetes VersionLeaderWorkerSet Version
main>=v1.22.x>=v0.7.0
v0.4.0>=v1.28.x>=v0.7.0
v0.3.0>=v1.28.x>=v0.6.0

Community