Container Runtime Interface streaming explained

The Kubernetes Container Runtime Interface (CRI)
acts as the main connection between the kubelet
and the Container Runtime.
Those runtimes have to provide a gRPC server which has to
fulfill a Kubernetes defined Protocol Buffer interface.
This API definition
evolves over time, for example when contributors add new features or fields are
going to become deprecated.

In this blog post, I’d like to dive into the functionality and history of three
extraordinary Remote Procedure Calls (RPCs), which are truly outstanding in
terms of how they work: Exec, Attach and PortForward.

Exec can be used to run dedicated commands within the container and stream
the output to a client like kubectl or
crictl. It also allows interaction with
that process using standard input (stdin), for example if users want to run a
new shell instance within an existing workload.

Attach streams the output of the currently running process via standard I/O
from the container to the client and also allows interaction with them. This is
particularly useful if users want to see what is going on in the container and
be able to interact with the process.

PortForward can be utilized to forward a port from the host to the container
to be able to interact with it using third party network tools. This allows it
to bypass Kubernetes services
for a certain workload and interact with its network interface.

What is so special about them?

All RPCs of the CRI either use the gRPC unary calls
for communication or the server side streaming
feature (only GetContainerEvents right now). This means that mainly all RPCs
retrieve a single client request and have to return a single server response.
The same applies to Exec, Attach, and PortForward, where their protocol definition
looks like this:

// Exec prepares a streaming endpoint to execute a command in the container.
rpc Exec(ExecRequest) returns (ExecResponse) {}

// Attach prepares a streaming endpoint to attach to a running container.
rpc Attach(AttachRequest) returns (AttachResponse) {}

// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {}

The requests carry everything required to allow the server to do the work,
for example, the ContainerId or command (Cmd) to be run in case of Exec.
More interestingly, all of their responses only contain a url:

message ExecResponse {
 // Fully qualified URL of the exec streaming server.
 string url = 1;
}

message AttachResponse {
 // Fully qualified URL of the attach streaming server.
 string url = 1;
}

message PortForwardResponse {
 // Fully qualified URL of the port-forward streaming server.
 string url = 1;
}

Why is it implemented like that? Well, the original design document
for those RPCs even predates Kubernetes Enhancements Proposals (KEPs)
and was originally outlined back in 2016. The kubelet had a native
implementation for Exec, Attach, and PortForward before the
initiative to bring the functionality to the CRI started. Before that,
everything was bound to Docker or the later abandoned
container runtime rkt.

The CRI related design document also elaborates on the option to use native RPC
streaming for exec, attach, and port forward. The downsides outweighed this
approach: the kubelet would still create a network bottleneck and future
runtimes would not be free in choosing the server implementation details. Also,
another option that the Kubelet implements a portable, runtime-agnostic solution
has been abandoned over the final one, because this would mean another project
to maintain which nevertheless would be runtime dependent.

This means, that the basic flow for Exec, Attach and PortForward
was proposed to look like this:

sequenceDiagram
participant crictl
participant kubectl
participant API as API Server
participant kubelet
participant runtime as Container Runtime
participant streaming as Streaming Server
alt Client alternatives
Note over kubelet,runtime: Container Runtime Interface (CRI)
kubectl->>API: exec, attach, port-forward
API->>kubelet:
kubelet->>runtime: Exec, Attach, PortForward
else
Note over crictl,runtime: Container Runtime Interface (CRI)
crictl->>runtime: Exec, Attach, PortForward
end
runtime->>streaming: New Session
streaming->>runtime: HTTP endpoint (URL)
alt Client alternatives
runtime->>kubelet: Response URL
kubelet->>API:
API–>>streaming: Connection upgrade (SPDY or WebSocket)
streaming-)API: Stream data
API-)kubectl: Stream data
else
runtime->>crictl: Response URL
crictl–>>streaming: Connection upgrade (SPDY or WebSocket)
streaming-)crictl: Stream data
end

Clients like crictl or the kubelet (via kubectl) request a new exec, attach or
port forward session from the runtime using the gRPC interface. The runtime
implements a streaming server that also manages the active sessions. This
streaming server provides an HTTP endpoint for the client to connect to. The
client upgrades the connection to use the SPDY
streaming protocol or (in the future) to a WebSocket
connection and starts to stream the data back and forth.

This implementation allows runtimes to have the flexibility to implement
Exec, Attach and PortForward the way they want, and also allows a
simple test path. Runtimes can change the underlying implementation to support
any kind of feature without having a need to modify the CRI at all.

Many smaller enhancements to this overall approach have been merged into
Kubernetes in the past years, but the general pattern has always stayed the
same. The kubelet source code transformed into a reusable library,
which is nowadays usable from container runtimes to implement the basic
streaming capability.

How does the streaming actually work?

At a first glance, it looks like all three RPCs work the same way, but that’s
not the case. It’s possible to group the functionality of Exec and
Attach, while PortForward follows a distinct internal protocol
definition.

Exec and Attach

Kubernetes defines Exec and Attach as remote commands, where its
protocol definition exists in five different versions:

#	Version	Note
1	`channel.k8s.io`	Initial (unversioned) SPDY sub protocol (#13394, #13395)
2	`v2.channel.k8s.io`	Resolves the issues present in the first version (#15961)
3	`v3.channel.k8s.io`	Adds support for resizing container terminals (#25273)
4	`v4.channel.k8s.io`	Adds support for exit codes using JSON errors (#26541)
5	`v5.channel.k8s.io`	Adds support for a CLOSE signal (#119157)

On top of that, there is an overall effort to replace the SPDY transport
protocol using WebSockets as part KEP #4006.
Runtimes have to satisfy those protocols over their life cycle to stay up to
date with the Kubernetes implementation.

Let’s assume that a client uses the latest (v5) version of the protocol as
well as communicating over WebSockets. In that case, the general flow would be:

The client requests an URL endpoint for Exec or Attach using the CRI.
- The server (runtime) validates the request, inserts it into a connection
  tracking cache, and provides the HTTP endpoint URL for that request.
The client connects to that URL, upgrades the connection to establish
a WebSocket, and starts to stream data.
- In the case of Attach, the server has to stream the main container process
  data to the client.
- In the case of Exec, the server has to create the subprocess command within
  the container and then streams the output to the client.
If stdin is required, then the server needs to listen for that as well and
redirect it to the corresponding process.

Interpreting data for the defined protocol is fairly simple: The first
byte of every input and output packet defines
the actual stream:

First Byte	Type	Description
`0`	standard input	Data streamed from stdin
`1`	standard output	Data streamed to stdout
`2`	standard error	Data streamed to stderr
`3`	stream error	A streaming error occurred
`4`	stream resize	A terminal resize event
`255`	stream close	Stream should be closed (for WebSockets)

How should runtimes now implement the streaming server methods for Exec and
Attach by using the provided kubelet library? The key is that the streaming
server implementation in the kubelet outlines an interface
called Runtime which has to be fulfilled by the actual container runtime if it
wants to use that library:

// Runtime is the interface to execute the commands and provide the streams.
type Runtime interface {
 Exec(ctx context.Context, containerID string, cmd []string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error
 Attach(ctx context.Context, containerID string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error
 PortForward(ctx context.Context, podSandboxID string, port int32, stream io.ReadWriteCloser) error
}

Everything related to the protocol interpretation is
already in place and runtimes only have to implement the actual Exec and
Attach logic. For example, the container runtime CRI-O
does it like this pseudo code:

func (s StreamService) Exec(
 ctx context.Context,
 containerID string,
 cmd []string,
 stdin io.Reader, stdout, stderr io.WriteCloser,
 tty bool,
 resizeChan <-chan remotecommand.TerminalSize,
) error {
 // Retrieve the container by the provided containerID
 // …

 // Update the container status and verify that the workload is running
 // …

 // Execute the command and stream the data
 return s.runtimeServer.Runtime().ExecContainer(
 s.ctx, c, cmd, stdin, stdout, stderr, tty, resizeChan,
 )
}

PortForward

Forwarding ports to a container works a bit differently when comparing it to
streaming IO data from a workload. The server still has to provide a URL
endpoint for the client to connect to, but then the container runtime has to
enter the network namespace of the container, allocate the port as well as
stream the data back and forth. There is no simple protocol definition available
like for Exec or Attach. This means that the client will stream the
plain SPDY frames (with or without an additional WebSocket connection) which can
be interpreted using libraries like moby/spdystream.

Luckily, the kubelet library already provides the PortForward interface method
which has to be implemented by the runtime. CRI-O does that by (simplified):

func (s StreamService) PortForward(
 ctx context.Context,
 podSandboxID string,
 port int32,
 stream io.ReadWriteCloser,
) error {
 // Retrieve the pod sandbox by the provided podSandboxID
 sandboxID, err := s.runtimeServer.PodIDIndex().Get(podSandboxID)
 sb := s.runtimeServer.GetSandbox(sandboxID)
 // …

 // Get the network namespace path on disk for that sandbox
 netNsPath := sb.NetNsPath()
 // …

 // Enter the network namespace and stream the data
 return s.runtimeServer.Runtime().PortForwardContainer(
 ctx, sb.InfraContainer(), netNsPath, port, stream,
 )
}

Future work

The flexibility Kubernetes provides for the RPCs Exec, Attach and
PortForward is truly outstanding compared to other methods. Nevertheless,
container runtimes have to keep up with the latest and greatest implementations
to support those features in a meaningful way. The general effort to support
WebSockets is not only a plain Kubernetes thing, it also has to be supported by
container runtimes as well as clients like crictl.

For example, crictl v1.30 features a new --transport flag for the
subcommands exec, attach and port-forward
(#1383,
#1385)
to allow choosing between websocket and spdy.

CRI-O is going an experimental path by moving the streaming server
implementation into conmon-rs
(a substitute for the container monitor conmon). conmon-rs is
a Rust implementation of the original container
monitor and allows streaming WebSockets directly using supported libraries
(#2070). The major benefit
of this approach is that CRI-O does not even have to be running while conmon-rs
can keep active Exec, Attach and PortForward sessions open. The
simplified flow when using crictl directly will then look like this:

sequenceDiagram
autonumber
participant crictl
participant runtime as Container Runtime
participant conmon-rs
Note over crictl,runtime: Container Runtime Interface (CRI)
crictl->>runtime: Exec, Attach, PortForward
Note over runtime,conmon-rs: Cap’n Proto
runtime->>conmon-rs: Serve Exec, Attach, PortForward
conmon-rs->>runtime: HTTP endpoint (URL)
runtime->>crictl: Response URL
crictl–>>conmon-rs: Connection upgrade to WebSocket
conmon-rs-)crictl: Stream data

All of those enhancements require iterative design decisions, while the original
well-conceived implementation acts as the foundation for those. I really hope
you’ve enjoyed this compact journey through the history of CRI RPCs. Feel free
to reach out to me anytime for suggestions or feedback using the
official Kubernetes Slack.

Originally posted on Kubernetes Blog
Author: