Key Concepts¶

This section includes the key concepts that will help the users better understand Observability in DataOS.

Runnable Resources¶

Runnable Resources are dynamic components within DataOS that perform operations, execute tasks, or handle workloads in real time. These are the building blocks responsible for actively running services, jobs, or orchestrations.

These Resources are dynamic and continuously emit Observability Data, such as logs and metrics, making them a core focus of observability strategies. Their behavior and health are critical for system stability, and they are typically monitored and managed using tools like Grafana, CLI interfaces, and Pager.

Static Resources¶

Static Resources are DataOS Resources that do not perform runtime execution but serve as foundational infrastructure or configuration inputs.

Both Runnable and Static Resources contribute to Observability Data, which is consumed by Observability Endpoints like Grafana, Monitor, and CLI. Runnable Resources generate telemetry during execution, while static Resources provide the context (health) essential for interpreting telemetry signals effectively.

Observability Data¶

Observability Data includes metrics, logs, and other data generated by the Resources and the Resource infrastructure that provide information about Resource health and performance. Resource-centric observability refers to tools that let you visualize and analyze the observability data from the perspective of a Resource.

Metrics¶

Metrics are numerical values collected at regular intervals that reflect the health and performance of a Resource. Common examples include CPU usage, memory consumption, and request latency. Anomalies in metric trends often signal underlying issues, while long-term patterns can reveal usage trends and guide resource planning.

Logs¶

Logs are timestamped records of discrete events generated by the Resources. Each log entry captures what happened at a specific point in time, often including detailed context such as error messages, user actions, or system responses.

While logs offer deep visibility into component-level activity, they may not clearly show how events in one part of a system relate to those in another. This is where traces provide additional clarity.

Status¶

Status indicates the Resource's lifecycle state in DataOS, providing a high-level view of whether a Resource is available and functioning as expected. Common status values include:

active - The Resource is currently accessible and usable by other DataOS Resources
error - The Resource has encountered configuration issues or operational failures
deleted - The Resource has been removed from the system

Status helps users quickly assess Resource availability and detect configuration issues, misconfigurations, or unexpected deletions that may impact dependent components.

Runtime¶

Runtime reflects the Resource's execution state, capturing what is actively happening behind the scenes, typically at the container or pod level. Common runtime states include:

running - The Resource is actively executing (ideal for long-running services)
succeeded - The Resource has completed successfully (ideal for batch jobs or workflows)
failed - The Resource execution has encountered errors
pending - The Resource is waiting to execute or stuck in a waiting state

A Resource is considered healthy when its status is active and its runtime is either running or succeeded, depending on the type of workload it handles. Both status and runtime signals together help users detect operational failures and disruptions that could impact downstream workflows.

Contextual Data¶

Observability Data becomes significantly more actionable when paired with contextual metadata, such as status, runtime, aggregated status, etc. This added context aids in filtering, debugging, and correlating information during incident response.

Workload¶

A workload is an application running on Kubernetes. Whether it's a single component or multiple components working together, you run your workload inside a set of pods. In Kubernetes, a Pod represents a set of running containers on your cluster. Kubernetes pods follow a defined lifecycle. When a critical fault occurs on a node running a pod, all pods on that node fail. Kubernetes considers this failure permanent; you must create a new Pod to recover, even if the node returns to a healthy state.

Pod¶

A Pod is a group of one or more containers that share storage and network resources, along with specifications for running those containers for each DataOS Resource. Think of a Pod as an application-specific "logical host"; it contains one or more tightly coupled application containers. This is similar to how applications run on the same physical or virtual machine in traditional non-cloud environments. Besides application containers, a Pod may include init containers that run during startup and can be injected with ephemeral containers for debugging purposes.

Static Pod¶

Static Pods are managed directly by the kubelet daemon on a specific node, without the API server observing them. Whereas most Pods are managed by the control plane (for example, a Deployment), for static Pods, the kubelet directly supervises each static Pod (and restarts it if it fails). Static Pods are always bound to one Kubelet on a specific node. The kubelet automatically tries to create a mirror Pod on the Kubernetes API server for each static Pod. This means that the Pods running on a node are visible on the API server, but cannot be controlled from there.

Containers¶

Technology for packaging an application along with its runtime dependencies. Each container that you run is repeatable; the standardization from having dependencies included means that you get the same behavior wherever you run it. Containers decouple applications from the underlying host infrastructure. This makes deployment easier in different cloud or OS environments. Each node in a Kubernetes cluster runs the containers that form the Pods assigned to that node. Containers in a Pod are co-located and co-scheduled to run on the same node.

Init container¶

A Pod can have multiple containers running apps within it, but it can also have one or more init containers, which are run before the app containers are started. Init containers are exactly like regular containers, except:

Init containers always run to completion.
Each init container must complete successfully before the next one starts.

If a Pod's init container fails, the kubelet repeatedly restarts that init container until it succeeds. However, if the Pod has a restartPolicy of Never, and an init container fails during startup of that Pod, Kubernetes treats the overall Pod as failed. To specify an init container for a Pod, add the initContainers field to the Pod specification, as an array of container items.

Ephemeral containers¶

Ephemeral containers differ from other containers in that they lack guarantees for resources or execution, and they will never be automatically restarted, so they are not appropriate for building applications. Like regular containers, you may not change or remove an ephemeral container after you have added it to a Pod.

API server¶

The API server is a component of the Kubernetes control plane that exposes the Kubernetes API. The API server is the front end for the Kubernetes control plane. The main implementation of a Kubernetes API server is kube-apiserver. kube-apiserver is designed to scale horizontally, that is, it scales by deploying more instances. You can run several instances of kube-apiserver and balance traffic between those instances.

Node¶

Kubernetes runs your workload by placing containers into Pods to run on Nodes. A node may be a virtual or physical machine, depending on the cluster. Each node is managed by the control plane and contains the services necessary to run Pods. The name uniquely identifies a Node, and no two Nodes can share the same name simultaneously. Kubernetes treats resources with identical names as the same object. The components on a node include the kubelet, a container runtime, and the kube-proxy.

Kubelet¶

The kubelet is the primary "node agent" that runs on each node. It can register the node with the apiserver using one of: the hostname, a flag to override the hostname, or specific logic for a cloud provider. The kubelet works in terms of a PodSpec. A PodSpec is a YAML or JSON object that describes a pod. The kubelet takes a set of PodSpecs that are provided through various mechanisms (primarily through the apiserver) and ensures that the containers described in those PodSpecs are running and healthy. The kubelet doesn't manage containers that were not created by Kubernetes.

Kube-proxy¶

The Kubernetes network proxy runs on each node. This reflects services as defined in the Kubernetes API on each node and can do simple TCP, UDP, and SCTP stream forwarding or round robin TCP, UDP, and SCTP forwarding across a set of backends. Service cluster IPs and ports are currently found through Docker-links-compatible environment variables specifying ports opened by the service proxy. There is an optional addon that provides cluster DNS for these cluster IPs. The user must create a service with the apiserver API to configure the proxy.

Namespace¶

In Kubernetes, namespaces provide a mechanism for isolating groups of resources within a single cluster. Names of resources need to be unique within a namespace, but not across namespaces. Namespace-based scoping is applicable only for namespaced objects (e.g., Deployments, Services, etc.) and not for cluster-wide objects (e.g., StorageClass, Nodes, PersistentVolumes, etc.). Namespaces are a way to divide cluster resources between multiple users. It is not necessary to use multiple namespaces to separate slightly different resources, such as different versions of the same software: use labels to distinguish resources within the same namespace.

CPU¶

CPU refers to one of the primary compute resources that can be requested, limited, and monitored for containers running in Pods. In Kubernetes, CPU refers to the processing power available to a containerized workload, measured in cores or millicores.

1 CPU = 1 virtual core on a node (equivalent to one AWS vCPU, one GCP core, etc.)
1000m (millicores) = 1 CPU

Kubernetes allows developers and operators to request and limit CPU resources for each container in a Pod, enabling precise workload management.

Disk¶

"Disk" in Kubernetes refers to block or file storage attached to a Pod via a volume, commonly used to persist application state beyond the lifecycle of a Pod.

There are two primary disk use cases:

Ephemeral Storage: Temporary data that lives and dies with the Pod.
Persistent Storage: Durable data retained across Pod restarts or reschedules.

Memory¶

Memory refers to the RAM (Random Access Memory) usage of a containerized application within a Pod. It is measured in Bytes, typically represented in:

Mi (Mebibytes, 1 Mi = 1,048,576 bytes)
Gi (Gibibytes)
Kubernetes also supports SI units like MB or GB, but Mi/Gi is preferred for precision

Microservices¶

A microservice is a small, modular, and autonomous service that performs a specific business function. Unlike monolithic applications, microservices:

Are independently deployable
Have isolated data stores
Use well-defined APIs for communication
They are developed and scaled individually