Skip to content

Monitor the CPU and Memory Usage of a Cluster

When a Cluster Resource is created, a corresponding pod is automatically provisioned in the backend. This section involves steps to monitor the CPU and memory usage of a Cluster’s pod using different endpoints.

Metis UI

To monitor the CPU and memory usage of a Cluster on the Metis Catalog UI, follow the steps below:

  1. Open the Metis Catalog.

    Open the Metis Catalog (Metis UI)

  2. Search for the Cluster by name.

    Search the Cluster in Metis

  3. Click on the Cluster that needs to be monitored and navigate to the ‘Runtime’ section.

    Cluster details in Metis

  4. Click on the run name for which you want to monitor the CPU and memory usage, and navigate to the ‘Pod Usage’ section.

    Pod CPU usage vs request and limit

    CPU usage:

    • Usage (blue line) shows actual CPU consumed by the Cluster pod, increasing steadily and reaching just above 600 millicores.
    • Request (green line) is fixed at approximately 250 millicores, indicating the CPU guaranteed for the pod at scheduling time.
    • Limit (yellow line) is flat at 2000 millicores, representing the maximum CPU the pod is allowed to use if resources are available.

    Pod memory usage vs limit

    Memory usage:

    • Usage (blue line) shows actual memory consumed by the workflow pod, increasing over time and reaching just under 1000 MiB.
    • Request (green line) is not visible in the graph, which indicates that memory was not explicitly requested or the request value is not available in this dataset.
    • Limit (yellow line) remains constant at 2500 MiB, indicating the maximum memory the pod is allowed to consume.

    Aggregated CPU and memory statistics

    Details:

    • Type: Indicates whether the metric refers to CPU or memory.
    • Name: Describes the metric subtype usage (actual consumption), request (minimum reserved), or limit (maximum allowed).
    • Mean / Median / Min / Max: Statistical summaries over the measured time slices.
    • Sum: Total usage across all time slices.
    • Unit: CPU is in millicores (1000 millicores = 1 core); memory is in mebibytes (1 MiB = 1,048,576 bytes).
    • CPU Usage: Average CPU usage was 460 millicores, peaking at 690 millicores. The pod used significantly more than its request, but stayed within the limit.
    • CPU Request: Set at 200 millicores consistently, meaning the pod was guaranteed a small baseline of CPU.
    • CPU Limit: Fixed at 2000 millicores (2 cores), providing ample room for the workload to scale under load.
    • Memory Usage: Average memory usage was approximately 527.72 MiB, with a maximum usage of 791.58 MiB.
    • Memory Request: Set at 2432 MiB, meaning this much memory was reserved and guaranteed to the pod.
    • Memory Limit: Also set at 2432 MiB, indicating that the pod could not use more memory than requested.

    The Cluster was allowed to consume significantly more CPU than it requested, and it did so, but stayed under the defined limit. Memory usage remained far below both the request and limit. This suggests an opportunity to reduce memory allocation to free up resources for other workloads.

Operations App

When a Cluster Resource is created, a corresponding pod is automatically provisioned in the backend. You can monitor the CPU and memory usage of this pod directly through the Operations app.

To monitor the CPU and memory usage of a Cluster on the Operations app, follow the steps below:

  1. Open the Operations app.

    Open the Operations app

  2. Navigate to User Space → Resources → types, select the Cluster as type, and search for the Cluster that needs to be monitored.

    Operations > User Space > Resources

  3. Click on the Cluster, navigate to the ‘Resource Runtime’ section.

    Cluster resource runtime in Operations

  4. Click on the pod name for which you want to monitor the CPU and memory usage, and navigate to the ‘Runtime Node Usage’ section.

    Runtime Node Usage details

    Details:

    • Type: Indicates whether the metric is related to CPU or memory.
    • Name: Specifies the metric subtype, usage (actual consumption), request (guaranteed minimum), and limit (maximum allocable resource).
    • Mean / Median / Min / Max: Statistical summaries of the metric over the evaluated time slices.
    • Sum: Total value accumulated across all time slices.
    • Unit: Measurement unit (millicore for CPU, mebibyte for memory).
    • CPU Usage: Average CPU usage was ~55.95 millicores, peaking at 535 millicores (1 core = 1000 millicores). Millicores provide fine-grained control over how much CPU each pod or container should receive.
    • CPU Request: The pods are configured to request 1200 millicores of CPU each, consistently across time slices.
    • CPU Limit: The pods are configured to limit 1200 millicores of CPU each, consistently across time slices.
    • Memory Usage: Average memory consumption stands at ~1305.46 MiB, with a peak at 1808.15 MiB (1 MiB = 2²⁰ bytes = 1,048,576 bytes).
    • Memory Request: Each pod has been allocated 2048.00 MiB as a guaranteed memory reservation.
    • Memory Limit: Each pod has been allocated 2048.00 MiB as a limited memory.

    CPU usage over time in Operations

    CPU usage:

    • Usage (blue line) shows intermittent spikes in CPU consumption, peaking close to 600 millicores. Between spikes, CPU usage drops near zero, indicating short bursts of activity followed by idle periods.
    • Request (green line) is not plotted, suggesting no CPU request was explicitly configured or it’s not captured here.
    • Limit (yellow line) remains constant at around 1200 millicores, representing the maximum CPU the pod can consume.

    Memory usage:

    • Usage (blue line) steadily increases, reaching up to 1900 MiB, then dips temporarily and stabilizes again around 1200–1400 MiB. This indicates a consistent memory footprint with some momentary drops.
    • Request (green line) is missing, indicating it was either not set or not shown in this view.
    • Limit (yellow line) is fixed at 2048 MiB, defining the upper bound for memory allocation.