ops0ops0

Nodes

Monitor and manage Kubernetes cluster nodes, including health status, capacity, and resource allocation.

Node List

View all nodes in a cluster with status and resource usage:

ColumnDescription
NameNode name or hostname
StatusReady, NotReady, Unknown
Rolesmaster, worker, or custom labels
CPUCurrent usage / Total capacity
MemoryCurrent usage / Total capacity
PodsRunning pods / Max pods
AgeTime since node joined cluster

Node Status

StatusColorMeaning
ReadyGreenNode is healthy and accepting pods
NotReadyRedNode has issues (network, kubelet, resources)
UnknownGrayNode status cannot be determined
SchedulingDisabledYellowNode is cordoned (no new pods)

Node Details

Click a node to view comprehensive information:

System Information

FieldDescription
NameNode hostname
Provider IDCloud provider instance ID
Instance TypeEC2 instance type, GCE machine type, etc.
OS ImageOperating system and version
Kernel VersionLinux kernel version
Container Runtimecontainerd, CRI-O, Docker
Kubelet VersionKubernetes version running on node

Capacity and Allocatable

ResourceCapacityAllocatableUsage
CPUTotal coresAvailable for podsCurrent usage
MemoryTotal RAMAvailable for podsCurrent usage
PodsMax pods (usually 110)Pods limitRunning pods
Ephemeral StorageTotal diskAvailable diskCurrent usage

Capacity vs Allocatable:

  • Capacity: Total resources on node
  • Allocatable: Resources available for user pods (after system pods and kubelet overhead)

Resource Utilization

Visual breakdown of resource usage:

CPU Allocation (12 cores total, 10 allocatable)
─────────────────────────────────────────────────
System Reserved:  ██░░░░░░░░  2 cores (16.7%)
Allocated Pods:   ████████░░  8 cores (80% of allocatable)
Free:             ██░░░░░░░░  2 cores (20% of allocatable)

Memory Allocation (32 GB total, 28 GB allocatable)
─────────────────────────────────────────────────
System Reserved:  ██░░░░░░░░  4 GB (12.5%)
Allocated Pods:   ██████████  20 GB (71% of allocatable)
Free:             ████░░░░░░  8 GB (29% of allocatable)

Node Conditions

ConditionStatusDescription
ReadyTrue/FalseNode is healthy and ready for pods
MemoryPressureTrue/FalseNode running low on memory
DiskPressureTrue/FalseNode running low on disk
PIDPressureTrue/FalseToo many processes running
NetworkUnavailableTrue/FalseNetwork not properly configured

Taints and Tolerations

View node taints that prevent pods from scheduling:

TaintEffectDescription
node-role.kubernetes.io/masterNoScheduleMaster nodes don't run user pods
node.kubernetes.io/disk-pressureNoScheduleNode has disk pressure
example.com/special=trueNoExecuteCustom taint for specialized workloads

Effects:

  • NoSchedule: Pods without toleration won't be scheduled
  • PreferNoSchedule: Avoid scheduling if possible
  • NoExecute: Evict existing pods without toleration

Node Labels

View node labels used for pod scheduling:

LabelValueUsage
kubernetes.io/hostnamenode-1.example.comNode hostname
node.kubernetes.io/instance-typem5.2xlargeInstance type
topology.kubernetes.io/zoneus-east-1aAvailability zone
custom/workloadhigh-memoryCustom label for targeting

Pods Running on Node

List all pods scheduled on this node:

PodNamespaceCPU RequestMemory RequestStatus
api-gateway-abc123production500m1GiRunning
nginx-xyz789production100m256MiRunning
fluentd-daemon-setkube-system100m200MiRunning

Node Events

View recent Kubernetes events for the node:

Type    Reason           Age   Message
────    ──────           ───   ───────
Normal  Starting         45d   Starting kubelet.
Normal  NodeHasSufficientMemory  45d   Node has sufficient memory
Normal  NodeReady        45d   Node is ready
Warning DiskPressure     2h    Node has disk pressure
Normal  DiskPressureCleared  1h    Disk pressure cleared

Troubleshooting

Node NotReady
Check node conditions for MemoryPressure, DiskPressure, or NetworkUnavailable. Verify kubelet is running. Check cloud provider for instance issues.
High Resource Usage
View pods on node to identify resource-heavy workloads. Consider scaling horizontally or moving pods to larger nodes.
Pods Not Scheduling
Check allocatable resources vs pod requests. Verify taints don't block scheduling. Review pod events for FailedScheduling reasons.

Example: Node Resource Analysis

Node Overview

Node: ip-10-0-1-45.ec2.internal
Status: Ready
Instance Type: m5.2xlarge (8 vCPU, 32 GB RAM)
Zone: us-east-1a
Kubelet: v1.28.2

Resource Breakdown

CPU Capacity: 8 cores
  System Reserved: 0.5 cores
  Allocatable: 7.5 cores
  Requested: 6.2 cores (83%)
  Used: 5.1 cores (68%)
  Free: 1.3 cores (17%)

Memory Capacity: 32 GB
  System Reserved: 2 GB
  Allocatable: 30 GB
  Requested: 22 GB (73%)
  Used: 18 GB (60%)
  Free: 8 GB (27%)

Pods: 18 / 110 (16%)

Top Resource Consumers

PodCPU UsageMemory Usage
api-gateway-7d9f8c6b4d-2xkjp1.2 cores4 GB
worker-5f8d9c7b2-kp3mn0.9 cores3.5 GB
redis-00.5 cores2 GB

Recommendations

  • Node has 17% free CPU capacity - can accommodate more pods
  • Memory usage is healthy at 60%
  • No resource pressure conditions
  • 92 pod slots available

Example: Troubleshooting NotReady Node

Node Status

Node: ip-10-0-2-34.ec2.internal
Status: NotReady
Last Heartbeat: 15 minutes ago

Node Conditions

ConditionStatusLast TransitionMessage
ReadyFalse15mKubelet stopped posting node status
MemoryPressureFalse2dkubelet has sufficient memory
DiskPressureTrue20mkubelet has disk pressure
PIDPressureFalse2dkubelet has sufficient PID

Recent Events

Type    Reason           Age   Message
────    ──────           ───   ───────
Normal  NodeReady        2d    Node is ready
Warning DiskPressure     20m   Node has disk pressure
Warning Rebooted         15m   Node rebooted, reason: unknown
Warning NodeNotReady     15m   Kubelet stopped posting status

Investigation Steps

  1. Check Disk Usage: DiskPressure condition is True
  2. Check Instance: Node rebooted 15 minutes ago
  3. Check Kubelet: Kubelet not posting status (may be stopped)
  4. Review Pods: 12 pods on node, likely evicted or pending

Resolution

Action: SSH to node and check kubelet status
Result: Kubelet service crashed after reboot

Fix:
$ sudo systemctl start kubelet
$ sudo systemctl status kubelet

Node recovered after 2 minutes:
Status: Ready
All pods rescheduled and running
Best Practices
Monitor conditions - Watch for MemoryPressure, DiskPressure, and PIDPressure before they cause failures
Track capacity - Keep allocatable resources above 20% to handle traffic spikes
Label nodes - Use labels for targeted workload placement (GPU, high-memory, etc.)
Review taints - Ensure critical workloads have proper tolerations
Update gradually - Roll node updates slowly to minimize disruption