40 Kubernetes Interview Questions and Answers (2026)
40 Kubernetes interview questions covering Pods, Deployments, networking, RBAC, and real troubleshooting scenarios like CrashLoopBackOff and OOMKilled. Updated for 2026.
On this page
Kubernetes shows up on nearly every backend, DevOps, and platform engineering job posting in 2026. Knowing the definitions is the easy part. What separates strong candidates is troubleshooting instinct: when a Pod won't start, when do you check logs versus events versus node conditions, and what does each exit code actually tell you.
These 40 questions cover both. The first half builds the conceptual foundation interviewers expect everyone to know. The second half focuses on the scenario-based and troubleshooting questions that actually separate candidates in 2026 interviews. Answers are written to be said out loud, not read from a textbook. If you're also preparing for cloud and infrastructure roles, our cloud and DevOps interview questions guide covers AWS Lambda, Docker, and microservices, pairing well with the Kubernetes content here.
Category 1: Architecture and Core Concepts (Q1-Q10)
Architecture questions test whether you understand the system you're deploying to, not just the kubectl commands. Expect questions about the control plane, worker nodes, and how Pods actually work under the hood.
Q1. What is Kubernetes and why do teams need it instead of just Docker?
Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications. Google open-sourced it in 2014, based on lessons from their internal system called Borg. It is now maintained by the CNCF.
Docker alone manages individual containers on a single host. It does not solve the problems that appear when you run hundreds of containers across dozens of machines: which node should run which container, what happens when a container crashes, how do containers find each other, and how do you update an application with zero downtime.
Kubernetes solves these with five core capabilities:
- Self-healing: automatically restarts failed containers and reschedules Pods from dead nodes.
- Auto-scaling: adds or removes Pods based on CPU, memory, or custom metrics.
- Rolling updates and rollbacks: zero-downtime deployments with automatic rollback on failure.
- Service discovery: stable DNS names for Pods that are constantly created and destroyed.
- Load balancing: distributes traffic across healthy Pod replicas.
Q2. What are the core components of the Kubernetes control plane?
The control plane makes global decisions about the cluster and detects and responds to cluster events. It runs on master nodes (or is fully managed in EKS/GKE/AKS).
kube-apiserver: the front end of the control plane. Every interaction with the cluster, including kubectl commands, goes through the API server. It validates requests and writes the desired state to etcd.
etcd: a distributed, consistent key-value store. The single source of truth for all cluster state: every Pod, Deployment, Service, and Secret is stored here. Uses the Raft consensus algorithm for consistency across replicas. If etcd is lost, cluster state is lost, which is why etcd backups are critical.
kube-scheduler: watches for newly created Pods with no assigned node and selects a suitable node based on resource requirements, affinity rules, and taints/tolerations.
kube-controller-manager: runs controller processes that watch the cluster state via the API server and move it toward the desired state. Includes the Node controller, Replication controller, and Endpoints controller.
Q3. What are the core components running on a worker node?
kubelet: the agent on every node. It receives Pod specs from the API server and ensures the described containers are running and healthy. It reports node and Pod status back to the control plane.
kube-proxy: maintains network rules on each node that allow Pods to communicate with each other and with Services. Implements the Service abstraction at the network level (via iptables or IPVS).
Container runtime: the software that actually runs containers. containerd or CRI-O are the standard choices today. Docker Engine itself was deprecated as a direct Kubernetes runtime in 1.24, replaced by the Container Runtime Interface.
Q4. What is a Pod and why doesn't Kubernetes manage containers directly?
A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share the same network namespace (same IP, same port space) and can share storage volumes. Containers in a Pod communicate via localhost.
Kubernetes does not manage individual containers directly because real applications often need tightly coupled helper processes: a logging sidecar, a proxy, or an init step. The Pod abstraction lets you group these together as one schedulable, network-addressable unit.
apiVersion: v1
kind: Pod
metadata:
name: web-with-sidecar
spec:
containers:
- name: app
image: myapp:1.4
ports:
- containerPort: 8080
- name: log-shipper
image: fluent-bit:2.1Pods are ephemeral. If a Pod dies, Kubernetes does not resurrect that exact Pod, it creates a new one with a new IP. This is why you never rely on a Pod's IP directly and instead use a Service.
Q5. What is the difference between a ReplicaSet and a Deployment?
A ReplicaSet ensures a specified number of identical Pod replicas are running at all times. If a Pod dies, the ReplicaSet creates a replacement. A ReplicaSet has no built-in support for rolling updates.
A Deployment is a higher-level object that manages ReplicaSets. It adds rolling updates, rollbacks, and revision history on top of what a ReplicaSet provides. When you update a Deployment's Pod template, it creates a new ReplicaSet and gradually shifts Pods from the old ReplicaSet to the new one.
kubectl get replicasets
# A Deployment named "web" has ReplicaSets like web-7d9f8c6b5
# (the hash suffix changes with each new Pod template revision)In practice, you always create Deployments, never ReplicaSets or Pods directly. A Deployment creates a ReplicaSet, which creates Pods. If you manually create a standalone Pod and it dies, it is gone permanently. If a Deployment-managed Pod dies, the ReplicaSet immediately creates a new one.
Q6. What is a Namespace and when do you use one?
A Namespace is a way to divide cluster resources between multiple users, teams, or environments within a single cluster. It provides a scope for names, meaning two resources with the same name can coexist in different namespaces.
kubectl create namespace staging
kubectl create namespace production
kubectl get pods -n staging
kubectl config set-context --current --namespace=staging- Separate environments (dev, staging, production) within one cluster.
- Multi-tenant clusters where different teams should not see or affect each other's resources.
- Applying ResourceQuotas to limit how much CPU/memory a team can consume.
- Combining with RBAC to restrict which users can access which namespace.
Some resources are not namespaced: Nodes, PersistentVolumes, and ClusterRoles exist at the cluster level, not within a namespace.
Q7. What are Labels and Selectors and how do they work together?
Labels are key-value pairs attached to Kubernetes objects (Pods, Services, Deployments) for identification and organization. They carry no inherent meaning to Kubernetes; you define what they mean.
metadata:
labels:
app: payment-service
environment: production
tier: backendSelectors query objects by their labels. This is how a Deployment knows which Pods belong to it, and how a Service knows which Pods to route traffic to.
# Deployment's selector must match the Pod template's labels
spec:
selector:
matchLabels:
app: payment-service
template:
metadata:
labels:
app: payment-service # must match the selector above# Query by label directly
kubectl get pods -l app=payment-service
kubectl get pods -l 'environment in (staging,production)'If a Deployment's selector and the Pod template's labels do not match, Kubernetes rejects the configuration. This selector mechanism is how higher-level controllers find the Pods they manage.
Q8. What is the difference between a ConfigMap and a Secret?
Both store configuration data that is decoupled from your container image, letting you change configuration without rebuilding images. The difference is intent and handling.
ConfigMap: stores non-sensitive configuration such as feature flags, log levels, URLs, and non-secret settings. Stored as plain text in etcd.
kubectl create configmap app-config --from-literal=LOG_LEVEL=infoSecret: stores sensitive data such as passwords, API keys, and TLS certificates. Values are base64-encoded (not encrypted by default, just encoded). For real protection you must enable encryption at rest for etcd.
kubectl create secret generic db-credentials \
--from-literal=username=admin \
--from-literal=password='S3cur3P@ss'Both can be consumed as environment variables or mounted as files:
env:
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: app-config
key: LOG_LEVEL
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: passwordQ9. What is the difference between ClusterIP, NodePort, and LoadBalancer Services?
These are the three core Service types, each exposing Pods differently.
ClusterIP (default): exposes the Service on an internal cluster IP, only reachable from within the cluster. Used for internal service-to-service communication.
apiVersion: v1
kind: Service
metadata:
name: backend-service
spec:
type: ClusterIP
selector:
app: backend
ports:
- port: 80
targetPort: 8080NodePort: exposes the Service on a static port (30000-32767 by default) on every node's IP. Accessible externally via NodeIP:NodePort, but exposes node IPs directly, which is not ideal for production.
LoadBalancer: provisions an external load balancer through the cloud provider (AWS ELB, GCP Cloud Load Balancer, Azure Load Balancer). Each LoadBalancer Service gets its own external IP. Works well for a handful of services but becomes expensive at scale since every Service gets a separate cloud load balancer.
The common production pattern: use ClusterIP for internal services, and a single Ingress (backed by one LoadBalancer Service for the Ingress Controller itself) to route external traffic to many internal Services.
Q10. What is Ingress and how does it differ from a LoadBalancer Service?
An Ingress is a Kubernetes resource that manages external HTTP/HTTPS traffic and routes it to different Services based on rules you define (hostname, path). It requires an Ingress Controller (NGINX Ingress, Traefik, AWS Load Balancer Controller) running in the cluster to actually implement the rules.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
spec:
rules:
- host: api.example.com
http:
paths:
- path: /users
pathType: Prefix
backend:
service:
name: users-service
port:
number: 80
- path: /orders
pathType: Prefix
backend:
service:
name: orders-service
port:
number: 80The key difference: a LoadBalancer Service gives you one external IP per Service, each with its own cloud load balancer cost. An Ingress lets you route many different paths and hostnames through a single entry point and a single load balancer, dramatically reducing cost and complexity as the number of services grows.
Category 2: Workloads and Scaling (Q11-Q18)
Workload questions test whether you know which Kubernetes object to use for a given scenario and how scaling, updates, and health checks actually work in production.
Q11. What is the difference between a Deployment, a StatefulSet, and a DaemonSet?
Deployment: for stateless applications where every replica is interchangeable. Pods get random names and are not guaranteed any particular identity. Best for web servers, APIs, and anything where replicas are clones of each other.
StatefulSet: for stateful applications that need stable, unique network identities and stable storage. Pods get predictable names (pod-0, pod-1, pod-2) and are created/deleted in order. Each Pod gets its own PersistentVolumeClaim that survives Pod rescheduling. Best for databases, message queues, and anything where each replica has a distinct role or owns specific data.
DaemonSet: ensures a copy of a specific Pod runs on every node (or a selected subset). When a new node joins, the DaemonSet automatically schedules a Pod on it. Best for log collectors, monitoring agents, and CNI network plugins that need to run on every node.
# StatefulSet snippet: note stable identity via volumeClaimTemplates
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 3
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
# Creates postgres-0, postgres-1, postgres-2, each with its own PVCQ12. How do rolling updates and rollbacks work in a Deployment?
A rolling update gradually replaces old Pods with new ones, maintaining availability throughout. You control the pace with maxSurge and maxUnavailable.
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # can create 1 extra Pod above desired count during update
maxUnavailable: 0 # never go below desired count (zero downtime)# Trigger a rolling update by changing the image
kubectl set image deployment/web web=myapp:2.0
# Watch the rollout
kubectl rollout status deployment/web
# Check rollout history
kubectl rollout history deployment/web
# Rollback to the previous revision if something is wrong
kubectl rollout undo deployment/web
# Rollback to a specific revision
kubectl rollout undo deployment/web --to-revision=3Kubernetes keeps old ReplicaSets around (by default, the last 10 revisions) so rollbacks are fast: it just scales the old ReplicaSet back up and scales the broken one down.
Q13. What is the difference between Horizontal Pod Autoscaler and Vertical Pod Autoscaler?
Horizontal Pod Autoscaler (HPA): adjusts the number of Pod replicas based on observed metrics (CPU, memory, or custom metrics via the Metrics API). Scales out (more Pods) under load, scales in when load drops.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70Vertical Pod Autoscaler (VPA): adjusts the CPU and memory requests/limits of existing Pods rather than the replica count. Useful when you're not sure what resource values to set and want Kubernetes to recommend or automatically apply them based on observed usage.
The two solve different scaling problems: HPA handles "I need more capacity," VPA handles "each instance needs more or less resources than I gave it." They can be combined carefully but require coordination since VPA restarts Pods to apply new resource values, which can conflict with HPA's scaling decisions if misconfigured.
Q14. What are resource requests and limits, and what is a QoS class?
requests: the amount of CPU/memory Kubernetes guarantees the container. The scheduler uses this to decide which node has room for the Pod.
limits: the maximum amount the container is allowed to use. Exceeding the memory limit triggers an OOM kill. Exceeding the CPU limit causes throttling (not termination).
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"Based on how requests and limits are set, every Pod gets a Quality of Service (QoS) class:
Guaranteed: requests equal limits for both CPU and memory, on every container in the Pod. Highest priority, least likely to be evicted under node pressure.
Burstable: requests are set but are lower than limits (or only some resources have limits). Medium priority.
BestEffort: no requests or limits set at all. Lowest priority, first to be evicted when a node runs low on resources.
QoS class matters during node memory pressure: Kubernetes evicts BestEffort Pods first, then Burstable, and Guaranteed Pods last.
Q15. What is the difference between a Job and a CronJob?
A Job runs a Pod (or several) to completion for a one-off task, then stops. Unlike a Deployment, it does not keep the Pod running indefinitely.
apiVersion: batch/v1
kind: Job
metadata:
name: data-migration
spec:
template:
spec:
containers:
- name: migrate
image: myapp:migration
command: ["python", "migrate.py"]
restartPolicy: OnFailure
backoffLimit: 3 # retry up to 3 times on failureA CronJob creates Jobs on a recurring schedule, using standard cron syntax.
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-report
spec:
schedule: "0 2 * * *" # every day at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: report
image: myapp:reports
restartPolicy: OnFailureUse Jobs for database migrations, batch processing, and one-time data backfills. Use CronJobs for nightly reports, periodic cleanup tasks, and scheduled backups.
Q16. What is an init container and why would you use one?
An init container runs and completes before the main application containers in a Pod start. Each init container must complete successfully before the next one (or the main containers) begins. If an init container fails, the Pod restarts according to its restart policy.
spec:
initContainers:
- name: wait-for-db
image: busybox
command: ['sh', '-c', 'until nc -z postgres-service 5432; do sleep 2; done']
- name: run-migrations
image: myapp:migrate
command: ['python', 'migrate.py']
containers:
- name: app
image: myapp:latest- Waiting for a dependency (database, another service) to become available before starting the main app.
- Running database migrations before the application starts.
- Cloning a git repository or fetching configuration before the app boots.
- Setting up file permissions or directory structure on a shared volume.
Init containers keep this setup logic separate from the application image and guarantee strict ordering, which a single container with a startup script cannot guarantee as cleanly.
Q17. What is the difference between liveness, readiness, and startup probes?
All three are health checks the kubelet runs against a container, but they answer different questions and trigger different actions.
Liveness probe: "Is this container still alive and functioning?" If it fails, Kubernetes kills and restarts the container. Use for detecting deadlocks or unrecoverable states.
Readiness probe: "Is this container ready to receive traffic?" If it fails, the Pod is removed from Service endpoints (no traffic routed to it) but is NOT restarted. Use for temporary unavailability, like a Pod still loading a large cache on startup.
Startup probe: "Has this container finished its slow startup process?" Liveness and readiness probes are disabled until the startup probe succeeds. Use for applications with a long, variable startup time, to avoid the liveness probe killing the container before it has even finished booting.
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10 # allows up to 300 seconds for slow-starting appsQ18. What is a Pod Disruption Budget and when do you need one?
A Pod Disruption Budget (PDB) limits how many Pods of a replicated application can be down simultaneously during voluntary disruptions: node drains for maintenance, cluster autoscaler scale-downs, or manual evictions.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
minAvailable: 2 # at least 2 Pods must stay available
selector:
matchLabels:
app: webPDBs only protect against voluntary disruptions (things Kubernetes initiates deliberately). They cannot protect against involuntary disruptions like a node crashing unexpectedly or a hardware failure.
Without a PDB, a cluster admin draining a node for maintenance could accidentally take down every replica of a service at once if they all happen to be scheduled there. With a PDB, the drain operation respects the minimum availability and proceeds Pod by Pod instead.
Category 3: Networking (Q19-Q24)
Networking questions separate candidates who have run real clusters from those who have only read documentation. Expect questions about DNS, CNI plugins, and why Services sometimes fail to route traffic.
Q19. How does Pod-to-Pod networking work in Kubernetes?
Kubernetes requires a flat networking model: every Pod gets its own unique IP address, and every Pod can communicate with every other Pod across the entire cluster without NAT, regardless of which node they're on.
Kubernetes itself does not implement this networking. It defines the requirement and delegates implementation to a CNI (Container Network Interface) plugin: Calico, Cilium, Flannel, or the cloud provider's native CNI (AWS VPC CNI, Azure CNI).
The CNI plugin is responsible for assigning IP addresses to Pods, setting up routes between nodes so Pods on different nodes can reach each other, and enforcing NetworkPolicies if the plugin supports it (not all CNIs do; Calico and Cilium do, basic Flannel does not).
Without a CNI installed, Pods on different nodes have no route to each other. A fresh kubeadm cluster will show nodes stuck in NotReady until a CNI is applied.
Q20. How does Service discovery and DNS work in Kubernetes?
Every Service gets a stable DNS name automatically, resolved through CoreDNS (the default cluster DNS addon). The format is:
<service-name>.<namespace>.svc.cluster.localWithin the same namespace, you can just use the Service name:
// From a Pod in the same namespace as 'payment-service'
const response = await fetch('http://payment-service/charge');
// From a different namespace, use the fully qualified name
const response = await fetch('http://payment-service.billing.svc.cluster.local/charge');This is the mechanism that solves the "Pods are ephemeral with changing IPs" problem. Application code never hardcodes a Pod IP; it always calls a Service name, and kube-proxy handles routing that request to one of the currently healthy Pod IPs behind that Service.
Q21. What is a headless Service and when do you use one?
A normal ClusterIP Service load-balances requests across Pods behind a single virtual IP. A headless Service (set clusterIP: None) skips the virtual IP entirely. DNS queries against a headless Service return the individual Pod IPs directly.
apiVersion: v1
kind: Service
metadata:
name: postgres
spec:
clusterIP: None # headless
selector:
app: postgres
ports:
- port: 5432# DNS lookup against a headless Service returns each Pod's IP individually
nslookup postgres.default.svc.cluster.local
# Returns: postgres-0.postgres.default.svc.cluster.local -> 10.244.1.5
# postgres-1.postgres.default.svc.cluster.local -> 10.244.2.7Use headless Services with StatefulSets when clients need to address a specific replica directly (connect to the primary database instance, not just any instance) rather than being load-balanced to a random one. This pairs with the StatefulSet concept covered in Q11, and is especially relevant when running databases like MongoDB or Redis in Kubernetes.
Q22. What is a NetworkPolicy and what does it control?
A NetworkPolicy controls which Pods are allowed to communicate with which other Pods, and on which ports. By default, Kubernetes allows all Pods to talk to all other Pods unrestricted. NetworkPolicies let you implement a default-deny, allow-specific-traffic model.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080This policy says: only Pods labeled app=frontend may send traffic to Pods labeled app=backend, and only on port 8080. Everything else is denied.
Q23. What is an Ingress Controller and why is it required separately from Ingress resources?
An Ingress resource is just a set of routing rules, a declaration of intent. It does nothing on its own. An Ingress Controller is the actual software that watches Ingress resources and implements the routing: it's typically a reverse proxy (NGINX, Traefik, HAProxy) running as a Deployment in your cluster, exposed via a LoadBalancer Service.
# Without an Ingress Controller installed, Ingress resources do nothing
kubectl get ingress
# Will show your rules but ADDRESS column stays emptyDifferent controllers support different annotation-based features (rate limiting, custom headers, canary deployments) beyond the base Ingress spec. Choosing an Ingress Controller is an architectural decision, not just a default that works the same everywhere.
Q24. What is the role of CNI plugins, and name a few popular ones.
CNI (Container Network Interface) is the standard interface Kubernetes uses to delegate Pod networking setup to a plugin. Without a CNI plugin installed, Pods cannot get assigned IP addresses or communicate across nodes.
Calico: widely used, supports NetworkPolicy enforcement, BGP-based routing, good performance at scale. Common choice for production clusters needing network policy enforcement.
Cilium: eBPF-based, increasingly popular for its performance and observability features. Supports advanced NetworkPolicy, including L7-aware policies (for example, allow only specific HTTP methods/paths).
Flannel: simple, easy to set up, but does not support NetworkPolicy enforcement. Good for simple clusters where network segmentation isn't a requirement.
AWS VPC CNI / Azure CNI: cloud-native plugins that assign Pods real VPC IPs, integrating tightly with the cloud provider's networking and security groups.
Category 4: Storage and Configuration (Q25-Q28)
Storage questions test whether you understand how Kubernetes decouples storage provisioning from consumption and how configuration data flows into containers.
Q25. What is the difference between a PersistentVolume and a PersistentVolumeClaim?
A PersistentVolume (PV) is a piece of storage in the cluster, provisioned either manually by an admin or dynamically via a StorageClass. It exists independently of any Pod's lifecycle.
A PersistentVolumeClaim (PVC) is a request for storage by a user/Pod. It specifies size and access mode, and Kubernetes binds it to a matching PV.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: gp3# Pod references the PVC, not the PV directly
spec:
containers:
- name: app
volumeMounts:
- name: data
mountPath: /var/lib/data
volumes:
- name: data
persistentVolumeClaim:
claimName: data-pvcThe PV/PVC split decouples storage provisioning from storage consumption. Developers write PVCs without needing to know the underlying storage infrastructure. Admins (or dynamic provisioners) handle PVs.
Q26. What is a StorageClass and what does dynamic provisioning mean?
A StorageClass defines a "class" of storage with a specific provisioner and parameters (disk type, IOPS, replication). When a PVC requests a StorageClass, Kubernetes automatically provisions a matching PV on demand, rather than requiring an admin to pre-create volumes manually.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumervolumeBindingMode: WaitForFirstConsumer delays volume creation until a Pod actually uses the PVC, which lets the scheduler factor in storage availability zone constraints when picking a node. Without dynamic provisioning, an admin would need to manually provision a cloud disk for every single PVC requested, which does not scale.
Q27. How are ConfigMaps and Secrets mounted into a Pod as files versus environment variables?
Both mounting strategies are common, and the choice matters for how configuration changes propagate.
As environment variables: simple, but changes to the ConfigMap/Secret do NOT automatically update already-running Pods. You need to restart the Pod to pick up new values.
envFrom:
- configMapRef:
name: app-configAs a mounted volume: Kubernetes updates the mounted files automatically when the underlying ConfigMap/Secret changes (after a short propagation delay, typically under a minute), without restarting the Pod. The application needs to watch the file for changes itself if it wants to reload configuration live.
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: app-configUse environment variables for simple, rarely-changing settings. Use volume mounts when you want configuration to update without a full Pod restart, or when mounting TLS certificates that rotate periodically.
Q28. How do you inject a database password from a Secret without exposing it in plain text?
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
template:
spec:
containers:
- name: api
image: myapp:latest
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: passwordThe value is injected at runtime by the kubelet, never baked into the image or visible in the Deployment YAML itself (only the Secret name and key are referenced). To inspect it for debugging:
# View the Secret's keys (without decoding values)
kubectl describe secret db-credentials
# Decode a specific value (requires appropriate RBAC permission)
kubectl get secret db-credentials -o jsonpath='{.data.password}' | base64 -dFor stronger security than the Kubernetes-native Secret object, many production setups integrate an external secrets manager (AWS Secrets Manager, HashiCorp Vault) using a tool like External Secrets Operator, which syncs secrets from the external store into native Kubernetes Secrets automatically and supports rotation.
Category 5: Troubleshooting and Scenarios (Q29-Q36)
Troubleshooting questions are where senior candidates separate themselves. These scenario-based questions test whether you can systematically diagnose real production issues, not just recite definitions.
Q29. A Pod is stuck in CrashLoopBackOff. Walk through how you'd debug it.
CrashLoopBackOff is a status, not a root cause. It means the container starts, exits, Kubernetes restarts it, it exits again, and Kubernetes is now waiting an exponentially increasing delay (10s, 20s, 40s, up to 300s) before trying again.
# Step 1: check status and restart count
kubectl get pods
# High RESTARTS count confirms the loop
# Step 2: describe the pod (check State, Last State, and Events)
kubectl describe pod <pod-name>
# Look for: Last State: Terminated, Reason, Exit Code
# Step 3: check current logs (often empty if the container restarted recently)
kubectl logs <pod-name>
# Step 4: check logs from the PREVIOUS crashed instance (this is the key step)
kubectl logs <pod-name> --previous
# Step 5: if the pod is multi-container, specify which one
kubectl logs <pod-name> -c <container-name> --previousCommon root causes, in order of frequency: missing or malformed environment variables/Secrets, a database or dependency that isn't reachable yet, an application bug causing an unhandled exception on startup, or a memory limit set too low (OOMKilled).
Q30. What do exit codes 0, 1, 137, and 143 mean in Kubernetes?
Exit codes communicate why a container stopped. Codes 0 through 125 are application-defined. Codes above 128 typically mean the process was terminated by a signal, following the convention: 128 + signal number.
| Exit Code | Meaning | Common Cause |
|---|---|---|
| 0 | Clean, successful exit | Intentional, no action needed |
| 1 | Application error | Unhandled exception, application-level failure |
| 127 | Command not found | Typo in Dockerfile CMD/ENTRYPOINT, missing binary |
| 137 | SIGKILL (128 + 9) | OOMKilled, or manual kill -9 |
| 139 | SIGSEGV (128 + 11) | Segmentation fault, memory access violation |
| 143 | SIGTERM (128 + 15) | Graceful shutdown request (normal during deploys) |
The 137 vs 143 distinction is a very common interview question. 143 means the container received SIGTERM and shut down gracefully within the grace period. 137 means it either ignored SIGTERM and got force-killed with SIGKILL after the grace period expired, or was killed immediately by the OOM killer.
# Confirm OOMKilled specifically
kubectl describe pod <pod-name> | grep -A 5 "Last State"
# Last State: Terminated
# Reason: OOMKilled
# Exit Code: 137Q31. A Pod shows ImagePullBackOff. What are the possible causes and how do you fix each?
ImagePullBackOff means Kubernetes cannot pull the container image specified in the Pod spec, and is backing off between retries.
kubectl describe pod <pod-name>
# Check the Events section for the specific pull error messageWrong image name or tag: typo in the image reference, or the tag doesn't exist in the registry. Fix: verify the exact image name and tag.
Private registry without credentials: the cluster has no way to authenticate to a private registry. Fix: create an imagePullSecret and reference it in the Pod spec.
kubectl create secret docker-registry regcred \
--docker-server=<registry-url> \
--docker-username=<username> \
--docker-password=<password>spec:
imagePullSecrets:
- name: regcred
containers:
- name: app
image: myregistry.com/myapp:1.0Network or DNS issue reaching the registry: the node cannot resolve or reach the registry endpoint. Fix: check node network policies, proxy configuration, or registry status.
Rate limiting: public registries like Docker Hub rate-limit anonymous pulls. Fix: authenticate even for public images, or use a registry mirror.
Q32. A Pod is stuck in Pending state. What would you check?
Pending means the Pod has been accepted by the API server but has not yet been scheduled to a node, or the node cannot start it.
kubectl describe pod <pod-name>
# The Events section almost always explains whyInsufficient cluster resources: no node has enough free CPU/memory to satisfy the Pod's resource requests.
kubectl describe nodes | grep -A 5 "Allocated resources"Fix: scale the cluster, free up resources, or lower the Pod's requests.
Unsatisfiable scheduling constraints: nodeSelector, affinity rules, or taints/tolerations that no available node matches.
kubectl get nodes --show-labels # check labels match nodeSelector
kubectl describe node <node> | grep TaintsPersistentVolumeClaim not bound: the Pod references a PVC that has no matching PV and dynamic provisioning failed or is misconfigured.
kubectl get pvc
# Status should be "Bound", not "Pending"No nodes available at all: cluster autoscaler is still spinning up capacity, or all nodes are NotReady.
Q33. How would you debug a Service that isn't routing traffic to its Pods?
This is one of the most common real-world scenarios. Here is a systematic approach:
# Step 1: confirm the Service has registered endpoints
kubectl get endpoints <service-name>
# If empty, the Service's selector is not matching any Pod's labels
# Step 2: compare the Service selector to the Pod's actual labels
kubectl get service <service-name> -o yaml | grep -A 3 selector
kubectl get pods --show-labels
# Step 3: confirm the Pods are actually Ready
kubectl get pods -l app=<your-app-label>
# READY column should show 1/1, not 0/1
# A failing readinessProbe removes the Pod from Service endpoints even if it's Running
# Step 4: test connectivity directly to a Pod, bypassing the Service
kubectl port-forward pod/<pod-name> 8080:8080
curl localhost:8080/health
# Step 5: test from inside another Pod in the cluster
kubectl run debug --image=busybox -it --rm -- sh
wget -O- http://<service-name>.<namespace>.svc.cluster.localThe single most common root cause: a typo or mismatch between the Service's selector and the Pod's labels, resulting in zero matched endpoints. The second most common: a failing readiness probe silently removing otherwise-healthy Pods from rotation.
Q34. What exactly happens when you run kubectl delete pod?
This is a deceptively deep question that tests whether you understand the graceful termination sequence.
- The API server marks the Pod for deletion and sets a deletionTimestamp.
- Kubernetes sends SIGTERM to the container's main process and starts the terminationGracePeriodSeconds countdown (default 30 seconds).
- Simultaneously, the Pod is removed from any Service's endpoints, so new traffic stops being routed to it almost immediately.
- The application should use this grace period to finish in-flight requests and shut down cleanly.
- If the process has not exited once the grace period expires, Kubernetes sends SIGKILL, forcefully terminating it (this produces exit code 137, not 143).
- If the Pod was managed by a Deployment/ReplicaSet, the controller immediately creates a replacement Pod to maintain the desired replica count.
spec:
terminationGracePeriodSeconds: 60 # override the 30s default if your app
# needs longer to drain connectionsIf you manually created a standalone Pod (not managed by a Deployment), deleting it is permanent. No controller exists to recreate it.
Q35. Multiple services in your cluster suddenly start failing simultaneously. How do you approach this?
This tests incident response methodology, not just Kubernetes trivia. Start broad before going deep. A simultaneous multi-service failure usually points to a shared dependency, not N independent bugs.
# Check overall cluster health first
kubectl get nodes
# Are any nodes NotReady? That alone explains multi-service failures.
kubectl get events --all-namespaces --sort-by='.lastTimestamp' | tail -30
# Recent cluster-wide events often reveal the trigger
# Check for resource exhaustion across the cluster
kubectl top nodes
kubectl describe nodes | grep -A 5 "Conditions"Likely root causes for simultaneous failures: a shared dependency went down (database, DNS, a common upstream API), a node or availability zone outage, a NetworkPolicy or DNS change that broke connectivity cluster-wide, an etcd or API server issue affecting the whole control plane, or a recent cluster-wide change (a Helm upgrade, a CNI update, a cert rotation) that landed just before the failures started.
The instinct to demonstrate here: check what changed recently and what is shared across the failing services, rather than debugging each failing service in isolation as if they were unrelated.
Q36. How would you debug unexpected resource exhaustion on a node?
# Identify which nodes are under pressure
kubectl top nodes
# Identify which Pods are consuming the most on that node
kubectl top pods --all-namespaces --sort-by=memory
kubectl top pods --all-namespaces --sort-by=cpu
# Check what's actually scheduled on the affected node
kubectl get pods --all-namespaces -o wide | grep <node-name>
# Review resource requests/limits cluster-wide for over-commitment
kubectl describe node <node-name> | grep -A 10 "Allocated resources"Things to look for: a Pod with no resource limits set (BestEffort QoS) consuming far more than expected, a memory leak in a long-running Pod that has been up for days, too many Pods scheduled onto one node because of poor anti-affinity configuration, or a DaemonSet (running on every node) that recently got more resource-hungry after an update.
Longer-term fix: set sane resource requests and limits everywhere, use the Vertical Pod Autoscaler in recommendation mode to right-size requests based on actual historical usage, and configure pod anti-affinity for resource-heavy workloads so they spread across nodes rather than clustering.
Category 6: Security and Production Practices (Q37-Q40)
Security questions round out the interview with topics that matter in production clusters: access control, secrets management, and packaging. Expect at least one question from this category in any senior-level interview.
Q37. What is RBAC in Kubernetes and how do Roles and RoleBindings work together?
RBAC (Role-Based Access Control) controls who can perform which actions on which resources in the cluster. Two object types work together.
Role (or ClusterRole for cluster-wide scope): defines a set of permissions, specifying which verbs (get, list, create, delete) are allowed on which resources, within a namespace (Role) or cluster-wide (ClusterRole).
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]RoleBinding (or ClusterRoleBinding): grants the permissions defined in a Role to a specific user, group, or ServiceAccount.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods-binding
namespace: production
subjects:
- kind: User
name: jane@example.com
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.ioThe principle: a Role defines WHAT is allowed, a RoleBinding defines WHO gets that permission. Follow least privilege: grant the narrowest Role that lets someone do their job, scoped to a namespace whenever possible rather than cluster-wide.
Q38. Are Kubernetes Secrets encrypted by default?
No, and this catches many people off guard. By default, Secret values are only base64-ENCODED (not encrypted) and stored as plain text within etcd. Anyone with access to etcd, or with sufficient RBAC permissions to read Secret objects via the API, can trivially retrieve the real value.
To actually encrypt Secrets at rest, you must explicitly configure encryption at the API server level using an EncryptionConfiguration:
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {}Managed Kubernetes services (EKS, GKE, AKS) typically offer this as a configurable option (often integrated with the cloud provider's KMS) rather than something you configure on raw etcd yourself. Always verify whether encryption at rest is actually enabled, rather than assuming Kubernetes secrets are secure by default just because the name says "Secret."
Q39. What are Pod Security Standards and what do they replace?
Pod Security Standards (PSS) define three security policy levels that restrict what a Pod is allowed to do at the cluster or namespace level. They replaced PodSecurityPolicy (PSP), which was deprecated in 1.21 and removed in 1.25.
Privileged: unrestricted, allows known privilege escalations. Use only for trusted, infrastructure-level workloads (CNI plugins, monitoring agents that need host access).
Baseline: prevents known privilege escalations while allowing default Pod configurations to work. Disallows things like running with the host network namespace, host PID namespace, or privileged containers.
Restricted: heavily restricted, follows current Pod hardening best practices. Requires running as non-root, disallows privilege escalation, and drops all Linux capabilities by default.
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restrictedApplying these labels to a namespace enforces the policy at admission time, rejecting any Pod spec that violates the chosen level.
Q40. What is Helm and when should a team adopt it?
Helm is the most widely used package manager for Kubernetes. A Helm chart packages a complete application's Kubernetes resources (Deployments, Services, ConfigMaps, Ingress, etc.) into a single, versioned, reusable template.
# Create a new chart scaffold
helm create my-app
# Install a chart, providing custom values
helm install my-release ./my-app --values production-values.yaml
# Upgrade an existing release
helm upgrade my-release ./my-app --set image.tag=2.1.0
# Roll back to a previous release
helm rollback my-release 3A chart's values.yaml lets you parameterize environment-specific settings (replica count, image tag, resource limits) without duplicating YAML files per environment:
# values.yaml
replicaCount: 3
image:
repository: myapp
tag: "1.0"
resources:
limits:
memory: "512Mi"Adopt Helm when you're deploying the same application across multiple environments with different configuration, you want versioned, rollback-able releases instead of raw kubectl apply, or you're installing third-party software (most popular open-source Kubernetes tools ship official Helm charts as the primary installation method).
Skip Helm when you have a single simple application with one environment and raw YAML manifests are still easy to manage by hand. Helm adds real value once you have multiple environments or multiple similar deployments that would otherwise mean copy-pasting YAML.
Quick Reference: All 40 Questions at a Glance
| Q# | Question | Core Concept |
|---|---|---|
| Q1 | What is Kubernetes and why not just Docker | Orchestration, self-healing, scaling, rolling updates |
| Q2 | Control plane components | API server, etcd, scheduler, controller manager |
| Q3 | Worker node components | kubelet, kube-proxy, container runtime |
| Q4 | What is a Pod | Smallest unit, shared network namespace, ephemeral |
| Q5 | ReplicaSet vs Deployment | Replica count vs rolling updates/rollbacks |
| Q6 | What is a Namespace | Multi-tenancy, scoping, RBAC + ResourceQuota pairing |
| Q7 | Labels and Selectors | Key-value tags, how controllers find their Pods |
| Q8 | ConfigMap vs Secret | Non-sensitive vs sensitive, base64 is not encryption |
| Q9 | ClusterIP vs NodePort vs LoadBalancer | Internal vs node-exposed vs cloud LB per service |
| Q10 | Ingress vs LoadBalancer Service | Single entry point, path/host routing, cost savings |
| Q11 | Deployment vs StatefulSet vs DaemonSet | Stateless vs stable identity vs one-per-node |
| Q12 | Rolling updates and rollbacks | maxSurge/maxUnavailable, rollout undo |
| Q13 | HPA vs VPA | Scale replica count vs scale resource allocation |
| Q14 | Resource requests/limits and QoS classes | Guaranteed, Burstable, BestEffort eviction order |
| Q15 | Job vs CronJob | Run-to-completion vs scheduled recurring |
| Q16 | Init containers | Run-before-main, strict ordering, dependency waits |
| Q17 | Liveness vs Readiness vs Startup probes | Restart vs remove-from-service vs delay-other-probes |
| Q18 | Pod Disruption Budget | Limits voluntary disruption, minAvailable |
| Q19 | Pod-to-Pod networking | Flat network model, CNI plugin responsibility |
| Q20 | Service discovery and DNS | CoreDNS, service-name.namespace.svc.cluster.local |
| Q21 | Headless Service | ClusterIP: None, direct Pod IP resolution |
| Q22 | NetworkPolicy | Default-allow-all without it, CNI support required |
| Q23 | Ingress Controller | Implements Ingress rules, NGINX/Traefik/HAProxy |
| Q24 | CNI plugins | Calico, Cilium, Flannel, cloud-native CNIs |
| Q25 | PersistentVolume vs PersistentVolumeClaim | Storage resource vs storage request |
| Q26 | StorageClass and dynamic provisioning | On-demand PV creation, WaitForFirstConsumer |
| Q27 | ConfigMap/Secret as env vars vs mounted files | Restart required vs live-update propagation |
| Q28 | Injecting Secrets safely | secretKeyRef, base64 decode, external secrets managers |
| Q29 | Debugging CrashLoopBackOff | describe, logs --previous, exponential backoff |
| Q30 | Exit codes 0, 1, 137, 143 | Signal math (128+N), OOMKilled vs graceful SIGTERM |
| Q31 | Debugging ImagePullBackOff | Wrong tag, missing imagePullSecret, rate limiting |
| Q32 | Debugging Pod stuck Pending | Resource shortage, scheduling constraints, PVC unbound |
| Q33 | Debugging a Service with no traffic | Selector/label mismatch, failing readiness probe |
| Q34 | What happens on kubectl delete pod | SIGTERM, grace period, endpoint removal, SIGKILL |
| Q35 | Multiple services failing simultaneously | Shared dependency, node/AZ outage, recent change |
| Q36 | Debugging node resource exhaustion | kubectl top, BestEffort QoS, anti-affinity fix |
| Q37 | RBAC: Roles and RoleBindings | What is allowed vs who gets it, least privilege |
| Q38 | Are Secrets encrypted by default | No, base64 only, EncryptionConfiguration required |
| Q39 | Pod Security Standards | Privileged, Baseline, Restricted; replaced PSP |
| Q40 | Helm and when to adopt it | Chart packaging, values.yaml, versioned rollbacks |
Frequently Asked Questions
What level of Kubernetes knowledge do these 40 questions target?
This guide spans mid-level fundamentals through senior production expertise. Questions 1 through 18 (architecture, workloads, scaling) cover the conceptual foundation that any candidate deploying to Kubernetes should know confidently.
Questions 19 through 28 (networking, storage) go deeper into operational knowledge that separates mid-level from senior candidates. Questions 29 through 36 (troubleshooting scenarios) are the hardest section: they test real-world debugging instinct with specific kubectl command sequences, which is what senior and staff-level interviews probe in 2026.
Do I need CKA or CKAD certification to answer these interview questions?
No. CKA (Certified Kubernetes Administrator) and CKAD (Certified Kubernetes Application Developer) are valuable credentials, but they are not required to answer these questions well. The questions here focus on concepts and troubleshooting instinct, which you can build through hands-on experience with any Kubernetes cluster.
That said, if you can answer these 40 questions confidently with real examples, you are well-positioned for CKA/CKAD exam preparation. The overlap is significant, especially in the architecture (Q1-Q10), workloads (Q11-Q18), and troubleshooting (Q29-Q36) categories.
How can I practice Kubernetes locally before an interview?
You can run a full single-node Kubernetes cluster on your laptop using several free tools. The most popular options for local development:
# Option 1: minikube (most widely used for learning)
minikube start
kubectl get nodes # single-node cluster running locally
# Option 2: kind (Kubernetes in Docker, great for CI and testing)
kind create cluster --name practice
kubectl cluster-info
# Option 3: Docker Desktop includes a built-in Kubernetes toggle
# Enable it in Docker Desktop > Settings > Kubernetes > Enable KubernetesOnce your local cluster is running, practice every kubectl command from the troubleshooting section (Q29-Q36). Deploy a simple app, break it intentionally (wrong image tag, missing Secret, resource limits too low), and debug it using the exact sequences described in this guide.
What is the difference between kubectl apply and kubectl create?
kubectl create is imperative: it creates a resource and fails if the resource already exists. kubectl apply is declarative: it creates the resource if it doesn't exist, or updates it if it does, by comparing the desired state in your YAML file to the current state in the cluster.
# Imperative: fails on second run because the resource exists
kubectl create -f deployment.yaml
# Error: deployments.apps "web" already exists
# Declarative: creates on first run, updates on subsequent runs
kubectl apply -f deployment.yaml
# deployment.apps/web created (first time)
# deployment.apps/web configured (subsequent times)In production workflows, kubectl apply is the standard because it supports the GitOps pattern: store manifests in Git, and apply them to the cluster on every change. kubectl create is useful for one-off operations like creating a Secret or Namespace interactively.
How do I check which Pods are running on a specific node?
Use the -o wide flag to see which node each Pod is scheduled on, then filter by node name:
# List all Pods with their node assignments
kubectl get pods --all-namespaces -o wide
# Filter to a specific node
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=worker-node-1
# See resource usage per Pod on that node
kubectl top pods --all-namespaces --sort-by=cpu | head -20This is a common first step in the node resource exhaustion debugging flow described in Q36. Combine it with kubectl describe node to see allocated vs available resources.
What Kubernetes topics are most commonly asked in 2026 interviews?
Based on the frequency these topics appear in interview prep communities and job postings, the most commonly asked areas in 2026 are:
- Troubleshooting scenarios (CrashLoopBackOff, ImagePullBackOff, Pending Pods): asked in nearly every Kubernetes interview at mid-level and above.
- Deployment vs StatefulSet vs DaemonSet: the most common "compare these" question.
- Service types and Ingress: how traffic flows into and within the cluster.
- Resource requests/limits and QoS classes: especially the OOMKilled scenario and exit code 137.
- RBAC fundamentals: Roles, RoleBindings, least privilege.
- Probes (liveness, readiness, startup): understanding when Kubernetes restarts vs removes from traffic.
If you have limited prep time, focus on categories 1 and 5 (architecture and troubleshooting) first. These cover the questions you are most likely to face. For broader backend interview preparation, combine this guide with our NestJS interview questions for application-layer coverage.
Related Articles
50 Cloud & DevOps Interview Questions and Answers (2026)
50 cloud and DevOps interview questions covering AWS Lambda, Docker, Microservices, API Gateway, S3, serverless, and Azure Entra ID. With code examples.
30 NestJS Interview Questions and Answers (2026)
30 NestJS interview questions with full answers: modules, DI, guards, pipes, interceptors, JWT auth, microservices, and testing. Updated for 2026.
42 NoSQL Database Interview Questions and Answers (2026)
42 NoSQL interview questions covering MongoDB, Redis, and DynamoDB: aggregation pipelines, data structures, GSI vs LSI, and CAP theorem. Updated for 2026.