Dev Encyclopedia
ArticlesToolsContactAbout

Get notified when new content drops

No spam. Just new articles, tools, and updates straight to your inbox.

Dev Encyclopedia

A reference for builders

Content

  • Articles
  • Tools
  • About
  • Contact

Connect

  • support@devencyclopedia.com
  • RSS Feed

Legal

  • Privacy Policy
  • Terms of Service
  • Disclaimer

© 2026 Dev Encyclopedia

Back to top ↑
  1. Home
  2. /Blog
  3. /40 Kubernetes Interview Questions and Answers (2026)
devops38 min read

40 Kubernetes Interview Questions and Answers (2026)

40 Kubernetes interview questions covering Pods, Deployments, networking, RBAC, and real troubleshooting scenarios like CrashLoopBackOff and OOMKilled. Updated for 2026.

Zeeshan Tofiq
Zeeshan Tofiq
June 26, 2026
On this page

On this page

  • Category 1: Architecture and Core Concepts (Q1-Q10)
  • Q1. What is Kubernetes and why do teams need it instead of just Docker?
  • Q2. What are the core components of the Kubernetes control plane?
  • Q3. What are the core components running on a worker node?
  • Q4. What is a Pod and why doesn't Kubernetes manage containers directly?
  • Q5. What is the difference between a ReplicaSet and a Deployment?
  • Q6. What is a Namespace and when do you use one?
  • Q7. What are Labels and Selectors and how do they work together?
  • Q8. What is the difference between a ConfigMap and a Secret?
  • Q9. What is the difference between ClusterIP, NodePort, and LoadBalancer Services?
  • Q10. What is Ingress and how does it differ from a LoadBalancer Service?
  • Category 2: Workloads and Scaling (Q11-Q18)
  • Q11. What is the difference between a Deployment, a StatefulSet, and a DaemonSet?
  • Q12. How do rolling updates and rollbacks work in a Deployment?
  • Q13. What is the difference between Horizontal Pod Autoscaler and Vertical Pod Autoscaler?
  • Q14. What are resource requests and limits, and what is a QoS class?
  • Q15. What is the difference between a Job and a CronJob?
  • Q16. What is an init container and why would you use one?
  • Q17. What is the difference between liveness, readiness, and startup probes?
  • Q18. What is a Pod Disruption Budget and when do you need one?
  • Category 3: Networking (Q19-Q24)
  • Q19. How does Pod-to-Pod networking work in Kubernetes?
  • Q20. How does Service discovery and DNS work in Kubernetes?
  • Q21. What is a headless Service and when do you use one?
  • Q22. What is a NetworkPolicy and what does it control?
  • Q23. What is an Ingress Controller and why is it required separately from Ingress resources?
  • Q24. What is the role of CNI plugins, and name a few popular ones.
  • Category 4: Storage and Configuration (Q25-Q28)
  • Q25. What is the difference between a PersistentVolume and a PersistentVolumeClaim?
  • Q26. What is a StorageClass and what does dynamic provisioning mean?
  • Q27. How are ConfigMaps and Secrets mounted into a Pod as files versus environment variables?
  • Q28. How do you inject a database password from a Secret without exposing it in plain text?
  • Category 5: Troubleshooting and Scenarios (Q29-Q36)
  • Q29. A Pod is stuck in CrashLoopBackOff. Walk through how you'd debug it.
  • Q30. What do exit codes 0, 1, 137, and 143 mean in Kubernetes?
  • Q31. A Pod shows ImagePullBackOff. What are the possible causes and how do you fix each?
  • Q32. A Pod is stuck in Pending state. What would you check?
  • Q33. How would you debug a Service that isn't routing traffic to its Pods?
  • Q34. What exactly happens when you run kubectl delete pod?
  • Q35. Multiple services in your cluster suddenly start failing simultaneously. How do you approach this?
  • Q36. How would you debug unexpected resource exhaustion on a node?
  • Category 6: Security and Production Practices (Q37-Q40)
  • Q37. What is RBAC in Kubernetes and how do Roles and RoleBindings work together?
  • Q38. Are Kubernetes Secrets encrypted by default?
  • Q39. What are Pod Security Standards and what do they replace?
  • Q40. What is Helm and when should a team adopt it?
  • Quick Reference: All 40 Questions at a Glance
  • Frequently Asked Questions

Kubernetes shows up on nearly every backend, DevOps, and platform engineering job posting in 2026. Knowing the definitions is the easy part. What separates strong candidates is troubleshooting instinct: when a Pod won't start, when do you check logs versus events versus node conditions, and what does each exit code actually tell you.

These 40 questions cover both. The first half builds the conceptual foundation interviewers expect everyone to know. The second half focuses on the scenario-based and troubleshooting questions that actually separate candidates in 2026 interviews. Answers are written to be said out loud, not read from a textbook. If you're also preparing for cloud and infrastructure roles, our cloud and DevOps interview questions guide covers AWS Lambda, Docker, and microservices, pairing well with the Kubernetes content here.

💡 How to use this guide

Skim the Quick Reference table near the bottom to see all 40 questions and their core concept at a glance. Then use the table of contents to jump into the category you need most. Categories 1 through 3 (architecture, workloads, networking) cover the fundamentals every candidate must know. Category 5 (troubleshooting) is where mid-level and senior candidates get separated. Category 6 (security) rounds out production readiness topics.

Category 1: Architecture and Core Concepts (Q1-Q10)

Architecture questions test whether you understand the system you're deploying to, not just the kubectl commands. Expect questions about the control plane, worker nodes, and how Pods actually work under the hood.

Q1. What is Kubernetes and why do teams need it instead of just Docker?

Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications. Google open-sourced it in 2014, based on lessons from their internal system called Borg. It is now maintained by the CNCF.

Docker alone manages individual containers on a single host. It does not solve the problems that appear when you run hundreds of containers across dozens of machines: which node should run which container, what happens when a container crashes, how do containers find each other, and how do you update an application with zero downtime.

Kubernetes solves these with five core capabilities:

  • Self-healing: automatically restarts failed containers and reschedules Pods from dead nodes.
  • Auto-scaling: adds or removes Pods based on CPU, memory, or custom metrics.
  • Rolling updates and rollbacks: zero-downtime deployments with automatic rollback on failure.
  • Service discovery: stable DNS names for Pods that are constantly created and destroyed.
  • Load balancing: distributes traffic across healthy Pod replicas.

Q2. What are the core components of the Kubernetes control plane?

The control plane makes global decisions about the cluster and detects and responds to cluster events. It runs on master nodes (or is fully managed in EKS/GKE/AKS).

kube-apiserver: the front end of the control plane. Every interaction with the cluster, including kubectl commands, goes through the API server. It validates requests and writes the desired state to etcd.

etcd: a distributed, consistent key-value store. The single source of truth for all cluster state: every Pod, Deployment, Service, and Secret is stored here. Uses the Raft consensus algorithm for consistency across replicas. If etcd is lost, cluster state is lost, which is why etcd backups are critical.

kube-scheduler: watches for newly created Pods with no assigned node and selects a suitable node based on resource requirements, affinity rules, and taints/tolerations.

kube-controller-manager: runs controller processes that watch the cluster state via the API server and move it toward the desired state. Includes the Node controller, Replication controller, and Endpoints controller.

Q3. What are the core components running on a worker node?

kubelet: the agent on every node. It receives Pod specs from the API server and ensures the described containers are running and healthy. It reports node and Pod status back to the control plane.

kube-proxy: maintains network rules on each node that allow Pods to communicate with each other and with Services. Implements the Service abstraction at the network level (via iptables or IPVS).

Container runtime: the software that actually runs containers. containerd or CRI-O are the standard choices today. Docker Engine itself was deprecated as a direct Kubernetes runtime in 1.24, replaced by the Container Runtime Interface.

Q4. What is a Pod and why doesn't Kubernetes manage containers directly?

A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share the same network namespace (same IP, same port space) and can share storage volumes. Containers in a Pod communicate via localhost.

Kubernetes does not manage individual containers directly because real applications often need tightly coupled helper processes: a logging sidecar, a proxy, or an init step. The Pod abstraction lets you group these together as one schedulable, network-addressable unit.

yaml
apiVersion: v1
kind: Pod
metadata:
  name: web-with-sidecar
spec:
  containers:
  - name: app
    image: myapp:1.4
    ports:
    - containerPort: 8080
  - name: log-shipper
    image: fluent-bit:2.1

Pods are ephemeral. If a Pod dies, Kubernetes does not resurrect that exact Pod, it creates a new one with a new IP. This is why you never rely on a Pod's IP directly and instead use a Service.

Q5. What is the difference between a ReplicaSet and a Deployment?

A ReplicaSet ensures a specified number of identical Pod replicas are running at all times. If a Pod dies, the ReplicaSet creates a replacement. A ReplicaSet has no built-in support for rolling updates.

A Deployment is a higher-level object that manages ReplicaSets. It adds rolling updates, rollbacks, and revision history on top of what a ReplicaSet provides. When you update a Deployment's Pod template, it creates a new ReplicaSet and gradually shifts Pods from the old ReplicaSet to the new one.

bash
kubectl get replicasets
# A Deployment named "web" has ReplicaSets like web-7d9f8c6b5
# (the hash suffix changes with each new Pod template revision)

In practice, you always create Deployments, never ReplicaSets or Pods directly. A Deployment creates a ReplicaSet, which creates Pods. If you manually create a standalone Pod and it dies, it is gone permanently. If a Deployment-managed Pod dies, the ReplicaSet immediately creates a new one.

Q6. What is a Namespace and when do you use one?

A Namespace is a way to divide cluster resources between multiple users, teams, or environments within a single cluster. It provides a scope for names, meaning two resources with the same name can coexist in different namespaces.

bash
kubectl create namespace staging
kubectl create namespace production
kubectl get pods -n staging
kubectl config set-context --current --namespace=staging
  • Separate environments (dev, staging, production) within one cluster.
  • Multi-tenant clusters where different teams should not see or affect each other's resources.
  • Applying ResourceQuotas to limit how much CPU/memory a team can consume.
  • Combining with RBAC to restrict which users can access which namespace.

Some resources are not namespaced: Nodes, PersistentVolumes, and ClusterRoles exist at the cluster level, not within a namespace.

Q7. What are Labels and Selectors and how do they work together?

Labels are key-value pairs attached to Kubernetes objects (Pods, Services, Deployments) for identification and organization. They carry no inherent meaning to Kubernetes; you define what they mean.

yaml
metadata:
  labels:
    app: payment-service
    environment: production
    tier: backend

Selectors query objects by their labels. This is how a Deployment knows which Pods belong to it, and how a Service knows which Pods to route traffic to.

yaml
# Deployment's selector must match the Pod template's labels
spec:
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service   # must match the selector above
bash
# Query by label directly
kubectl get pods -l app=payment-service
kubectl get pods -l 'environment in (staging,production)'

If a Deployment's selector and the Pod template's labels do not match, Kubernetes rejects the configuration. This selector mechanism is how higher-level controllers find the Pods they manage.

Q8. What is the difference between a ConfigMap and a Secret?

Both store configuration data that is decoupled from your container image, letting you change configuration without rebuilding images. The difference is intent and handling.

ConfigMap: stores non-sensitive configuration such as feature flags, log levels, URLs, and non-secret settings. Stored as plain text in etcd.

bash
kubectl create configmap app-config --from-literal=LOG_LEVEL=info

Secret: stores sensitive data such as passwords, API keys, and TLS certificates. Values are base64-encoded (not encrypted by default, just encoded). For real protection you must enable encryption at rest for etcd.

bash
kubectl create secret generic db-credentials \
  --from-literal=username=admin \
  --from-literal=password='S3cur3P@ss'

Both can be consumed as environment variables or mounted as files:

yaml
env:
- name: LOG_LEVEL
  valueFrom:
    configMapKeyRef:
      name: app-config
      key: LOG_LEVEL
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: db-credentials
      key: password

⚠ Warning

Never put real secrets in a ConfigMap. Base64 encoding is not encryption; anyone with read access to the Secret object can trivially decode it.

Q9. What is the difference between ClusterIP, NodePort, and LoadBalancer Services?

These are the three core Service types, each exposing Pods differently.

ClusterIP (default): exposes the Service on an internal cluster IP, only reachable from within the cluster. Used for internal service-to-service communication.

yaml
apiVersion: v1
kind: Service
metadata:
  name: backend-service
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
  - port: 80
    targetPort: 8080

NodePort: exposes the Service on a static port (30000-32767 by default) on every node's IP. Accessible externally via NodeIP:NodePort, but exposes node IPs directly, which is not ideal for production.

LoadBalancer: provisions an external load balancer through the cloud provider (AWS ELB, GCP Cloud Load Balancer, Azure Load Balancer). Each LoadBalancer Service gets its own external IP. Works well for a handful of services but becomes expensive at scale since every Service gets a separate cloud load balancer.

The common production pattern: use ClusterIP for internal services, and a single Ingress (backed by one LoadBalancer Service for the Ingress Controller itself) to route external traffic to many internal Services.

Q10. What is Ingress and how does it differ from a LoadBalancer Service?

An Ingress is a Kubernetes resource that manages external HTTP/HTTPS traffic and routes it to different Services based on rules you define (hostname, path). It requires an Ingress Controller (NGINX Ingress, Traefik, AWS Load Balancer Controller) running in the cluster to actually implement the rules.

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /users
        pathType: Prefix
        backend:
          service:
            name: users-service
            port:
              number: 80
      - path: /orders
        pathType: Prefix
        backend:
          service:
            name: orders-service
            port:
              number: 80

The key difference: a LoadBalancer Service gives you one external IP per Service, each with its own cloud load balancer cost. An Ingress lets you route many different paths and hostnames through a single entry point and a single load balancer, dramatically reducing cost and complexity as the number of services grows.

Category 2: Workloads and Scaling (Q11-Q18)

Workload questions test whether you know which Kubernetes object to use for a given scenario and how scaling, updates, and health checks actually work in production.

Q11. What is the difference between a Deployment, a StatefulSet, and a DaemonSet?

Deployment: for stateless applications where every replica is interchangeable. Pods get random names and are not guaranteed any particular identity. Best for web servers, APIs, and anything where replicas are clones of each other.

StatefulSet: for stateful applications that need stable, unique network identities and stable storage. Pods get predictable names (pod-0, pod-1, pod-2) and are created/deleted in order. Each Pod gets its own PersistentVolumeClaim that survives Pod rescheduling. Best for databases, message queues, and anything where each replica has a distinct role or owns specific data.

DaemonSet: ensures a copy of a specific Pod runs on every node (or a selected subset). When a new node joins, the DaemonSet automatically schedules a Pod on it. Best for log collectors, monitoring agents, and CNI network plugins that need to run on every node.

yaml
# StatefulSet snippet: note stable identity via volumeClaimTemplates
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 3
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi
# Creates postgres-0, postgres-1, postgres-2, each with its own PVC

Q12. How do rolling updates and rollbacks work in a Deployment?

A rolling update gradually replaces old Pods with new ones, maintaining availability throughout. You control the pace with maxSurge and maxUnavailable.

yaml
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # can create 1 extra Pod above desired count during update
      maxUnavailable: 0  # never go below desired count (zero downtime)
bash
# Trigger a rolling update by changing the image
kubectl set image deployment/web web=myapp:2.0

# Watch the rollout
kubectl rollout status deployment/web

# Check rollout history
kubectl rollout history deployment/web

# Rollback to the previous revision if something is wrong
kubectl rollout undo deployment/web

# Rollback to a specific revision
kubectl rollout undo deployment/web --to-revision=3

Kubernetes keeps old ReplicaSets around (by default, the last 10 revisions) so rollbacks are fast: it just scales the old ReplicaSet back up and scales the broken one down.

Q13. What is the difference between Horizontal Pod Autoscaler and Vertical Pod Autoscaler?

Horizontal Pod Autoscaler (HPA): adjusts the number of Pod replicas based on observed metrics (CPU, memory, or custom metrics via the Metrics API). Scales out (more Pods) under load, scales in when load drops.

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Vertical Pod Autoscaler (VPA): adjusts the CPU and memory requests/limits of existing Pods rather than the replica count. Useful when you're not sure what resource values to set and want Kubernetes to recommend or automatically apply them based on observed usage.

The two solve different scaling problems: HPA handles "I need more capacity," VPA handles "each instance needs more or less resources than I gave it." They can be combined carefully but require coordination since VPA restarts Pods to apply new resource values, which can conflict with HPA's scaling decisions if misconfigured.

Q14. What are resource requests and limits, and what is a QoS class?

requests: the amount of CPU/memory Kubernetes guarantees the container. The scheduler uses this to decide which node has room for the Pod.

limits: the maximum amount the container is allowed to use. Exceeding the memory limit triggers an OOM kill. Exceeding the CPU limit causes throttling (not termination).

yaml
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Based on how requests and limits are set, every Pod gets a Quality of Service (QoS) class:

Guaranteed: requests equal limits for both CPU and memory, on every container in the Pod. Highest priority, least likely to be evicted under node pressure.

Burstable: requests are set but are lower than limits (or only some resources have limits). Medium priority.

BestEffort: no requests or limits set at all. Lowest priority, first to be evicted when a node runs low on resources.

QoS class matters during node memory pressure: Kubernetes evicts BestEffort Pods first, then Burstable, and Guaranteed Pods last.

Q15. What is the difference between a Job and a CronJob?

A Job runs a Pod (or several) to completion for a one-off task, then stops. Unlike a Deployment, it does not keep the Pod running indefinitely.

yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: myapp:migration
        command: ["python", "migrate.py"]
      restartPolicy: OnFailure
  backoffLimit: 3   # retry up to 3 times on failure

A CronJob creates Jobs on a recurring schedule, using standard cron syntax.

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-report
spec:
  schedule: "0 2 * * *"   # every day at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: report
            image: myapp:reports
          restartPolicy: OnFailure

Use Jobs for database migrations, batch processing, and one-time data backfills. Use CronJobs for nightly reports, periodic cleanup tasks, and scheduled backups.

Q16. What is an init container and why would you use one?

An init container runs and completes before the main application containers in a Pod start. Each init container must complete successfully before the next one (or the main containers) begins. If an init container fails, the Pod restarts according to its restart policy.

yaml
spec:
  initContainers:
  - name: wait-for-db
    image: busybox
    command: ['sh', '-c', 'until nc -z postgres-service 5432; do sleep 2; done']
  - name: run-migrations
    image: myapp:migrate
    command: ['python', 'migrate.py']
  containers:
  - name: app
    image: myapp:latest
  • Waiting for a dependency (database, another service) to become available before starting the main app.
  • Running database migrations before the application starts.
  • Cloning a git repository or fetching configuration before the app boots.
  • Setting up file permissions or directory structure on a shared volume.

Init containers keep this setup logic separate from the application image and guarantee strict ordering, which a single container with a startup script cannot guarantee as cleanly.

Q17. What is the difference between liveness, readiness, and startup probes?

All three are health checks the kubelet runs against a container, but they answer different questions and trigger different actions.

Liveness probe: "Is this container still alive and functioning?" If it fails, Kubernetes kills and restarts the container. Use for detecting deadlocks or unrecoverable states.

Readiness probe: "Is this container ready to receive traffic?" If it fails, the Pod is removed from Service endpoints (no traffic routed to it) but is NOT restarted. Use for temporary unavailability, like a Pod still loading a large cache on startup.

Startup probe: "Has this container finished its slow startup process?" Liveness and readiness probes are disabled until the startup probe succeeds. Use for applications with a long, variable startup time, to avoid the liveness probe killing the container before it has even finished booting.

yaml
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10   # allows up to 300 seconds for slow-starting apps

ℹ Info

A very common interview follow-up: "what happens if your liveness probe is too aggressive?" The container gets killed and restarted repeatedly even though the application is actually fine, just slow, often producing a CrashLoopBackOff that has nothing to do with an actual bug.

Q18. What is a Pod Disruption Budget and when do you need one?

A Pod Disruption Budget (PDB) limits how many Pods of a replicated application can be down simultaneously during voluntary disruptions: node drains for maintenance, cluster autoscaler scale-downs, or manual evictions.

yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-pdb
spec:
  minAvailable: 2          # at least 2 Pods must stay available
  selector:
    matchLabels:
      app: web

PDBs only protect against voluntary disruptions (things Kubernetes initiates deliberately). They cannot protect against involuntary disruptions like a node crashing unexpectedly or a hardware failure.

Without a PDB, a cluster admin draining a node for maintenance could accidentally take down every replica of a service at once if they all happen to be scheduled there. With a PDB, the drain operation respects the minimum availability and proceeds Pod by Pod instead.

Category 3: Networking (Q19-Q24)

Networking questions separate candidates who have run real clusters from those who have only read documentation. Expect questions about DNS, CNI plugins, and why Services sometimes fail to route traffic.

Q19. How does Pod-to-Pod networking work in Kubernetes?

Kubernetes requires a flat networking model: every Pod gets its own unique IP address, and every Pod can communicate with every other Pod across the entire cluster without NAT, regardless of which node they're on.

Kubernetes itself does not implement this networking. It defines the requirement and delegates implementation to a CNI (Container Network Interface) plugin: Calico, Cilium, Flannel, or the cloud provider's native CNI (AWS VPC CNI, Azure CNI).

The CNI plugin is responsible for assigning IP addresses to Pods, setting up routes between nodes so Pods on different nodes can reach each other, and enforcing NetworkPolicies if the plugin supports it (not all CNIs do; Calico and Cilium do, basic Flannel does not).

Without a CNI installed, Pods on different nodes have no route to each other. A fresh kubeadm cluster will show nodes stuck in NotReady until a CNI is applied.

Q20. How does Service discovery and DNS work in Kubernetes?

Every Service gets a stable DNS name automatically, resolved through CoreDNS (the default cluster DNS addon). The format is:

bash
<service-name>.<namespace>.svc.cluster.local

Within the same namespace, you can just use the Service name:

javascript
// From a Pod in the same namespace as 'payment-service'
const response = await fetch('http://payment-service/charge');

// From a different namespace, use the fully qualified name
const response = await fetch('http://payment-service.billing.svc.cluster.local/charge');

This is the mechanism that solves the "Pods are ephemeral with changing IPs" problem. Application code never hardcodes a Pod IP; it always calls a Service name, and kube-proxy handles routing that request to one of the currently healthy Pod IPs behind that Service.

Q21. What is a headless Service and when do you use one?

A normal ClusterIP Service load-balances requests across Pods behind a single virtual IP. A headless Service (set clusterIP: None) skips the virtual IP entirely. DNS queries against a headless Service return the individual Pod IPs directly.

yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  clusterIP: None    # headless
  selector:
    app: postgres
  ports:
  - port: 5432
bash
# DNS lookup against a headless Service returns each Pod's IP individually
nslookup postgres.default.svc.cluster.local
# Returns: postgres-0.postgres.default.svc.cluster.local -> 10.244.1.5
#          postgres-1.postgres.default.svc.cluster.local -> 10.244.2.7

Use headless Services with StatefulSets when clients need to address a specific replica directly (connect to the primary database instance, not just any instance) rather than being load-balanced to a random one. This pairs with the StatefulSet concept covered in Q11, and is especially relevant when running databases like MongoDB or Redis in Kubernetes.

Q22. What is a NetworkPolicy and what does it control?

A NetworkPolicy controls which Pods are allowed to communicate with which other Pods, and on which ports. By default, Kubernetes allows all Pods to talk to all other Pods unrestricted. NetworkPolicies let you implement a default-deny, allow-specific-traffic model.

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

This policy says: only Pods labeled app=frontend may send traffic to Pods labeled app=backend, and only on port 8080. Everything else is denied.

⚠ Warning

NetworkPolicies only take effect if your CNI plugin supports them. Calico and Cilium do. Plain Flannel does not enforce NetworkPolicies at all, so applying one with Flannel installed silently does nothing.

Q23. What is an Ingress Controller and why is it required separately from Ingress resources?

An Ingress resource is just a set of routing rules, a declaration of intent. It does nothing on its own. An Ingress Controller is the actual software that watches Ingress resources and implements the routing: it's typically a reverse proxy (NGINX, Traefik, HAProxy) running as a Deployment in your cluster, exposed via a LoadBalancer Service.

bash
# Without an Ingress Controller installed, Ingress resources do nothing
kubectl get ingress
# Will show your rules but ADDRESS column stays empty

Different controllers support different annotation-based features (rate limiting, custom headers, canary deployments) beyond the base Ingress spec. Choosing an Ingress Controller is an architectural decision, not just a default that works the same everywhere.

Q24. What is the role of CNI plugins, and name a few popular ones.

CNI (Container Network Interface) is the standard interface Kubernetes uses to delegate Pod networking setup to a plugin. Without a CNI plugin installed, Pods cannot get assigned IP addresses or communicate across nodes.

Calico: widely used, supports NetworkPolicy enforcement, BGP-based routing, good performance at scale. Common choice for production clusters needing network policy enforcement.

Cilium: eBPF-based, increasingly popular for its performance and observability features. Supports advanced NetworkPolicy, including L7-aware policies (for example, allow only specific HTTP methods/paths).

Flannel: simple, easy to set up, but does not support NetworkPolicy enforcement. Good for simple clusters where network segmentation isn't a requirement.

AWS VPC CNI / Azure CNI: cloud-native plugins that assign Pods real VPC IPs, integrating tightly with the cloud provider's networking and security groups.

Category 4: Storage and Configuration (Q25-Q28)

Storage questions test whether you understand how Kubernetes decouples storage provisioning from consumption and how configuration data flows into containers.

Q25. What is the difference between a PersistentVolume and a PersistentVolumeClaim?

A PersistentVolume (PV) is a piece of storage in the cluster, provisioned either manually by an admin or dynamically via a StorageClass. It exists independently of any Pod's lifecycle.

A PersistentVolumeClaim (PVC) is a request for storage by a user/Pod. It specifies size and access mode, and Kubernetes binds it to a matching PV.

yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: gp3
yaml
# Pod references the PVC, not the PV directly
spec:
  containers:
  - name: app
    volumeMounts:
    - name: data
      mountPath: /var/lib/data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: data-pvc

The PV/PVC split decouples storage provisioning from storage consumption. Developers write PVCs without needing to know the underlying storage infrastructure. Admins (or dynamic provisioners) handle PVs.

Q26. What is a StorageClass and what does dynamic provisioning mean?

A StorageClass defines a "class" of storage with a specific provisioner and parameters (disk type, IOPS, replication). When a PVC requests a StorageClass, Kubernetes automatically provisions a matching PV on demand, rather than requiring an admin to pre-create volumes manually.

yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

volumeBindingMode: WaitForFirstConsumer delays volume creation until a Pod actually uses the PVC, which lets the scheduler factor in storage availability zone constraints when picking a node. Without dynamic provisioning, an admin would need to manually provision a cloud disk for every single PVC requested, which does not scale.

Q27. How are ConfigMaps and Secrets mounted into a Pod as files versus environment variables?

Both mounting strategies are common, and the choice matters for how configuration changes propagate.

As environment variables: simple, but changes to the ConfigMap/Secret do NOT automatically update already-running Pods. You need to restart the Pod to pick up new values.

yaml
envFrom:
- configMapRef:
    name: app-config

As a mounted volume: Kubernetes updates the mounted files automatically when the underlying ConfigMap/Secret changes (after a short propagation delay, typically under a minute), without restarting the Pod. The application needs to watch the file for changes itself if it wants to reload configuration live.

yaml
volumeMounts:
- name: config-volume
  mountPath: /etc/config
volumes:
- name: config-volume
  configMap:
    name: app-config

Use environment variables for simple, rarely-changing settings. Use volume mounts when you want configuration to update without a full Pod restart, or when mounting TLS certificates that rotate periodically.

Q28. How do you inject a database password from a Secret without exposing it in plain text?

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    spec:
      containers:
      - name: api
        image: myapp:latest
        env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: password

The value is injected at runtime by the kubelet, never baked into the image or visible in the Deployment YAML itself (only the Secret name and key are referenced). To inspect it for debugging:

bash
# View the Secret's keys (without decoding values)
kubectl describe secret db-credentials

# Decode a specific value (requires appropriate RBAC permission)
kubectl get secret db-credentials -o jsonpath='{.data.password}' | base64 -d

For stronger security than the Kubernetes-native Secret object, many production setups integrate an external secrets manager (AWS Secrets Manager, HashiCorp Vault) using a tool like External Secrets Operator, which syncs secrets from the external store into native Kubernetes Secrets automatically and supports rotation.

Category 5: Troubleshooting and Scenarios (Q29-Q36)

Troubleshooting questions are where senior candidates separate themselves. These scenario-based questions test whether you can systematically diagnose real production issues, not just recite definitions.

Q29. A Pod is stuck in CrashLoopBackOff. Walk through how you'd debug it.

CrashLoopBackOff is a status, not a root cause. It means the container starts, exits, Kubernetes restarts it, it exits again, and Kubernetes is now waiting an exponentially increasing delay (10s, 20s, 40s, up to 300s) before trying again.

bash
# Step 1: check status and restart count
kubectl get pods
# High RESTARTS count confirms the loop

# Step 2: describe the pod (check State, Last State, and Events)
kubectl describe pod <pod-name>
# Look for: Last State: Terminated, Reason, Exit Code

# Step 3: check current logs (often empty if the container restarted recently)
kubectl logs <pod-name>

# Step 4: check logs from the PREVIOUS crashed instance (this is the key step)
kubectl logs <pod-name> --previous

# Step 5: if the pod is multi-container, specify which one
kubectl logs <pod-name> -c <container-name> --previous

💡 The --previous flag

The --previous flag is what most candidates forget. Without it, you're often looking at empty logs from a container that just started a few seconds ago, not the crash that actually happened.

Common root causes, in order of frequency: missing or malformed environment variables/Secrets, a database or dependency that isn't reachable yet, an application bug causing an unhandled exception on startup, or a memory limit set too low (OOMKilled).

Q30. What do exit codes 0, 1, 137, and 143 mean in Kubernetes?

Exit codes communicate why a container stopped. Codes 0 through 125 are application-defined. Codes above 128 typically mean the process was terminated by a signal, following the convention: 128 + signal number.

Exit CodeMeaningCommon Cause
0Clean, successful exitIntentional, no action needed
1Application errorUnhandled exception, application-level failure
127Command not foundTypo in Dockerfile CMD/ENTRYPOINT, missing binary
137SIGKILL (128 + 9)OOMKilled, or manual kill -9
139SIGSEGV (128 + 11)Segmentation fault, memory access violation
143SIGTERM (128 + 15)Graceful shutdown request (normal during deploys)

The 137 vs 143 distinction is a very common interview question. 143 means the container received SIGTERM and shut down gracefully within the grace period. 137 means it either ignored SIGTERM and got force-killed with SIGKILL after the grace period expired, or was killed immediately by the OOM killer.

bash
# Confirm OOMKilled specifically
kubectl describe pod <pod-name> | grep -A 5 "Last State"
# Last State: Terminated
#   Reason: OOMKilled
#   Exit Code: 137

Q31. A Pod shows ImagePullBackOff. What are the possible causes and how do you fix each?

ImagePullBackOff means Kubernetes cannot pull the container image specified in the Pod spec, and is backing off between retries.

bash
kubectl describe pod <pod-name>
# Check the Events section for the specific pull error message

Wrong image name or tag: typo in the image reference, or the tag doesn't exist in the registry. Fix: verify the exact image name and tag.

Private registry without credentials: the cluster has no way to authenticate to a private registry. Fix: create an imagePullSecret and reference it in the Pod spec.

bash
kubectl create secret docker-registry regcred \
  --docker-server=<registry-url> \
  --docker-username=<username> \
  --docker-password=<password>
yaml
spec:
  imagePullSecrets:
  - name: regcred
  containers:
  - name: app
    image: myregistry.com/myapp:1.0

Network or DNS issue reaching the registry: the node cannot resolve or reach the registry endpoint. Fix: check node network policies, proxy configuration, or registry status.

Rate limiting: public registries like Docker Hub rate-limit anonymous pulls. Fix: authenticate even for public images, or use a registry mirror.

Q32. A Pod is stuck in Pending state. What would you check?

Pending means the Pod has been accepted by the API server but has not yet been scheduled to a node, or the node cannot start it.

bash
kubectl describe pod <pod-name>
# The Events section almost always explains why

Insufficient cluster resources: no node has enough free CPU/memory to satisfy the Pod's resource requests.

bash
kubectl describe nodes | grep -A 5 "Allocated resources"

Fix: scale the cluster, free up resources, or lower the Pod's requests.

Unsatisfiable scheduling constraints: nodeSelector, affinity rules, or taints/tolerations that no available node matches.

bash
kubectl get nodes --show-labels   # check labels match nodeSelector
kubectl describe node <node> | grep Taints

PersistentVolumeClaim not bound: the Pod references a PVC that has no matching PV and dynamic provisioning failed or is misconfigured.

bash
kubectl get pvc
# Status should be "Bound", not "Pending"

No nodes available at all: cluster autoscaler is still spinning up capacity, or all nodes are NotReady.

Q33. How would you debug a Service that isn't routing traffic to its Pods?

This is one of the most common real-world scenarios. Here is a systematic approach:

bash
# Step 1: confirm the Service has registered endpoints
kubectl get endpoints <service-name>
# If empty, the Service's selector is not matching any Pod's labels

# Step 2: compare the Service selector to the Pod's actual labels
kubectl get service <service-name> -o yaml | grep -A 3 selector
kubectl get pods --show-labels

# Step 3: confirm the Pods are actually Ready
kubectl get pods -l app=<your-app-label>
# READY column should show 1/1, not 0/1
# A failing readinessProbe removes the Pod from Service endpoints even if it's Running

# Step 4: test connectivity directly to a Pod, bypassing the Service
kubectl port-forward pod/<pod-name> 8080:8080
curl localhost:8080/health

# Step 5: test from inside another Pod in the cluster
kubectl run debug --image=busybox -it --rm -- sh
wget -O- http://<service-name>.<namespace>.svc.cluster.local

The single most common root cause: a typo or mismatch between the Service's selector and the Pod's labels, resulting in zero matched endpoints. The second most common: a failing readiness probe silently removing otherwise-healthy Pods from rotation.

Q34. What exactly happens when you run kubectl delete pod?

This is a deceptively deep question that tests whether you understand the graceful termination sequence.

  1. The API server marks the Pod for deletion and sets a deletionTimestamp.
  2. Kubernetes sends SIGTERM to the container's main process and starts the terminationGracePeriodSeconds countdown (default 30 seconds).
  3. Simultaneously, the Pod is removed from any Service's endpoints, so new traffic stops being routed to it almost immediately.
  4. The application should use this grace period to finish in-flight requests and shut down cleanly.
  5. If the process has not exited once the grace period expires, Kubernetes sends SIGKILL, forcefully terminating it (this produces exit code 137, not 143).
  6. If the Pod was managed by a Deployment/ReplicaSet, the controller immediately creates a replacement Pod to maintain the desired replica count.
yaml
spec:
  terminationGracePeriodSeconds: 60   # override the 30s default if your app
                                        # needs longer to drain connections

If you manually created a standalone Pod (not managed by a Deployment), deleting it is permanent. No controller exists to recreate it.

Q35. Multiple services in your cluster suddenly start failing simultaneously. How do you approach this?

This tests incident response methodology, not just Kubernetes trivia. Start broad before going deep. A simultaneous multi-service failure usually points to a shared dependency, not N independent bugs.

bash
# Check overall cluster health first
kubectl get nodes
# Are any nodes NotReady? That alone explains multi-service failures.

kubectl get events --all-namespaces --sort-by='.lastTimestamp' | tail -30
# Recent cluster-wide events often reveal the trigger

# Check for resource exhaustion across the cluster
kubectl top nodes
kubectl describe nodes | grep -A 5 "Conditions"

Likely root causes for simultaneous failures: a shared dependency went down (database, DNS, a common upstream API), a node or availability zone outage, a NetworkPolicy or DNS change that broke connectivity cluster-wide, an etcd or API server issue affecting the whole control plane, or a recent cluster-wide change (a Helm upgrade, a CNI update, a cert rotation) that landed just before the failures started.

The instinct to demonstrate here: check what changed recently and what is shared across the failing services, rather than debugging each failing service in isolation as if they were unrelated.

Q36. How would you debug unexpected resource exhaustion on a node?

bash
# Identify which nodes are under pressure
kubectl top nodes

# Identify which Pods are consuming the most on that node
kubectl top pods --all-namespaces --sort-by=memory
kubectl top pods --all-namespaces --sort-by=cpu

# Check what's actually scheduled on the affected node
kubectl get pods --all-namespaces -o wide | grep <node-name>

# Review resource requests/limits cluster-wide for over-commitment
kubectl describe node <node-name> | grep -A 10 "Allocated resources"

Things to look for: a Pod with no resource limits set (BestEffort QoS) consuming far more than expected, a memory leak in a long-running Pod that has been up for days, too many Pods scheduled onto one node because of poor anti-affinity configuration, or a DaemonSet (running on every node) that recently got more resource-hungry after an update.

Longer-term fix: set sane resource requests and limits everywhere, use the Vertical Pod Autoscaler in recommendation mode to right-size requests based on actual historical usage, and configure pod anti-affinity for resource-heavy workloads so they spread across nodes rather than clustering.

Category 6: Security and Production Practices (Q37-Q40)

Security questions round out the interview with topics that matter in production clusters: access control, secrets management, and packaging. Expect at least one question from this category in any senior-level interview.

Q37. What is RBAC in Kubernetes and how do Roles and RoleBindings work together?

RBAC (Role-Based Access Control) controls who can perform which actions on which resources in the cluster. Two object types work together.

Role (or ClusterRole for cluster-wide scope): defines a set of permissions, specifying which verbs (get, list, create, delete) are allowed on which resources, within a namespace (Role) or cluster-wide (ClusterRole).

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

RoleBinding (or ClusterRoleBinding): grants the permissions defined in a Role to a specific user, group, or ServiceAccount.

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods-binding
  namespace: production
subjects:
- kind: User
  name: jane@example.com
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

The principle: a Role defines WHAT is allowed, a RoleBinding defines WHO gets that permission. Follow least privilege: grant the narrowest Role that lets someone do their job, scoped to a namespace whenever possible rather than cluster-wide.

Q38. Are Kubernetes Secrets encrypted by default?

No, and this catches many people off guard. By default, Secret values are only base64-ENCODED (not encrypted) and stored as plain text within etcd. Anyone with access to etcd, or with sufficient RBAC permissions to read Secret objects via the API, can trivially retrieve the real value.

To actually encrypt Secrets at rest, you must explicitly configure encryption at the API server level using an EncryptionConfiguration:

yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}

Managed Kubernetes services (EKS, GKE, AKS) typically offer this as a configurable option (often integrated with the cloud provider's KMS) rather than something you configure on raw etcd yourself. Always verify whether encryption at rest is actually enabled, rather than assuming Kubernetes secrets are secure by default just because the name says "Secret."

Q39. What are Pod Security Standards and what do they replace?

Pod Security Standards (PSS) define three security policy levels that restrict what a Pod is allowed to do at the cluster or namespace level. They replaced PodSecurityPolicy (PSP), which was deprecated in 1.21 and removed in 1.25.

Privileged: unrestricted, allows known privilege escalations. Use only for trusted, infrastructure-level workloads (CNI plugins, monitoring agents that need host access).

Baseline: prevents known privilege escalations while allowing default Pod configurations to work. Disallows things like running with the host network namespace, host PID namespace, or privileged containers.

Restricted: heavily restricted, follows current Pod hardening best practices. Requires running as non-root, disallows privilege escalation, and drops all Linux capabilities by default.

yaml
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Applying these labels to a namespace enforces the policy at admission time, rejecting any Pod spec that violates the chosen level.

Q40. What is Helm and when should a team adopt it?

Helm is the most widely used package manager for Kubernetes. A Helm chart packages a complete application's Kubernetes resources (Deployments, Services, ConfigMaps, Ingress, etc.) into a single, versioned, reusable template.

bash
# Create a new chart scaffold
helm create my-app

# Install a chart, providing custom values
helm install my-release ./my-app --values production-values.yaml

# Upgrade an existing release
helm upgrade my-release ./my-app --set image.tag=2.1.0

# Roll back to a previous release
helm rollback my-release 3

A chart's values.yaml lets you parameterize environment-specific settings (replica count, image tag, resource limits) without duplicating YAML files per environment:

yaml
# values.yaml
replicaCount: 3
image:
  repository: myapp
  tag: "1.0"
resources:
  limits:
    memory: "512Mi"

Adopt Helm when you're deploying the same application across multiple environments with different configuration, you want versioned, rollback-able releases instead of raw kubectl apply, or you're installing third-party software (most popular open-source Kubernetes tools ship official Helm charts as the primary installation method).

Skip Helm when you have a single simple application with one environment and raw YAML manifests are still easy to manage by hand. Helm adds real value once you have multiple environments or multiple similar deployments that would otherwise mean copy-pasting YAML.

Quick Reference: All 40 Questions at a Glance

Q#QuestionCore Concept
Q1What is Kubernetes and why not just DockerOrchestration, self-healing, scaling, rolling updates
Q2Control plane componentsAPI server, etcd, scheduler, controller manager
Q3Worker node componentskubelet, kube-proxy, container runtime
Q4What is a PodSmallest unit, shared network namespace, ephemeral
Q5ReplicaSet vs DeploymentReplica count vs rolling updates/rollbacks
Q6What is a NamespaceMulti-tenancy, scoping, RBAC + ResourceQuota pairing
Q7Labels and SelectorsKey-value tags, how controllers find their Pods
Q8ConfigMap vs SecretNon-sensitive vs sensitive, base64 is not encryption
Q9ClusterIP vs NodePort vs LoadBalancerInternal vs node-exposed vs cloud LB per service
Q10Ingress vs LoadBalancer ServiceSingle entry point, path/host routing, cost savings
Q11Deployment vs StatefulSet vs DaemonSetStateless vs stable identity vs one-per-node
Q12Rolling updates and rollbacksmaxSurge/maxUnavailable, rollout undo
Q13HPA vs VPAScale replica count vs scale resource allocation
Q14Resource requests/limits and QoS classesGuaranteed, Burstable, BestEffort eviction order
Q15Job vs CronJobRun-to-completion vs scheduled recurring
Q16Init containersRun-before-main, strict ordering, dependency waits
Q17Liveness vs Readiness vs Startup probesRestart vs remove-from-service vs delay-other-probes
Q18Pod Disruption BudgetLimits voluntary disruption, minAvailable
Q19Pod-to-Pod networkingFlat network model, CNI plugin responsibility
Q20Service discovery and DNSCoreDNS, service-name.namespace.svc.cluster.local
Q21Headless ServiceClusterIP: None, direct Pod IP resolution
Q22NetworkPolicyDefault-allow-all without it, CNI support required
Q23Ingress ControllerImplements Ingress rules, NGINX/Traefik/HAProxy
Q24CNI pluginsCalico, Cilium, Flannel, cloud-native CNIs
Q25PersistentVolume vs PersistentVolumeClaimStorage resource vs storage request
Q26StorageClass and dynamic provisioningOn-demand PV creation, WaitForFirstConsumer
Q27ConfigMap/Secret as env vars vs mounted filesRestart required vs live-update propagation
Q28Injecting Secrets safelysecretKeyRef, base64 decode, external secrets managers
Q29Debugging CrashLoopBackOffdescribe, logs --previous, exponential backoff
Q30Exit codes 0, 1, 137, 143Signal math (128+N), OOMKilled vs graceful SIGTERM
Q31Debugging ImagePullBackOffWrong tag, missing imagePullSecret, rate limiting
Q32Debugging Pod stuck PendingResource shortage, scheduling constraints, PVC unbound
Q33Debugging a Service with no trafficSelector/label mismatch, failing readiness probe
Q34What happens on kubectl delete podSIGTERM, grace period, endpoint removal, SIGKILL
Q35Multiple services failing simultaneouslyShared dependency, node/AZ outage, recent change
Q36Debugging node resource exhaustionkubectl top, BestEffort QoS, anti-affinity fix
Q37RBAC: Roles and RoleBindingsWhat is allowed vs who gets it, least privilege
Q38Are Secrets encrypted by defaultNo, base64 only, EncryptionConfiguration required
Q39Pod Security StandardsPrivileged, Baseline, Restricted; replaced PSP
Q40Helm and when to adopt itChart packaging, values.yaml, versioned rollbacks

💡 Five things to memorize before you walk in

The CrashLoopBackOff debugging sequence: describe, logs --previous, check exit code (Q29). Exit code 137 vs 143: OOMKilled/SIGKILL vs graceful SIGTERM (Q30). The three Service types and when to use Ingress instead of LoadBalancer (Q9, Q10). Deployment vs StatefulSet: stateless clones vs stable identity with persistent storage (Q11). RBAC least privilege: Role defines what, RoleBinding defines who, always namespace-scoped when possible (Q37).

Frequently Asked Questions

What level of Kubernetes knowledge do these 40 questions target?

This guide spans mid-level fundamentals through senior production expertise. Questions 1 through 18 (architecture, workloads, scaling) cover the conceptual foundation that any candidate deploying to Kubernetes should know confidently.

Questions 19 through 28 (networking, storage) go deeper into operational knowledge that separates mid-level from senior candidates. Questions 29 through 36 (troubleshooting scenarios) are the hardest section: they test real-world debugging instinct with specific kubectl command sequences, which is what senior and staff-level interviews probe in 2026.

Do I need CKA or CKAD certification to answer these interview questions?

No. CKA (Certified Kubernetes Administrator) and CKAD (Certified Kubernetes Application Developer) are valuable credentials, but they are not required to answer these questions well. The questions here focus on concepts and troubleshooting instinct, which you can build through hands-on experience with any Kubernetes cluster.

That said, if you can answer these 40 questions confidently with real examples, you are well-positioned for CKA/CKAD exam preparation. The overlap is significant, especially in the architecture (Q1-Q10), workloads (Q11-Q18), and troubleshooting (Q29-Q36) categories.

How can I practice Kubernetes locally before an interview?

You can run a full single-node Kubernetes cluster on your laptop using several free tools. The most popular options for local development:

bash
# Option 1: minikube (most widely used for learning)
minikube start
kubectl get nodes   # single-node cluster running locally

# Option 2: kind (Kubernetes in Docker, great for CI and testing)
kind create cluster --name practice
kubectl cluster-info

# Option 3: Docker Desktop includes a built-in Kubernetes toggle
# Enable it in Docker Desktop > Settings > Kubernetes > Enable Kubernetes

Once your local cluster is running, practice every kubectl command from the troubleshooting section (Q29-Q36). Deploy a simple app, break it intentionally (wrong image tag, missing Secret, resource limits too low), and debug it using the exact sequences described in this guide.

What is the difference between kubectl apply and kubectl create?

kubectl create is imperative: it creates a resource and fails if the resource already exists. kubectl apply is declarative: it creates the resource if it doesn't exist, or updates it if it does, by comparing the desired state in your YAML file to the current state in the cluster.

bash
# Imperative: fails on second run because the resource exists
kubectl create -f deployment.yaml
# Error: deployments.apps "web" already exists

# Declarative: creates on first run, updates on subsequent runs
kubectl apply -f deployment.yaml
# deployment.apps/web created   (first time)
# deployment.apps/web configured (subsequent times)

In production workflows, kubectl apply is the standard because it supports the GitOps pattern: store manifests in Git, and apply them to the cluster on every change. kubectl create is useful for one-off operations like creating a Secret or Namespace interactively.

How do I check which Pods are running on a specific node?

Use the -o wide flag to see which node each Pod is scheduled on, then filter by node name:

bash
# List all Pods with their node assignments
kubectl get pods --all-namespaces -o wide

# Filter to a specific node
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=worker-node-1

# See resource usage per Pod on that node
kubectl top pods --all-namespaces --sort-by=cpu | head -20

This is a common first step in the node resource exhaustion debugging flow described in Q36. Combine it with kubectl describe node to see allocated vs available resources.

What Kubernetes topics are most commonly asked in 2026 interviews?

Based on the frequency these topics appear in interview prep communities and job postings, the most commonly asked areas in 2026 are:

  1. Troubleshooting scenarios (CrashLoopBackOff, ImagePullBackOff, Pending Pods): asked in nearly every Kubernetes interview at mid-level and above.
  2. Deployment vs StatefulSet vs DaemonSet: the most common "compare these" question.
  3. Service types and Ingress: how traffic flows into and within the cluster.
  4. Resource requests/limits and QoS classes: especially the OOMKilled scenario and exit code 137.
  5. RBAC fundamentals: Roles, RoleBindings, least privilege.
  6. Probes (liveness, readiness, startup): understanding when Kubernetes restarts vs removes from traffic.

If you have limited prep time, focus on categories 1 and 5 (architecture and troubleshooting) first. These cover the questions you are most likely to face. For broader backend interview preparation, combine this guide with our NestJS interview questions for application-layer coverage.

Zeeshan Tofiq

Zeeshan Tofiq

Full Stack Developer

Full stack developer with over 6 years of experience building production applications. Writes practical guides on JavaScript, TypeScript, React, Node.js, and cloud infrastructure. Focused on helping developers solve real problems with clean, maintainable code.

Enjoyed this article?

Get practical dev guides, tool updates, and new articles delivered straight to your inbox. No spam, unsubscribe anytime.

Related Articles

devops

50 Cloud & DevOps Interview Questions and Answers (2026)

50 cloud and DevOps interview questions covering AWS Lambda, Docker, Microservices, API Gateway, S3, serverless, and Azure Entra ID. With code examples.

Jun 15, 2026·41 min read
nodejs

30 NestJS Interview Questions and Answers (2026)

30 NestJS interview questions with full answers: modules, DI, guards, pipes, interceptors, JWT auth, microservices, and testing. Updated for 2026.

Jun 8, 2026·24 min read
databases

42 NoSQL Database Interview Questions and Answers (2026)

42 NoSQL interview questions covering MongoDB, Redis, and DynamoDB: aggregation pipelines, data structures, GSI vs LSI, and CAP theorem. Updated for 2026.

Jun 10, 2026·37 min read

On this page

  • Category 1: Architecture and Core Concepts (Q1-Q10)
  • Q1. What is Kubernetes and why do teams need it instead of just Docker?
  • Q2. What are the core components of the Kubernetes control plane?
  • Q3. What are the core components running on a worker node?
  • Q4. What is a Pod and why doesn't Kubernetes manage containers directly?
  • Q5. What is the difference between a ReplicaSet and a Deployment?
  • Q6. What is a Namespace and when do you use one?
  • Q7. What are Labels and Selectors and how do they work together?
  • Q8. What is the difference between a ConfigMap and a Secret?
  • Q9. What is the difference between ClusterIP, NodePort, and LoadBalancer Services?
  • Q10. What is Ingress and how does it differ from a LoadBalancer Service?
  • Category 2: Workloads and Scaling (Q11-Q18)
  • Q11. What is the difference between a Deployment, a StatefulSet, and a DaemonSet?
  • Q12. How do rolling updates and rollbacks work in a Deployment?
  • Q13. What is the difference between Horizontal Pod Autoscaler and Vertical Pod Autoscaler?
  • Q14. What are resource requests and limits, and what is a QoS class?
  • Q15. What is the difference between a Job and a CronJob?
  • Q16. What is an init container and why would you use one?
  • Q17. What is the difference between liveness, readiness, and startup probes?
  • Q18. What is a Pod Disruption Budget and when do you need one?
  • Category 3: Networking (Q19-Q24)
  • Q19. How does Pod-to-Pod networking work in Kubernetes?
  • Q20. How does Service discovery and DNS work in Kubernetes?
  • Q21. What is a headless Service and when do you use one?
  • Q22. What is a NetworkPolicy and what does it control?
  • Q23. What is an Ingress Controller and why is it required separately from Ingress resources?
  • Q24. What is the role of CNI plugins, and name a few popular ones.
  • Category 4: Storage and Configuration (Q25-Q28)
  • Q25. What is the difference between a PersistentVolume and a PersistentVolumeClaim?
  • Q26. What is a StorageClass and what does dynamic provisioning mean?
  • Q27. How are ConfigMaps and Secrets mounted into a Pod as files versus environment variables?
  • Q28. How do you inject a database password from a Secret without exposing it in plain text?
  • Category 5: Troubleshooting and Scenarios (Q29-Q36)
  • Q29. A Pod is stuck in CrashLoopBackOff. Walk through how you'd debug it.
  • Q30. What do exit codes 0, 1, 137, and 143 mean in Kubernetes?
  • Q31. A Pod shows ImagePullBackOff. What are the possible causes and how do you fix each?
  • Q32. A Pod is stuck in Pending state. What would you check?
  • Q33. How would you debug a Service that isn't routing traffic to its Pods?
  • Q34. What exactly happens when you run kubectl delete pod?
  • Q35. Multiple services in your cluster suddenly start failing simultaneously. How do you approach this?
  • Q36. How would you debug unexpected resource exhaustion on a node?
  • Category 6: Security and Production Practices (Q37-Q40)
  • Q37. What is RBAC in Kubernetes and how do Roles and RoleBindings work together?
  • Q38. Are Kubernetes Secrets encrypted by default?
  • Q39. What are Pod Security Standards and what do they replace?
  • Q40. What is Helm and when should a team adopt it?
  • Quick Reference: All 40 Questions at a Glance
  • Frequently Asked Questions
Advertisement