devops41 min read

50 Cloud & DevOps Interview Questions and Answers (2026)

50 cloud and DevOps interview questions covering AWS Lambda, Docker, Microservices, API Gateway, S3, serverless, and Azure Entra ID. With code examples.

Zeeshan Tofiq

June 15, 2026

On this page

Cloud and DevOps knowledge is no longer optional for backend engineers. AWS Lambda, Docker, and microservices show up in job requirements for roles that were purely application-focused a few years ago. Azure Entra ID is now a standard topic for any role touching enterprise Microsoft environments.

These 50 questions cover what interviewers are actually asking in 2026 across six cloud and DevOps topics: serverless and AWS Lambda, API Gateway, S3, Docker, microservices, and Azure Entra ID. Answers are written to be said out loud in an interview: specific, grounded in real behavior, with working examples where they help. If you're also brushing up on the application side, our Node.js interview questions guide pairs well with the Lambda and serverless sections here, since most serverless backends in 2026 are written in Node.js.

Category 1: Serverless and AWS Lambda (Q1-Q8)

Serverless and Lambda questions test whether you understand the execution model behind the trend, not just the marketing pitch. Expect questions about cold starts, concurrency, and the operational tradeoffs of going serverless.

Q1. What is serverless computing and what problem does it solve?

Serverless computing is a cloud execution model where the cloud provider manages the server infrastructure entirely. You write and deploy code (a function), define what triggers it, and the provider handles provisioning, scaling, patching, and availability. You pay only for actual execution time, not for idle capacity.

The problem it solves: traditional server deployments require you to provision capacity in advance. A VM or container running 24/7 costs money even at 3am when no traffic arrives. Auto-scaling helps, but still requires managing the scaling configuration. Serverless eliminates the operational overhead entirely.

Stateless execution: functions do not persist state between invocations.
Cold starts: functions take extra time on first invocation after an idle period.
Maximum execution time: AWS Lambda caps at 15 minutes per invocation.
Best fit: event-driven tasks, API backends, data processing, scheduled jobs.

When NOT to use serverless:

Long-running processes like video encoding or ML training.
Applications needing persistent connections, such as WebSocket servers (unless using the Lambda WebSocket API via API Gateway).
Workloads that run continuously at high volume, where an always-on EC2 instance or container may be cheaper.

Q2. How does AWS Lambda work? Explain the execution model.

Lambda runs your function code inside an execution environment: a managed container that AWS provisions, runs, and destroys. You never see the server.

Execution flow:

A trigger event arrives (HTTP request via API Gateway, S3 upload, SQS message, EventBridge rule, etc.).
Lambda allocates an execution environment with your chosen memory and CPU.
AWS downloads your deployment package (code plus dependencies), initializes the runtime, and runs your init code outside the handler (once per environment).
Lambda calls your handler function with the event and context objects.
Your handler processes the event and returns a response.
The execution environment stays alive for a period (typically 5 to 15 minutes) to handle subsequent invocations (warm start).
After inactivity, the environment is frozen and eventually destroyed.

javascript — index.js

// Node.js Lambda handler
exports.handler = async (event, context) => {
  // event: the trigger payload (HTTP request, S3 event, etc.)
  // context: runtime info (function name, remaining time, etc.)

  console.log("Event:", JSON.stringify(event));
  console.log("Remaining time:", context.getRemainingTimeInMillis(), "ms");

  // Business logic here
  const result = await processEvent(event);

  // Return response (format depends on trigger type)
  return {
    statusCode: 200,
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(result),
  };
};

// Code OUTSIDE the handler runs once per execution environment (cold start)
// Good for: database connections, SDK clients, config loading
const dbClient = new DatabaseClient({ host: process.env.DB_HOST });

Lambda supports Node.js, Python, Java, Go, Ruby, .NET, and custom runtimes via the Runtime API.

Q3. What is a Lambda cold start and how do you reduce it?

A cold start occurs when Lambda must create a new execution environment before running your function. This adds 100ms to several seconds of latency before your handler even starts. Warm starts reuse an existing environment and run in milliseconds.

Cold starts happen when:

The function has not been invoked recently (environment was destroyed).
Traffic spikes cause more concurrent executions than existing environments.
You deploy a new function version.
The function runs inside a VPC (adds latency for ENI attachment, now much improved with Hyperplane ENI, but VPC still adds some overhead).

Cold start mitigation strategies:

1. Provisioned Concurrency: keeps N execution environments pre-initialized and always warm. Eliminates cold starts for those instances entirely. Costs money even when idle.

bash

# Set provisioned concurrency on a specific version or alias
aws lambda put-provisioned-concurrency-config \
  --function-name my-api \
  --qualifier prod \
  --provisioned-concurrent-executions 10

2. SnapStart (Java only): pre-initializes the Java runtime snapshot at publish time. Reduces Java cold starts from seconds to sub-100ms.
3. Keep deployment packages small: a smaller zip means a faster download and faster cold start. Remove unused dependencies and use Lambda Layers for shared libs.
4. Choose fast runtimes: Node.js and Python have the fastest cold starts. Java and .NET are slowest. Go is also fast.
5. Move init code outside the handler: database connections, SDK clients, and config loading done outside the handler run once per environment, not per invocation.
6. Schedule warm-up pings: invoke the function every 5 minutes via EventBridge to prevent the environment from going cold. Works but is imprecise and not reliable for burst traffic.

Q4. What is Lambda concurrency and what are the two types?

Concurrency is the number of Lambda function instances executing at the same time. By default, Lambda can run up to 1,000 concurrent executions per account per region (a soft limit that can be raised).

Unreserved concurrency is the pool shared by all functions in your account. If one function consumes all available concurrency during a spike, other functions in the same account get throttled.

Reserved concurrency is a hard allocation for a specific function. It has two effects: it guarantees that function can always scale up to the reserved amount, and it hard-caps the function at that amount (it cannot use more even if the pool allows).

bash

# Reserve 100 concurrent executions for the payments function
aws lambda put-function-concurrency \
  --function-name payment-processor \
  --reserved-concurrent-executions 100

# Remove reserved concurrency (return to unreserved pool)
aws lambda delete-function-concurrency \
  --function-name payment-processor

When Lambda exceeds its concurrency limit it throttles, returning a 429 TooManyRequestsException for synchronous invocations. For asynchronous invocations, Lambda retries automatically.

Provisioned concurrency (different from reserved) pre-initializes environments. Reserved concurrency sets a hard maximum. You can combine both: set reserved concurrency to 50 and provisioned to 10 (10 always warm, up to 50 total).

Q5. What are Lambda Layers and when do you use them?

A Lambda Layer is a .zip archive containing libraries, a custom runtime, configuration, or other dependencies. You attach up to 5 layers to a function. Lambda merges the layer contents into the /opt directory at runtime.

Use layers for:

Sharing common libraries across multiple functions (avoid bundling the same 200MB dependency in every function zip).
Keeping deployment packages small for faster cold starts and easier deployments.
Distributing internal utilities or helper code across your team.
Custom runtimes for unsupported languages.

bash

# Create a layer from a zip containing Node.js dependencies
zip -r layer.zip nodejs/
aws lambda publish-layer-version \
  --layer-name my-shared-libs \
  --description "Shared utilities and database client" \
  --zip-file fileb://layer.zip \
  --compatible-runtimes nodejs20.x nodejs22.x

# Attach the layer to a function
aws lambda update-function-configuration \
  --function-name my-api \
  --layers arn:aws:lambda:us-east-1:123456789:layer:my-shared-libs:3

In your function, access layer contents at /opt:

javascript

const { sharedUtil } = require("/opt/nodejs/shared-util");

AWS also publishes public layers, such as the AWS X-Ray SDK layer and the Lambda Powertools layer for Python and Node.js.

Q6. What are the two Lambda invocation types and how does error handling differ?

Synchronous invocation: the caller waits for Lambda to finish and return a response. API Gateway, ALB, and direct SDK calls use synchronous invocation. If the function throws, the error is returned to the caller immediately. Lambda does NOT automatically retry synchronous failures.

javascript

// Direct synchronous invocation from SDK
const result = await lambda.invoke({
  FunctionName: "my-function",
  InvocationType: "RequestResponse", // synchronous
  Payload: JSON.stringify({ key: "value" }),
}).promise();

Asynchronous invocation: the caller sends the event and gets a 202 Accepted response immediately. Lambda processes the event in the background. S3 event notifications, SNS, EventBridge, and SES use async invocation. Lambda automatically retries failed async invocations up to 2 times (3 total attempts) with delays between retries.

javascript

// Asynchronous invocation
await lambda.invoke({
  FunctionName: "my-function",
  InvocationType: "Event", // asynchronous
  Payload: JSON.stringify({ key: "value" }),
}).promise();
// Returns immediately with 202

Dead Letter Queue (DLQ): for async invocations that fail after all retries, configure a DLQ (SQS queue or SNS topic) to receive the failed event. Inspect DLQ messages to diagnose persistent failures.

bash

aws lambda update-function-configuration \
  --function-name my-function \
  --dead-letter-config TargetArn=arn:aws:sqs:us-east-1:123:my-dlq

Lambda Destinations (the newer approach): route invocation results to SQS, SNS, EventBridge, or another Lambda function on success OR failure. More flexible than DLQ because it captures both success and failure events.

Q7. What are Lambda's key limits and how do you work around them?

Limit	Value	Workaround
Max execution duration	15 minutes	Use Step Functions for longer workflows
Max memory	10,240 MB (10 GB)	Break into smaller functions
Deployment package (zip)	50 MB direct, 250 MB unzipped	Use Lambda Layers, container images
Container image size	10 GB	Fine for most use cases
/tmp storage	10 GB (increased in 2022)	Use S3 for larger temp files
Concurrency (default)	1,000 per region	Request limit increase
Environment variables	4 KB total	Use SSM Parameter Store or Secrets Manager
Payload (sync invocation)	6 MB request, 6 MB response	Stream response, use S3 for large payloads

VPC-specific behavior: Lambda functions inside a VPC can access private resources (RDS, ElastiCache) but cannot access the public internet unless routed through a NAT Gateway. Always add a NAT Gateway if VPC Lambda functions need outbound internet access.

Q8. How do you monitor and debug AWS Lambda in production?

CloudWatch Logs: every console.log(), print(), or fmt.Println() call from your handler is captured automatically. Each function gets its own log group (/aws/lambda/function-name). Use structured logging (JSON) for better queryability.

javascript

// Structured logging for CloudWatch Insights queries
console.log(JSON.stringify({
  level: "INFO",
  message: "Order processed",
  orderId: event.orderId,
  duration: Date.now() - startTime,
  userId: event.userId,
}));

CloudWatch Metrics: Lambda automatically publishes:

Invocations: total calls.
Duration: execution time (p50, p95, p99).
Errors: function-level errors.
Throttles: invocations rejected due to concurrency limits.
ConcurrentExecutions: peak concurrent instances.

AWS X-Ray: distributed tracing. Add the X-Ray SDK to trace downstream calls (DynamoDB, S3, HTTP calls) and view flame graphs of where time is spent.

javascript

const AWSXRay = require("aws-xray-sdk-core");
const AWS = AWSXRay.captureAWS(require("aws-sdk"));
// All AWS SDK calls now appear in X-Ray traces

Lambda Powertools (Node.js / Python): an AWS-maintained utility library adding structured logging, tracing, and metrics with minimal code.

javascript

const { Logger } = require("@aws-lambda-powertools/logger");
const logger = new Logger({ serviceName: "order-service" });
logger.info("Processing order", { orderId: event.orderId });

Category 2: AWS API Gateway (Q9-Q14)

API Gateway questions test whether you know which API type and integration to reach for, and how authorization and throttling actually work under the hood.

Q9. What is AWS API Gateway and what are the three API types?

API Gateway is a fully managed service for creating, publishing, securing, and monitoring APIs at any scale. It acts as the front door for applications to access backend services: Lambda functions, EC2, ECS, or any HTTP endpoint.

REST API: the original, most feature-rich option. Supports request/response transformation, request validation, usage plans, API keys, custom domain names, caching, and fine-grained IAM permissions. More expensive and complex to configure.

HTTP API: launched in 2020 as a simpler, cheaper alternative. Supports Lambda and HTTP integrations, JWT authorizers, and OIDC/OAuth 2.0. Up to 71% cheaper than REST API. Lacks some REST API features (no built-in response transformation, no usage plans). Best for most modern serverless APIs.

WebSocket API: for two-way stateful communication. Maintains persistent connections. Used for real-time chat, live dashboards, and multiplayer games. Supports $connect, $disconnect, and custom route keys.

Q10. What integration types does API Gateway support?

API Gateway can route requests to different backends depending on the integration type.

Lambda Proxy: the most common. API Gateway passes the full HTTP request (headers, query params, body, path params) to Lambda as a structured event. Lambda returns a structured response object. Zero request/response transformation by API Gateway.

json

// Lambda receives this event from API Gateway proxy integration
{
  "httpMethod": "POST",
  "path": "/users",
  "headers": { "Content-Type": "application/json", "Authorization": "Bearer ..." },
  "queryStringParameters": { "include": "profile" },
  "body": "{\"name\":\"Alice\",\"email\":\"alice@example.com\"}",
  "requestContext": { "requestId": "abc-123", "stage": "prod" }
}

HTTP: proxy the request to any publicly routable HTTP endpoint. Useful for putting API Gateway in front of an existing server or third-party service.
AWS Service: directly call an AWS service action (SQS SendMessage, DynamoDB PutItem) without going through Lambda. Reduces latency and cost by removing the Lambda layer.
Mock: return a hardcoded response from API Gateway itself. Useful for mocking endpoints during development or returning maintenance-mode responses.

Q11. How does API Gateway handle authorization?

Three built-in authorization mechanisms:

IAM Authorization: requests must be signed with AWS Signature Version 4. Best for internal service-to-service calls and AWS CLI/SDK access. Not for public APIs since every caller needs AWS credentials.

Lambda Authorizer (formerly Custom Authorizer): API Gateway calls a Lambda function with the request token (or full request). The Lambda returns an IAM policy (allow/deny) and optionally a context object passed to the backend. Results can be cached by API Gateway to reduce Lambda calls.

javascript

// Lambda Authorizer handler
exports.handler = async (event) => {
  const token = event.authorizationToken; // Bearer <jwt>

  try {
    const decoded = jwt.verify(token.split(" ")[1], process.env.JWT_SECRET);
    return {
      principalId: decoded.sub,
      policyDocument: {
        Version: "2012-10-17",
        Statement: [{ Effect: "Allow", Action: "execute-api:Invoke", Resource: event.methodArn }],
      },
      context: { userId: decoded.sub, email: decoded.email },
    };
  } catch {
    throw new Error("Unauthorized");
  }
};

JWT Authorizer (HTTP API only): API Gateway validates JWT tokens directly without invoking a Lambda function. Configure the issuer URL (Cognito, Auth0, Okta) and API Gateway verifies signature and claims automatically. No Lambda cost, lower latency.

yaml — serverless.yml

# Serverless Framework: HTTP API with JWT authorizer
httpApi:
  authorizers:
    jwtAuthorizer:
      type: jwt
      identitySource: $request.header.Authorization
      issuerUrl: https://cognito-idp.us-east-1.amazonaws.com/us-east-1_XXX
      audience:
        - my-app-client-id

Q12. How does API Gateway throttling work?

API Gateway enforces throttling at two levels.

Account-level throttle: default 10,000 requests/second and 5,000 burst (requests allowed in the first second). Shared across all APIs in the region.

Stage-level and method-level throttle: set per API stage or per individual route/method. Overrides the account default for that specific resource.

When throttled, API Gateway returns 429 Too Many Requests.

Usage Plans (REST API): pair with API keys to set per-customer throttle limits and monthly request quotas. Useful for monetized APIs.

bash

# Set throttle on a specific method via AWS CLI
aws apigateway update-stage \
  --rest-api-id abc123 \
  --stage-name prod \
  --patch-operations \
    op=replace,path=/defaultRouteSettings/throttlingRateLimit,value=1000 \
    op=replace,path=/defaultRouteSettings/throttlingBurstLimit,value=500

Behind the scenes, API Gateway uses a token bucket algorithm. The burst limit is the bucket capacity (tokens available instantly). The rate limit is the refill rate (tokens added per second).

Q13. What are API Gateway stages and how do you use them for deployments?

A stage is a named reference to a specific deployment of your API (e.g., dev, staging, prod). Each stage has its own URL, throttle settings, logging configuration, and stage variables.

text

# Stage URLs
https://abc123.execute-api.us-east-1.amazonaws.com/dev/users
https://abc123.execute-api.us-east-1.amazonaws.com/prod/users

Stage variables work like environment variables for API Gateway. Use them to point different stages to different Lambda function aliases or backends.

text

# Stage variable: lambdaAlias = "dev" in dev stage, "prod" in prod stage
# Lambda integration URI uses the stage variable:
# arn:aws:apigateway:us-east-1:lambda:path/.../functions/${stageVariables.lambdaAlias}/invocations

Canary deployments: route a percentage of traffic to a new stage deployment before full release. If metrics look good, promote the canary. If not, roll back.

bash

aws apigateway create-deployment \
  --rest-api-id abc123 \
  --stage-name prod \
  --canary-settings percentTraffic=10,stageVariableOverrides='{"lambdaAlias":"canary"}'

Custom domains: map your own domain (api.yourcompany.com) to an API Gateway endpoint using Route 53 and ACM certificates, hiding the default execute-api URL.

Q14. What is the difference between REST API and HTTP API in API Gateway?

Feature	REST API	HTTP API
Price	~$3.50/million requests	~$1.00/million requests
JWT authorizers	No (use Lambda authorizer)	Yes (native, no Lambda needed)
Response transformation	Yes (mapping templates)	No
Request validation	Yes	No
Usage plans + API keys	Yes	No
WebSocket	No	No (separate WebSocket API type)
Private integrations (VPC Link)	Yes	Yes
OpenAPI import/export	Yes	Partial
Caching	Yes	No
Latency	Slightly higher	Lower

Choose HTTP API when you are building a new serverless API, you use JWT auth (Cognito, Auth0), and you do not need request/response transformation or usage plans. It is simpler, cheaper, and faster for the majority of use cases.

Choose REST API when you need built-in caching, response transformation via mapping templates, usage plans for monetization, or API keys with per-key throttle controls.

Category 3: AWS S3 (Q15-Q20)

S3 questions test your understanding of object storage fundamentals: storage classes, versioning, access control, and the presigned URL pattern that almost every file-upload feature depends on. If your data layer also includes a NoSQL store alongside S3, our NoSQL interview questions guide covers DynamoDB and MongoDB patterns that frequently pair with S3 for media and document storage.

Q15. What is Amazon S3 and what are its core concepts?

Amazon S3 (Simple Storage Service) is AWS's object storage service. It stores any file type as an object inside a container called a bucket. S3 is designed for 99.999999999% (11 nines) durability and scales from bytes to exabytes.

Bucket: a container for objects. Bucket names are globally unique across all AWS accounts. Each bucket lives in one AWS region.

Object: a file stored in S3. Consists of the object data (binary) plus metadata (key-value pairs). Maximum object size is 5TB. Objects over 5GB require multipart upload.

Key: the full path and name of the object within the bucket, for example images/profile/user-1001/avatar.jpg. S3 is flat (no real folders): the slash is just part of the key name, but the console displays it as folders.

text

URL format: https://bucket-name.s3.amazonaws.com/path/to/object
Or regional: https://bucket-name.s3.us-east-1.amazonaws.com/path/to/object

Key S3 capabilities:

Object versioning: keep multiple versions of the same key.
Lifecycle policies: automatically transition or delete objects by age.
Replication: cross-region or same-region replication.
Event notifications: trigger Lambda, SQS, or SNS on object events.
Static website hosting: serve HTML/CSS/JS files as a static site.
Presigned URLs: temporary access to private objects.

Q16. What are S3 storage classes and when do you use each?

S3 offers several storage classes optimized for different access patterns and cost profiles.

S3 Standard: the default. Low latency, high availability (99.99%). Best for frequently accessed data: active user uploads, application assets, frequently read datasets. Most expensive per GB.
S3 Intelligent-Tiering: automatically moves objects between frequent-access, infrequent-access, and archive tiers based on access patterns. Small monthly monitoring fee per object. Best when access patterns are unpredictable.
S3 Standard-IA (Infrequent Access): lower storage cost than Standard, but adds a per-GB retrieval fee. Minimum storage duration of 30 days. Best for data accessed once a month or less: backups, disaster recovery copies.
S3 One Zone-IA: same as Standard-IA but stored in one AZ only. Lower cost but lower durability. Best for secondary backup copies.
S3 Glacier Instant Retrieval: archival storage, millisecond retrieval. Best for quarterly-accessed data (medical images, annual reports).
S3 Glacier Flexible Retrieval: archival storage, retrieval in minutes to hours. Best for backups accessed a few times per year.
S3 Glacier Deep Archive: the cheapest storage class. Retrieval in 12 to 48 hours. Best for regulatory long-term retention (7+ year compliance data).

bash

# Upload directly to a specific storage class
aws s3 cp myfile.zip s3://my-bucket/backups/ \
  --storage-class STANDARD_IA

# Configure lifecycle rule to transition to Glacier after 90 days
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration file://lifecycle.json

Q17. What is S3 versioning and what problems does it solve?

When versioning is enabled on a bucket, S3 stores every version of every object rather than overwriting. Each PUT creates a new version with a unique version ID. Deleting a versioned object adds a delete marker (it is not truly deleted until you delete all versions).

Problems it solves:

Accidental overwrites: restore a previous version with one API call.
Accidental deletions: delete the delete marker to restore the object.
Ransomware protection: an attacker cannot overwrite history.
Audit trail: full history of every object mutation.

bash

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled

# List all versions of an object
aws s3api list-object-versions \
  --bucket my-bucket \
  --prefix images/avatar.jpg

# Restore previous version (copy old version over current)
aws s3api copy-object \
  --bucket my-bucket \
  --copy-source my-bucket/images/avatar.jpg?versionId=OLD_VERSION_ID \
  --key images/avatar.jpg

Combined with lifecycle policies, you can expire old versions automatically to control costs (keep last 5 versions, delete older ones after 90 days).

MFA Delete: requires MFA authentication to permanently delete a versioned object. Extra protection against accidental or malicious permanent deletion.

Q18. What are S3 presigned URLs and when do you use them?

A presigned URL is a time-limited, signed URL that grants temporary access to a specific S3 object to anyone who has the URL, without requiring AWS credentials. The URL encodes the identity of the signer, the target object, and the expiration time.

Use cases:

Allow users to download a private file without making the bucket public.
Allow users to upload directly to S3 from a browser, bypassing your server: a PUT presigned URL lets the browser PUT directly to S3 without proxying through your backend.

javascript

// Generate a presigned GET URL (download)
const { S3Client, GetObjectCommand } = require("@aws-sdk/client-s3");
const { getSignedUrl } = require("@aws-sdk/s3-request-presigner");

const client = new S3Client({ region: "us-east-1" });

const url = await getSignedUrl(
  client,
  new GetObjectCommand({ Bucket: "my-bucket", Key: "reports/q1-2026.pdf" }),
  { expiresIn: 3600 } // expires in 1 hour
);
// URL is safe to send to the client, only works for 1 hour

// Generate a presigned PUT URL (upload)
const uploadUrl = await getSignedUrl(
  client,
  new PutObjectCommand({
    Bucket: "my-bucket",
    Key: `uploads/user-${userId}/avatar.jpg`,
    ContentType: "image/jpeg",
  }),
  { expiresIn: 300 } // 5 minutes to start the upload
);
// Client uses this URL to PUT the file directly to S3
// Your backend never touches the file data

The direct-upload pattern (presigned PUT) is the right way to handle large file uploads. It offloads all the bandwidth and processing from your server to S3.

Q19. How do S3 event notifications work with Lambda?

S3 can invoke a Lambda function when specific events happen on a bucket:

ObjectCreated (PUT, POST, COPY, CompleteMultipartUpload).
ObjectRemoved (DELETE).
ObjectRestore (from Glacier).
Replication events.

bash

# 1. Add Lambda permission to allow S3 to invoke it
aws lambda add-permission \
  --function-name image-processor \
  --principal s3.amazonaws.com \
  --statement-id s3-invoke \
  --action lambda:InvokeFunction \
  --source-arn arn:aws:s3:::my-upload-bucket \
  --source-account 123456789012

# 2. Configure the S3 event notification
aws s3api put-bucket-notification-configuration \
  --bucket my-upload-bucket \
  --notification-configuration '{
    "LambdaFunctionConfigurations": [{
      "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123:function:image-processor",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": { "FilterRules": [
          { "Name": "prefix", "Value": "uploads/" },
          { "Name": "suffix", "Value": ".jpg" }
        ]}
      }
    }]
  }'

Lambda receives an S3 event:

javascript

exports.handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    console.log(`Processing: s3://${bucket}/${key}`);
    await generateThumbnail(bucket, key);
  }
};

Q20. What is the difference between an S3 bucket policy and an ACL?

S3 Access Control Lists (ACLs) are a legacy mechanism. They define basic read/write permissions per object or bucket for specific AWS accounts or predefined groups (all users, authenticated users). AWS recommends disabling ACLs and using bucket policies instead for all new buckets.

S3 Bucket Policy: a JSON IAM-style resource policy attached to the bucket. Supports fine-grained conditions, IP restrictions, MFA requirements, and cross-account access. More powerful and auditable than ACLs.

json

// Bucket policy: allow a specific IAM role to read, deny all others
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowReadFromAppRole",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::123456789:role/app-role" },
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ]
    },
    {
      "Sid": "DenyPublicAccess",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": ["arn:aws:s3:::my-bucket/*"],
      "Condition": {
        "Bool": { "aws:SecureTransport": "false" }
      }
    }
  ]
}

Block Public Access settings: a separate account-level and bucket-level setting that overrides ACLs and bucket policies to prevent any public access, even if a policy accidentally grants it. Enable this on all non-public buckets.

Category 4: Docker (Q21-Q32)

Docker questions cover the full lifecycle: images, containers, networking, volumes, multi-stage builds, security, and orchestration. This is the largest category in the guide because Docker fluency underpins almost every microservices and CI/CD question that follows. Once you're comfortable with the concepts here, our NestJS interview questions guide shows how these containers typically run a Node.js backend in production.

Q21. What is Docker and how does it differ from a virtual machine?

Docker is a platform for packaging, distributing, and running applications in containers. A container is an isolated process running on the host OS, packaged with all its dependencies.

The key difference from a virtual machine:

VM: runs a full guest OS on top of a hypervisor. Each VM has its own kernel, virtual hardware, and OS installation. Boot time is measured in minutes, and overhead is gigabytes of memory per VM.

Container: shares the host OS kernel. Only packages the application and its user-space dependencies. Boot time is measured in milliseconds, and overhead is megabytes.

Practical result: you can run dozens of containers on a machine where only 3 to 4 VMs would fit. Containers also start in milliseconds, making them ideal for scaling and CI/CD.

The trade-off: VMs provide stronger isolation (separate kernel). Containers share the host kernel, so a kernel exploit can potentially escape container isolation. In practice, production environments often run containers inside VMs (e.g., EC2 instances running Docker) to get both performance and isolation.

Q22. Explain Docker's architecture (client, daemon, containerd, registry).

Docker uses a client-server architecture.

Docker CLI (client): the command-line tool you interact with. Sends REST API requests to the Docker daemon via a Unix socket (/var/run/docker.sock).

Docker daemon (dockerd): the long-running background service. Manages images, containers, networks, and volumes. Delegates container lifecycle management to containerd.

containerd: an industry-standard container runtime (CNCF project). Handles the actual container lifecycle: pulling images, creating, starting, and stopping containers. Kubernetes also uses containerd directly.

runc: the low-level container runtime that containerd calls to actually spawn containers using Linux namespaces and cgroups.

Docker Registry: stores and distributes Docker images. Docker Hub is the default public registry. ECR (AWS), GCR (Google), and ACR (Azure) are popular managed alternatives. You can also self-host with Harbor or a private registry.

text

CLI --> dockerd (REST API) --> containerd --> runc --> container process
                               |
                               --> registry (pull/push images)

Q23. What is the difference between a Docker image and a container?

Image: a read-only, immutable template. Built from a Dockerfile. Consists of stacked layers, each representing a filesystem change. An image is like a class in OOP.

Container: a running (or stopped) instance of an image. Created with docker run. Gets its own writable layer on top of the image layers. All writes go into this ephemeral writable layer: when the container is deleted, the writes are lost unless you use volumes. A container is like an object (instance) of the class.

text

Image layers (read-only):
  Layer 3: COPY . /app
  Layer 2: RUN npm install
  Layer 1: FROM node:22-alpine

Container:
  Writable layer (container-specific changes)
  [Image layers below: shared, read-only]

Multiple containers can run from the same image simultaneously. They all share the same read-only image layers (saving disk space) but each has its own writable layer.

bash

# Image vs container commands
docker images              # list images
docker build -t myapp .    # build image from Dockerfile
docker rmi myapp           # remove image

docker ps                  # list running containers
docker ps -a               # list all containers (including stopped)
docker run myapp           # create and start a container from image
docker rm my-container     # remove stopped container

Q24. How do you write an efficient Dockerfile?

A good Dockerfile produces a small, fast-to-build, secure image.

dockerfile

# Use a specific version tag, never just "latest"
FROM node:22-alpine

# Set working directory
WORKDIR /app

# Copy dependency files FIRST (before source code)
# This way the npm install layer is cached as long as package.json doesn't change
COPY package.json package-lock.json ./

# Install dependencies
RUN npm ci --only=production

# Copy source code AFTER installing dependencies
COPY src/ ./src/

# Run as non-root user for security
USER node

# Document the port (does not actually expose, use docker run -p or compose)
EXPOSE 3000

# Use ENTRYPOINT + CMD pattern
# ENTRYPOINT: the executable
# CMD: default arguments (can be overridden at runtime)
ENTRYPOINT ["node"]
CMD ["src/server.js"]

Key best practices:

Order layers from least-to-most frequently changed (dependencies before source).
Use .dockerignore to exclude node_modules, .git, test files, and README.
Use alpine or distroless base images to minimize attack surface and size.
Never RUN apt-get in multiple separate RUN commands: chain them with && to avoid creating unnecessary intermediate layers.
Never store secrets in ENV variables or COPY .env files: use secrets at runtime instead.
Run as non-root with the USER directive or a dedicated user.

Q25. What is Docker layer caching and how does it affect build speed?

Every instruction in a Dockerfile creates a layer. Docker caches each layer. When you rebuild, Docker reuses cached layers from the cache until it encounters a layer that has changed, then it re-executes all subsequent layers.

This means layer order matters for build speed.

dockerfile

# BAD: source code changes every build, invalidates npm install cache
FROM node:22-alpine
WORKDIR /app
COPY . .            # copies everything including package.json AND source code
RUN npm ci          # runs every time ANY file changes

# GOOD: separate dependency install from source code copy
FROM node:22-alpine
WORKDIR /app
COPY package*.json ./   # only package files
RUN npm ci              # cached as long as package.json is unchanged
COPY . .                # source code, cache miss here is cheap

In the GOOD example, if you change only a source file, Docker reuses the npm ci layer (which is slow) and only re-executes the COPY . . layer (fast).

bash

# Build with no cache (force full rebuild)
docker build --no-cache -t myapp .

# See image layers and their sizes
docker history myapp
docker inspect myapp

Q26. What is Docker Compose and when do you use it?

Docker Compose defines and runs multi-container applications using a single YAML file. Instead of running docker network create, docker run, and manually linking containers, you describe all services in docker-compose.yml and bring everything up with one command.

yaml — docker-compose.yml

services:
  api:
    build: .
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgres://user:password@db:5432/myapp
      - REDIS_URL=redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_started
    volumes:
      - ./src:/app/src  # mount source code for hot reload in dev

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: myapp
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d myapp"]
      interval: 5s
      timeout: 5s
      retries: 5

  cache:
    image: redis:7-alpine
    ports:
      - "6379:6379"

volumes:
  postgres_data:

bash

docker compose up -d         # start all services in background
docker compose logs -f api   # follow logs for the api service
docker compose down          # stop and remove containers and networks
docker compose down -v       # also remove named volumes

Docker Compose is ideal for local development and CI/CD test environments. In production, use Kubernetes or ECS for orchestration.

Q27. What Docker network drivers exist and when do you use each?

Docker ships five built-in network drivers.

bridge: the default for containers on the same host. Creates a virtual bridge network. Containers can communicate by container name (automatic DNS). Use for local development multi-container setups.

bash

# Create a custom bridge network (recommended over default bridge)
docker network create my-network
docker run --network my-network --name api myapp
docker run --network my-network --name db postgres
# api can reach db at hostname "db"

host: the container shares the host's network stack directly. No NAT, no port mapping needed. Best performance, no network isolation. Use for performance-sensitive workloads where networking overhead matters.
overlay: spans multiple Docker hosts. Required for Docker Swarm multi-host communication. Containers on different machines communicate as if on the same network. Used in Docker Swarm clusters.
macvlan: assigns a MAC address to the container, making it appear as a physical device on the network. Used for legacy applications that expect to be on the physical network.
none: complete network isolation. The container has only a loopback interface, with no external communication.

Q28. What is the difference between Docker volumes and bind mounts?

Volumes: managed by Docker, stored in Docker's storage area (/var/lib/docker/volumes/). Created explicitly or automatically. Portable, shareable between containers, and can be backed up with docker volume commands. Best for production persistent data.

Bind mounts: mount a specific host directory or file into the container. The host path must exist. Commonly used in development to mount source code for hot-reloading.

bash

# Volume (Docker-managed)
docker run -v postgres_data:/var/lib/postgresql/data postgres:16
docker volume ls
docker volume inspect postgres_data

# Bind mount (host directory mounted)
docker run -v $(pwd)/src:/app/src myapp  # source code hot reload
# Or named syntax:
docker run --mount type=bind,source=$(pwd)/src,target=/app/src myapp

tmpfs mount: stored in host memory only, never written to disk. For temporary sensitive data (secrets, temp files) that must not persist.

Production rule: use volumes for databases and persistent state. Use bind mounts only in development for code mounting. Never bind-mount secrets.

Q29. What is a multi-stage Docker build and why is it important?

A multi-stage build uses multiple FROM instructions in a single Dockerfile. Each stage can use a different base image. Only the final stage becomes the shipped image, earlier stages are discarded.

This solves the "fat build image" problem: build tools (compilers, package managers, test frameworks) needed to build the app are not needed to run it.

dockerfile

# Stage 1: build
FROM node:22 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci                 # includes devDependencies
COPY . .
RUN npm run build          # compile TypeScript, bundle assets
RUN npm test               # run tests during build

# Stage 2: production image
FROM node:22-alpine AS production
WORKDIR /app

# Only copy what is needed to RUN the app
COPY package*.json ./
RUN npm ci --only=production  # production deps only
COPY --from=builder /app/dist ./dist  # compiled output only

USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]

The production image contains only the Alpine Node.js runtime, production dependencies, and compiled output. The node:22 build environment (3x larger) is discarded.

Result: a production image might be 80MB instead of 800MB. Smaller images mean faster push and pull times, a smaller attack surface, and lower ECR storage costs.

Q30. How do you debug a crashing Docker container?

bash

# Step 1: Check container status and exit code
docker ps -a
# Exit code 0: clean exit | 1: error | 137: OOMKilled (out of memory) | 143: SIGTERM

# Step 2: Check logs
docker logs my-container
docker logs --tail 100 my-container   # last 100 lines
docker logs -f my-container           # follow (stream)

# Step 3: Inspect container config
docker inspect my-container
# Look for: environment variables, mount points, network config, exit code

# Step 4: Shell into a running container
docker exec -it my-container sh    # or bash if available
docker exec -it my-container env   # print environment variables

# Step 5: Start the container with shell override (if it crashes on start)
docker run -it --entrypoint sh my-image
# Manually run the start command to see errors

# Step 6: Check resource usage
docker stats my-container          # CPU, memory, network I/O

# Step 7: For OOMKilled (exit 137)
docker inspect my-container --format='{{.HostConfig.Memory}}'
# Increase memory limit in run command or compose file

Common causes of container crashes:

Application crash at startup, caused by a misconfigured env var or missing dependency.
OOM kill (exit 137): increase the memory limit.
Port already in use: check what is bound to the host port.
Volume mount issue: the path does not exist, or there's a permissions problem.
Health check failure: the container fails the configured health check threshold.

Q31. What are Docker container security best practices?

Run as non-root:

dockerfile

# Create a user in the Dockerfile
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

Use minimal base images:

dockerfile

# Distroless: no shell, no package manager, minimal attack surface
FROM gcr.io/distroless/nodejs22-debian12

Scan images for vulnerabilities:

bash

docker scout cve myapp:latest  # Docker Scout
trivy image myapp:latest       # Trivy (free, widely used)

Use a read-only filesystem:

bash

docker run --read-only myapp
# Application can still write to explicitly defined tmpfs mounts

Drop capabilities:

bash

docker run --cap-drop ALL --cap-add NET_BIND_SERVICE myapp

Never run in privileged mode in production:

bash

# Bad: gives container root-level host access
docker run --privileged myapp

# Good: only specific needed capability
docker run --cap-add SYS_PTRACE myapp

Limit resources:

bash

docker run --memory 512m --cpus 0.5 myapp

Use Docker Content Trust for image signing and verification in production pipelines.

Q32. What is the difference between Docker Swarm and Kubernetes?

Both are container orchestration platforms that manage clusters of containers across multiple hosts.

Docker Swarm: Docker's native clustering tool. Simple to set up and operate. Built into Docker Engine. Uses docker-compose.yml (stack files) for service definitions. Suitable for smaller deployments and teams migrating from Compose.

Kubernetes: the industry-standard container orchestration platform (CNCF). More complex but far more powerful. Richer ecosystem (Helm, service meshes, operators). Supported by every major cloud provider as a managed service (EKS, GKE, AKS). The standard choice for production at scale.

Key differences:

Setup: Swarm initializes in minutes; Kubernetes requires significant configuration (or use a managed service).
Scaling: both auto-scale, but Kubernetes has more control (HPA, VPA, KEDA).
Networking: Kubernetes has a richer networking model (Ingress, NetworkPolicy, CNI plugins).
Storage: Kubernetes has more storage options (PersistentVolumes, StorageClasses).
Ecosystem: Kubernetes has a vastly larger ecosystem and community.

Category 5: Microservices (Q33-Q42)

Microservices questions test whether you understand the distributed systems trade-offs, not just the buzzwords. Expect questions about communication patterns, failure handling, and data consistency across service boundaries.

Q33. What is a microservices architecture and how does it differ from monolith?

A monolith is a single deployable unit. All features, business logic, and data access live in one codebase and process. Simple to develop initially but grows harder to scale, deploy, and maintain as teams and complexity grow.

Microservices architecture decomposes an application into small, independently deployable services. Each service owns its domain, its data store, and its deployment lifecycle. Services communicate over the network via APIs or message queues.

Benefits:

Independent deployment: deploy the payment service without redeploying orders.
Independent scaling: scale the image processing service 10x without scaling auth.
Technology flexibility: each service can use the right language and database.
Team autonomy: each team owns and operates their service end to end.
Fault isolation: a crash in the notification service does not take down checkout.

Drawbacks:

Distributed system complexity: network failures, latency, partial failures.
Data consistency: no simple ACID transaction across service boundaries.
Operational overhead: more services means more things to monitor, deploy, and scale.
Debugging is harder: distributed tracing required across service boundaries.

When to start with a monolith: early-stage product, small team, unknown domain boundaries. Prematurely decomposing into microservices creates distributed system problems before you have the organizational scale to benefit.

Q34. How do microservices communicate with each other?

Two communication patterns: synchronous (request-response) and asynchronous (message-based).

Synchronous: the caller waits for a response.

REST over HTTP/HTTPS: simple, widely understood, human-readable. Uses HTTP methods (GET, POST, PUT, DELETE). Good for CRUD operations and request-response where the caller needs an immediate answer.
gRPC: binary protocol using Protocol Buffers. Faster than REST (roughly 7x), strongly typed contracts via .proto files, bidirectional streaming. Good for high-frequency inter-service calls, streaming data, and polyglot environments.
GraphQL: flexible query language. The client specifies exact data needed. Good for API aggregation layers and mobile clients with varied data needs.

Asynchronous: the caller publishes a message and continues.

Message queues (SQS, RabbitMQ): point-to-point. One producer, one consumer. Good for task queues, work distribution, and decoupling producer from consumer.
Event streaming (Kafka, Kinesis): one producer, many consumers. Events are retained and replayable. Good for event sourcing, real-time analytics, and decoupled event-driven architectures.
Publish/Subscribe (SNS, Redis Pub/Sub): publisher sends to a topic, multiple subscribers receive. Good for broadcasting events to multiple services.

The general rule: use synchronous communication when you need an immediate response. Use async messaging for operations where eventual processing is acceptable (order placed event, then email notification, then inventory update).

Q35. What is service discovery and why does it matter in microservices?

Service discovery is the mechanism by which microservices find each other's network locations. In a static environment, you could hardcode IP addresses. In a dynamic cloud environment where services scale up and down and containers get new IPs constantly, this is impossible.

Two patterns:

Client-side discovery: the calling service queries a service registry (Consul, Eureka, etcd) to get the current list of healthy instances for the target service. The client performs its own load balancing.

Server-side discovery: the client sends a request to a load balancer (AWS ALB, Kubernetes Service). The load balancer queries the registry and routes to a healthy instance. The client does not need to know about discovery.

Kubernetes handles service discovery automatically: every Service gets a DNS name that resolves to healthy pod IPs via kube-dns. The calling service uses http://payment-service/charge and Kubernetes routes it.

AWS App Mesh and Consul Connect are service mesh solutions that add service discovery plus mTLS, circuit breaking, and observability as a sidecar proxy, without changing application code.

Q36. What is the Circuit Breaker pattern and how does it work?

The Circuit Breaker prevents a single failing service from causing cascading failures across an entire system. It monitors calls to a downstream service and "trips" when the failure rate exceeds a threshold.

Three states:

Closed (normal): requests flow through. Success and failure rates are tracked.
Open (tripped): failure threshold exceeded. All requests fail immediately without contacting the downstream service. Returns cached data or a fallback response. Gives the failing service time to recover.
Half-Open (recovery check): after a timeout, a limited number of test requests are allowed through. If they succeed, the circuit closes. If they fail, it opens again.

javascript

// Example using opossum (Node.js circuit breaker library)
const CircuitBreaker = require("opossum");

const options = {
  timeout: 3000,                  // fail if takes longer than 3s
  errorThresholdPercentage: 50,   // open if more than 50% fail
  resetTimeout: 30000,            // try again after 30s
};

const breaker = new CircuitBreaker(callPaymentService, options);

breaker.fallback(() => ({ status: "deferred", message: "Payment queued for retry" }));

breaker.on("open", () => logger.warn("Payment service circuit OPEN"));
breaker.on("halfOpen", () => logger.info("Payment service circuit HALF-OPEN"));
breaker.on("close", () => logger.info("Payment service circuit CLOSED"));

// Usage
const result = await breaker.fire(paymentData);

Q37. What is the Saga pattern and when do you use it?

The Saga pattern manages distributed transactions across multiple microservices without two-phase commit (2PC). Since each service owns its own database, you cannot do a traditional SQL transaction across them. The Saga breaks the distributed transaction into a series of local transactions, each with a compensating transaction for rollback.

Two Saga implementations:

Choreography: each service publishes events. Downstream services listen and react. No central coordinator.

text

OrderService:  "OrderCreated" event -->
PaymentService: charges card, publishes "PaymentCompleted" or "PaymentFailed"
InventoryService: listens for PaymentCompleted, reserves stock
NotificationService: listens for both, sends email
If payment fails: OrderService listens for PaymentFailed, cancels the order

Orchestration: a central Saga orchestrator sends commands to each service and waits for responses. Step Functions (AWS) is a managed orchestrator.

text

Orchestrator:  Command "ChargeCard" --> PaymentService
PaymentService: "PaymentCompleted" --> Orchestrator
Orchestrator:  Command "ReserveStock" --> InventoryService
InventoryService: "StockInsufficient" --> Orchestrator
Orchestrator:  Command "RefundCard" --> PaymentService (compensating transaction)

Choreography is simpler for small flows. Orchestration is easier to reason about and debug for complex multi-step workflows.

Q38. What is event sourcing?

Event sourcing is a pattern where instead of storing the current state of an entity, you store the sequence of events that led to that state. The current state is derived by replaying all events.

text

Traditional: users table has row { id: 1, email: "bob@new.com", balance: 850 }

Event sourcing: events table has:
  { eventId: 1, type: "UserCreated", data: { email: "bob@old.com" } }
  { eventId: 2, type: "EmailChanged", data: { email: "bob@new.com" } }
  { eventId: 3, type: "Deposit",       data: { amount: 1000 } }
  { eventId: 4, type: "Withdrawal",    data: { amount: 150 } }

Current state = replay events 1-4: { email: "bob@new.com", balance: 850 }

Benefits:

Complete audit trail by default.
Time travel: reconstruct state at any point in history.
Event replay: rebuild read models, fix bugs by replaying events.
Natural fit for event-driven architectures.

Drawbacks:

Querying current state requires replaying events (use projections or read models).
Event schema changes require migration strategies.
Increased complexity for simple CRUD use cases.

Event sourcing is commonly paired with CQRS (Command Query Responsibility Segregation), where writes go through events and reads come from optimized read models (materialized projections).

Q39. What is distributed tracing and which tools implement it?

Distributed tracing tracks a single request as it flows through multiple microservices. Each service adds a trace span with timing information. You can see the full request lifecycle and identify where latency lives.

Without distributed tracing, debugging a slow request in a 20-service architecture is nearly impossible: the log is split across 20 services with no shared request ID.

Key concepts:

Trace: the complete journey of one request through all services.
Span: a single unit of work within a trace (one service call).
Trace ID: a unique ID propagated through all HTTP headers so spans from different services can be linked.
Parent Span ID: links child spans to their parent.

OpenTelemetry is the vendor-neutral standard for generating traces, metrics, and logs. Instrument your service once, export to any backend.

javascript

// OpenTelemetry instrumentation (Node.js)
const { NodeSDK } = require("@opentelemetry/sdk-node");
const { OTLPTraceExporter } = require("@opentelemetry/exporter-trace-otlp-http");

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: "http://collector:4318/v1/traces" }),
});
sdk.start();
// Now all HTTP calls and database queries are auto-instrumented

Popular backends: Jaeger (open source), Zipkin (open source), AWS X-Ray (native AWS), Datadog APM, Honeycomb, and Grafana Tempo.

Q40. What is a service mesh and when do you need one?

A service mesh is an infrastructure layer that manages all network communication between microservices. It runs as sidecar proxies (one per service pod) that intercept and handle all inbound and outbound traffic without changing application code.

Capabilities a service mesh provides:

mTLS between all services (automatic encryption and mutual authentication).
Circuit breaking and retry policies.
Traffic splitting (canary deployments, A/B testing).
Observability (automatic metrics and traces for all service-to-service calls).
Service discovery.

Popular service meshes: Istio, Linkerd, Consul Connect, and AWS App Mesh.

When you need a service mesh:

Zero-trust security: every service-to-service call must be encrypted and authenticated, and you cannot implement this in every service's code.
Observability across 20+ services without adding SDK code everywhere.
Advanced traffic management (canaries, dark launches) at the platform level.

When you probably do NOT need it:

Fewer than 10 services.
The team does not have Kubernetes expertise to operate Istio or Linkerd.
The complexity of operating the mesh exceeds the benefit.

Start without a service mesh. Add it when you have a clear, specific pain point, usually around security or observability at scale.

Q41. What is the 12-factor app methodology?

The 12-factor app is a methodology for building cloud-native, scalable, maintainable software-as-a-service applications. Relevant factors for microservices interviews:

III. Config: store config in environment variables, not in code or config files checked into source control. No hardcoded URLs, credentials, or environment-specific values.
IV. Backing services: treat databases, queues, and caches as attached resources accessed via URL. Swapping a local Postgres for a managed RDS instance should require only a config change.
VI. Processes: execute the app as one or more stateless processes. Shared state lives in a backing service (Redis, database), not in process memory. This makes horizontal scaling trivial.
VII. Port binding: export services via port binding. The service is self-contained and does not rely on a web server injection. Works naturally with Docker and Lambda.
IX. Disposability: maximize robustness with fast startup and graceful shutdown. Handle SIGTERM, drain in-flight requests, release resources. Enables zero-downtime deployments and auto-scaling.
XI. Logs: treat logs as event streams. Write to stdout. Let the platform (Docker, Kubernetes, CloudWatch) collect and route them.

Q42. How do you handle authentication and authorization across microservices?

Two patterns dominate.

Centralized authentication with token propagation: one auth service issues JWTs. Each downstream service validates the JWT independently, with no network call to the auth service per request, just local signature verification.

text

Client -> API Gateway -> (validates JWT) -> Order Service (validates JWT) -> Payment Service
                                            [extracts userId from JWT claims]

The API Gateway or service mesh validates the token. Services trust the validated identity propagated in headers (X-User-ID, X-User-Roles).

Service-to-service auth: services calling each other must also authenticate.

Short-lived service JWTs signed with service-specific keys.
Mutual TLS (mTLS) via a service mesh, with no application code changes.
AWS IAM roles and SigV4 signing for AWS-native architectures (Lambda calling Lambda, ECS service calling DynamoDB).

Authorization: each service enforces its own authorization rules based on the user identity in the token. Central policy enforcement (Open Policy Agent, AWS IAM) handles complex permission models.

The key principle: never pass usernames and passwords between services. Use short-lived tokens. Rotate signing keys regularly.

Category 6: Azure Entra ID (Q43-Q50)

Azure Entra ID questions test identity and access management knowledge for enterprise Microsoft environments. Expect questions on OAuth flows, managed identities, and the policy engine that controls conditional access.

Q43. What is Microsoft Azure Entra ID (formerly Azure Active Directory)?

Azure Entra ID (rebranded from Azure AD in 2023) is Microsoft's cloud-based Identity and Access Management (IAM) service. It is the identity backbone for Microsoft 365, Azure resources, and any third-party application registered with a tenant.

Core functions:

Authentication: verify who a user or application is (login).
Authorization: control what an authenticated identity can access.
Single Sign-On (SSO): users log in once and access many applications.
Multi-Factor Authentication (MFA): an extra verification step.
Conditional Access: a policy engine that controls access based on context, such as location, device compliance, and risk level.
Application management: register apps and define their permissions.

Entra ID is not the same as Windows Active Directory (AD DS). AD DS is an on-premises directory service using LDAP and Kerberos. Entra ID is cloud-native, uses OAuth 2.0 and OpenID Connect, and manages cloud identities. Azure AD Connect synchronizes on-premises AD users to Entra ID for hybrid environments.

Q44. What is the difference between an App Registration and a Service Principal?

App Registration: the global definition of an application in Entra ID. You create one App Registration in the home tenant. It defines the app's identity, redirect URIs, the API permissions it requests, and its certificate or secret credentials. Think of it as the blueprint.

Service Principal: the local instance of the App Registration within a specific tenant. When an app registration is created, or when a multi-tenant app is consented to in another tenant, a Service Principal is automatically created in that tenant. It carries the actual permissions granted to the app in that tenant.

For a single-tenant app, one App Registration creates one Service Principal in the same tenant. For a multi-tenant app, one App Registration creates N Service Principals: one per tenant that installs or consents to the app.

bash

# Create an app registration via Azure CLI
az ad app create \
  --display-name "my-backend-api" \
  --sign-in-audience AzureADMyOrg

# Get the app's Service Principal object ID
az ad sp show --id <app-id>

# Create a client secret for the app
az ad app credential reset \
  --id <app-id> \
  --append \
  --display-name "ci-cd-secret"

Q45. What are the OAuth 2.0 flows supported by Entra ID and when do you use each?

Entra ID supports several OAuth 2.0 and OpenID Connect flows. Picking the right one depends on whether a user is present and whether the client can keep a secret.

Authorization Code Flow: the standard flow for web apps where a user interactively logs in. The browser redirects to Entra ID, the user authenticates, Entra returns an authorization code, and the backend exchanges the code for tokens. Use for web applications where a human is present.
Authorization Code + PKCE: an extension of Authorization Code for public clients, such as single-page apps and mobile apps, that cannot securely store a client secret. PKCE (Proof Key for Code Exchange) replaces the client secret. Use for SPAs (React, Angular), mobile apps, and desktop apps.
Client Credentials Flow: the application authenticates directly with Entra ID using its own credentials (client ID plus secret or certificate). No user is involved, and the flow returns an access token for the application itself. Use for daemon processes, background services, CI/CD pipelines, and service-to-service calls.
On-Behalf-Of (OBO): a middle-tier API receives a user token and exchanges it for a token scoped to a downstream API, preserving the user's identity. Use for API-to-API calls where the user's identity must propagate downstream.
Device Code Flow: for devices with no browser or limited input capability. The device displays a code, and the user enters it on another device to approve. Use for CLI tools, IoT devices, and TV apps.

python

# Client credentials flow (Python MSAL)
import msal

app = msal.ConfidentialClientApplication(
    client_id="<app-id>",
    client_credential="<client-secret>",
    authority="https://login.microsoftonline.com/<tenant-id>"
)

result = app.acquire_token_for_client(
    scopes=["https://graph.microsoft.com/.default"]
)
access_token = result["access_token"]

Q46. What are Managed Identities in Azure and why are they preferred?

A Managed Identity is a Service Principal whose credentials are automatically managed by Azure. You never create, store, or rotate a client secret or certificate: Azure handles the key lifecycle entirely.

There are two types:

System-assigned managed identity: tied to a specific Azure resource, such as a VM, App Service, or Azure Function. Created with the resource and deleted with the resource, in a one-to-one relationship.
User-assigned managed identity: created as a standalone Azure resource. Can be assigned to multiple Azure resources and has an independent lifecycle.

bash

# Enable system-assigned managed identity on an App Service
az webapp identity assign \
  --resource-group my-rg \
  --name my-api

# Grant that identity permission to read from Key Vault
az keyvault set-policy \
  --name my-keyvault \
  --object-id <managed-identity-principal-id> \
  --secret-permissions get list

python

# In application code, no credentials needed.
# The Azure SDK automatically acquires tokens using the managed identity.
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

credential = DefaultAzureCredential()
client = SecretClient(vault_url="https://my-keyvault.vault.azure.net/", credential=credential)
secret = client.get_secret("database-password")

Why managed identities are preferred over client secrets:

No secret to rotate, store, or accidentally leak to source control.
Reduced attack surface, since there are no static credentials that can be stolen.
Automatic credential rotation by Azure.
Works seamlessly with Key Vault, Storage, SQL, Service Bus, and most Azure services.
Full audit trail via Entra ID sign-in logs.

Q47. What is Conditional Access in Entra ID?

Conditional Access is Entra ID's policy engine for access control decisions based on contextual signals. Instead of a simple allow or deny on identity, it evaluates who is accessing what, from where, on what device, and at what risk level.

Policy structure: if a set of conditions is met, then grant or block controls apply.

Conditions include:

User or group membership.
The application being accessed.
Sign-in risk level, detected by Identity Protection (low, medium, high).
Device compliance, such as Intune-managed or hybrid joined.
Location: named locations, IP ranges, or countries.
Client app: browser, mobile app, or legacy auth.

Controls include:

Block access.
Require MFA.
Require a compliant device.
Require a hybrid Azure AD joined device.
Require an approved client app.
Require a password change.

text

Example policy: "Require MFA for all admin portal access from outside the corporate network"

Conditions:
  Users: Admins group
  Application: Azure Management Portal
  Location: Exclude corporate IP range

Controls: Require MFA

Common use cases:

Require MFA for all external access.
Block legacy authentication protocols that do not support MFA.
Require compliant devices for accessing sensitive apps.
Block access from high-risk sign-ins automatically.

Q48. What is the difference between Azure RBAC and Entra ID roles?

Azure RBAC controls access to Azure resources: storage accounts, virtual machines, resource groups, and subscriptions. Roles are assigned at a scope (management group, subscription, resource group, or resource). Built-in roles include Owner, Contributor, Reader, and over 100 service-specific roles.

bash

# Grant a service principal Contributor access to a resource group
az role assignment create \
  --assignee <service-principal-object-id> \
  --role "Contributor" \
  --resource-group my-resource-group

# Grant read access to a specific storage account
az role assignment create \
  --assignee <user-or-sp-object-id> \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<account>

Entra ID roles (directory roles) control access to Entra ID itself and to Microsoft 365 services. Examples include Global Administrator, User Administrator, Application Administrator, and Security Reader.

Managing Azure infrastructure (VMs, storage, networking): use Azure RBAC.
Managing users, groups, app registrations, and Conditional Access: use Entra ID roles.
Accessing Azure resources from an application: use Azure RBAC on a managed identity.

Q49. How does Single Sign-On (SSO) work in Entra ID?

SSO lets users authenticate once with Entra ID and access multiple applications without re-entering credentials. Entra ID supports three SSO protocols.

OpenID Connect (OIDC): modern, token-based. Entra ID returns an ID token (user identity) and an access token (API access). Best for new cloud-native apps.
SAML 2.0: an XML-based federation standard, common for enterprise SaaS apps such as Salesforce and ServiceNow. Entra ID acts as the Identity Provider (IdP) and the app is the Service Provider (SP). No passwords are exchanged: assertions are signed XML documents.
Password-based SSO: Entra ID stores credentials for apps that do not support federated SSO, acting as a browser extension credential vault. Treat this as a last resort.

SSO session flow (OIDC):

User accesses App A while not logged in, and is redirected to the Entra ID login page.
User authenticates with credentials and MFA.
Entra ID sets a session cookie and issues tokens for App A.
User accesses App B; the browser sends the Entra ID session cookie.
Entra ID validates the existing session and issues tokens for App B without requiring re-authentication.

Token lifetime: access tokens default to 1 hour. Refresh tokens allow silent re-authentication for around 90 days before requiring an interactive login.

Q50. What is Privileged Identity Management (PIM) in Entra ID?

PIM is an Entra ID feature that manages just-in-time privileged access to Azure resources and Entra ID roles. Instead of giving permanent admin access (standing privileges), users are made eligible for privileged roles and must activate them when needed.

How it works:

An administrator marks a user as eligible for the Global Administrator role.
The user has no admin access by default.
When needed, the user activates the role via PIM, providing justification and optionally MFA or manager approval.
The user has admin access for a configurable time window, typically 1 to 8 hours.
Access expires automatically, and all activation requests are logged.

Benefits:

Reduces attack surface: stolen credentials for a non-admin account cannot immediately be used for admin tasks.
Requires justification for every privileged action.
Provides a full audit trail of who used which privileged role, when, and why.
Supports approval workflows for sensitive roles.

bash

# Check active PIM role assignments
az role assignment list --include-classic-administrators

# Via Microsoft Graph (list eligible assignments)
GET /roleManagement/directory/roleEligibilitySchedules
    ?$filter=principalId eq '{user-id}'

Quick Reference: All 50 Questions at a Glance

Use this table to scan every question and its core concept in one pass. It's the fastest way to spot the topics you need to revisit before an interview.

#	Question	Core concept
Q1	What is serverless computing	No server management, pay-per-execution, trade-offs
Q2	How does Lambda work	Execution environment lifecycle, handler, warm/cold start
Q3	Lambda cold starts and prevention	Causes, provisioned concurrency, SnapStart, runtime choice
Q4	Lambda concurrency types	Unreserved vs reserved vs provisioned
Q5	Lambda Layers	Shared dependencies, /opt directory, 5 max layers
Q6	Synchronous vs asynchronous invocation	Caller waits vs fire-and-forget, DLQ, retries
Q7	Lambda limits	15-minute timeout, 10GB memory, 1,000 concurrency, 6MB payload
Q8	Lambda monitoring and debugging	CloudWatch Logs, Metrics, X-Ray, Powertools
Q9	API Gateway and the three API types	REST vs HTTP vs WebSocket
Q10	API Gateway integration types	Lambda proxy, HTTP, AWS Service, Mock
Q11	API Gateway authorization	IAM, Lambda Authorizer, JWT Authorizer
Q12	API Gateway throttling	Token bucket, account/stage limits, 429 response
Q13	API Gateway stages and deployments	Stage variables, canary deployments, custom domains
Q14	REST API vs HTTP API	Price, features, JWT support, caching
Q15	S3 core concepts	Bucket, object, key, URL format
Q16	S3 storage classes	Standard, IA, Glacier, Intelligent-Tiering, Deep Archive
Q17	S3 versioning	Multiple versions, delete markers, MFA Delete
Q18	Presigned URLs	Temporary access, GET and PUT, direct upload pattern
Q19	S3 event notifications and Lambda	Event types, permission model, structured trigger event
Q20	Bucket policy vs ACL	JSON resource policies vs legacy ACLs, Block Public Access
Q21	Docker vs virtual machines	Shared kernel vs full OS, boot time, resource use
Q22	Docker architecture	CLI, dockerd, containerd, runc, registry
Q23	Docker image vs container	Read-only template vs running instance, writable layer
Q24	Writing an efficient Dockerfile	Layer order, dependencies before code, non-root, .dockerignore
Q25	Docker layer caching	Cache invalidation order, build speed optimization
Q26	Docker Compose	Multi-service YAML, depends_on, healthcheck, volumes
Q27	Docker network drivers	bridge, host, overlay, macvlan, none
Q28	Volumes vs bind mounts	Docker-managed vs host path, production vs dev
Q29	Multi-stage Docker builds	Fat build vs lean production image, --from copy
Q30	Debugging a crashing container	ps -a, logs, inspect, exec -it, exit codes
Q31	Container security best practices	Non-root, minimal image, scan, read-only FS, cap-drop
Q32	Docker Swarm vs Kubernetes	Simplicity vs ecosystem, production scale
Q33	Monolith vs microservices	Deployment, scaling, team autonomy, trade-offs
Q34	Inter-service communication	REST, gRPC, message queues, event streaming
Q35	Service discovery	Client-side vs server-side, Consul, Kubernetes DNS
Q36	Circuit Breaker pattern	Closed, Open, Half-Open states, opossum library
Q37	Saga pattern	Distributed transactions, choreography vs orchestration
Q38	Event sourcing	Events as the source of truth, replay, projections
Q39	Distributed tracing	Trace, span, OpenTelemetry, Jaeger, X-Ray
Q40	Service mesh	Sidecar proxy, mTLS, traffic management, Istio
Q41	12-factor app methodology	Config, stateless processes, port binding, disposability, logs
Q42	Auth across microservices	JWT propagation, mTLS, service-to-service tokens
Q43	What is Azure Entra ID	IAM service, authentication, SSO, Conditional Access
Q44	App Registration vs Service Principal	Blueprint vs instance, multi-tenant model
Q45	OAuth 2.0 flows in Entra ID	Auth Code, PKCE, Client Credentials, OBO, Device Code
Q46	Managed Identities	Auto-managed credentials, system vs user-assigned
Q47	Conditional Access	Policy engine, signals, grant/block controls
Q48	Azure RBAC vs Entra ID roles	Resource access vs directory access
Q49	Single Sign-On (SSO)	OIDC, SAML, session cookies, token lifetime, CAE
Q50	Privileged Identity Management (PIM)	Just-in-time access, eligible vs active, audit trail

Frequently Asked Questions

What level of cloud and DevOps knowledge do these 50 questions target?

This guide spans junior fundamentals through senior architecture decisions. Questions 1 through 20 (serverless, Lambda, API Gateway, S3) cover the AWS building blocks that most backend roles touch directly, and are reasonable for mid-level candidates to answer confidently.

Questions 21 through 42 (Docker and microservices) go deeper into operational and architectural tradeoffs, such as the Saga pattern, service mesh, and distributed tracing, which separate mid-level from senior and staff candidates. Questions 43 through 50 (Azure Entra ID) target roles in enterprise Microsoft environments and security-focused positions.

How does AWS Lambda compare to running containers on ECS or Kubernetes?

Both run your code without you managing physical servers, but the operational model and cost profile differ significantly.

	AWS Lambda	ECS / Kubernetes
Billing	Per invocation and execution time	Per running instance, regardless of traffic
Scaling	Automatic, near-instant, to zero	Configured autoscaling, rarely to zero
Max runtime	15 minutes per invocation	Unbounded, long-running processes are fine
Cold starts	Yes, mitigated with provisioned concurrency (Q3)	No, containers stay warm
Best fit	Event-driven, bursty, API backends	Steady high-traffic services, persistent connections

A common pattern is to start with Lambda for new features, since it has the lowest operational overhead, and move a service to ECS or Kubernetes once it runs continuously at high enough volume that always-on compute becomes cheaper than per-invocation billing.

How can I practice these AWS, Docker, and Azure concepts before an interview?

Most of these concepts can be tested locally without an AWS or Azure bill, using Docker Desktop, the AWS Free Tier, and LocalStack to emulate AWS services.

bash

# Run LocalStack to emulate S3, Lambda, and API Gateway locally
docker run -d -p 4566:4566 --name localstack localstack/localstack

# Create a bucket against the local endpoint
aws --endpoint-url=http://localhost:4566 s3 mb s3://my-test-bucket

# Build and run a multi-stage Dockerfile from Q29 locally
docker build -t myapp .
docker run -p 3000:3000 myapp

How do I generate an S3 presigned URL for a file upload?

Use the AWS SDK's request presigner to generate a time-limited PUT URL, then have the client upload directly to S3 without the file passing through your backend. This is the pattern covered in Q18.

javascript

const { S3Client, PutObjectCommand } = require("@aws-sdk/client-s3");
const { getSignedUrl } = require("@aws-sdk/s3-request-presigner");

const client = new S3Client({ region: "us-east-1" });

const uploadUrl = await getSignedUrl(
  client,
  new PutObjectCommand({
    Bucket: "my-bucket",
    Key: `uploads/user-${userId}/avatar.jpg`,
    ContentType: "image/jpeg",
  }),
  { expiresIn: 300 } // 5 minutes to start the upload
);
// Send uploadUrl to the client; it PUTs the file directly to S3

What happens when a Lambda function hits its concurrency limit?

It depends on the invocation type, covered in Q4 and Q6. For synchronous invocations (API Gateway, ALB), Lambda returns a 429 TooManyRequestsException to the caller immediately, and the request is not retried automatically.

For asynchronous invocations (S3, SNS, EventBridge), Lambda queues the event and retries automatically once the throttle clears, up to its retry policy.
Reserved concurrency (Q4) can make this worse if set too low for a function's real traffic, since it hard-caps that function even when the account has unused capacity.
Provisioned concurrency (Q3) does not prevent throttling on its own. It only keeps a fixed number of environments warm; traffic above that number still scales through the normal (cold-start-prone) path unless reserved concurrency is also raised.

The fix is almost always to request a concurrency limit increase for the account or function, add a queue (SQS) in front of the function to smooth bursts, or both.

nodejs

30 Node.js Interview Questions and Answers (2026)

30 Node.js interview questions with full answers: event loop, streams, clustering, worker threads, memory leaks, and security. Updated for 2026.

Jun 8, 202626 min read

nodejs

30 NestJS Interview Questions and Answers (2026)

30 NestJS interview questions with full answers: modules, DI, guards, pipes, interceptors, JWT auth, microservices, and testing. Updated for 2026.

Jun 8, 202624 min read

databases

42 NoSQL Database Interview Questions and Answers (2026)

42 NoSQL interview questions covering MongoDB, Redis, and DynamoDB: aggregation pipelines, data structures, GSI vs LSI, and CAP theorem. Updated for 2026.

Jun 10, 202637 min read

devops41 min read

50 Cloud & DevOps Interview Questions and Answers (2026)

50 cloud and DevOps interview questions covering AWS Lambda, Docker, Microservices, API Gateway, S3, serverless, and Azure Entra ID. With code examples.

Zeeshan Tofiq

June 15, 2026

On this page

Category 1: Serverless and AWS Lambda (Q1-Q8)

Q1. What is serverless computing and what problem does it solve?

Stateless execution: functions do not persist state between invocations.
Cold starts: functions take extra time on first invocation after an idle period.
Maximum execution time: AWS Lambda caps at 15 minutes per invocation.
Best fit: event-driven tasks, API backends, data processing, scheduled jobs.

When NOT to use serverless:

Long-running processes like video encoding or ML training.
Applications needing persistent connections, such as WebSocket servers (unless using the Lambda WebSocket API via API Gateway).
Workloads that run continuously at high volume, where an always-on EC2 instance or container may be cheaper.

Q2. How does AWS Lambda work? Explain the execution model.

Lambda runs your function code inside an execution environment: a managed container that AWS provisions, runs, and destroys. You never see the server.

Execution flow:

A trigger event arrives (HTTP request via API Gateway, S3 upload, SQS message, EventBridge rule, etc.).
Lambda allocates an execution environment with your chosen memory and CPU.
AWS downloads your deployment package (code plus dependencies), initializes the runtime, and runs your init code outside the handler (once per environment).
Lambda calls your handler function with the event and context objects.
Your handler processes the event and returns a response.
The execution environment stays alive for a period (typically 5 to 15 minutes) to handle subsequent invocations (warm start).
After inactivity, the environment is frozen and eventually destroyed.

javascript — index.js

// Node.js Lambda handler
exports.handler = async (event, context) => {
  // event: the trigger payload (HTTP request, S3 event, etc.)
  // context: runtime info (function name, remaining time, etc.)

  console.log("Event:", JSON.stringify(event));
  console.log("Remaining time:", context.getRemainingTimeInMillis(), "ms");

  // Business logic here
  const result = await processEvent(event);

  // Return response (format depends on trigger type)
  return {
    statusCode: 200,
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(result),
  };
};

// Code OUTSIDE the handler runs once per execution environment (cold start)
// Good for: database connections, SDK clients, config loading
const dbClient = new DatabaseClient({ host: process.env.DB_HOST });

Lambda supports Node.js, Python, Java, Go, Ruby, .NET, and custom runtimes via the Runtime API.

Q3. What is a Lambda cold start and how do you reduce it?

Cold starts happen when:

The function has not been invoked recently (environment was destroyed).
Traffic spikes cause more concurrent executions than existing environments.
You deploy a new function version.
The function runs inside a VPC (adds latency for ENI attachment, now much improved with Hyperplane ENI, but VPC still adds some overhead).

Cold start mitigation strategies:

1. Provisioned Concurrency: keeps N execution environments pre-initialized and always warm. Eliminates cold starts for those instances entirely. Costs money even when idle.

bash

# Set provisioned concurrency on a specific version or alias
aws lambda put-provisioned-concurrency-config \
  --function-name my-api \
  --qualifier prod \
  --provisioned-concurrent-executions 10

2. SnapStart (Java only): pre-initializes the Java runtime snapshot at publish time. Reduces Java cold starts from seconds to sub-100ms.
3. Keep deployment packages small: a smaller zip means a faster download and faster cold start. Remove unused dependencies and use Lambda Layers for shared libs.
4. Choose fast runtimes: Node.js and Python have the fastest cold starts. Java and .NET are slowest. Go is also fast.
5. Move init code outside the handler: database connections, SDK clients, and config loading done outside the handler run once per environment, not per invocation.
6. Schedule warm-up pings: invoke the function every 5 minutes via EventBridge to prevent the environment from going cold. Works but is imprecise and not reliable for burst traffic.

Q4. What is Lambda concurrency and what are the two types?

Unreserved concurrency is the pool shared by all functions in your account. If one function consumes all available concurrency during a spike, other functions in the same account get throttled.

bash

# Reserve 100 concurrent executions for the payments function
aws lambda put-function-concurrency \
  --function-name payment-processor \
  --reserved-concurrent-executions 100

# Remove reserved concurrency (return to unreserved pool)
aws lambda delete-function-concurrency \
  --function-name payment-processor

When Lambda exceeds its concurrency limit it throttles, returning a 429 TooManyRequestsException for synchronous invocations. For asynchronous invocations, Lambda retries automatically.

Q5. What are Lambda Layers and when do you use them?

Use layers for:

Sharing common libraries across multiple functions (avoid bundling the same 200MB dependency in every function zip).
Keeping deployment packages small for faster cold starts and easier deployments.
Distributing internal utilities or helper code across your team.
Custom runtimes for unsupported languages.

bash

# Create a layer from a zip containing Node.js dependencies
zip -r layer.zip nodejs/
aws lambda publish-layer-version \
  --layer-name my-shared-libs \
  --description "Shared utilities and database client" \
  --zip-file fileb://layer.zip \
  --compatible-runtimes nodejs20.x nodejs22.x

# Attach the layer to a function
aws lambda update-function-configuration \
  --function-name my-api \
  --layers arn:aws:lambda:us-east-1:123456789:layer:my-shared-libs:3

In your function, access layer contents at /opt:

javascript

const { sharedUtil } = require("/opt/nodejs/shared-util");

AWS also publishes public layers, such as the AWS X-Ray SDK layer and the Lambda Powertools layer for Python and Node.js.

Q6. What are the two Lambda invocation types and how does error handling differ?

javascript

// Direct synchronous invocation from SDK
const result = await lambda.invoke({
  FunctionName: "my-function",
  InvocationType: "RequestResponse", // synchronous
  Payload: JSON.stringify({ key: "value" }),
}).promise();

javascript

// Asynchronous invocation
await lambda.invoke({
  FunctionName: "my-function",
  InvocationType: "Event", // asynchronous
  Payload: JSON.stringify({ key: "value" }),
}).promise();
// Returns immediately with 202

bash

aws lambda update-function-configuration \
  --function-name my-function \
  --dead-letter-config TargetArn=arn:aws:sqs:us-east-1:123:my-dlq

Q7. What are Lambda's key limits and how do you work around them?

Limit	Value	Workaround
Max execution duration	15 minutes	Use Step Functions for longer workflows
Max memory	10,240 MB (10 GB)	Break into smaller functions
Deployment package (zip)	50 MB direct, 250 MB unzipped	Use Lambda Layers, container images
Container image size	10 GB	Fine for most use cases
/tmp storage	10 GB (increased in 2022)	Use S3 for larger temp files
Concurrency (default)	1,000 per region	Request limit increase
Environment variables	4 KB total	Use SSM Parameter Store or Secrets Manager
Payload (sync invocation)	6 MB request, 6 MB response	Stream response, use S3 for large payloads

Q8. How do you monitor and debug AWS Lambda in production?

javascript

// Structured logging for CloudWatch Insights queries
console.log(JSON.stringify({
  level: "INFO",
  message: "Order processed",
  orderId: event.orderId,
  duration: Date.now() - startTime,
  userId: event.userId,
}));

CloudWatch Metrics: Lambda automatically publishes:

Invocations: total calls.
Duration: execution time (p50, p95, p99).
Errors: function-level errors.
Throttles: invocations rejected due to concurrency limits.
ConcurrentExecutions: peak concurrent instances.

AWS X-Ray: distributed tracing. Add the X-Ray SDK to trace downstream calls (DynamoDB, S3, HTTP calls) and view flame graphs of where time is spent.

javascript

const AWSXRay = require("aws-xray-sdk-core");
const AWS = AWSXRay.captureAWS(require("aws-sdk"));
// All AWS SDK calls now appear in X-Ray traces

Lambda Powertools (Node.js / Python): an AWS-maintained utility library adding structured logging, tracing, and metrics with minimal code.

javascript

const { Logger } = require("@aws-lambda-powertools/logger");
const logger = new Logger({ serviceName: "order-service" });
logger.info("Processing order", { orderId: event.orderId });

Category 2: AWS API Gateway (Q9-Q14)

API Gateway questions test whether you know which API type and integration to reach for, and how authorization and throttling actually work under the hood.

Q9. What is AWS API Gateway and what are the three API types?

Q10. What integration types does API Gateway support?

API Gateway can route requests to different backends depending on the integration type.

json

// Lambda receives this event from API Gateway proxy integration
{
  "httpMethod": "POST",
  "path": "/users",
  "headers": { "Content-Type": "application/json", "Authorization": "Bearer ..." },
  "queryStringParameters": { "include": "profile" },
  "body": "{\"name\":\"Alice\",\"email\":\"alice@example.com\"}",
  "requestContext": { "requestId": "abc-123", "stage": "prod" }
}

HTTP: proxy the request to any publicly routable HTTP endpoint. Useful for putting API Gateway in front of an existing server or third-party service.
AWS Service: directly call an AWS service action (SQS SendMessage, DynamoDB PutItem) without going through Lambda. Reduces latency and cost by removing the Lambda layer.
Mock: return a hardcoded response from API Gateway itself. Useful for mocking endpoints during development or returning maintenance-mode responses.

Q11. How does API Gateway handle authorization?

Three built-in authorization mechanisms:

javascript

// Lambda Authorizer handler
exports.handler = async (event) => {
  const token = event.authorizationToken; // Bearer <jwt>

  try {
    const decoded = jwt.verify(token.split(" ")[1], process.env.JWT_SECRET);
    return {
      principalId: decoded.sub,
      policyDocument: {
        Version: "2012-10-17",
        Statement: [{ Effect: "Allow", Action: "execute-api:Invoke", Resource: event.methodArn }],
      },
      context: { userId: decoded.sub, email: decoded.email },
    };
  } catch {
    throw new Error("Unauthorized");
  }
};

yaml — serverless.yml

# Serverless Framework: HTTP API with JWT authorizer
httpApi:
  authorizers:
    jwtAuthorizer:
      type: jwt
      identitySource: $request.header.Authorization
      issuerUrl: https://cognito-idp.us-east-1.amazonaws.com/us-east-1_XXX
      audience:
        - my-app-client-id

Q12. How does API Gateway throttling work?

API Gateway enforces throttling at two levels.

Account-level throttle: default 10,000 requests/second and 5,000 burst (requests allowed in the first second). Shared across all APIs in the region.

Stage-level and method-level throttle: set per API stage or per individual route/method. Overrides the account default for that specific resource.

When throttled, API Gateway returns 429 Too Many Requests.

Usage Plans (REST API): pair with API keys to set per-customer throttle limits and monthly request quotas. Useful for monetized APIs.

bash

# Set throttle on a specific method via AWS CLI
aws apigateway update-stage \
  --rest-api-id abc123 \
  --stage-name prod \
  --patch-operations \
    op=replace,path=/defaultRouteSettings/throttlingRateLimit,value=1000 \
    op=replace,path=/defaultRouteSettings/throttlingBurstLimit,value=500

Behind the scenes, API Gateway uses a token bucket algorithm. The burst limit is the bucket capacity (tokens available instantly). The rate limit is the refill rate (tokens added per second).

Q13. What are API Gateway stages and how do you use them for deployments?

A stage is a named reference to a specific deployment of your API (e.g., dev, staging, prod). Each stage has its own URL, throttle settings, logging configuration, and stage variables.

text

# Stage URLs
https://abc123.execute-api.us-east-1.amazonaws.com/dev/users
https://abc123.execute-api.us-east-1.amazonaws.com/prod/users

Stage variables work like environment variables for API Gateway. Use them to point different stages to different Lambda function aliases or backends.

text

# Stage variable: lambdaAlias = "dev" in dev stage, "prod" in prod stage
# Lambda integration URI uses the stage variable:
# arn:aws:apigateway:us-east-1:lambda:path/.../functions/${stageVariables.lambdaAlias}/invocations

Canary deployments: route a percentage of traffic to a new stage deployment before full release. If metrics look good, promote the canary. If not, roll back.

bash

aws apigateway create-deployment \
  --rest-api-id abc123 \
  --stage-name prod \
  --canary-settings percentTraffic=10,stageVariableOverrides='{"lambdaAlias":"canary"}'

Custom domains: map your own domain (api.yourcompany.com) to an API Gateway endpoint using Route 53 and ACM certificates, hiding the default execute-api URL.

Q14. What is the difference between REST API and HTTP API in API Gateway?

Feature	REST API	HTTP API
Price	~$3.50/million requests	~$1.00/million requests
JWT authorizers	No (use Lambda authorizer)	Yes (native, no Lambda needed)
Response transformation	Yes (mapping templates)	No
Request validation	Yes	No
Usage plans + API keys	Yes	No
WebSocket	No	No (separate WebSocket API type)
Private integrations (VPC Link)	Yes	Yes
OpenAPI import/export	Yes	Partial
Caching	Yes	No
Latency	Slightly higher	Lower

Choose REST API when you need built-in caching, response transformation via mapping templates, usage plans for monetization, or API keys with per-key throttle controls.

Category 3: AWS S3 (Q15-Q20)

Q15. What is Amazon S3 and what are its core concepts?

Bucket: a container for objects. Bucket names are globally unique across all AWS accounts. Each bucket lives in one AWS region.

Object: a file stored in S3. Consists of the object data (binary) plus metadata (key-value pairs). Maximum object size is 5TB. Objects over 5GB require multipart upload.

text

URL format: https://bucket-name.s3.amazonaws.com/path/to/object
Or regional: https://bucket-name.s3.us-east-1.amazonaws.com/path/to/object

Key S3 capabilities:

Object versioning: keep multiple versions of the same key.
Lifecycle policies: automatically transition or delete objects by age.
Replication: cross-region or same-region replication.
Event notifications: trigger Lambda, SQS, or SNS on object events.
Static website hosting: serve HTML/CSS/JS files as a static site.
Presigned URLs: temporary access to private objects.

Q16. What are S3 storage classes and when do you use each?

S3 offers several storage classes optimized for different access patterns and cost profiles.

S3 Standard: the default. Low latency, high availability (99.99%). Best for frequently accessed data: active user uploads, application assets, frequently read datasets. Most expensive per GB.
S3 Intelligent-Tiering: automatically moves objects between frequent-access, infrequent-access, and archive tiers based on access patterns. Small monthly monitoring fee per object. Best when access patterns are unpredictable.
S3 Standard-IA (Infrequent Access): lower storage cost than Standard, but adds a per-GB retrieval fee. Minimum storage duration of 30 days. Best for data accessed once a month or less: backups, disaster recovery copies.
S3 One Zone-IA: same as Standard-IA but stored in one AZ only. Lower cost but lower durability. Best for secondary backup copies.
S3 Glacier Instant Retrieval: archival storage, millisecond retrieval. Best for quarterly-accessed data (medical images, annual reports).
S3 Glacier Flexible Retrieval: archival storage, retrieval in minutes to hours. Best for backups accessed a few times per year.
S3 Glacier Deep Archive: the cheapest storage class. Retrieval in 12 to 48 hours. Best for regulatory long-term retention (7+ year compliance data).

bash

# Upload directly to a specific storage class
aws s3 cp myfile.zip s3://my-bucket/backups/ \
  --storage-class STANDARD_IA

# Configure lifecycle rule to transition to Glacier after 90 days
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration file://lifecycle.json

Q17. What is S3 versioning and what problems does it solve?

Problems it solves:

Accidental overwrites: restore a previous version with one API call.
Accidental deletions: delete the delete marker to restore the object.
Ransomware protection: an attacker cannot overwrite history.
Audit trail: full history of every object mutation.

bash

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled

# List all versions of an object
aws s3api list-object-versions \
  --bucket my-bucket \
  --prefix images/avatar.jpg

# Restore previous version (copy old version over current)
aws s3api copy-object \
  --bucket my-bucket \
  --copy-source my-bucket/images/avatar.jpg?versionId=OLD_VERSION_ID \
  --key images/avatar.jpg

Combined with lifecycle policies, you can expire old versions automatically to control costs (keep last 5 versions, delete older ones after 90 days).

MFA Delete: requires MFA authentication to permanently delete a versioned object. Extra protection against accidental or malicious permanent deletion.

Q18. What are S3 presigned URLs and when do you use them?

Use cases:

Allow users to download a private file without making the bucket public.
Allow users to upload directly to S3 from a browser, bypassing your server: a PUT presigned URL lets the browser PUT directly to S3 without proxying through your backend.

javascript

// Generate a presigned GET URL (download)
const { S3Client, GetObjectCommand } = require("@aws-sdk/client-s3");
const { getSignedUrl } = require("@aws-sdk/s3-request-presigner");

const client = new S3Client({ region: "us-east-1" });

const url = await getSignedUrl(
  client,
  new GetObjectCommand({ Bucket: "my-bucket", Key: "reports/q1-2026.pdf" }),
  { expiresIn: 3600 } // expires in 1 hour
);
// URL is safe to send to the client, only works for 1 hour

// Generate a presigned PUT URL (upload)
const uploadUrl = await getSignedUrl(
  client,
  new PutObjectCommand({
    Bucket: "my-bucket",
    Key: `uploads/user-${userId}/avatar.jpg`,
    ContentType: "image/jpeg",
  }),
  { expiresIn: 300 } // 5 minutes to start the upload
);
// Client uses this URL to PUT the file directly to S3
// Your backend never touches the file data

The direct-upload pattern (presigned PUT) is the right way to handle large file uploads. It offloads all the bandwidth and processing from your server to S3.

Q19. How do S3 event notifications work with Lambda?

S3 can invoke a Lambda function when specific events happen on a bucket:

ObjectCreated (PUT, POST, COPY, CompleteMultipartUpload).
ObjectRemoved (DELETE).
ObjectRestore (from Glacier).
Replication events.

bash

# 1. Add Lambda permission to allow S3 to invoke it
aws lambda add-permission \
  --function-name image-processor \
  --principal s3.amazonaws.com \
  --statement-id s3-invoke \
  --action lambda:InvokeFunction \
  --source-arn arn:aws:s3:::my-upload-bucket \
  --source-account 123456789012

# 2. Configure the S3 event notification
aws s3api put-bucket-notification-configuration \
  --bucket my-upload-bucket \
  --notification-configuration '{
    "LambdaFunctionConfigurations": [{
      "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123:function:image-processor",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": { "FilterRules": [
          { "Name": "prefix", "Value": "uploads/" },
          { "Name": "suffix", "Value": ".jpg" }
        ]}
      }
    }]
  }'

Lambda receives an S3 event:

javascript

exports.handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    console.log(`Processing: s3://${bucket}/${key}`);
    await generateThumbnail(bucket, key);
  }
};

Q20. What is the difference between an S3 bucket policy and an ACL?

json

// Bucket policy: allow a specific IAM role to read, deny all others
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowReadFromAppRole",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::123456789:role/app-role" },
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ]
    },
    {
      "Sid": "DenyPublicAccess",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": ["arn:aws:s3:::my-bucket/*"],
      "Condition": {
        "Bool": { "aws:SecureTransport": "false" }
      }
    }
  ]
}

Category 4: Docker (Q21-Q32)

Q21. What is Docker and how does it differ from a virtual machine?

Docker is a platform for packaging, distributing, and running applications in containers. A container is an isolated process running on the host OS, packaged with all its dependencies.

The key difference from a virtual machine:

VM: runs a full guest OS on top of a hypervisor. Each VM has its own kernel, virtual hardware, and OS installation. Boot time is measured in minutes, and overhead is gigabytes of memory per VM.

Container: shares the host OS kernel. Only packages the application and its user-space dependencies. Boot time is measured in milliseconds, and overhead is megabytes.

Practical result: you can run dozens of containers on a machine where only 3 to 4 VMs would fit. Containers also start in milliseconds, making them ideal for scaling and CI/CD.

Q22. Explain Docker's architecture (client, daemon, containerd, registry).

Docker uses a client-server architecture.

Docker CLI (client): the command-line tool you interact with. Sends REST API requests to the Docker daemon via a Unix socket (/var/run/docker.sock).

Docker daemon (dockerd): the long-running background service. Manages images, containers, networks, and volumes. Delegates container lifecycle management to containerd.

runc: the low-level container runtime that containerd calls to actually spawn containers using Linux namespaces and cgroups.

text

CLI --> dockerd (REST API) --> containerd --> runc --> container process
                               |
                               --> registry (pull/push images)

Q23. What is the difference between a Docker image and a container?

Image: a read-only, immutable template. Built from a Dockerfile. Consists of stacked layers, each representing a filesystem change. An image is like a class in OOP.

text

Image layers (read-only):
  Layer 3: COPY . /app
  Layer 2: RUN npm install
  Layer 1: FROM node:22-alpine

Container:
  Writable layer (container-specific changes)
  [Image layers below: shared, read-only]

Multiple containers can run from the same image simultaneously. They all share the same read-only image layers (saving disk space) but each has its own writable layer.

bash

# Image vs container commands
docker images              # list images
docker build -t myapp .    # build image from Dockerfile
docker rmi myapp           # remove image

docker ps                  # list running containers
docker ps -a               # list all containers (including stopped)
docker run myapp           # create and start a container from image
docker rm my-container     # remove stopped container

Q24. How do you write an efficient Dockerfile?

A good Dockerfile produces a small, fast-to-build, secure image.

dockerfile

# Use a specific version tag, never just "latest"
FROM node:22-alpine

# Set working directory
WORKDIR /app

# Copy dependency files FIRST (before source code)
# This way the npm install layer is cached as long as package.json doesn't change
COPY package.json package-lock.json ./

# Install dependencies
RUN npm ci --only=production

# Copy source code AFTER installing dependencies
COPY src/ ./src/

# Run as non-root user for security
USER node

# Document the port (does not actually expose, use docker run -p or compose)
EXPOSE 3000

# Use ENTRYPOINT + CMD pattern
# ENTRYPOINT: the executable
# CMD: default arguments (can be overridden at runtime)
ENTRYPOINT ["node"]
CMD ["src/server.js"]

Key best practices:

Order layers from least-to-most frequently changed (dependencies before source).
Use .dockerignore to exclude node_modules, .git, test files, and README.
Use alpine or distroless base images to minimize attack surface and size.
Never RUN apt-get in multiple separate RUN commands: chain them with && to avoid creating unnecessary intermediate layers.
Never store secrets in ENV variables or COPY .env files: use secrets at runtime instead.
Run as non-root with the USER directive or a dedicated user.

Q25. What is Docker layer caching and how does it affect build speed?

This means layer order matters for build speed.

dockerfile

# BAD: source code changes every build, invalidates npm install cache
FROM node:22-alpine
WORKDIR /app
COPY . .            # copies everything including package.json AND source code
RUN npm ci          # runs every time ANY file changes

# GOOD: separate dependency install from source code copy
FROM node:22-alpine
WORKDIR /app
COPY package*.json ./   # only package files
RUN npm ci              # cached as long as package.json is unchanged
COPY . .                # source code, cache miss here is cheap

In the GOOD example, if you change only a source file, Docker reuses the npm ci layer (which is slow) and only re-executes the COPY . . layer (fast).

bash

# Build with no cache (force full rebuild)
docker build --no-cache -t myapp .

# See image layers and their sizes
docker history myapp
docker inspect myapp

Q26. What is Docker Compose and when do you use it?

yaml — docker-compose.yml

services:
  api:
    build: .
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgres://user:password@db:5432/myapp
      - REDIS_URL=redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_started
    volumes:
      - ./src:/app/src  # mount source code for hot reload in dev

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: myapp
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d myapp"]
      interval: 5s
      timeout: 5s
      retries: 5

  cache:
    image: redis:7-alpine
    ports:
      - "6379:6379"

volumes:
  postgres_data:

bash

docker compose up -d         # start all services in background
docker compose logs -f api   # follow logs for the api service
docker compose down          # stop and remove containers and networks
docker compose down -v       # also remove named volumes

Docker Compose is ideal for local development and CI/CD test environments. In production, use Kubernetes or ECS for orchestration.

Q27. What Docker network drivers exist and when do you use each?

Docker ships five built-in network drivers.

bash

# Create a custom bridge network (recommended over default bridge)
docker network create my-network
docker run --network my-network --name api myapp
docker run --network my-network --name db postgres
# api can reach db at hostname "db"

host: the container shares the host's network stack directly. No NAT, no port mapping needed. Best performance, no network isolation. Use for performance-sensitive workloads where networking overhead matters.
overlay: spans multiple Docker hosts. Required for Docker Swarm multi-host communication. Containers on different machines communicate as if on the same network. Used in Docker Swarm clusters.
macvlan: assigns a MAC address to the container, making it appear as a physical device on the network. Used for legacy applications that expect to be on the physical network.
none: complete network isolation. The container has only a loopback interface, with no external communication.

Q28. What is the difference between Docker volumes and bind mounts?

Bind mounts: mount a specific host directory or file into the container. The host path must exist. Commonly used in development to mount source code for hot-reloading.

bash

# Volume (Docker-managed)
docker run -v postgres_data:/var/lib/postgresql/data postgres:16
docker volume ls
docker volume inspect postgres_data

# Bind mount (host directory mounted)
docker run -v $(pwd)/src:/app/src myapp  # source code hot reload
# Or named syntax:
docker run --mount type=bind,source=$(pwd)/src,target=/app/src myapp

tmpfs mount: stored in host memory only, never written to disk. For temporary sensitive data (secrets, temp files) that must not persist.

Production rule: use volumes for databases and persistent state. Use bind mounts only in development for code mounting. Never bind-mount secrets.

Q29. What is a multi-stage Docker build and why is it important?

A multi-stage build uses multiple FROM instructions in a single Dockerfile. Each stage can use a different base image. Only the final stage becomes the shipped image, earlier stages are discarded.

This solves the "fat build image" problem: build tools (compilers, package managers, test frameworks) needed to build the app are not needed to run it.

dockerfile

# Stage 1: build
FROM node:22 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci                 # includes devDependencies
COPY . .
RUN npm run build          # compile TypeScript, bundle assets
RUN npm test               # run tests during build

# Stage 2: production image
FROM node:22-alpine AS production
WORKDIR /app

# Only copy what is needed to RUN the app
COPY package*.json ./
RUN npm ci --only=production  # production deps only
COPY --from=builder /app/dist ./dist  # compiled output only

USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]

The production image contains only the Alpine Node.js runtime, production dependencies, and compiled output. The node:22 build environment (3x larger) is discarded.

Result: a production image might be 80MB instead of 800MB. Smaller images mean faster push and pull times, a smaller attack surface, and lower ECR storage costs.

Q30. How do you debug a crashing Docker container?

bash

# Step 1: Check container status and exit code
docker ps -a
# Exit code 0: clean exit | 1: error | 137: OOMKilled (out of memory) | 143: SIGTERM

# Step 2: Check logs
docker logs my-container
docker logs --tail 100 my-container   # last 100 lines
docker logs -f my-container           # follow (stream)

# Step 3: Inspect container config
docker inspect my-container
# Look for: environment variables, mount points, network config, exit code

# Step 4: Shell into a running container
docker exec -it my-container sh    # or bash if available
docker exec -it my-container env   # print environment variables

# Step 5: Start the container with shell override (if it crashes on start)
docker run -it --entrypoint sh my-image
# Manually run the start command to see errors

# Step 6: Check resource usage
docker stats my-container          # CPU, memory, network I/O

# Step 7: For OOMKilled (exit 137)
docker inspect my-container --format='{{.HostConfig.Memory}}'
# Increase memory limit in run command or compose file

Common causes of container crashes:

Application crash at startup, caused by a misconfigured env var or missing dependency.
OOM kill (exit 137): increase the memory limit.
Port already in use: check what is bound to the host port.
Volume mount issue: the path does not exist, or there's a permissions problem.
Health check failure: the container fails the configured health check threshold.

Q31. What are Docker container security best practices?

Run as non-root:

dockerfile

# Create a user in the Dockerfile
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

Use minimal base images:

dockerfile

# Distroless: no shell, no package manager, minimal attack surface
FROM gcr.io/distroless/nodejs22-debian12

Scan images for vulnerabilities:

bash

docker scout cve myapp:latest  # Docker Scout
trivy image myapp:latest       # Trivy (free, widely used)

Use a read-only filesystem:

bash

docker run --read-only myapp
# Application can still write to explicitly defined tmpfs mounts

Drop capabilities:

bash

docker run --cap-drop ALL --cap-add NET_BIND_SERVICE myapp

Never run in privileged mode in production:

bash

# Bad: gives container root-level host access
docker run --privileged myapp

# Good: only specific needed capability
docker run --cap-add SYS_PTRACE myapp

Limit resources:

bash

docker run --memory 512m --cpus 0.5 myapp

Use Docker Content Trust for image signing and verification in production pipelines.

Q32. What is the difference between Docker Swarm and Kubernetes?

Both are container orchestration platforms that manage clusters of containers across multiple hosts.

Key differences:

Setup: Swarm initializes in minutes; Kubernetes requires significant configuration (or use a managed service).
Scaling: both auto-scale, but Kubernetes has more control (HPA, VPA, KEDA).
Networking: Kubernetes has a richer networking model (Ingress, NetworkPolicy, CNI plugins).
Storage: Kubernetes has more storage options (PersistentVolumes, StorageClasses).
Ecosystem: Kubernetes has a vastly larger ecosystem and community.

Category 5: Microservices (Q33-Q42)

Q33. What is a microservices architecture and how does it differ from monolith?

Benefits:

Independent deployment: deploy the payment service without redeploying orders.
Independent scaling: scale the image processing service 10x without scaling auth.
Technology flexibility: each service can use the right language and database.
Team autonomy: each team owns and operates their service end to end.
Fault isolation: a crash in the notification service does not take down checkout.

Drawbacks:

Distributed system complexity: network failures, latency, partial failures.
Data consistency: no simple ACID transaction across service boundaries.
Operational overhead: more services means more things to monitor, deploy, and scale.
Debugging is harder: distributed tracing required across service boundaries.

Q34. How do microservices communicate with each other?

Two communication patterns: synchronous (request-response) and asynchronous (message-based).

Synchronous: the caller waits for a response.

REST over HTTP/HTTPS: simple, widely understood, human-readable. Uses HTTP methods (GET, POST, PUT, DELETE). Good for CRUD operations and request-response where the caller needs an immediate answer.
gRPC: binary protocol using Protocol Buffers. Faster than REST (roughly 7x), strongly typed contracts via .proto files, bidirectional streaming. Good for high-frequency inter-service calls, streaming data, and polyglot environments.
GraphQL: flexible query language. The client specifies exact data needed. Good for API aggregation layers and mobile clients with varied data needs.

Asynchronous: the caller publishes a message and continues.

Message queues (SQS, RabbitMQ): point-to-point. One producer, one consumer. Good for task queues, work distribution, and decoupling producer from consumer.
Event streaming (Kafka, Kinesis): one producer, many consumers. Events are retained and replayable. Good for event sourcing, real-time analytics, and decoupled event-driven architectures.
Publish/Subscribe (SNS, Redis Pub/Sub): publisher sends to a topic, multiple subscribers receive. Good for broadcasting events to multiple services.

Q35. What is service discovery and why does it matter in microservices?

Two patterns:

AWS App Mesh and Consul Connect are service mesh solutions that add service discovery plus mTLS, circuit breaking, and observability as a sidecar proxy, without changing application code.

Q36. What is the Circuit Breaker pattern and how does it work?

Three states:

Closed (normal): requests flow through. Success and failure rates are tracked.
Open (tripped): failure threshold exceeded. All requests fail immediately without contacting the downstream service. Returns cached data or a fallback response. Gives the failing service time to recover.
Half-Open (recovery check): after a timeout, a limited number of test requests are allowed through. If they succeed, the circuit closes. If they fail, it opens again.

javascript

// Example using opossum (Node.js circuit breaker library)
const CircuitBreaker = require("opossum");

const options = {
  timeout: 3000,                  // fail if takes longer than 3s
  errorThresholdPercentage: 50,   // open if more than 50% fail
  resetTimeout: 30000,            // try again after 30s
};

const breaker = new CircuitBreaker(callPaymentService, options);

breaker.fallback(() => ({ status: "deferred", message: "Payment queued for retry" }));

breaker.on("open", () => logger.warn("Payment service circuit OPEN"));
breaker.on("halfOpen", () => logger.info("Payment service circuit HALF-OPEN"));
breaker.on("close", () => logger.info("Payment service circuit CLOSED"));

// Usage
const result = await breaker.fire(paymentData);

Q37. What is the Saga pattern and when do you use it?

Two Saga implementations:

Choreography: each service publishes events. Downstream services listen and react. No central coordinator.

text

OrderService:  "OrderCreated" event -->
PaymentService: charges card, publishes "PaymentCompleted" or "PaymentFailed"
InventoryService: listens for PaymentCompleted, reserves stock
NotificationService: listens for both, sends email
If payment fails: OrderService listens for PaymentFailed, cancels the order

Orchestration: a central Saga orchestrator sends commands to each service and waits for responses. Step Functions (AWS) is a managed orchestrator.

text

Orchestrator:  Command "ChargeCard" --> PaymentService
PaymentService: "PaymentCompleted" --> Orchestrator
Orchestrator:  Command "ReserveStock" --> InventoryService
InventoryService: "StockInsufficient" --> Orchestrator
Orchestrator:  Command "RefundCard" --> PaymentService (compensating transaction)

Choreography is simpler for small flows. Orchestration is easier to reason about and debug for complex multi-step workflows.

Q38. What is event sourcing?

Event sourcing is a pattern where instead of storing the current state of an entity, you store the sequence of events that led to that state. The current state is derived by replaying all events.

text

Traditional: users table has row { id: 1, email: "bob@new.com", balance: 850 }

Event sourcing: events table has:
  { eventId: 1, type: "UserCreated", data: { email: "bob@old.com" } }
  { eventId: 2, type: "EmailChanged", data: { email: "bob@new.com" } }
  { eventId: 3, type: "Deposit",       data: { amount: 1000 } }
  { eventId: 4, type: "Withdrawal",    data: { amount: 150 } }

Current state = replay events 1-4: { email: "bob@new.com", balance: 850 }

Benefits:

Complete audit trail by default.
Time travel: reconstruct state at any point in history.
Event replay: rebuild read models, fix bugs by replaying events.
Natural fit for event-driven architectures.

Drawbacks:

Querying current state requires replaying events (use projections or read models).
Event schema changes require migration strategies.
Increased complexity for simple CRUD use cases.

Event sourcing is commonly paired with CQRS (Command Query Responsibility Segregation), where writes go through events and reads come from optimized read models (materialized projections).

Q39. What is distributed tracing and which tools implement it?

Without distributed tracing, debugging a slow request in a 20-service architecture is nearly impossible: the log is split across 20 services with no shared request ID.

Key concepts:

Trace: the complete journey of one request through all services.
Span: a single unit of work within a trace (one service call).
Trace ID: a unique ID propagated through all HTTP headers so spans from different services can be linked.
Parent Span ID: links child spans to their parent.

OpenTelemetry is the vendor-neutral standard for generating traces, metrics, and logs. Instrument your service once, export to any backend.

javascript

// OpenTelemetry instrumentation (Node.js)
const { NodeSDK } = require("@opentelemetry/sdk-node");
const { OTLPTraceExporter } = require("@opentelemetry/exporter-trace-otlp-http");

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: "http://collector:4318/v1/traces" }),
});
sdk.start();
// Now all HTTP calls and database queries are auto-instrumented

Popular backends: Jaeger (open source), Zipkin (open source), AWS X-Ray (native AWS), Datadog APM, Honeycomb, and Grafana Tempo.

Q40. What is a service mesh and when do you need one?

Capabilities a service mesh provides:

mTLS between all services (automatic encryption and mutual authentication).
Circuit breaking and retry policies.
Traffic splitting (canary deployments, A/B testing).
Observability (automatic metrics and traces for all service-to-service calls).
Service discovery.

Popular service meshes: Istio, Linkerd, Consul Connect, and AWS App Mesh.

When you need a service mesh:

Zero-trust security: every service-to-service call must be encrypted and authenticated, and you cannot implement this in every service's code.
Observability across 20+ services without adding SDK code everywhere.
Advanced traffic management (canaries, dark launches) at the platform level.

When you probably do NOT need it:

Fewer than 10 services.
The team does not have Kubernetes expertise to operate Istio or Linkerd.
The complexity of operating the mesh exceeds the benefit.

Start without a service mesh. Add it when you have a clear, specific pain point, usually around security or observability at scale.

Q41. What is the 12-factor app methodology?

The 12-factor app is a methodology for building cloud-native, scalable, maintainable software-as-a-service applications. Relevant factors for microservices interviews:

III. Config: store config in environment variables, not in code or config files checked into source control. No hardcoded URLs, credentials, or environment-specific values.
IV. Backing services: treat databases, queues, and caches as attached resources accessed via URL. Swapping a local Postgres for a managed RDS instance should require only a config change.
VI. Processes: execute the app as one or more stateless processes. Shared state lives in a backing service (Redis, database), not in process memory. This makes horizontal scaling trivial.
VII. Port binding: export services via port binding. The service is self-contained and does not rely on a web server injection. Works naturally with Docker and Lambda.
IX. Disposability: maximize robustness with fast startup and graceful shutdown. Handle SIGTERM, drain in-flight requests, release resources. Enables zero-downtime deployments and auto-scaling.
XI. Logs: treat logs as event streams. Write to stdout. Let the platform (Docker, Kubernetes, CloudWatch) collect and route them.

Q42. How do you handle authentication and authorization across microservices?

Two patterns dominate.

text

Client -> API Gateway -> (validates JWT) -> Order Service (validates JWT) -> Payment Service
                                            [extracts userId from JWT claims]

The API Gateway or service mesh validates the token. Services trust the validated identity propagated in headers (X-User-ID, X-User-Roles).

Service-to-service auth: services calling each other must also authenticate.

Short-lived service JWTs signed with service-specific keys.
Mutual TLS (mTLS) via a service mesh, with no application code changes.
AWS IAM roles and SigV4 signing for AWS-native architectures (Lambda calling Lambda, ECS service calling DynamoDB).

Authorization: each service enforces its own authorization rules based on the user identity in the token. Central policy enforcement (Open Policy Agent, AWS IAM) handles complex permission models.

The key principle: never pass usernames and passwords between services. Use short-lived tokens. Rotate signing keys regularly.

Category 6: Azure Entra ID (Q43-Q50)

Q43. What is Microsoft Azure Entra ID (formerly Azure Active Directory)?

Core functions:

Authentication: verify who a user or application is (login).
Authorization: control what an authenticated identity can access.
Single Sign-On (SSO): users log in once and access many applications.
Multi-Factor Authentication (MFA): an extra verification step.
Conditional Access: a policy engine that controls access based on context, such as location, device compliance, and risk level.
Application management: register apps and define their permissions.

Q44. What is the difference between an App Registration and a Service Principal?

bash

# Create an app registration via Azure CLI
az ad app create \
  --display-name "my-backend-api" \
  --sign-in-audience AzureADMyOrg

# Get the app's Service Principal object ID
az ad sp show --id <app-id>

# Create a client secret for the app
az ad app credential reset \
  --id <app-id> \
  --append \
  --display-name "ci-cd-secret"

Q45. What are the OAuth 2.0 flows supported by Entra ID and when do you use each?

Entra ID supports several OAuth 2.0 and OpenID Connect flows. Picking the right one depends on whether a user is present and whether the client can keep a secret.

Authorization Code Flow: the standard flow for web apps where a user interactively logs in. The browser redirects to Entra ID, the user authenticates, Entra returns an authorization code, and the backend exchanges the code for tokens. Use for web applications where a human is present.
Authorization Code + PKCE: an extension of Authorization Code for public clients, such as single-page apps and mobile apps, that cannot securely store a client secret. PKCE (Proof Key for Code Exchange) replaces the client secret. Use for SPAs (React, Angular), mobile apps, and desktop apps.
Client Credentials Flow: the application authenticates directly with Entra ID using its own credentials (client ID plus secret or certificate). No user is involved, and the flow returns an access token for the application itself. Use for daemon processes, background services, CI/CD pipelines, and service-to-service calls.
On-Behalf-Of (OBO): a middle-tier API receives a user token and exchanges it for a token scoped to a downstream API, preserving the user's identity. Use for API-to-API calls where the user's identity must propagate downstream.
Device Code Flow: for devices with no browser or limited input capability. The device displays a code, and the user enters it on another device to approve. Use for CLI tools, IoT devices, and TV apps.

python

# Client credentials flow (Python MSAL)
import msal

app = msal.ConfidentialClientApplication(
    client_id="<app-id>",
    client_credential="<client-secret>",
    authority="https://login.microsoftonline.com/<tenant-id>"
)

result = app.acquire_token_for_client(
    scopes=["https://graph.microsoft.com/.default"]
)
access_token = result["access_token"]

Q46. What are Managed Identities in Azure and why are they preferred?

There are two types:

System-assigned managed identity: tied to a specific Azure resource, such as a VM, App Service, or Azure Function. Created with the resource and deleted with the resource, in a one-to-one relationship.
User-assigned managed identity: created as a standalone Azure resource. Can be assigned to multiple Azure resources and has an independent lifecycle.

bash

# Enable system-assigned managed identity on an App Service
az webapp identity assign \
  --resource-group my-rg \
  --name my-api

# Grant that identity permission to read from Key Vault
az keyvault set-policy \
  --name my-keyvault \
  --object-id <managed-identity-principal-id> \
  --secret-permissions get list

python

# In application code, no credentials needed.
# The Azure SDK automatically acquires tokens using the managed identity.
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

credential = DefaultAzureCredential()
client = SecretClient(vault_url="https://my-keyvault.vault.azure.net/", credential=credential)
secret = client.get_secret("database-password")

Why managed identities are preferred over client secrets:

No secret to rotate, store, or accidentally leak to source control.
Reduced attack surface, since there are no static credentials that can be stolen.
Automatic credential rotation by Azure.
Works seamlessly with Key Vault, Storage, SQL, Service Bus, and most Azure services.
Full audit trail via Entra ID sign-in logs.

Q47. What is Conditional Access in Entra ID?

Policy structure: if a set of conditions is met, then grant or block controls apply.

Conditions include:

User or group membership.
The application being accessed.
Sign-in risk level, detected by Identity Protection (low, medium, high).
Device compliance, such as Intune-managed or hybrid joined.
Location: named locations, IP ranges, or countries.
Client app: browser, mobile app, or legacy auth.

Controls include:

Block access.
Require MFA.
Require a compliant device.
Require a hybrid Azure AD joined device.
Require an approved client app.
Require a password change.

text

Example policy: "Require MFA for all admin portal access from outside the corporate network"

Conditions:
  Users: Admins group
  Application: Azure Management Portal
  Location: Exclude corporate IP range

Controls: Require MFA

Common use cases:

Require MFA for all external access.
Block legacy authentication protocols that do not support MFA.
Require compliant devices for accessing sensitive apps.
Block access from high-risk sign-ins automatically.

Q48. What is the difference between Azure RBAC and Entra ID roles?

bash

# Grant a service principal Contributor access to a resource group
az role assignment create \
  --assignee <service-principal-object-id> \
  --role "Contributor" \
  --resource-group my-resource-group

# Grant read access to a specific storage account
az role assignment create \
  --assignee <user-or-sp-object-id> \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<account>

Managing Azure infrastructure (VMs, storage, networking): use Azure RBAC.
Managing users, groups, app registrations, and Conditional Access: use Entra ID roles.
Accessing Azure resources from an application: use Azure RBAC on a managed identity.

Q49. How does Single Sign-On (SSO) work in Entra ID?

SSO lets users authenticate once with Entra ID and access multiple applications without re-entering credentials. Entra ID supports three SSO protocols.

OpenID Connect (OIDC): modern, token-based. Entra ID returns an ID token (user identity) and an access token (API access). Best for new cloud-native apps.
SAML 2.0: an XML-based federation standard, common for enterprise SaaS apps such as Salesforce and ServiceNow. Entra ID acts as the Identity Provider (IdP) and the app is the Service Provider (SP). No passwords are exchanged: assertions are signed XML documents.
Password-based SSO: Entra ID stores credentials for apps that do not support federated SSO, acting as a browser extension credential vault. Treat this as a last resort.

SSO session flow (OIDC):

User accesses App A while not logged in, and is redirected to the Entra ID login page.
User authenticates with credentials and MFA.
Entra ID sets a session cookie and issues tokens for App A.
User accesses App B; the browser sends the Entra ID session cookie.
Entra ID validates the existing session and issues tokens for App B without requiring re-authentication.

Token lifetime: access tokens default to 1 hour. Refresh tokens allow silent re-authentication for around 90 days before requiring an interactive login.

Q50. What is Privileged Identity Management (PIM) in Entra ID?

How it works:

An administrator marks a user as eligible for the Global Administrator role.
The user has no admin access by default.
When needed, the user activates the role via PIM, providing justification and optionally MFA or manager approval.
The user has admin access for a configurable time window, typically 1 to 8 hours.
Access expires automatically, and all activation requests are logged.

Benefits:

Reduces attack surface: stolen credentials for a non-admin account cannot immediately be used for admin tasks.
Requires justification for every privileged action.
Provides a full audit trail of who used which privileged role, when, and why.
Supports approval workflows for sensitive roles.

bash

# Check active PIM role assignments
az role assignment list --include-classic-administrators

# Via Microsoft Graph (list eligible assignments)
GET /roleManagement/directory/roleEligibilitySchedules
    ?$filter=principalId eq '{user-id}'

Quick Reference: All 50 Questions at a Glance

Use this table to scan every question and its core concept in one pass. It's the fastest way to spot the topics you need to revisit before an interview.

#	Question	Core concept
Q1	What is serverless computing	No server management, pay-per-execution, trade-offs
Q2	How does Lambda work	Execution environment lifecycle, handler, warm/cold start
Q3	Lambda cold starts and prevention	Causes, provisioned concurrency, SnapStart, runtime choice
Q4	Lambda concurrency types	Unreserved vs reserved vs provisioned
Q5	Lambda Layers	Shared dependencies, /opt directory, 5 max layers
Q6	Synchronous vs asynchronous invocation	Caller waits vs fire-and-forget, DLQ, retries
Q7	Lambda limits	15-minute timeout, 10GB memory, 1,000 concurrency, 6MB payload
Q8	Lambda monitoring and debugging	CloudWatch Logs, Metrics, X-Ray, Powertools
Q9	API Gateway and the three API types	REST vs HTTP vs WebSocket
Q10	API Gateway integration types	Lambda proxy, HTTP, AWS Service, Mock
Q11	API Gateway authorization	IAM, Lambda Authorizer, JWT Authorizer
Q12	API Gateway throttling	Token bucket, account/stage limits, 429 response
Q13	API Gateway stages and deployments	Stage variables, canary deployments, custom domains
Q14	REST API vs HTTP API	Price, features, JWT support, caching
Q15	S3 core concepts	Bucket, object, key, URL format
Q16	S3 storage classes	Standard, IA, Glacier, Intelligent-Tiering, Deep Archive
Q17	S3 versioning	Multiple versions, delete markers, MFA Delete
Q18	Presigned URLs	Temporary access, GET and PUT, direct upload pattern
Q19	S3 event notifications and Lambda	Event types, permission model, structured trigger event
Q20	Bucket policy vs ACL	JSON resource policies vs legacy ACLs, Block Public Access
Q21	Docker vs virtual machines	Shared kernel vs full OS, boot time, resource use
Q22	Docker architecture	CLI, dockerd, containerd, runc, registry
Q23	Docker image vs container	Read-only template vs running instance, writable layer
Q24	Writing an efficient Dockerfile	Layer order, dependencies before code, non-root, .dockerignore
Q25	Docker layer caching	Cache invalidation order, build speed optimization
Q26	Docker Compose	Multi-service YAML, depends_on, healthcheck, volumes
Q27	Docker network drivers	bridge, host, overlay, macvlan, none
Q28	Volumes vs bind mounts	Docker-managed vs host path, production vs dev
Q29	Multi-stage Docker builds	Fat build vs lean production image, --from copy
Q30	Debugging a crashing container	ps -a, logs, inspect, exec -it, exit codes
Q31	Container security best practices	Non-root, minimal image, scan, read-only FS, cap-drop
Q32	Docker Swarm vs Kubernetes	Simplicity vs ecosystem, production scale
Q33	Monolith vs microservices	Deployment, scaling, team autonomy, trade-offs
Q34	Inter-service communication	REST, gRPC, message queues, event streaming
Q35	Service discovery	Client-side vs server-side, Consul, Kubernetes DNS
Q36	Circuit Breaker pattern	Closed, Open, Half-Open states, opossum library
Q37	Saga pattern	Distributed transactions, choreography vs orchestration
Q38	Event sourcing	Events as the source of truth, replay, projections
Q39	Distributed tracing	Trace, span, OpenTelemetry, Jaeger, X-Ray
Q40	Service mesh	Sidecar proxy, mTLS, traffic management, Istio
Q41	12-factor app methodology	Config, stateless processes, port binding, disposability, logs
Q42	Auth across microservices	JWT propagation, mTLS, service-to-service tokens
Q43	What is Azure Entra ID	IAM service, authentication, SSO, Conditional Access
Q44	App Registration vs Service Principal	Blueprint vs instance, multi-tenant model
Q45	OAuth 2.0 flows in Entra ID	Auth Code, PKCE, Client Credentials, OBO, Device Code
Q46	Managed Identities	Auto-managed credentials, system vs user-assigned
Q47	Conditional Access	Policy engine, signals, grant/block controls
Q48	Azure RBAC vs Entra ID roles	Resource access vs directory access
Q49	Single Sign-On (SSO)	OIDC, SAML, session cookies, token lifetime, CAE
Q50	Privileged Identity Management (PIM)	Just-in-time access, eligible vs active, audit trail

Frequently Asked Questions

What level of cloud and DevOps knowledge do these 50 questions target?

How does AWS Lambda compare to running containers on ECS or Kubernetes?

Both run your code without you managing physical servers, but the operational model and cost profile differ significantly.

	AWS Lambda	ECS / Kubernetes
Billing	Per invocation and execution time	Per running instance, regardless of traffic
Scaling	Automatic, near-instant, to zero	Configured autoscaling, rarely to zero
Max runtime	15 minutes per invocation	Unbounded, long-running processes are fine
Cold starts	Yes, mitigated with provisioned concurrency (Q3)	No, containers stay warm
Best fit	Event-driven, bursty, API backends	Steady high-traffic services, persistent connections

How can I practice these AWS, Docker, and Azure concepts before an interview?

Most of these concepts can be tested locally without an AWS or Azure bill, using Docker Desktop, the AWS Free Tier, and LocalStack to emulate AWS services.

bash

# Run LocalStack to emulate S3, Lambda, and API Gateway locally
docker run -d -p 4566:4566 --name localstack localstack/localstack

# Create a bucket against the local endpoint
aws --endpoint-url=http://localhost:4566 s3 mb s3://my-test-bucket

# Build and run a multi-stage Dockerfile from Q29 locally
docker build -t myapp .
docker run -p 3000:3000 myapp

How do I generate an S3 presigned URL for a file upload?

Use the AWS SDK's request presigner to generate a time-limited PUT URL, then have the client upload directly to S3 without the file passing through your backend. This is the pattern covered in Q18.

javascript

const { S3Client, PutObjectCommand } = require("@aws-sdk/client-s3");
const { getSignedUrl } = require("@aws-sdk/s3-request-presigner");

const client = new S3Client({ region: "us-east-1" });

const uploadUrl = await getSignedUrl(
  client,
  new PutObjectCommand({
    Bucket: "my-bucket",
    Key: `uploads/user-${userId}/avatar.jpg`,
    ContentType: "image/jpeg",
  }),
  { expiresIn: 300 } // 5 minutes to start the upload
);
// Send uploadUrl to the client; it PUTs the file directly to S3

What happens when a Lambda function hits its concurrency limit?

For asynchronous invocations (S3, SNS, EventBridge), Lambda queues the event and retries automatically once the throttle clears, up to its retry policy.
Reserved concurrency (Q4) can make this worse if set too low for a function's real traffic, since it hard-caps that function even when the account has unused capacity.
Provisioned concurrency (Q3) does not prevent throttling on its own. It only keeps a fixed number of environments warm; traffic above that number still scales through the normal (cold-start-prone) path unless reserved concurrency is also raised.

The fix is almost always to request a concurrency limit increase for the account or function, add a queue (SQS) in front of the function to smooth bursts, or both.

nodejs

30 Node.js Interview Questions and Answers (2026)

30 Node.js interview questions with full answers: event loop, streams, clustering, worker threads, memory leaks, and security. Updated for 2026.

Jun 8, 202626 min read

nodejs

30 NestJS Interview Questions and Answers (2026)

30 NestJS interview questions with full answers: modules, DI, guards, pipes, interceptors, JWT auth, microservices, and testing. Updated for 2026.

Jun 8, 202624 min read

databases

42 NoSQL Database Interview Questions and Answers (2026)

42 NoSQL interview questions covering MongoDB, Redis, and DynamoDB: aggregation pipelines, data structures, GSI vs LSI, and CAP theorem. Updated for 2026.

Jun 10, 202637 min read

Category 1: Serverless and AWS Lambda (Q1-Q8)

Q1. What is serverless computing and what problem does it solve?

Q2. How does AWS Lambda work? Explain the execution model.

Q3. What is a Lambda cold start and how do you reduce it?

Q4. What is Lambda concurrency and what are the two types?

Q5. What are Lambda Layers and when do you use them?

Q6. What are the two Lambda invocation types and how does error handling differ?

Q7. What are Lambda's key limits and how do you work around them?

Q8. How do you monitor and debug AWS Lambda in production?

Category 2: AWS API Gateway (Q9-Q14)

Q9. What is AWS API Gateway and what are the three API types?

Q10. What integration types does API Gateway support?

Q11. How does API Gateway handle authorization?

Q12. How does API Gateway throttling work?

Q13. What are API Gateway stages and how do you use them for deployments?

Q14. What is the difference between REST API and HTTP API in API Gateway?

Category 3: AWS S3 (Q15-Q20)

Q15. What is Amazon S3 and what are its core concepts?

Q16. What are S3 storage classes and when do you use each?

Q17. What is S3 versioning and what problems does it solve?

Q18. What are S3 presigned URLs and when do you use them?

Q19. How do S3 event notifications work with Lambda?

Q20. What is the difference between an S3 bucket policy and an ACL?

Category 4: Docker (Q21-Q32)

Q21. What is Docker and how does it differ from a virtual machine?

Q22. Explain Docker's architecture (client, daemon, containerd, registry).

Q23. What is the difference between a Docker image and a container?

Q24. How do you write an efficient Dockerfile?

Q25. What is Docker layer caching and how does it affect build speed?

Q26. What is Docker Compose and when do you use it?

Q27. What Docker network drivers exist and when do you use each?

Q28. What is the difference between Docker volumes and bind mounts?

Q29. What is a multi-stage Docker build and why is it important?

Q30. How do you debug a crashing Docker container?

Q31. What are Docker container security best practices?

Q32. What is the difference between Docker Swarm and Kubernetes?

Category 5: Microservices (Q33-Q42)

Q33. What is a microservices architecture and how does it differ from monolith?

Q34. How do microservices communicate with each other?

Q35. What is service discovery and why does it matter in microservices?

Q36. What is the Circuit Breaker pattern and how does it work?

Q37. What is the Saga pattern and when do you use it?

Q38. What is event sourcing?

Q39. What is distributed tracing and which tools implement it?

Q40. What is a service mesh and when do you need one?

Q41. What is the 12-factor app methodology?

Q42. How do you handle authentication and authorization across microservices?

Category 6: Azure Entra ID (Q43-Q50)

Q43. What is Microsoft Azure Entra ID (formerly Azure Active Directory)?

Q44. What is the difference between an App Registration and a Service Principal?

Q45. What are the OAuth 2.0 flows supported by Entra ID and when do you use each?

Q46. What are Managed Identities in Azure and why are they preferred?

Q47. What is Conditional Access in Entra ID?

Q48. What is the difference between Azure RBAC and Entra ID roles?

Q49. How does Single Sign-On (SSO) work in Entra ID?

Q50. What is Privileged Identity Management (PIM) in Entra ID?

Quick Reference: All 50 Questions at a Glance

Frequently Asked Questions

Related Articles

30 Node.js Interview Questions and Answers (2026)

30 NestJS Interview Questions and Answers (2026)

42 NoSQL Database Interview Questions and Answers (2026)

Category 1: Serverless and AWS Lambda (Q1-Q8)

Q1. What is serverless computing and what problem does it solve?

Q2. How does AWS Lambda work? Explain the execution model.

Q3. What is a Lambda cold start and how do you reduce it?

Q4. What is Lambda concurrency and what are the two types?

Q5. What are Lambda Layers and when do you use them?

Q6. What are the two Lambda invocation types and how does error handling differ?

Q7. What are Lambda's key limits and how do you work around them?

Q8. How do you monitor and debug AWS Lambda in production?

Category 2: AWS API Gateway (Q9-Q14)

Q9. What is AWS API Gateway and what are the three API types?

Q10. What integration types does API Gateway support?

Q11. How does API Gateway handle authorization?

Q12. How does API Gateway throttling work?

Q13. What are API Gateway stages and how do you use them for deployments?

Q14. What is the difference between REST API and HTTP API in API Gateway?

Category 3: AWS S3 (Q15-Q20)

Q15. What is Amazon S3 and what are its core concepts?