Cloud platform case study

Turning repeated serverless wiring into reusable CDK constructs.

A durable workflow pattern built as composable AWS CDK constructs: API intake, state, asynchronous work, failure handling, notifications, alarms, and tests. The point is to keep the repeated wiring consistent without hiding the service behavior teams still need to own.

What this shows AWS CDK platform design
Architecture API Gateway, Lambda, DynamoDB, SQS/DLQ, SNS, CloudWatch
Code shape Six custom constructs plus example stacks
Checks Unit tests, construct tests, CDK assertion tests, local synth

Code map

Files behind the CDK pattern.

The public repository keeps the construct API, generated infrastructure, comparison stacks, and tests separate enough to read without guessing where the behavior lives.

Problem

The hard part is not one service. It is what gets rebuilt every time.

Teams often need the same workflow skeleton: accept a request, store durable state, queue asynchronous work, update status, publish lifecycle events, and show enough operational signal to notice when work is stuck.

If every team hand-builds that pattern, permissions, retry policy, alarm thresholds, table indexes, and event-source settings drift between services. The goal here is to package the repeated parts while keeping the business behavior visible.

Options

The useful middle ground is a small construct library.

Hand-wire every service

Maximum flexibility, but repeated infrastructure code makes drift and missed defaults more likely.

Use a template

Good for bootstrapping, but teams still own the copied code after the first service is created.

Build focused CDK constructs

Standardizes the risky repeated wiring while leaving workflow behavior configurable.

Build a full internal platform

Powerful, but too much ceremony for a pattern that still benefits from local CDK composition.

Architecture

A durable workflow path where failure has somewhere to go.

Entry & flow State & signals API Gateway POST · GET Submit Lambda SQS work queue durable async Worker Lambda Status Lambda DynamoDB state table SQS DLQ retry exhausted SNS lifecycle POST enqueue poll GET write exhausted publish read
API Gateway routes both requests. POST goes through Submit, which writes durable state and enqueues work. GET goes through Status, which reads the state table directly. Worker polls the queue, updates state, and publishes lifecycle events; messages that exhaust retries land in the dead-letter queue. This is the AWS shape the example stack synthesizes when the six constructs are composed.
  • API Gateway REST API
  • 3 Lambda functions
  • DynamoDB state table with TTL, PITR, and status index
  • SQS work queue and dead-letter queue
  • SNS topic for lifecycle events
  • 2 CloudWatch alarms
  • least-privilege IAM policies
  • stack outputs for API URL, table name, and queue URL
Developer surface

The same architecture, with less repeated service code.

Before: each team wires the same defaults

const deadLetterQueue = new Queue(this, "WorkflowDeadLetterQueue", {
  retentionPeriod: Duration.days(14)
});

const workQueue = new Queue(this, "WorkflowWorkQueue", {
  visibilityTimeout: Duration.seconds(60),
  retentionPeriod: Duration.days(4),
  deadLetterQueue: {
    queue: deadLetterQueue,
    maxReceiveCount: 3
  }
});

stateTable.grantReadWriteData(submitFunction);
workQueue.grantSendMessages(submitFunction);
stateTable.grantReadData(statusFunction);
stateTable.grantReadWriteData(workerFunction);
workQueue.grantConsumeMessages(workerFunction);
notifications.grantPublish(workerFunction);

After: teams compose the pattern

const stateTable = new WorkflowStateTable(this, "WorkflowState");
const workQueue = new AsyncWorkQueue(this, "WorkQueue");
const notifications = new WorkflowNotifications(this, "WorkflowNotifications");

new WorkflowApi(this, "WorkflowApi", {
  stateTable,
  workQueue,
  submitEntry,
  statusEntry
});

new WorkflowWorker(this, "WorkflowWorker", {
  stateTable,
  workQueue,
  notifications,
  entry: workerEntry
});

new WorkflowObservability(this, "WorkflowObservability", {
  workQueue
});
Constructs

Each construct owns one operational concern.

WorkflowStateTable

Owns the DynamoDB table shape, TTL, point-in-time recovery, and the status index teams would otherwise rebuild by hand.

AsyncWorkQueue

Owns the queue, dead-letter queue, visibility timeout, retention, and retry count, because those defaults tend to drift.

WorkflowApi

Owns API Gateway routes and the submit/status Lambda wiring, while leaving service behavior outside the construct.

WorkflowWorker

Owns the worker Lambda, SQS event source, grants, timeout, and environment needed to process work safely.

WorkflowNotifications

Owns the lifecycle notification topic and the narrow publish permission.

WorkflowObservability

Owns queue-age and dead-letter queue alarms, so stuck work is visible by default.

State table default

export class WorkflowStateTable extends Construct {
  public readonly table: Table;
  public readonly requestIdKey = "requestId";

  public constructor(scope: Construct, id: string, props: WorkflowStateTableProps = {}) {
    super(scope, id);

    this.table = new Table(this, "Table", {
      partitionKey: { name: this.requestIdKey, type: AttributeType.STRING },
      billingMode: BillingMode.PAY_PER_REQUEST,
      timeToLiveAttribute: props.ttlAttribute ?? "expiresAt",
      pointInTimeRecoverySpecification: {
        pointInTimeRecoveryEnabled: props.pointInTimeRecovery ?? true
      }
    });

    this.table.addGlobalSecondaryIndex({
      indexName: "status-createdAt-index",
      partitionKey: { name: "status", type: AttributeType.STRING },
      sortKey: { name: "createdAt", type: AttributeType.STRING },
      projectionType: ProjectionType.ALL
    });
  }
}

Queue failure path

export class AsyncWorkQueue extends Construct {
  public readonly queue: Queue;
  public readonly deadLetterQueue: Queue;

  public constructor(scope: Construct, id: string, props: AsyncWorkQueueProps = {}) {
    super(scope, id);

    this.deadLetterQueue = new Queue(this, "DeadLetterQueue", {
      retentionPeriod: props.retentionPeriod ?? Duration.days(14)
    });

    this.queue = new Queue(this, "Queue", {
      visibilityTimeout: props.visibilityTimeout ?? Duration.seconds(60),
      retentionPeriod: props.retentionPeriod ?? Duration.days(4),
      deadLetterQueue: {
        queue: this.deadLetterQueue,
        maxReceiveCount: props.maxReceiveCount ?? 3
      }
    });
  }
}
Permissions

Permissions stay close to the workflow boundary.

The example keeps permissions close to the construct that needs them. Submit, status, and worker functions get different grants, and the stack test checks that generated IAM actions do not fall back to broad wildcard access.

Submit Lambda

Write workflow state and send work messages.

No queue consumption or notification publishing.

Status Lambda

Read workflow state for status lookups.

No queue, notification, or write path access.

Worker Lambda

Consume queued work, update state, and publish lifecycle notifications.

No API management permissions or broad wildcard policy.

Grant separation test

expect(submitActions).toEqual(
  expect.arrayContaining(["dynamodb:PutItem", "dynamodb:UpdateItem", "sqs:SendMessage"])
);
expect(submitActions).not.toContain("sqs:ReceiveMessage");
expect(submitActions).not.toContain("sns:Publish");

expect(statusActions).toEqual(expect.arrayContaining(["dynamodb:GetItem", "dynamodb:Query"]));
expect(statusActions).not.toContain("dynamodb:PutItem");
expect(statusActions).not.toContain("sqs:SendMessage");

expect(workerActions).toEqual(
  expect.arrayContaining(["dynamodb:UpdateItem", "sqs:ReceiveMessage", "sns:Publish"])
);
expect(workerActions).not.toContain("sqs:SendMessage");
Synth output

From construct definition to reproducible CloudFormation.

The two excerpts below are what cdk synth emits for the example stack. They are abbreviated for reading: logical IDs are truncated and unrelated metadata is removed. Anyone running npm run synth in the companion repository can reproduce the full template.

State table — synthesized

From WorkflowStateTable: PAY_PER_REQUEST, PITR, TTL, and the status index.

{
  "Type": "AWS::DynamoDB::Table",
  "Properties": {
    "AttributeDefinitions": [
      { "AttributeName": "requestId", "AttributeType": "S" },
      { "AttributeName": "status",    "AttributeType": "S" },
      { "AttributeName": "createdAt", "AttributeType": "S" }
    ],
    "KeySchema": [
      { "AttributeName": "requestId", "KeyType": "HASH" }
    ],
    "BillingMode": "PAY_PER_REQUEST",
    "PointInTimeRecoverySpecification": { "PointInTimeRecoveryEnabled": true },
    "TimeToLiveSpecification":          { "AttributeName": "expiresAt", "Enabled": true },
    "GlobalSecondaryIndexes": [
      {
        "IndexName": "status-createdAt-index",
        "KeySchema": [
          { "AttributeName": "status",    "KeyType": "HASH" },
          { "AttributeName": "createdAt", "KeyType": "RANGE" }
        ],
        "Projection": { "ProjectionType": "ALL" }
      }
    ]
  }
}

Submit Lambda role — synthesized

Generated by WorkflowApi: scoped to the table and queue, with no consumer or publish actions.

{
  "PolicyDocument": {
    "Statement": [
      {
        "Action": ["dynamodb:PutItem", "dynamodb:UpdateItem"],
        "Effect": "Allow",
        "Resource": { "Fn::GetAtt": ["WorkflowStateTableTable...", "Arn"] }
      },
      {
        "Action":   "sqs:SendMessage",
        "Effect":   "Allow",
        "Resource": { "Fn::GetAtt": ["WorkQueueQueue...", "Arn"] }
      }
    ],
    "Version": "2012-10-17"
  }
}
Tests

The tests check both behavior and infrastructure shape.

The implementation has workflow unit tests, construct-level tests, full-stack assertion tests, and a before/after comparison test. The comparison is there because the developer interface changed, but the architecture should not quietly change with it.

it("keeps the same architectural shape while changing the developer interface", () => {
  const beforeApp = new App();
  const afterApp = new App();
  const before = Template.fromStack(new BeforeHandWiredStack(beforeApp, "BeforeStack"));
  const after = Template.fromStack(new AfterConstructLibraryStack(afterApp, "AfterStack"));

  expect(selectedResourceCounts(before)).toEqual(selectedResourceCounts(after));
});
TypeScript build passed Vitest suite 4 files, 15 tests CDK synth default and comparison apps synthesize successfully
Tradeoffs

What this design chooses, and what it leaves open.

Several small constructs instead of one large construct

The composition code is slightly longer, but each construct has a clear reason to exist. That makes the pattern easier to adopt in pieces instead of forcing one large abstraction on every service.

SQS introduces eventual consistency

The workflow accepts that tradeoff because the work is durable, retryable, and observable. I would rather make the async behavior explicit than pretend every downstream operation is synchronous.

REST remains the front door, not the core idea

API Gateway makes the example easy to understand, but the main design decision is the reusable cloud pattern behind it.

Production path

What I would change before running this publicly.

  • Authentication and caller identity mapped into request context.
  • API throttling, budgets, and alarms for cost control.
  • Deployment promotion across environments with retained data policies.
  • Structured logs, traces, and dashboard views for support handoff.

This version stays local on purpose. Tests and synth show the construct design without exposing a public endpoint that would need bot protection, request throttling, and spend controls.

Limits

This is a focused platform pattern, not a complete internal platform.

The first slice is local and single-region. Authentication, deployment promotion, multi-region failover, structured logging helpers, and publishable package automation are intentionally outside this version. Those become more useful once the construct boundaries are stable.