Architecture communication case study

Standardizing async workflows without taking ownership away from teams.

A platform decision case for a repeated workflow shape: intake API, durable state, async processing, failure handling, notification hooks, and observability. The tradeoff is how much to standardize before the platform starts taking useful ownership away from teams.

Open the CDK companion case See the communication layers

Skill: Architecture communication
Decision: Reusable construct library
Audiences: Engineering · delivery · adoption
Companion: CDK workflow case

Problem
Constraints
Criteria
Options
Recommendation
Communication
Adoption
Boundaries

01 — Problem

Repeated workflow decisions create delivery and operational risk.

Multiple teams need the same class of workflow. The visible work is AWS wiring, but the bigger problem is uneven team execution: duplicated design effort, inconsistent operational defaults, and support boundaries negotiated service by service.

Design effort gets repeated for each new workflow.
Retry, alarm, and recovery defaults drift between services.
Support boundaries become ambiguous when behavior is locally invented.
Cross-team improvements stop compounding.

02 — Constraints

The decision has to protect autonomy and consistency at the same time.

Team autonomy

Teams still need to own service behavior, deployment timing, data contracts, and domain-specific failure choices.
Service variability

Workflows can share an intake-state-async-notification shape, but they still differ in payloads, authorization, deadlines, and downstream systems.
Operational consistency

Retry limits, dead-letter handling, state retention, alarms, and dashboards should not depend on which team implemented the pattern last.
Migration cost

The path has to work for new services first and let existing services adopt only where the return is worth the change.
Support model

The shared layer needs clear ownership for defaults, bugs, versioning, and support boundaries.
Reversibility

A team should be able to stop using the pattern without turning it into a major platform extraction project.

03 — Criteria

Five criteria kept the decision grounded.

The criteria are intentionally practical. They test whether the choice helps teams ship, operate, and change direction, not only whether it looks clean on an architecture diagram.

Criterion Question

Repeatability

Does this remove repeated design and wiring decisions across services?

Operability

Does it make failure modes, alarms, and recovery paths more consistent?

Adoption cost

Can teams adopt it without waiting for a central roadmap or rewriting service logic?

Ownership clarity

Can teams tell who owns the platform layer, the service behavior, and production support?

Reversibility

Can the decision be changed if the workflow class becomes too broad or too critical?

04 — Options

Three shapes of platform ownership.

01

Service-specific implementations

Where it fits. Fits when a workflow is unusual, low-volume, or tightly coupled to one service's domain model.

Primary risk. Every team re-decides state shape, queue policy, retries, alerts, and support handoff. Delivery looks autonomous, but platform learning does not compound.
02 Recommended

Reusable construct library

Where it fits. Fits when teams need local ownership, but the organization needs consistent workflow defaults and a shared vocabulary.

Primary risk. Requires disciplined versioning, examples, and support ownership. Without that operating model, the library becomes another dependency teams work around.
03

Central platform service

Where it fits. Fits when the workflow becomes a high-scale product capability with uniform contracts, centralized operations, and strong governance needs.

Primary risk. Can over-centralize too early. Teams wait on platform capacity, special cases pile up, and local product behavior becomes harder to change.

05 — Recommendation

Choose a reusable construct library, but treat it as an operating model.

The recommendation is not "share code." It is a shift in how teams approach the workflow class: start from a maintained platform pattern, accept common operational defaults, configure the service-specific behavior, and raise gaps as improvements to the shared layer.

Before

Each team independently chooses queue settings, state shape, notification behavior, alarm coverage, and failure recovery. Reviews catch drift but do not prevent it.

After

Teams compose a known workflow foundation, keep domain logic local, and discuss exceptions through a small set of shared decision points.

06 — Communication

The same recommendation needs different wording for each audience.

01

Engineering detail

Use a construct library to standardize the workflow skeleton: intake boundary, durable state, async queueing, retry and dead-letter defaults, lifecycle hooks, and baseline observability. Services keep their own handlers, payload schemas, authorization, and domain transitions.
02

Delivery and business impact

The decision removes repeated platform design from each team's backlog while keeping teams able to ship service-specific behavior. It reduces inconsistent operational defaults without creating a central service dependency for every workflow change.
03

Team adoption guidance

Adopt the construct for new workflows that match the common shape. Opt out when the workflow needs unusual ordering, latency guarantees, regional topology, or a different state model. Raise gaps as versioned library changes, not one-off copies.

07 — Adoption

Adoption succeeds only if the pattern is easy to use and easy to question.

01
Publish a short decision note with the problem, selected option, non-goals, and opt-out rules.
02
Provide a reference implementation that creates the intake route, state store, queue, worker, notification hook, and alarms.
03
Create a design checklist for retry policy, state retention, notification consumers, dashboard needs, and runbook ownership.
04
Version defaults deliberately, with migration notes for breaking changes and a support window for older versions.
05
Keep an escape hatch for teams that need raw infrastructure composition or a different workflow topology.
06
Name adoption risks up front: documentation drift, unclear bug ownership, slow updates, and teams copying older examples.

08 — Boundaries

The decision leaves room for opt-outs and later evolution.

When this is the wrong move

If the workflow is highly specialized, needs orchestration semantics beyond the shared shape, or needs central runtime governance from day one, the construct library is not the right answer.

What stays service-owned

Domain validation, authorization, payload design, data retention beyond the default, and incident ownership for service behavior all stay with the team.

When this evolves

A central platform service becomes more attractive once many teams need the same runtime contract, cross-service workflow visibility, centralized audit controls, or shared SLOs that local composition cannot handle cleanly.

Implementation details for the recommended pattern live in the CDK workflow case. That page covers the construct surface, generated AWS shape, IAM grant boundaries, and local synth validation. This page stays on the decision and adoption model.

Repeated workflow decisions create delivery and operational risk.

The decision has to protect autonomy and consistency at the same time.

Team autonomy

Service variability

Operational consistency

Migration cost

Support model

Reversibility

Five criteria kept the decision grounded.

Three shapes of platform ownership.

Service-specific implementations

Reusable construct library

Central platform service

Choose a reusable construct library, but treat it as an operating model.

The same recommendation needs different wording for each audience.

Engineering detail

Delivery and business impact

Team adoption guidance

Adoption succeeds only if the pattern is easy to use and easy to question.

The decision leaves room for opt-outs and later evolution.

When this is the wrong move

What stays service-owned

When this evolves

Related pages.