Architecture communication case study

Standardizing async workflows without taking ownership away from teams.

A platform decision case for a repeated workflow shape: intake API, durable state, async processing, failure handling, notification hooks, and observability. The tradeoff is how much to standardize before the platform starts taking useful ownership away from teams.

Skill
Architecture communication
Decision
Reusable construct library
Audiences
Engineering · delivery · adoption
Companion
CDK workflow case
01 — Problem

Repeated workflow decisions create delivery and operational risk.

Multiple teams need the same class of workflow. The visible work is AWS wiring, but the bigger problem is uneven team execution: duplicated design effort, inconsistent operational defaults, and support boundaries negotiated service by service.

  • Design effort gets repeated for each new workflow.
  • Retry, alarm, and recovery defaults drift between services.
  • Support boundaries become ambiguous when behavior is locally invented.
  • Cross-team improvements stop compounding.
02 — Constraints

The decision has to protect autonomy and consistency at the same time.

03 — Criteria

Five criteria kept the decision grounded.

The criteria are intentionally practical. They test whether the choice helps teams ship, operate, and change direction, not only whether it looks clean on an architecture diagram.

Criterion Question
Repeatability

Does this remove repeated design and wiring decisions across services?

Operability

Does it make failure modes, alarms, and recovery paths more consistent?

Adoption cost

Can teams adopt it without waiting for a central roadmap or rewriting service logic?

Ownership clarity

Can teams tell who owns the platform layer, the service behavior, and production support?

Reversibility

Can the decision be changed if the workflow class becomes too broad or too critical?

04 — Options

Three shapes of platform ownership.

  1. 01

    Service-specific implementations

    Where it fits. Fits when a workflow is unusual, low-volume, or tightly coupled to one service's domain model.

    Primary risk. Every team re-decides state shape, queue policy, retries, alerts, and support handoff. Delivery looks autonomous, but platform learning does not compound.

  2. 02 Recommended

    Reusable construct library

    Where it fits. Fits when teams need local ownership, but the organization needs consistent workflow defaults and a shared vocabulary.

    Primary risk. Requires disciplined versioning, examples, and support ownership. Without that operating model, the library becomes another dependency teams work around.

  3. 03

    Central platform service

    Where it fits. Fits when the workflow becomes a high-scale product capability with uniform contracts, centralized operations, and strong governance needs.

    Primary risk. Can over-centralize too early. Teams wait on platform capacity, special cases pile up, and local product behavior becomes harder to change.

05 — Recommendation

Choose a reusable construct library, but treat it as an operating model.

The recommendation is not "share code." It is a shift in how teams approach the workflow class: start from a maintained platform pattern, accept common operational defaults, configure the service-specific behavior, and raise gaps as improvements to the shared layer.

Before

Each team independently chooses queue settings, state shape, notification behavior, alarm coverage, and failure recovery. Reviews catch drift but do not prevent it.

After

Teams compose a known workflow foundation, keep domain logic local, and discuss exceptions through a small set of shared decision points.

06 — Communication

The same recommendation needs different wording for each audience.

  1. 01

    Engineering detail

    Use a construct library to standardize the workflow skeleton: intake boundary, durable state, async queueing, retry and dead-letter defaults, lifecycle hooks, and baseline observability. Services keep their own handlers, payload schemas, authorization, and domain transitions.

  2. 02

    Delivery and business impact

    The decision removes repeated platform design from each team's backlog while keeping teams able to ship service-specific behavior. It reduces inconsistent operational defaults without creating a central service dependency for every workflow change.

  3. 03

    Team adoption guidance

    Adopt the construct for new workflows that match the common shape. Opt out when the workflow needs unusual ordering, latency guarantees, regional topology, or a different state model. Raise gaps as versioned library changes, not one-off copies.

07 — Adoption

Adoption succeeds only if the pattern is easy to use and easy to question.

  1. 01

    Publish a short decision note with the problem, selected option, non-goals, and opt-out rules.

  2. 02

    Provide a reference implementation that creates the intake route, state store, queue, worker, notification hook, and alarms.

  3. 03

    Create a design checklist for retry policy, state retention, notification consumers, dashboard needs, and runbook ownership.

  4. 04

    Version defaults deliberately, with migration notes for breaking changes and a support window for older versions.

  5. 05

    Keep an escape hatch for teams that need raw infrastructure composition or a different workflow topology.

  6. 06

    Name adoption risks up front: documentation drift, unclear bug ownership, slow updates, and teams copying older examples.

08 — Boundaries

The decision leaves room for opt-outs and later evolution.

When this is the wrong move

If the workflow is highly specialized, needs orchestration semantics beyond the shared shape, or needs central runtime governance from day one, the construct library is not the right answer.

What stays service-owned

Domain validation, authorization, payload design, data retention beyond the default, and incident ownership for service behavior all stay with the team.

When this evolves

A central platform service becomes more attractive once many teams need the same runtime contract, cross-service workflow visibility, centralized audit controls, or shared SLOs that local composition cannot handle cleanly.

Implementation details for the recommended pattern live in the CDK workflow case. That page covers the construct surface, generated AWS shape, IAM grant boundaries, and local synth validation. This page stays on the decision and adoption model.

Related pages.