Before
Each team independently chooses queue settings, state shape, notification behavior, alarm coverage, and failure recovery. Reviews catch drift but do not prevent it.
Architecture communication case study
A platform decision case for a repeated workflow shape: intake API, durable state, async processing, failure handling, notification hooks, and observability. The tradeoff is how much to standardize before the platform starts taking useful ownership away from teams.
Multiple teams need the same class of workflow. The visible work is AWS wiring, but the bigger problem is uneven team execution: duplicated design effort, inconsistent operational defaults, and support boundaries negotiated service by service.
Teams still need to own service behavior, deployment timing, data contracts, and domain-specific failure choices.
Workflows can share an intake-state-async-notification shape, but they still differ in payloads, authorization, deadlines, and downstream systems.
Retry limits, dead-letter handling, state retention, alarms, and dashboards should not depend on which team implemented the pattern last.
The path has to work for new services first and let existing services adopt only where the return is worth the change.
The shared layer needs clear ownership for defaults, bugs, versioning, and support boundaries.
A team should be able to stop using the pattern without turning it into a major platform extraction project.
The criteria are intentionally practical. They test whether the choice helps teams ship, operate, and change direction, not only whether it looks clean on an architecture diagram.
Does this remove repeated design and wiring decisions across services?
Does it make failure modes, alarms, and recovery paths more consistent?
Can teams adopt it without waiting for a central roadmap or rewriting service logic?
Can teams tell who owns the platform layer, the service behavior, and production support?
Can the decision be changed if the workflow class becomes too broad or too critical?
Where it fits. Fits when a workflow is unusual, low-volume, or tightly coupled to one service's domain model.
Primary risk. Every team re-decides state shape, queue policy, retries, alerts, and support handoff. Delivery looks autonomous, but platform learning does not compound.
Where it fits. Fits when teams need local ownership, but the organization needs consistent workflow defaults and a shared vocabulary.
Primary risk. Requires disciplined versioning, examples, and support ownership. Without that operating model, the library becomes another dependency teams work around.
Where it fits. Fits when the workflow becomes a high-scale product capability with uniform contracts, centralized operations, and strong governance needs.
Primary risk. Can over-centralize too early. Teams wait on platform capacity, special cases pile up, and local product behavior becomes harder to change.
The recommendation is not "share code." It is a shift in how teams approach the workflow class: start from a maintained platform pattern, accept common operational defaults, configure the service-specific behavior, and raise gaps as improvements to the shared layer.
Before
Each team independently chooses queue settings, state shape, notification behavior, alarm coverage, and failure recovery. Reviews catch drift but do not prevent it.
After
Teams compose a known workflow foundation, keep domain logic local, and discuss exceptions through a small set of shared decision points.
Use a construct library to standardize the workflow skeleton: intake boundary, durable state, async queueing, retry and dead-letter defaults, lifecycle hooks, and baseline observability. Services keep their own handlers, payload schemas, authorization, and domain transitions.
The decision removes repeated platform design from each team's backlog while keeping teams able to ship service-specific behavior. It reduces inconsistent operational defaults without creating a central service dependency for every workflow change.
Adopt the construct for new workflows that match the common shape. Opt out when the workflow needs unusual ordering, latency guarantees, regional topology, or a different state model. Raise gaps as versioned library changes, not one-off copies.
Publish a short decision note with the problem, selected option, non-goals, and opt-out rules.
Provide a reference implementation that creates the intake route, state store, queue, worker, notification hook, and alarms.
Create a design checklist for retry policy, state retention, notification consumers, dashboard needs, and runbook ownership.
Version defaults deliberately, with migration notes for breaking changes and a support window for older versions.
Keep an escape hatch for teams that need raw infrastructure composition or a different workflow topology.
Name adoption risks up front: documentation drift, unclear bug ownership, slow updates, and teams copying older examples.
If the workflow is highly specialized, needs orchestration semantics beyond the shared shape, or needs central runtime governance from day one, the construct library is not the right answer.
Domain validation, authorization, payload design, data retention beyond the default, and incident ownership for service behavior all stay with the team.
A central platform service becomes more attractive once many teams need the same runtime contract, cross-service workflow visibility, centralized audit controls, or shared SLOs that local composition cannot handle cleanly.
Implementation details for the recommended pattern live in the CDK workflow case. That page covers the construct surface, generated AWS shape, IAM grant boundaries, and local synth validation. This page stays on the decision and adoption model.