October 22, 2025·13 min read·Specialized Domainsintermediate

Why resilient, observable, and adaptable supply chains matter more than ever

conceptual illustration of a connected supply chain network showing factories, warehouses, and retailers linked by data flows

I did not truly appreciate the fragility of supply chains until a late-night pager alert turned into a multi-day firefight. A upstream supplier’s system had quietly changed a date format, and our ingestion pipeline started silently dropping promises about delivery windows. Support tickets piled up, planning spreadsheets diverged from reality, and a small data mismatch rippled into real missed shipments. That experience is the impetus for this post. It is not a marketing overview; it is a practical guide for engineers who need to build systems that model reality, degrade gracefully, and stay trustworthy when the world gets messy.

In this article, I will walk through how to think about supply chain management (SCM) systems from an engineering perspective. We will cover context and tradeoffs, core modeling concepts, and practical code patterns using Python and some common tooling. If you have ever wondered how to represent orders, inventory, or shipments in code without painting yourself into a corner, or how to handle asynchronous updates across organizations without losing your mind, this is for you.

Context: Where SCM fits in the modern engineering landscape

SCM systems coordinate the flow of goods, information, and money across a network of suppliers, manufacturers, distributors, and retailers. In practice, that means integrating ERPs, warehouse management systems (WMS), transportation management systems (TMS), and custom services that model orders, inventory, and logistics events.

Engineers rarely build all of these from scratch. More often, we integrate and orchestrate. We design a central service layer that translates between domain models and external systems, enforce invariants around inventory and orders, and provide observability into exceptions. The closest alternatives to a bespoke SCM are buying an off-the-shelf platform or relying entirely on a single ERP’s capabilities. The tradeoff is flexibility versus maintenance burden. If your business model evolves quickly or you operate in a heterogeneous ecosystem, a custom orchestration layer often outperforms a rigid monolith. If you are standardizing on a single vendor and have limited engineering capacity, a packaged solution may be more pragmatic.

Who typically builds these systems? Midsize to large companies with multi-echelon inventory, complex fulfillment logic, or integrations across many suppliers. Common personas include platform engineers, integration specialists, and domain-focused backend developers who care about consistency, throughput, and auditability.

Core concepts and practical modeling

SCM domain models tend to revolve around a few core entities: organizations, locations, items, inventory, orders, shipments, and events that describe state transitions. A practical approach is to treat the system as an event-driven architecture where services emit and consume immutable facts (events), while the core models remain as consistent projections.

A quick mental model:

Orders represent demand.
Shipments represent supply in motion.
Inventory is the truth at rest.
Events are the glue that reconciles intent and reality.

Domain modeling with event-sourced aggregates

Let’s start with a simple order aggregate that emits events. This pattern is useful because it creates an audit trail and makes it easier to reconstruct state or feed downstream consumers.

from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional, Dict
from enum import Enum
import json


class OrderStatus(Enum):
    CREATED = "CREATED"
    PARTIALLY_FULFILLED = "PARTIALLY_FULFILLED"
    FULFILLED = "FULFILLED"
    CANCELLED = "CANCELLED"


@dataclass
class Event:
    event_id: str
    aggregate_id: str
    event_type: str
    timestamp: datetime
    payload: Dict


@dataclass
class Order:
    order_id: str
    customer_id: str
    status: OrderStatus
    line_items: List[Dict]  # e.g., [{"sku": "ABC-123", "qty": 2}]
    events: List[Event] = field(default_factory=list)

    def create(self, line_items: List[Dict]) -> None:
        assert self.status is None, "Order already exists"
        self.line_items = line_items
        self.status = OrderStatus.CREATED
        self.events.append(
            Event(
                event_id=f"evt-{self.order_id}-1",
                aggregate_id=self.order_id,
                event_type="OrderCreated",
                timestamp=datetime.utcnow(),
                payload={"line_items": line_items},
            )
        )

    def partially_fulfilled(self, fulfillment: Dict) -> None:
        # fulfillment: {"sku": "ABC-123", "shipped": 1, "backordered": 1}
        if self.status == OrderStatus.CREATED:
            self.status = OrderStatus.PARTIALLY_FULFILLED
        self.events.append(
            Event(
                event_id=f"evt-{self.order_id}-{len(self.events)+1}",
                aggregate_id=self.order_id,
                event_type="OrderPartiallyFulfilled",
                timestamp=datetime.utcnow(),
                payload={"fulfillment": fulfillment},
            )
        )

    def fulfill(self) -> None:
        self.status = OrderStatus.FULFILLED
        self.events.append(
            Event(
                event_id=f"evt-{self.order_id}-{len(self.events)+1}",
                aggregate_id=self.order_id,
                event_type="OrderFulfilled",
                timestamp=datetime.utcnow(),
                payload={},
            )
        )

    def cancel(self, reason: str) -> None:
        self.status = OrderStatus.CANCELLED
        self.events.append(
            Event(
                event_id=f"evt-{self.order_id}-{len(self.events)+1}",
                aggregate_id=self.order_id,
                event_type="OrderCancelled",
                timestamp=datetime.utcnow(),
                payload={"reason": reason},
            )
        )


# Example usage
order = Order(order_id="ord-1001", customer_id="cust-42", status=None, line_items=[])
order.create([{"sku": "ABC-123", "qty": 2}])
order.partially_fulfilled({"sku": "ABC-123", "shipped": 1, "backordered": 1})
order.fulfill()

print(json.dumps([e.payload for e in order.events], indent=2))

This is intentionally simple. In a real system, events would be persisted to an event store, and the aggregate would be reconstructed by replaying events. That gives you auditability, temporal queries, and the ability to feed multiple projections (e.g., inventory, billing, analytics).

Inventory tracking and reservations

Inventory is tricky because of concurrency and partial visibility. You want to reserve inventory when an order is created and adjust reservations when shipments move or cancellations occur. One practical approach is to treat reservations as a ledger that sits against a SKU and location.

from dataclasses import dataclass, field
from typing import Dict, List
from datetime import datetime


@dataclass
class InventoryLedgerEntry:
    sku: str
    location: str
    qty: int  # positive for inbound, negative for outbound
    reference: str  # e.g., order_id or shipment_id
    timestamp: datetime


@dataclass
class InventoryView:
    ledger: List[InventoryLedgerEntry] = field(default_factory=list)

    def on_hand(self, sku: str, location: str) -> int:
        return sum(e.qty for e in self.ledger if e.sku == sku and e.location == location)

    def reserve(self, sku: str, location: str, order_id: str, qty: int) -> bool:
        # naive optimistic check
        available = self.on_hand(sku, location)
        if available < qty:
            return False
        self.ledger.append(
            InventoryLedgerEntry(
                sku=sku,
                location=location,
                qty=-qty,  # outbound reservation
                reference=order_id,
                timestamp=datetime.utcnow(),
            )
        )
        return True

    def ship(self, sku: str, location: str, order_id: str, qty: int) -> None:
        # remove reservation and record shipment
        self.ledger.append(
            InventoryLedgerEntry(
                sku=sku,
                location=location,
                qty=0,  # accounting line
                reference=f"ship-{order_id}",
                timestamp=datetime.utcnow(),
            )
        )


# Example usage
inv = InventoryView()
inv.ledger.append(InventoryLedgerEntry(sku="ABC-123", location="WH-A", qty=100, reference="initial", timestamp=datetime.utcnow()))
inv.reserve("ABC-123", "WH-A", "ord-1001", 2)
inv.reserve("ABC-123", "WH-A", "ord-1002", 98)
inv.reserve("ABC-123", "WH-A", "ord-1003", 1)  # will fail due to insufficient stock

print(inv.on_hand("ABC-123", "WH-A"))

In production, this naive ledger would be replaced with a more robust mechanism that includes optimistic concurrency control or a dedicated inventory service with strong consistency guarantees. Tools like Apache Kafka or Amazon Kinesis can be used to stream inventory events and update projections in near real-time.

Asynchronous order-to-shipment workflow

Real-world SCM workflows are asynchronous. Orders are placed, reservations are made, shipments are scheduled, and exceptions happen. Using an event-driven approach and a workflow orchestrator like Cadence, Temporal, or even a simple state machine helps manage this complexity.

Here is a sketch of a simple workflow using a state machine in Python. It demonstrates order creation, inventory reservation, shipment planning, and fulfillment. In practice, you would wrap this in a durable orchestrator.

from dataclasses import dataclass
from enum import Enum
from typing import Optional, List
import time


class WorkflowState(Enum):
    CREATED = "CREATED"
    RESERVED = "RESERVED"
    SHIPPED = "SHIPPED"
    FULFILLED = "FULFILLED"
    FAILED = "FAILED"


@dataclass
class WorkflowContext:
    order_id: str
    sku: str
    qty: int
    location: str
    state: WorkflowState
    attempts: int = 0
    last_error: Optional[str] = None


def reserve_inventory(ctx: WorkflowContext, inv: InventoryView) -> bool:
    success = inv.reserve(ctx.sku, ctx.location, ctx.order_id, ctx.qty)
    if success:
        ctx.state = WorkflowState.RESERVED
    else:
        ctx.last_error = "Inventory reservation failed"
    return success


def plan_shipment(ctx: WorkflowContext) -> bool:
    # Simulate a call to a TMS; could be HTTP, gRPC, etc.
    # A real implementation would log and emit events.
    ctx.state = WorkflowState.SHIPPED
    return True


def fulfill_order(ctx: WorkflowContext) -> bool:
    # Simulate fulfillment confirmation
    ctx.state = WorkflowState.FULFILLED
    return True


def run_order_workflow(order_id: str, sku: str, qty: int, location: str, inv: InventoryView) -> WorkflowContext:
    ctx = WorkflowContext(order_id=order_id, sku=sku, qty=qty, location=location, state=WorkflowState.CREATED)

    # Step 1: Reserve inventory
    if not reserve_inventory(ctx, inv):
        ctx.state = WorkflowState.FAILED
        return ctx

    # Step 2: Plan shipment
    plan_shipment(ctx)

    # Step 3: Fulfill
    fulfill_order(ctx)

    return ctx


# Example usage
inv = InventoryView()
inv.ledger.append(InventoryLedgerEntry(sku="ABC-123", location="WH-A", qty=100, reference="initial", timestamp=datetime.utcnow()))
ctx = run_order_workflow("ord-2001", "ABC-123", 2, "WH-A", inv)
print(f"Workflow state: {ctx.state}")

This is intentionally minimal. In the real world, you would run this in a durable orchestrator with retries, timeouts, and compensation logic for partial failures. For instance, if shipment planning fails, you must release the inventory reservation. That is where the event log shines: you can build compensating actions by reading the event stream.

Error handling and retries with exponential backoff

Network calls to external systems (e.g., a supplier’s EDI endpoint) are prone to transient errors. A robust retry strategy is essential.

import time
import random
from functools import wraps


def retry(times: int, delay: float = 0.5, backoff: float = 2.0):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            last_exc = None
            for attempt in range(times):
                try:
                    return func(*args, **kwargs)
                except Exception as exc:
                    last_exc = exc
                    if attempt < times - 1:
                        sleep = delay * (backoff ** attempt) + (random.random() * 0.2)
                        time.sleep(sleep)
            raise last_exc
        return wrapper
    return decorator


@retry(times=5, delay=0.2, backoff=1.5)
def call_supplier_api(supplier_url: str, payload: dict):
    # Simulate flakiness
    if random.random() < 0.7:
        raise ConnectionError("Supplier API transient failure")
    return {"status": "accepted", "payload": payload}


result = call_supplier_api("https://supplier.example.com/ship", {"order_id": "ord-2001"})
print(result)

Real-world architecture patterns

Most mature SCM systems I have worked with share a similar structure: a core domain service, an event bus, and a set of adapters for external systems. The core domain service handles business rules, while adapters translate between external formats and internal models.

Event-driven architecture with adapters

Adapters are essential for normalization. Suppliers may send EDI X12 or JSON via SFTP, APIs, or message queues. A robust adapter layer can normalize events into a canonical format, then publish them to a central event bus (Kafka, NATS, or AWS EventBridge). Downstream services consume these events to update projections and trigger workflows.

For example, when a shipment event arrives, it should be normalized to a canonical event like ShipmentDeparted or ShipmentDelivered. These events can then update inventory projections and notify downstream systems. The key is to keep adapters thin and push domain logic into the core service.

Honest evaluation: strengths, weaknesses, and tradeoffs

Strengths

Event-sourced models provide auditability and flexibility. You can reconstruct state at any point in time and feed multiple projections.
Asynchronous workflows with durable orchestrators provide resilience to partial failures and allow for complex compensation logic.
Adapters enable integration with heterogeneous external systems without forcing all participants to adopt a single standard.

Weaknesses

Event-driven systems introduce eventual consistency. This is fine for many SCM workflows but problematic for scenarios requiring strong, immediate consistency.
The operational complexity increases: you need robust monitoring, dead-letter handling, and replay capabilities.
Building adapters for every supplier can be time-consuming and requires domain knowledge of formats like EDI.

Tradeoffs

Use eventual consistency with projections when you can tolerate slight delays (e.g., inventory availability).
Use strong consistency for critical steps like final inventory allocation when stock is limited.
Choose your orchestrator carefully: a general-purpose workflow engine like Temporal adds durability but also infrastructure overhead. For simpler workflows, a lightweight state machine might suffice.

Personal experience: Lessons from the trenches

I learned the importance of modeling events the hard way. On one project, we tried to store only the current state of orders and shipments. When a supplier changed their message format, we lost the original event data, making it impossible to reconcile discrepancies. Moving to an event-sourced model not only fixed our audit trail but also made it easier to onboard new suppliers. Each adapter could emit events in a canonical format, and the core service stayed unchanged.

Another lesson is to treat data contracts as living documents. Early on, we defined JSON schemas for key events and stored them in version control. When a supplier added a new field, we could version the schema and update the adapter without breaking downstream consumers. This small investment paid off during a peak season when we had to onboard a new distributor under tight deadlines.

A common mistake is to over-index on the tool rather than the domain. Picking a fancy orchestrator or a hot database does not replace understanding the supply chain’s realities. The best systems I have worked on were built by engineers who spent time with planners and warehouse staff to understand constraints and exceptions.

Getting started: Setup and workflow

For engineers new to SCM systems, start with a simple project structure that separates domain logic from adapters. Focus on a clear mental model: events drive updates, and services react to events. Use a durable orchestrator for workflows and an event bus for inter-service communication.

Here is a typical project structure:

supply-chain-core/
├── domain/
│   ├── models.py
│   ├── events.py
│   └── aggregates.py
├── services/
│   ├── order_service.py
│   ├── inventory_service.py
│   └── workflow_service.py
├── adapters/
│   ├── supplier_adapter.py
│   ├── tms_adapter.py
│   └── erp_adapter.py
├── infrastructure/
│   ├── event_bus.py
│   ├── orchestrator.py
│   └── persistence.py
├── tests/
│   ├── unit/
│   └── integration/
└── scripts/
    └── seed.py

A minimal Docker compose setup for local development can include Kafka and a Postgres database:

version: "3.8"
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.4.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
  kafka:
    image: confluentinc/cp-kafka:7.4.0
    depends_on:
      - zookeeper
    environment:
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
  postgres:
    image: postgres:14
    environment:
      POSTGRES_USER: scm
      POSTGRES_PASSWORD: scm
      POSTGRES_DB: scm
    ports:
      - "5432:5432"

To run a simple event consumer in Python, you can use the kafka-python library:

from kafka import KafkaConsumer
import json

consumer = KafkaConsumer(
    "supply-chain-events",
    bootstrap_servers=["localhost:9092"],
    value_deserializer=lambda m: json.loads(m.decode("utf-8")),
)

for message in consumer:
    event = message.value
    print(f"Consumed event: {event['event_type']} for {event['aggregate_id']}")

What makes SCM systems stand out

SCM systems stand out because they live at the intersection of data integration and domain complexity. The key differentiators are:

The ability to model events as immutable facts, enabling auditability and flexible projections.
The capacity to integrate diverse external systems via adapters, translating between formats and protocols.
The resilience to partial failures via asynchronous workflows, retries, and compensating actions.

Developer experience is improved when you have a clear separation between domain logic and infrastructure, and when you invest in observability from day one. Metrics, tracing, and structured logging are not optional; they are the difference between a manageable incident and a chaotic one.

Free learning resources

Supply Chain Analytics on Coursera — A good primer on the business side of SCM, which helps engineers understand the constraints that drive system requirements.
EDI University (TrueCommerce) — Practical examples of EDI X12 documents. Essential reading for adapter development.
Temporal documentation — A durable execution platform for workflows. Useful for modeling complex SCM processes with retries and compensation.
Kafka documentation — Event streaming fundamentals. Crucial for building event-driven architectures.
Inventory Management: Principles and Concepts by CSCMP — A foundational text that clarifies the business logic behind inventory and fulfillment.

Summary and final thoughts

SCM systems are best approached as event-driven, adapter-rich platforms that balance consistency and resilience. If you are building a system that must integrate with multiple suppliers, manage inventory across locations, and handle exceptions gracefully, the patterns described here will serve you well. If you are operating in a single-vendor environment with minimal integration needs, a packaged solution may be more practical.

Start small. Model your core domain entities as aggregates, emit events for every meaningful state change, and design adapters that normalize external data. Invest in observability early, and choose an orchestrator that matches your complexity. The supply chain is a living system; your software should be able to adapt alongside it.