Building Supply Chain Management Systems
Why resilient, observable, and adaptable supply chains matter more than ever

I did not truly appreciate the fragility of supply chains until a late-night pager alert turned into a multi-day firefight. A upstream supplier’s system had quietly changed a date format, and our ingestion pipeline started silently dropping promises about delivery windows. Support tickets piled up, planning spreadsheets diverged from reality, and a small data mismatch rippled into real missed shipments. That experience is the impetus for this post. It is not a marketing overview; it is a practical guide for engineers who need to build systems that model reality, degrade gracefully, and stay trustworthy when the world gets messy.
In this article, I will walk through how to think about supply chain management (SCM) systems from an engineering perspective. We will cover context and tradeoffs, core modeling concepts, and practical code patterns using Python and some common tooling. If you have ever wondered how to represent orders, inventory, or shipments in code without painting yourself into a corner, or how to handle asynchronous updates across organizations without losing your mind, this is for you.
Context: Where SCM fits in the modern engineering landscape
SCM systems coordinate the flow of goods, information, and money across a network of suppliers, manufacturers, distributors, and retailers. In practice, that means integrating ERPs, warehouse management systems (WMS), transportation management systems (TMS), and custom services that model orders, inventory, and logistics events.
Engineers rarely build all of these from scratch. More often, we integrate and orchestrate. We design a central service layer that translates between domain models and external systems, enforce invariants around inventory and orders, and provide observability into exceptions. The closest alternatives to a bespoke SCM are buying an off-the-shelf platform or relying entirely on a single ERP’s capabilities. The tradeoff is flexibility versus maintenance burden. If your business model evolves quickly or you operate in a heterogeneous ecosystem, a custom orchestration layer often outperforms a rigid monolith. If you are standardizing on a single vendor and have limited engineering capacity, a packaged solution may be more pragmatic.
Who typically builds these systems? Midsize to large companies with multi-echelon inventory, complex fulfillment logic, or integrations across many suppliers. Common personas include platform engineers, integration specialists, and domain-focused backend developers who care about consistency, throughput, and auditability.
Core concepts and practical modeling
SCM domain models tend to revolve around a few core entities: organizations, locations, items, inventory, orders, shipments, and events that describe state transitions. A practical approach is to treat the system as an event-driven architecture where services emit and consume immutable facts (events), while the core models remain as consistent projections.
A quick mental model:
- Orders represent demand.
- Shipments represent supply in motion.
- Inventory is the truth at rest.
- Events are the glue that reconciles intent and reality.
Domain modeling with event-sourced aggregates
Let’s start with a simple order aggregate that emits events. This pattern is useful because it creates an audit trail and makes it easier to reconstruct state or feed downstream consumers.
from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional, Dict
from enum import Enum
import json
class OrderStatus(Enum):
CREATED = "CREATED"
PARTIALLY_FULFILLED = "PARTIALLY_FULFILLED"
FULFILLED = "FULFILLED"
CANCELLED = "CANCELLED"
@dataclass
class Event:
event_id: str
aggregate_id: str
event_type: str
timestamp: datetime
payload: Dict
@dataclass
class Order:
order_id: str
customer_id: str
status: OrderStatus
line_items: List[Dict] # e.g., [{"sku": "ABC-123", "qty": 2}]
events: List[Event] = field(default_factory=list)
def create(self, line_items: List[Dict]) -> None:
assert self.status is None, "Order already exists"
self.line_items = line_items
self.status = OrderStatus.CREATED
self.events.append(
Event(
event_id=f"evt-{self.order_id}-1",
aggregate_id=self.order_id,
event_type="OrderCreated",
timestamp=datetime.utcnow(),
payload={"line_items": line_items},
)
)
def partially_fulfilled(self, fulfillment: Dict) -> None:
# fulfillment: {"sku": "ABC-123", "shipped": 1, "backordered": 1}
if self.status == OrderStatus.CREATED:
self.status = OrderStatus.PARTIALLY_FULFILLED
self.events.append(
Event(
event_id=f"evt-{self.order_id}-{len(self.events)+1}",
aggregate_id=self.order_id,
event_type="OrderPartiallyFulfilled",
timestamp=datetime.utcnow(),
payload={"fulfillment": fulfillment},
)
)
def fulfill(self) -> None:
self.status = OrderStatus.FULFILLED
self.events.append(
Event(
event_id=f"evt-{self.order_id}-{len(self.events)+1}",
aggregate_id=self.order_id,
event_type="OrderFulfilled",
timestamp=datetime.utcnow(),
payload={},
)
)
def cancel(self, reason: str) -> None:
self.status = OrderStatus.CANCELLED
self.events.append(
Event(
event_id=f"evt-{self.order_id}-{len(self.events)+1}",
aggregate_id=self.order_id,
event_type="OrderCancelled",
timestamp=datetime.utcnow(),
payload={"reason": reason},
)
)
# Example usage
order = Order(order_id="ord-1001", customer_id="cust-42", status=None, line_items=[])
order.create([{"sku": "ABC-123", "qty": 2}])
order.partially_fulfilled({"sku": "ABC-123", "shipped": 1, "backordered": 1})
order.fulfill()
print(json.dumps([e.payload for e in order.events], indent=2))
This is intentionally simple. In a real system, events would be persisted to an event store, and the aggregate would be reconstructed by replaying events. That gives you auditability, temporal queries, and the ability to feed multiple projections (e.g., inventory, billing, analytics).
Inventory tracking and reservations
Inventory is tricky because of concurrency and partial visibility. You want to reserve inventory when an order is created and adjust reservations when shipments move or cancellations occur. One practical approach is to treat reservations as a ledger that sits against a SKU and location.
from dataclasses import dataclass, field
from typing import Dict, List
from datetime import datetime
@dataclass
class InventoryLedgerEntry:
sku: str
location: str
qty: int # positive for inbound, negative for outbound
reference: str # e.g., order_id or shipment_id
timestamp: datetime
@dataclass
class InventoryView:
ledger: List[InventoryLedgerEntry] = field(default_factory=list)
def on_hand(self, sku: str, location: str) -> int:
return sum(e.qty for e in self.ledger if e.sku == sku and e.location == location)
def reserve(self, sku: str, location: str, order_id: str, qty: int) -> bool:
# naive optimistic check
available = self.on_hand(sku, location)
if available < qty:
return False
self.ledger.append(
InventoryLedgerEntry(
sku=sku,
location=location,
qty=-qty, # outbound reservation
reference=order_id,
timestamp=datetime.utcnow(),
)
)
return True
def ship(self, sku: str, location: str, order_id: str, qty: int) -> None:
# remove reservation and record shipment
self.ledger.append(
InventoryLedgerEntry(
sku=sku,
location=location,
qty=0, # accounting line
reference=f"ship-{order_id}",
timestamp=datetime.utcnow(),
)
)
# Example usage
inv = InventoryView()
inv.ledger.append(InventoryLedgerEntry(sku="ABC-123", location="WH-A", qty=100, reference="initial", timestamp=datetime.utcnow()))
inv.reserve("ABC-123", "WH-A", "ord-1001", 2)
inv.reserve("ABC-123", "WH-A", "ord-1002", 98)
inv.reserve("ABC-123", "WH-A", "ord-1003", 1) # will fail due to insufficient stock
print(inv.on_hand("ABC-123", "WH-A"))
In production, this naive ledger would be replaced with a more robust mechanism that includes optimistic concurrency control or a dedicated inventory service with strong consistency guarantees. Tools like Apache Kafka or Amazon Kinesis can be used to stream inventory events and update projections in near real-time.
Asynchronous order-to-shipment workflow
Real-world SCM workflows are asynchronous. Orders are placed, reservations are made, shipments are scheduled, and exceptions happen. Using an event-driven approach and a workflow orchestrator like Cadence, Temporal, or even a simple state machine helps manage this complexity.
Here is a sketch of a simple workflow using a state machine in Python. It demonstrates order creation, inventory reservation, shipment planning, and fulfillment. In practice, you would wrap this in a durable orchestrator.
from dataclasses import dataclass
from enum import Enum
from typing import Optional, List
import time
class WorkflowState(Enum):
CREATED = "CREATED"
RESERVED = "RESERVED"
SHIPPED = "SHIPPED"
FULFILLED = "FULFILLED"
FAILED = "FAILED"
@dataclass
class WorkflowContext:
order_id: str
sku: str
qty: int
location: str
state: WorkflowState
attempts: int = 0
last_error: Optional[str] = None
def reserve_inventory(ctx: WorkflowContext, inv: InventoryView) -> bool:
success = inv.reserve(ctx.sku, ctx.location, ctx.order_id, ctx.qty)
if success:
ctx.state = WorkflowState.RESERVED
else:
ctx.last_error = "Inventory reservation failed"
return success
def plan_shipment(ctx: WorkflowContext) -> bool:
# Simulate a call to a TMS; could be HTTP, gRPC, etc.
# A real implementation would log and emit events.
ctx.state = WorkflowState.SHIPPED
return True
def fulfill_order(ctx: WorkflowContext) -> bool:
# Simulate fulfillment confirmation
ctx.state = WorkflowState.FULFILLED
return True
def run_order_workflow(order_id: str, sku: str, qty: int, location: str, inv: InventoryView) -> WorkflowContext:
ctx = WorkflowContext(order_id=order_id, sku=sku, qty=qty, location=location, state=WorkflowState.CREATED)
# Step 1: Reserve inventory
if not reserve_inventory(ctx, inv):
ctx.state = WorkflowState.FAILED
return ctx
# Step 2: Plan shipment
plan_shipment(ctx)
# Step 3: Fulfill
fulfill_order(ctx)
return ctx
# Example usage
inv = InventoryView()
inv.ledger.append(InventoryLedgerEntry(sku="ABC-123", location="WH-A", qty=100, reference="initial", timestamp=datetime.utcnow()))
ctx = run_order_workflow("ord-2001", "ABC-123", 2, "WH-A", inv)
print(f"Workflow state: {ctx.state}")
This is intentionally minimal. In the real world, you would run this in a durable orchestrator with retries, timeouts, and compensation logic for partial failures. For instance, if shipment planning fails, you must release the inventory reservation. That is where the event log shines: you can build compensating actions by reading the event stream.
Error handling and retries with exponential backoff
Network calls to external systems (e.g., a supplier’s EDI endpoint) are prone to transient errors. A robust retry strategy is essential.
import time
import random
from functools import wraps
def retry(times: int, delay: float = 0.5, backoff: float = 2.0):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
last_exc = None
for attempt in range(times):
try:
return func(*args, **kwargs)
except Exception as exc:
last_exc = exc
if attempt < times - 1:
sleep = delay * (backoff ** attempt) + (random.random() * 0.2)
time.sleep(sleep)
raise last_exc
return wrapper
return decorator
@retry(times=5, delay=0.2, backoff=1.5)
def call_supplier_api(supplier_url: str, payload: dict):
# Simulate flakiness
if random.random() < 0.7:
raise ConnectionError("Supplier API transient failure")
return {"status": "accepted", "payload": payload}
result = call_supplier_api("https://supplier.example.com/ship", {"order_id": "ord-2001"})
print(result)
Real-world architecture patterns
Most mature SCM systems I have worked with share a similar structure: a core domain service, an event bus, and a set of adapters for external systems. The core domain service handles business rules, while adapters translate between external formats and internal models.
Event-driven architecture with adapters
Adapters are essential for normalization. Suppliers may send EDI X12 or JSON via SFTP, APIs, or message queues. A robust adapter layer can normalize events into a canonical format, then publish them to a central event bus (Kafka, NATS, or AWS EventBridge). Downstream services consume these events to update projections and trigger workflows.
For example, when a shipment event arrives, it should be normalized to a canonical event like ShipmentDeparted or ShipmentDelivered. These events can then update inventory projections and notify downstream systems. The key is to keep adapters thin and push domain logic into the core service.
Honest evaluation: strengths, weaknesses, and tradeoffs
Strengths
- Event-sourced models provide auditability and flexibility. You can reconstruct state at any point in time and feed multiple projections.
- Asynchronous workflows with durable orchestrators provide resilience to partial failures and allow for complex compensation logic.
- Adapters enable integration with heterogeneous external systems without forcing all participants to adopt a single standard.
Weaknesses
- Event-driven systems introduce eventual consistency. This is fine for many SCM workflows but problematic for scenarios requiring strong, immediate consistency.
- The operational complexity increases: you need robust monitoring, dead-letter handling, and replay capabilities.
- Building adapters for every supplier can be time-consuming and requires domain knowledge of formats like EDI.
Tradeoffs
- Use eventual consistency with projections when you can tolerate slight delays (e.g., inventory availability).
- Use strong consistency for critical steps like final inventory allocation when stock is limited.
- Choose your orchestrator carefully: a general-purpose workflow engine like Temporal adds durability but also infrastructure overhead. For simpler workflows, a lightweight state machine might suffice.
Personal experience: Lessons from the trenches
I learned the importance of modeling events the hard way. On one project, we tried to store only the current state of orders and shipments. When a supplier changed their message format, we lost the original event data, making it impossible to reconcile discrepancies. Moving to an event-sourced model not only fixed our audit trail but also made it easier to onboard new suppliers. Each adapter could emit events in a canonical format, and the core service stayed unchanged.
Another lesson is to treat data contracts as living documents. Early on, we defined JSON schemas for key events and stored them in version control. When a supplier added a new field, we could version the schema and update the adapter without breaking downstream consumers. This small investment paid off during a peak season when we had to onboard a new distributor under tight deadlines.
A common mistake is to over-index on the tool rather than the domain. Picking a fancy orchestrator or a hot database does not replace understanding the supply chain’s realities. The best systems I have worked on were built by engineers who spent time with planners and warehouse staff to understand constraints and exceptions.
Getting started: Setup and workflow
For engineers new to SCM systems, start with a simple project structure that separates domain logic from adapters. Focus on a clear mental model: events drive updates, and services react to events. Use a durable orchestrator for workflows and an event bus for inter-service communication.
Here is a typical project structure:
supply-chain-core/
├── domain/
│ ├── models.py
│ ├── events.py
│ └── aggregates.py
├── services/
│ ├── order_service.py
│ ├── inventory_service.py
│ └── workflow_service.py
├── adapters/
│ ├── supplier_adapter.py
│ ├── tms_adapter.py
│ └── erp_adapter.py
├── infrastructure/
│ ├── event_bus.py
│ ├── orchestrator.py
│ └── persistence.py
├── tests/
│ ├── unit/
│ └── integration/
└── scripts/
└── seed.py
A minimal Docker compose setup for local development can include Kafka and a Postgres database:
version: "3.8"
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.4.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:7.4.0
depends_on:
- zookeeper
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
postgres:
image: postgres:14
environment:
POSTGRES_USER: scm
POSTGRES_PASSWORD: scm
POSTGRES_DB: scm
ports:
- "5432:5432"
To run a simple event consumer in Python, you can use the kafka-python library:
from kafka import KafkaConsumer
import json
consumer = KafkaConsumer(
"supply-chain-events",
bootstrap_servers=["localhost:9092"],
value_deserializer=lambda m: json.loads(m.decode("utf-8")),
)
for message in consumer:
event = message.value
print(f"Consumed event: {event['event_type']} for {event['aggregate_id']}")
What makes SCM systems stand out
SCM systems stand out because they live at the intersection of data integration and domain complexity. The key differentiators are:
- The ability to model events as immutable facts, enabling auditability and flexible projections.
- The capacity to integrate diverse external systems via adapters, translating between formats and protocols.
- The resilience to partial failures via asynchronous workflows, retries, and compensating actions.
Developer experience is improved when you have a clear separation between domain logic and infrastructure, and when you invest in observability from day one. Metrics, tracing, and structured logging are not optional; they are the difference between a manageable incident and a chaotic one.
Free learning resources
- Supply Chain Analytics on Coursera — A good primer on the business side of SCM, which helps engineers understand the constraints that drive system requirements.
- EDI University (TrueCommerce) — Practical examples of EDI X12 documents. Essential reading for adapter development.
- Temporal documentation — A durable execution platform for workflows. Useful for modeling complex SCM processes with retries and compensation.
- Kafka documentation — Event streaming fundamentals. Crucial for building event-driven architectures.
- Inventory Management: Principles and Concepts by CSCMP — A foundational text that clarifies the business logic behind inventory and fulfillment.
Summary and final thoughts
SCM systems are best approached as event-driven, adapter-rich platforms that balance consistency and resilience. If you are building a system that must integrate with multiple suppliers, manage inventory across locations, and handle exceptions gracefully, the patterns described here will serve you well. If you are operating in a single-vendor environment with minimal integration needs, a packaged solution may be more practical.
Start small. Model your core domain entities as aggregates, emit events for every meaningful state change, and design adapters that normalize external data. Invest in observability early, and choose an orchestrator that matches your complexity. The supply chain is a living system; your software should be able to adapt alongside it.




