Event Mesh Architecture Patterns

·16 min read·Architecture and Designintermediate

Why distributed systems need better event routing in an increasingly connected world

abstract network diagram showing interconnected nodes with event arrows flowing between microservices and cloud platforms

If you have ever tried to orchestrate events across microservices, serverless functions, and legacy monoliths, you know the pain. One service emits an event that three other services need, but only one of them is online. Another service needs the same event but only if it satisfies certain criteria. Suddenly you are writing brittle point-to-point integrations, or worse, you are coupling services that should never know about each other.

I ran into this exact problem a few years back while helping a retail client migrate from a monolithic order processor to microservices. The inventory service, the recommendation engine, and the fraud detection module all needed order events, but each needed a different subset of fields and different delivery guarantees. Our initial approach was a shared message broker with a fan-out pattern. It worked until it did not. Routing logic crept into every service, debugging became a nightmare, and event schemas drifted quietly. That is when we started exploring event mesh patterns in earnest.

In this post, we will look at what an event mesh is, why it matters right now, and how you can implement practical patterns without over-engineering. We will look at code examples in Go and TypeScript for realistic scenarios, discuss tradeoffs, and share lessons learned the hard way. If you are building distributed systems where events matter, this should give you a grounded view of the architectural landscape.

Context: Where event mesh fits in the modern stack

Event-driven architectures are not new, but the complexity of modern systems has pushed the need for a more flexible event routing layer. Microservices, serverless functions, edge deployments, and SaaS integrations all generate and consume events. A traditional message broker is great for point-to-point or pub/sub within a single environment, but it often falls short when events need to flow across heterogeneous environments with varying delivery semantics, security boundaries, and data formats.

An event mesh is a network of interconnected event brokers that provides dynamic routing, transformation, and policy enforcement for events across the organization. Think of it as a nervous system for your services, where events flow across boundaries without each service needing to know the topology. It decouples producers from consumers, supports multiple protocols, and enables sophisticated routing based on content, context, or consumer subscriptions.

You will typically see event meshes implemented using technologies like Apache Kafka, NATS JetStream, RabbitMQ, or cloud-native brokers such as Azure Event Hubs, AWS EventBridge, and Google Pub/Sub. Some vendors offer proprietary meshes, but the patterns remain similar. The goal is to route events reliably, enforce governance, and allow services to evolve independently.

Compared to a single broker, an event mesh adds layers of abstraction and routing. This introduces complexity, but the payoff is flexibility. If you run services across clouds, on-prem, and the edge, an event mesh can simplify integration. If you are a small team with a handful of services, a single broker may be enough. The decision is situational, and we will get into that.

Core concepts and architectural patterns

Before diving into code, it helps to have a mental model. At the heart of an event mesh are event producers, consumers, brokers, and the routing logic that ties them together. Events flow from producers to brokers, which route them to consumers based on topics, queues, and subscriptions. An event mesh extends this by connecting multiple brokers and adding transformation and policy layers.

Event routing and subscription models

In pub/sub models, producers publish events to a topic, and consumers subscribe to topics. In queue-based models, events are delivered to a single consumer. An event mesh often supports both. The key is to avoid hard-coding endpoints. Instead, define subscription rules that describe what an event is and where it should go.

For example, consider an order service that emits OrderCreated events. Inventory needs the event to reserve stock. Analytics needs the event to update dashboards. Fraud detection needs the event for scoring. You could publish the event to a single topic and let consumers filter. But if consumers need different formats or delivery guarantees, you may want separate topics or downstream transformation streams.

Schema governance and data contracts

Event schemas are first-class citizens. Without governance, services will evolve independently and break consumers. A practical approach is to use a schema registry and enforce compatibility rules. Avro, JSON Schema, and Protobuf are common choices. In an event mesh, schemas help routing decisions too. You can route based on the event type, version, or content.

Idempotency and ordering

Events can be delivered more than once, and ordering is not guaranteed across brokers. Consumers must be idempotent. For ordering within a partition, use keys. In our retail example, we keyed OrderCreated events by orderId to keep related operations in sequence within a single partition. For global ordering across brokers, you often need to accept tradeoffs in throughput.

Error handling and dead letter patterns

When a consumer fails repeatedly, move the event to a dead letter queue for analysis. In an event mesh, you might want centralized observability for DLQs to avoid hidden failures. Consider retries with exponential backoff and circuit breakers to prevent cascading failures.

Security and multi-tenancy

Event meshes often span environments with different security policies. Use TLS for transport, authentication with mTLS or OAuth, and fine-grained authorization on topics. Multi-tenancy may require namespacing or separate clusters. In cloud environments, use VPC peering or private links to keep traffic off the public internet.

Practical implementation: Code and patterns

Let’s build a minimal event mesh using two brokers connected through a router service. We will simulate a retail scenario where orders, inventory, and analytics consume events with different needs. We will use NATS JetStream for lightweight streaming and a custom router in Go to transform and route events. For consumers, we will show a TypeScript service that handles idempotency and DLQs.

Project structure

This is a simplified structure you can adapt. It includes the router, a producer, and two consumers.

event-mesh-demo/
├── router/
│   ├── main.go
│   ├── config.yaml
│   ├── Dockerfile
│   ├── go.mod
│   ├── go.sum
│   └── internal/
│       ├── broker/
│       │   ├── nats.go
│       │   └── routes.go
│       └── model/
│           └── event.go
├── producer/
│   ├── main.go
│   ├── go.mod
│   └── go.sum
├── consumer-analytics/
│   ├── src/
│   │   ├── index.ts
│   │   ├── consumer.ts
│   │   └── idempotency.ts
│   ├── package.json
│   ├── tsconfig.json
│   └── Dockerfile
├── consumer-inventory/
│   ├── src/
│   │   ├── index.ts
│   │   ├── consumer.ts
│   │   └── idempotency.ts
│   ├── package.json
│   ├── tsconfig.json
│   └── Dockerfile
├── docker-compose.yaml
└── README.md

NATS JetStream configuration

We will run two NATS servers with JetStream enabled to simulate separate brokers. The router will bridge them. Here is a docker-compose file to start them.

version: "3.8"

services:
  nats-broker-a:
    image: nats:2.9-alpine
    command: "-js -mc 2048 -store /data"
    ports:
      - "4222:4222"
      - "8222:8222"
    volumes:
      - ./data/broker-a:/data

  nats-broker-b:
    image: nats:2.9-alpine
    command: "-js -mc 2048 -store /data"
    ports:
      - "4223:4222"
      - "8223:8222"
    volumes:
      - ./data/broker-b:/data

  router:
    build: ./router
    environment:
      - NATS_URL_A=nats://nats-broker-a:4222
      - NATS_URL_B=nats://nats-broker-b:4222
    depends_on:
      - nats-broker-a
      - nats-broker-b
    ports:
      - "8080:8080"

  producer:
    build: ./producer
    environment:
      - NATS_URL=nats://nats-broker-a:4222
    depends_on:
      - nats-broker-a

  consumer-analytics:
    build: ./consumer-analytics
    environment:
      - NATS_URL=nats://nats-broker-b:4222
    depends_on:
      - nats-broker-b

  consumer-inventory:
    build: ./consumer-inventory
    environment:
      - NATS_URL=nats://nats-broker-b:4222
    depends_on:
      - nats-broker-b

Producer: Emitting OrderCreated events

The producer emits OrderCreated events to broker A. We include a schema version and a deterministic event ID for idempotency. In production, you would validate against a schema registry before publishing.

// producer/main.go
package main

import (
	"encoding/json"
	"log"
	"time"

	"github.com/nats-io/nats.go"
)

type OrderCreated struct {
	EventID     string    `json:"event_id"`
	OrderID     string    `json:"order_id"`
	CustomerID  string    `json:"customer_id"`
	Amount      float64   `json:"amount"`
	Currency    string    `json:"currency"`
	Items       []string  `json:"items"`
	SchemaVer   string    `json:"schema_ver"`
	OccurredAt  time.Time `json:"occurred_at"`
}

func main() {
	nc, err := nats.Connect("nats://localhost:4222")
	if err != nil {
		log.Fatal(err)
	}
	defer nc.Close()

	stream := "orders"
	event := OrderCreated{
		EventID:    "ord-12345-20251015-001",
		OrderID:    "12345",
		CustomerID: "cust-678",
		Amount:     199.99,
		Currency:   "USD",
		Items:      []string{"sku-1", "sku-2"},
		SchemaVer:  "1.0",
		OccurredAt: time.Now().UTC(),
	}

	data, err := json.Marshal(event)
	if err != nil {
		log.Fatal(err)
	}

	// Use a key for ordering by orderId
	err = nc.PublishMsg(&nats.Msg{
		Subject: stream,
		Header:  nats.Header{"X-Event-Type": []string{"OrderCreated"}},
		Data:    data,
	})
	if err != nil {
		log.Fatal(err)
	}

	log.Printf("Published OrderCreated: %s", event.EventID)
}

Router: Bridging brokers with transformation and routing

The router subscribes to events from broker A and publishes transformed events to broker B. It routes based on event type and content. For example, inventory gets a minimal payload, analytics gets the full payload.

// router/internal/broker/nats.go
package broker

import (
	"log"

	"github.com/nats-io/nats.go"
)

type Broker struct {
	ncA *nats.Conn
	ncB *nats.Conn
}

func NewBroker(urlA, urlB string) (*Broker, error) {
	ncA, err := nats.Connect(urlA)
	if err != nil {
		return nil, err
	}
	ncB, err := nats.Connect(urlB)
	if err != nil {
		return nil, err
	}
	return &Broker{ncA: ncA, ncB: ncB}, nil
}

func (b *Broker) Close() {
	b.ncA.Close()
	b.ncB.Close()
}

// router/internal/broker/routes.go
package broker

import (
	"encoding/json"
	"log"

	"github.com/nats-io/nats.go"
)

type OrderCreated struct {
	EventID     string  `json:"event_id"`
	OrderID     string  `json:"order_id"`
	CustomerID  string  `json:"customer_id"`
	Amount      float64 `json:"amount"`
	Currency    string  `json:"currency"`
	Items       []string `json:"items"`
	SchemaVer   string  `json:"schema_ver"`
	OccurredAt  string  `json:"occurred_at"`
}

func (b *Broker) SubscribeAndRoute() error {
	// Subscribe to orders stream on broker A
	_, err := b.ncA.Subscribe("orders", func(msg *nats.Msg) {
		var event OrderCreated
		if err := json.Unmarshal(msg.Data, &event); err != nil {
			log.Printf("Failed to unmarshal event: %v", err)
			return
		}

		// Route to analytics (full event)
		analyticsPayload, _ := json.Marshal(event)
		if err := b.ncB.PublishMsg(&nats.Msg{
			Subject: "analytics.orders",
			Header:  nats.Header{"X-Event-Type": []string{"OrderCreated"}},
			Data:    analyticsPayload,
		}); err != nil {
			log.Printf("Failed to publish analytics: %v", err)
		}

		// Route to inventory (minimal payload)
		minPayload := map[string]interface{}{
			"order_id": event.OrderID,
			"amount":   event.Amount,
			"items":    event.Items,
		}
		minData, _ := json.Marshal(minPayload)
		if err := b.ncB.PublishMsg(&nats.Msg{
			Subject: "inventory.orders",
			Header:  nats.Header{"X-Event-Type": []string{"OrderCreated"}},
			Data:    minData,
		}); err != nil {
			log.Printf("Failed to publish inventory: %v", err)
		}
	})
	if err != nil {
		return err
	}
	return nil
}

Consumer: TypeScript service with idempotency and DLQ

Consumers on broker B handle events. We implement idempotency with a Redis-backed store and route failures to a DLQ. This is a simplified pattern, but it reflects typical production setups.

// consumer-analytics/src/idempotency.ts
import Redis from 'ioredis';

export class IdempotencyStore {
  private redis: Redis;
  constructor(redisUrl: string) {
    this.redis = new Redis(redisUrl);
  }

  async isProcessed(eventId: string): Promise<boolean> {
    const key = `processed:${eventId}`;
    const val = await this.redis.get(key);
    return val === '1';
  }

  async markProcessed(eventId: string, ttlSeconds: number = 3600): Promise<void> {
    const key = `processed:${eventId}`;
    await this.redis.set(key, '1', 'EX', ttlSeconds);
  }
}
// consumer-analytics/src/consumer.ts
import { connect, JSONCodec } from 'nats';
import { IdempotencyStore } from './idempotency';

const jc = JSONCodec();

export async function startConsumer(natsUrl: string, redisUrl: string) {
  const nc = await connect({ servers: natsUrl });
  const js = nc.jetstream();
  const store = new IdempotencyStore(redisUrl);

  // Create or bind stream and consumer
  const stream = await js.streams.find('analytics');
  if (!stream) {
    await js.streams.add({ name: 'analytics', subjects: ['analytics.orders'] });
  }

  const consumer = await js.consumers.find('analytics', 'analytics-dashboard');
  if (!consumer) {
    await js.consumers.add('analytics', {
      durable_name: 'analytics-dashboard',
      deliver_policy: 'all',
      ack_policy: 'explicit',
    });
  }

  const c = js.consumer('analytics', 'analytics-dashboard');
  for await (const msg of await c.consume()) {
    try {
      const event = jc.decode(msg.data);
      const eventId = event.event_id as string;

      // Idempotency check
      if (await store.isProcessed(eventId)) {
        msg.ack(); // Already processed, acknowledge to move on
        continue;
      }

      // Business logic: update analytics store
      await updateAnalytics(event);

      // Mark processed
      await store.markProcessed(eventId, 3600);
      msg.ack();
    } catch (err) {
      console.error('Processing error:', err);

      // NAK with delay, or move to DLQ
      // For demo, publish to DLQ and ack to avoid blocking
      try {
        const dlqPayload = { error: String(err), original: jc.decode(msg.data) };
        await nc.publish('analytics.orders.dlq', jc.encode(dlqPayload));
      } catch (dlqErr) {
        console.error('Failed to publish to DLQ:', dlqErr);
      }
      msg.ack(); // Acknowledge to remove from active queue
    }
  }
}

async function updateAnalytics(event: any) {
  // Placeholder for real analytics logic
  console.log('Analytics update for order', event.order_id);
}

Real-world patterns: When to use which approach

A single pub/sub broker can suffice for homogeneous systems. But as you scale and integrate across boundaries, patterns emerge:

  • Multi-broker routing: Connect brokers via routers or gateways to move events across environments. Use content-based routing to avoid duplicating events across topics.
  • Schema-based filtering: Route events by schema version or type. This reduces consumer coupling and avoids sending irrelevant events.
  • Edge-to-cloud: For IoT or mobile, you may need lightweight brokers at the edge (MQTT or NATS) that forward events to a central Kafka cluster. The router can transform MQTT messages into a standard Avro format.
  • Event sourcing with projection: Store events in a durable log, build projections in downstream services. An event mesh can feed multiple projections without changing the source.
  • Multi-tenant isolation: Use separate streams per tenant or encode tenant ID in keys. Ensure authorization policies match.

Example: Edge-to-cloud routing

Suppose you have edge devices publishing telemetry via MQTT. You want to route to Kafka for long-term storage and to a stream processor for real-time alerts. A gateway service can bridge MQTT to Kafka, set headers for routing, and handle backpressure.

// gateway/edge.go
package main

import (
	"encoding/json"
	"log"

	mqtt "github.com/eclipse/paho.mqtt.golang"
	"github.com/segmentio/kafka-go"
)

type Telemetry struct {
	DeviceID string  `json:"device_id"`
	Temp     float64 `json:"temp"`
	Humidity float64 `json:"humidity"`
}

func main() {
	mqttOpts := mqtt.NewClientOptions().AddBroker("tcp://localhost:1883")
	mqttClient := mqtt.NewClient(mqttOpts)
	if token := mqttClient.Connect(); token.Wait() && token.Error() != nil {
		log.Fatal(token.Error())
	}

	kafkaWriter := kafka.NewWriter(kafka.WriterConfig{
		Brokers:  []string{"localhost:9092"},
		Topic:    "telemetry.raw",
		Balancer: &kafka.LeastBytes{},
	})
	defer kafkaWriter.Close()

	// Subscribe to telemetry
	mqttClient.Subscribe("devices/+/telemetry", 1, func(c mqtt.Client, m mqtt.Message) {
		var t Telemetry
		if err := json.Unmarshal(m.Payload(), &t); err != nil {
			log.Printf("Failed to unmarshal telemetry: %v", err)
			return
		}

		data, _ := json.Marshal(t)
		err := kafkaWriter.WriteMessages(nil, kafka.Message{
			Key:   []byte(t.DeviceID),
			Value: data,
		})
		if err != nil {
			log.Printf("Failed to write to Kafka: %v", err)
		}
	})
}

This gateway is a simple router. In a full event mesh, you might have multiple gateways, policy engines, and schema validators. The pattern, however, remains the same: transform, route, and ensure delivery guarantees.

Honest evaluation: Strengths, weaknesses, and tradeoffs

Strengths

  • Decoupling: Services evolve independently when events are the contract.
  • Flexibility: Multiple protocols and delivery semantics can coexist.
  • Observability: With the right tooling, you can trace event flows across brokers.
  • Scalability: Partitioned streams scale horizontally and support high throughput.

Weaknesses

  • Complexity: A mesh introduces more moving parts. You need monitoring, alerting, and operational expertise.
  • Cost: Running multiple brokers and routers increases infrastructure and operational costs.
  • Latency: Additional hops add latency. If low latency is critical, you may prefer direct broker-to-consumer.
  • Data privacy: Routing across borders may require careful compliance handling.

When to use an event mesh

  • You operate across multiple clouds or on-prem environments.
  • You have heterogeneous consumers with different formats and guarantees.
  • You need centralized governance for schemas and security.
  • You want to avoid tight coupling and brittle integrations.

When to avoid an event mesh

  • You have a small number of services in a single environment.
  • Strict low-latency requirements demand minimal hops.
  • Your team lacks the operational bandwidth to manage a mesh.
  • Your events are mostly RPC-like request/reply rather than asynchronous streams.

Personal experience: Lessons learned

In that retail migration, we started with a single Kafka cluster. It worked until we needed to integrate a third-party logistics provider that required a different event format and delivery acknowledgment. Our initial approach was to duplicate events and transform them inside every consumer. This quickly became unmanageable.

We introduced a routing layer that normalized events into a canonical schema and produced separate streams for consumers. The biggest win was observability: we added headers for trace IDs and tenant IDs, and built a dashboard showing event flow across clusters. We also learned the hard way that idempotency must be built into consumers early. We had duplicate events during a network partition, and our analytics counts were off by single-digit percentages. That was enough to trigger a remediation plan.

Another surprise was how often schemas changed. We adopted a compatibility check in CI that rejected schema changes that would break existing consumers. This slowed feature velocity slightly, but it paid off in stability. If I could give one piece of advice, it would be to treat event schemas like API contracts with versioning and deprecation policies.

Getting started: Workflow and tooling

If you are new to event mesh patterns, start small. Build a proof of concept that bridges two brokers and routes events to two consumers. Focus on the mental model: events are immutable facts, consumers are idempotent, and routing decisions are declarative.

Tooling

  • Brokers: NATS JetStream for lightweight streaming, Kafka for large-scale partitioned streams. Use cloud-managed services if you can.
  • Schema registry: Confluent Schema Registry or Redpanda’s schema registry for Avro and JSON Schema.
  • Observability: OpenTelemetry for tracing, Prometheus and Grafana for metrics. Link events to trace IDs.
  • Local development: Docker Compose for multi-broker setups, as shown above.
  • Testing: Use replayable streams. Test idempotency, ordering, and DLQ handling.

Project workflow

  • Define event types and schemas before writing code.
  • Use a router or gateway for cross-environment flows.
  • Build consumers with idempotency and DLQ support early.
  • Add observability from day one. Without it, debugging is guesswork.

Free learning resources

Summary: Who should use event mesh and who might skip it

Event mesh architecture is not a silver bullet. It is a practical response to the growing complexity of integrating distributed systems. If you operate across clouds, manage heterogeneous consumers, or need governance and observability for event flows, an event mesh can provide a robust foundation. You will pay with added complexity and operational overhead, but you gain flexibility and resilience.

If you are a small team with a handful of services in a single environment, a well-designed pub/sub broker and clear contracts may be enough. If low latency is paramount, keep hops minimal and avoid unnecessary routing layers. If your events are mostly synchronous RPC, consider whether an event-driven approach is the right fit at all.

For the rest of us, building and evolving event meshes is a skill worth developing. Start with a single use case, measure outcomes, and grow deliberately. The patterns here, combined with practical tooling, can help you avoid the mistakes that many teams make early on. The key is to treat events as first-class citizens, invest in schemas and idempotency, and design for failure from the start.