Google Cloud’s Anthos for Hybrid Cloud
Why Anthos matters now: a single control plane for apps that span on‑prem, edge, and multiple clouds.

I’ve spent enough nights staring at blinking cursor lights in server rooms to know that “hybrid” is rarely a clean architecture decision. It’s a business reality. Regulations, latency, cost, legacy systems, and organizational politics all converge into a messy map of clusters that you must keep humming. The promise of the cloud was to simplify, but in practice many of us ended up managing a fleet of Kubernetes distributions, each with its own way of doing things. Anthos is Google Cloud’s answer to that mess, built around the idea that you should be able to manage applications consistently across on‑prem, edge, and public clouds without rewrites or duplicated pipelines.
If you’re reading this, you’ve likely wrestled with EKS, AKS, GKE, OpenShift, or even a home‑grown K8s flavor, and you’ve felt the friction when policies, observability, and deployment strategies differ. Anthos leans into Kubernetes as the universal runtime and layers a consistent control plane, policy engine, and service mesh on top so teams can treat a mix of environments as one logical platform. In this post, I’ll walk through what Anthos actually is, when it makes sense, how it looks in practice with code and configuration, and where I’ve found it to be a huge win… and where it’s been overkill.
*** here is placeholder: query = kubernetes clusters *** *** alt text for image = multiple Kubernetes clusters arranged in a grid with policy overlays and a central control plane dashboard ***
Where Anthos fits today
In real projects, Anthos sits at the intersection of platform engineering and delivery. It targets organizations that need to:
- Keep certain workloads on‑prem for data residency, latency, or compliance.
- Run services at the edge, close to customers or factory floors.
- Avoid cloud lock‑in by maintaining a consistent way to deploy across multiple clouds.
From a developer’s perspective, you push container images and Kubernetes manifests. From an operator’s perspective, you have one place to enforce policies, view metrics, and roll out changes. At a high level, Anthos resembles managed Kubernetes offerings (like GKE, EKS, AKS) but with added layers for multi‑cluster governance and a service mesh that works across environments. It also competes conceptually with platforms like Rancher and OpenShift, which provide multi‑cluster management, but Anthos ties more tightly into Google Cloud’s operational tooling and billing model.
The key difference is consistency. Instead of writing separate Helm values for each environment or crafting bespoke CI jobs, you define a config repository that Anthos syncs to clusters. You define policies once and they’re enforced everywhere. The service mesh lets you route traffic between services across clusters and clouds using a unified model. It’s not magic, but it reduces the number of places you have to remember to configure something.
Core concepts and capabilities
The control plane: Config Management and Fleet concept
Anthos organizes clusters into a “fleet.” Policies and configs can be applied fleet‑wide. Under the hood, Config Sync and Policy Controller (which uses Gatekeeper) are the workhorses for declarative state.
Config Sync pulls from a Git repository and reconciles cluster state with what’s declared there. This is GitOps‑style, but without requiring you to pick and wire up your own tooling. Policy Controller enforces constraints (for example, “no LoadBalancer services in a specific cluster” or “images must come from a trusted registry”) by intercepting requests to the Kubernetes API server.
A typical project layout might look like this:
acme-platform/
├── clusters/
│ ├── prod-us-central/
│ │ ├── config.yaml
│ │ └── sync-branch.yaml
│ └── dev-eu-west/
│ ├── config.yaml
│ └── sync-branch.yaml
├── configs/
│ ├── base/
│ │ ├── namespaces/
│ │ │ ├── team-a.yaml
│ │ │ └── team-b.yaml
│ │ ├── network/
│ │ │ └── networkpolicies/
│ │ │ └── deny-all.yaml
│ │ └── policy/
│ │ └── constraints/
│ │ ├── require-trusted-images.yaml
│ │ └── restrict-loadbalancer.yaml
│ └── overlays/
│ ├── prod/
│ │ └── ingress/
│ │ └── tls-issuer.yaml
│ └── dev/
│ └── ingress/
│ └── tls-issuer.yaml
├── apps/
│ ├── checkout/
│ │ ├── base/
│ │ │ ├── deployment.yaml
│ │ │ ├── service.yaml
│ │ │ └── kustomization.yaml
│ │ └── overlays/
│ │ ├── prod/
│ │ │ └── replica-patch.yaml
│ │ └── dev/
│ │ └── replica-patch.yaml
└── README.md
This structure mirrors how platform teams actually work: base configs are shared, overlays differ by environment, and apps are versioned alongside policies.
Service mesh: Anthos Service Mesh (ASM)
ASM is based on Istio and provides traffic management, security (mTLS), and observability for services. It’s optional but powerful when you have services talking across clusters or clouds. Instead of each team configuring ingress and egress in ad‑hoc ways, ASM lets you define virtual services and destinations that route consistently.
Think of it as the connective tissue of your fleet. If you have a checkout service in AWS and a payment service in GCP, ASM makes it straightforward to define secure, observable routes without hardcoding IPs or reinventing retry logic.
Migrate for Anthos and Containers
When you need to lift‑and‑shift VMs into containers, Migrate for Anthos helps rehost without rewriting. It’s not a daily developer tool, but it’s a pragmatic bridge for legacy apps. In practice, I’ve seen teams migrate a handful of monoliths this way, then gradually refactor them into microservices.
Bare Metal and edge
Anthos can run on bare metal in your data center, which is useful if you don’t have a hypervisor or prefer to avoid it. For edge, the ability to deploy small clusters with consistent policies and observability is a real advantage, especially in retail or industrial scenarios where connectivity is intermittent.
A practical example: GitOps with Config Sync
Let’s walk through a realistic pattern: you want to enforce a default deny network policy across all clusters and allow specific ingress for an app. You also want to require that all images come from your trusted registry. We’ll define this once and let Anthos propagate it.
1. Policy Controller constraints
Constraint templates define what’s enforceable. Here’s a simple constraint that requires images to come from your trusted registry (using the K8sAllowedRepos constraint from the Gatekeeper library):
# configs/base/policy/constraints/require-trusted-images.yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
name: require-trusted-registry
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
repos:
- "gcr.io/acme-registry/"
- "us.gcr.io/acme-registry/"
This policy will block pods that use images outside the allowed repos. You can layer additional constraints for resource limits, host networking, or node selectors depending on your security model.
2. Default deny network policy
# configs/base/network/networkpolicies/deny-all.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: team-a
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
In a real project, you’d extend this with per‑namespace policies to allow traffic between specific services. The goal is a baseline of security that you don’t have to remember to add per app.
3. App overlay for production
For an app called checkout, here’s a minimal Kustomize overlay to bump replicas in production:
# apps/checkout/overlays/prod/replica-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout
spec:
replicas: 5
And the Kustomization that ties it together:
# apps/checkout/overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patchesStrategicMerge:
- replica-patch.yaml
This pattern scales well. Teams can version their apps, and platform teams can version policies. When you connect the Git repo to Config Sync, clusters continuously reconcile toward this desired state.
A real‑world workflow
In a multi‑cloud project I worked on, we used Anthos to unify a GKE cluster in GCP, an AKS cluster in Azure, and an on‑prem K8s cluster. Each had different ingress controllers. With ASM, we standardized traffic flow and security. Config Sync ensured that network policies and quota defaults were identical. We added a small edge cluster for a retail location; the same policies applied, and we used overlays to tweak resource requests for the constrained hardware.
Here’s a condensed view of how we set up Config Sync per cluster:
# This is a conceptual workflow; exact commands vary by environment.
# Enable required APIs in GCP.
gcloud services enable anthos.googleapis.com \
gkehub.googleapis.com \
configmanagement.googleapis.com
# Register the cluster with the fleet.
gcloud container fleet memberships register acme-membership \
--project=acme-project \
--location=us-central1 \
--gke-cluster=us-central1/acme-cluster
# Apply Config Management to the cluster.
cat <<EOF | kubectl apply -f -
apiVersion: configmanagement.gke.io/v1
kind: ConfigManagement
metadata:
name: config-management
spec:
git:
syncRepo: https://github.com/acme/acme-platform.git
syncBranch: main
policyDir: "configs/base"
secretType: gitSecret
policyController: true
sourceFormat: kustomize
EOF
From here, you structure your Git repo as shown earlier. When a developer merges a PR to the apps/checkout/overlays/prod folder, the production cluster reconciles the change through Config Sync. Meanwhile, Policy Controller ensures that any accidental image from a public registry is rejected.
Service mesh in practice
ASM shines when you need mTLS and advanced routing. For example, you might route 90% of traffic to the current version and 10% to a new version of a service. With Istio’s VirtualService, this becomes declarative:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: checkout
namespace: team-a
spec:
hosts:
- checkout.team-a.svc.cluster.local
http:
- route:
- destination:
host: checkout.team-a.svc.cluster.local
subset: v1
weight: 90
- destination:
host: checkout.team-a.svc.cluster.local
subset: v2
weight: 10
In practice, you’ll also define DestinationRule subsets tied to labels. Because ASM integrates with Anthos, you can apply these rules across clusters. The real win is that you don’t need a separate service mesh deployment per cloud. It’s one model, enforced consistently.
Honest evaluation
Strengths
- Consistency across environments: The same policies, deployment patterns, and observability across on‑prem, edge, and multiple clouds.
- Security posture: Policy Controller and network policies provide guardrails that don’t rely on developer diligence alone.
- Developer experience: Teams push to Git and see changes reflected without crafting bespoke pipelines for each cluster.
- Operational visibility: With Anthos integrated into Google Cloud’s operations suite, you get unified metrics and logs. This is not exclusive to Anthos, but it’s well integrated.
- Service mesh: For cross‑cluster communication, ASM reduces operational complexity and improves security by default.
Weaknesses
- Cost and licensing: Anthos has licensing costs, which can be significant for smaller organizations. You must evaluate whether the consistency gains offset the spend.
- Complexity: It’s not trivial. Setting up fleet membership, Git repositories, overlays, and service mesh takes time and expertise.
- Opinionated stack: You’re buying into Google Cloud’s way of doing GitOps and mesh. If you already have strong in‑house tooling (like Argo CD and Linkerd), migration may not be worth it.
- Learning curve: For teams new to Kubernetes or GitOps, the combined surface area can be overwhelming. Training and iteration are required.
When to use Anthos
- You have a genuine hybrid strategy, with workloads that must live in multiple environments.
- Security and compliance require uniform guardrails across clusters.
- You want to reduce tool fragmentation and standardize on a single control plane.
When to skip it
- All your workloads are in a single cloud and your Kubernetes strategy is stable.
- Your team already has robust, standardized tooling (GitOps with Argo, service mesh with Linkerd) and doesn’t benefit from integration.
- Budget is tight and the operational savings don’t justify the licensing.
Personal experience and lessons learned
I learned the hard way that “policy as code” only works when you make it easy to test. Early on, we merged a network policy that unintentionally blocked health checks. The cluster didn’t go down, but monitoring broke. To avoid this, we introduced a small “policy test” step that runs conftest or OPA tests against PRs. That step catches obvious mistakes before Config Sync reconciles.
Another common mistake is treating overlays as “magic.” If you don’t keep a clear map of what overlays do, you’ll end up with subtle configuration drift. We now maintain a simple README.md next to each overlay explaining its purpose and ownership. For example:
# overlays/prod
- Purpose: Increase replicas, enforce production TLS issuer
- Owner: platform-team
- Applies to: prod-us-central, prod-eu-west
- Rollout: merges to this folder trigger automatic sync
The moment Anthos proved its value was during a cross‑cloud incident. A latency spike hit one cloud. With ASM, we adjusted traffic weights and enabled circuit breaking rules in minutes, without touching application code. Observability across clusters let us correlate metrics and isolate the issue. That’s the kind of leverage that’s hard to achieve with fragmented tooling.
Getting started: tooling and workflow
You don’t need to boil the ocean. Start with a small fleet: one dev cluster and one prod cluster. Use Google Cloud’s Console to register clusters with the fleet and enable Config Management. Keep your Git repository simple at first: base configs for namespaces, network policies, and one policy constraint.
Workflow mental model
- Declarative over imperative: Your Git repo is the source of truth. Merges drive change.
- Overlays for environments: Keep differences small and explicit. Avoid environment‑specific logic in base configs.
- Policy as code: Start with guardrails (image registry, resource quotas). Add more constraints as you gain confidence.
- Service mesh gradually: Start with observability and mTLS before complex routing rules.
- Test before sync: Use policy tests or dry‑runs in CI to catch mistakes early.
Tooling checklist
- A Git repository (GitHub, GitLab, or Cloud Source Repositories).
- GKE or Anthos‑compatible Kubernetes clusters registered to the fleet.
- Config Management enabled with policy controller.
- A container registry (GCR or Artifact Registry) with appropriate IAM.
- Optional: Anthos Service Mesh for cross‑cluster traffic.
Free learning resources
- Anthos documentation: The official docs are the best place to understand fleet management, Config Sync, and ASM. See Google Cloud Anthos documentation.
- Gatekeeper policy library: Explore constraint templates for Policy Controller in the Gatekeeper project.
- Istio’s user guides: For ASM patterns, Istio’s docs are invaluable. See Istio documentation.
- Kustomize tutorials: To structure overlays cleanly, review Kustomize docs.
- Open Policy Agent: To deepen policy as code skills, OPA documentation is excellent.
Summary: who should use it and who might skip it
Anthos is a strong fit for organizations with hybrid or multi‑cloud footprints that want consistent security, deployment, and observability across clusters. It’s particularly valuable for platform teams tasked with standardizing operations while giving developers a predictable path to ship code. If you’re looking to reduce tool sprawl and enforce policy at the fleet level, Anthos delivers.
You might skip it if your stack is already uniform, your budgets are tight, or your organization’s needs are comfortably served by a single cloud and existing GitOps tooling. In those cases, Anthos may add complexity without enough ROI.
The ultimate takeaway is pragmatic: hybrid is the reality for many, and Anthos offers a way to tame that reality. Treat it as a platform investment, not a silver bullet. Start small, codify your policies, and iterate. When done well, you’ll spend less time fighting configuration drift and more time delivering features that matter.




