Automating Infrastructure Compliance
Why shifting compliance checks into code is becoming essential for fast-moving teams

Keeping cloud infrastructure compliant used to mean long checklists, manual ticket approvals, and waiting for a quarterly audit to uncover drift. In the last few years, teams have moved to infrastructure as code (IaC) and continuous deployment, which changes the equation: if infrastructure is defined in code, compliance can be too. Automating infrastructure compliance lets you validate security baselines, policy rules, and configuration standards as part of the build and deploy pipeline, catching missteps early and proving governance continuously. It turns compliance from a one-time event into a routine, repeatable, and testable practice.
For many developers and engineers, compliance sounds like a legal or audit concern, something other teams "own." In reality, anyone deploying cloud resources touches compliance. Whether you are writing Terraform for a microservice, configuring Kubernetes, or managing S3 buckets, you are setting controls that auditors will eventually inspect. Automation helps close the gap between how we build and how we prove we built it correctly. It can also reduce toil, especially in regulated industries where evidence collection is time consuming. But it is not magic. It introduces new tradeoffs, new abstractions, and a need for thoughtful governance that balances speed with risk.
Where compliance automation fits today
Compliance automation sits at the intersection of DevOps, security, and governance. In practice, it shows up as policy-as-code checks in CI pipelines, pre-deploy validations, and ongoing posture monitoring. Teams commonly enforce rules like "no public S3 buckets," "encryption at rest for databases," or "no wildcard IAM policies." The goal is to codify standards derived from internal policies, regulatory frameworks (like SOC 2, HIPAA, PCI DSS), or cloud security benchmarks (like the CIS AWS Foundations Benchmark).
The ecosystem is mature enough for production use, with options spanning multiple layers. For Terraform, Open Policy Agent’s Terraform integration and Sentinel (a commercial product for Terraform Cloud/Enterprise) are popular. Cloud providers also offer native guardrails: AWS has AWS Config Rules and AWS Control Tower guardrails; Azure Policy and Azure Blueprints; GCP Security Command Center and Organization Policies. Kubernetes teams often lean on admission controllers like OPA Gatekeeper or Kyverno. These approaches are not mutually exclusive. It is common to combine pre-deploy static checks (e.g., terraform plan analyzed by OPA) with post-deploy continuous validation (e.g., AWS Config evaluating deployed resources).
Who uses these tools? Platform engineers and DevOps teams implement policy-as-code pipelines. Security teams define the rules and thresholds. Developers benefit by receiving faster feedback instead of waiting for security reviews. Compared to manual audits or ticket-based approvals, automation reduces cycle time, improves consistency, and creates an audit trail. Against alternative approaches like relying solely on cloud-native controls, policy-as-code gives portability across environments and better developer experience, because rules live in your repo and can be tested.
Core concepts of infrastructure compliance automation
Compliance automation relies on a few foundational concepts:
- Policy as code: Rules expressed in a declarative language (Rego for OPA, Sentinel policy language, YAML/JSON for native cloud policies). These rules inspect infrastructure plans or live resources and return pass/fail decisions.
- Preventive vs detective controls: Preventive controls block non-compliant changes before deployment (e.g., CI checks). Detective controls monitor running infrastructure and flag drift (e.g., periodic scans).
- Scope and boundaries: Policies apply at different scopes—organization, account, project, or environment. Think in layers: guardrails at the organization level, finer-grained rules at the team level.
- Evidence and auditability: Compliance automation must produce durable evidence. This includes logs of policy evaluations, decisions, and context (who, what, when). That evidence should be easy to export or present to auditors.
A typical modern workflow looks like this:
- Author infrastructure as code (Terraform, CloudFormation, Pulumi, Helm, etc.).
- On pull request, run static analysis and policy checks against the plan or manifest.
- On merge, deploy to a staging environment and run a second layer of checks.
- Continuously evaluate deployed resources against compliance rules and alert on drift.
Practical implementation: OPA + Terraform in CI
Open Policy Agent (OPA) is a general-purpose policy engine widely used for infrastructure compliance. It uses Rego, a declarative policy language designed for complex data queries. A common pattern is to generate a Terraform plan JSON and evaluate it with Rego policies.
Here’s a minimal end-to-end example of a CI workflow that runs OPA checks against a Terraform plan:
Project structure:
terraform/
main.tf
variables.tf
outputs.tf
policy/
terraform/
deny_public_s3.rego
require_encryption.rego
.github/
workflows/
compliance.yml
A simple Terraform snippet that creates an S3 bucket (intentionally non-compliant for demonstration):
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
resource "aws_s3_bucket" "example" {
bucket = "example-bucket-12345"
tags = {
Environment = "dev"
}
}
resource "aws_s3_bucket_public_access_block" "example" {
bucket = aws_s3_bucket.example.id
block_public_acls = false
block_public_policy = false
ignore_public_acls = false
restrict_public_buckets = false
}
Rego policy that denies public S3 buckets:
package terraform.deny_public_s3
import future.keywords.in
# Deny if any S3 bucket has public access blocks set to allow public access
deny[msg] {
resource := input.planned_values.root_module.resources[_]
resource.type == "aws_s3_bucket_public_access_block"
block := resource.values
not block.block_public_acls
not block.block_public_policy
not block.ignore_public_acls
not block.restrict_public_buckets
msg := sprintf("S3 bucket public access block allows public exposure: %s", [resource.address])
}
Rego policy that requires encryption at rest for S3 buckets:
package terraform.require_encryption
import future.keywords.in
# Deny if an S3 bucket lacks server-side encryption configuration
deny[msg] {
resource := input.planned_values.root_module.resources[_]
resource.type == "aws_s3_bucket"
bucket_name := resource.address
# Look for aws_s3_bucket_server_side_encryption_configuration referencing this bucket
not has_encryption(bucket_name, input.planned_values.root_module.resources)
msg := sprintf("S3 bucket missing encryption at rest: %s", [bucket_name])
}
has_encryption(bucket_name, resources) {
some resource in resources
resource.type == "aws_s3_bucket_server_side_encryption_configuration"
resource.values.bucket == bucket_name
}
GitHub Actions workflow that generates a Terraform plan and runs OPA:
name: Compliance Check
on:
pull_request:
branches:
- main
jobs:
opa-compliance:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Setup OPA
uses: open-policy-agent/setup-opa@v2
with:
version: latest
- name: Terraform Init
run: |
cd terraform
terraform init -input=false
- name: Terraform Plan
run: |
cd terraform
terraform plan -out=tfplan -input=false
terraform show -json tfplan > tfplan.json
- name: Run OPA Policies
run: |
cd policy/terraform
opa test . --verbose
- name: Evaluate Terraform Plan
run: |
cd policy/terraform
opa eval --format pretty \
--input ../../terraform/tfplan.json \
--data deny_public_s3.rego \
--data require_encryption.rego \
"data.terraform.deny_public_s3.deny"
In a real pipeline, you would gate merges on OPA exit codes and annotate PRs with policy violations. A helpful pattern is to wrap the evaluation in a script that fails the job if any deny set is non-empty:
#!/bin/bash
set -e
cd policy/terraform
# Evaluate all policy packages under terraform/
violations=$(opa eval --format json \
--input ../../terraform/tfplan.json \
--data . \
"data.terraform[_].deny" | jq '.result[].expressions[].value')
if [[ "$violations" != "[]" && -n "$violations" ]]; then
echo "Policy violations found:"
echo "$violations" | jq .
exit 1
else
echo "No policy violations."
fi
This setup is pragmatic: it integrates with developer workflows, gives fast feedback, and ensures decisions are captured in logs. When paired with branch protections and mandatory PR reviews, it becomes a reliable preventive control.
Native cloud controls: AWS Config as a detective layer
Pre-deploy checks are only half the story. Once resources are deployed, you need detective controls that continuously monitor compliance. AWS Config is a common choice, offering managed rules and custom Lambda-backed rules. Here’s a minimal AWS Config rule defined in Terraform that checks for S3 bucket encryption:
resource "aws_config_configuration_recorder" "main" {
name = "primary-recorder"
role_arn = aws_iam_role.config_role.arn
}
resource "aws_config_delivery_channel" "main" {
name = "primary-channel"
s3_bucket_name = aws_s3_bucket.config_bucket.id
}
resource "aws_config_config_rule" "s3_encryption" {
name = "s3-bucket-server-side-encryption-enabled"
source {
owner = "AWS"
source_identifier = "S3_BUCKET_SERVER_SIDE_ENCRYPTION_ENABLED"
}
depends_on = [aws_config_configuration_recorder.main]
}
To make this work, you need IAM permissions for AWS Config, an S3 bucket for delivery, and the recorder enabled. The rule runs periodically and flags non-compliant resources. The results appear in the AWS Config console and can trigger SNS notifications or EventBridge rules to feed into ticketing systems or chatops.
It’s worth noting the interplay between preventive and detective controls. AWS Config can catch misconfigurations introduced outside of IaC (e.g., manual changes). However, it does not block deployments in the moment; that’s what the CI checks above provide. Together, they form a balanced approach: prevent drift and detect exceptions.
Kubernetes admission control with OPA Gatekeeper
For Kubernetes, policy-as-code can enforce constraints at admission time. OPA Gatekeeper is a CNCF project that integrates OPA with Kubernetes. Here’s a simple constraint template and a constraint that禁止 host networking in pods:
ConstraintTemplate:
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8sdisallownhostnetwork
spec:
crd:
spec:
names:
kind: K8sDisallowHostNetwork
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sdisallownhostnetwork
violation[{"msg": msg}] {
input.review.object.spec.hostNetwork == true
msg := "hostNetwork is not allowed"
}
Constraint:
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sDisallowHostNetwork
metadata:
name: disallow-host-network
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
This pattern is powerful because it prevents non-compliant resources from entering the cluster. Developers see immediate feedback when applying manifests via kubectl. It also reduces reliance on post-deploy scanning, although scanning is still valuable for drift detection. Many organizations build a library of constraint templates and version them alongside cluster definitions.
Tradeoffs and honest evaluation
Policy-as-code is not a silver bullet. The strengths are clear: faster feedback, consistent enforcement, and better auditability. But there are tradeoffs:
- Complexity: Rego can be tricky for newcomers. Writing robust policies requires careful testing against varied inputs. Tooling like
opa testhelps, but there is a learning curve. - False positives: Overly strict rules can block legitimate changes. Start with audit mode (warn-only) and gradually move to enforcement. Ensure policies have clear, actionable messages.
- Multi-cloud portability: Writing universal policies across AWS, Azure, and GCP is challenging due to differing resource models. Often, you end up with provider-specific policy sets, which is pragmatic.
- Operational overhead: Continuous compliance requires maintaining rules, managing policy scope, and handling alerts. Avoid “policy sprawl” where dozens of overlapping rules create noise.
- Vendor lock-in: Sentinel is powerful but tied to Terraform Cloud/Enterprise. OPA is open source and widely adopted, making it a safer choice for diverse environments.
When is this approach a great fit? If you deploy frequently, have multiple teams touching infrastructure, or operate under compliance frameworks, automation will pay off quickly. If you are a small team with a handful of static resources, manual reviews may be sufficient; however, even then, a small policy suite can help guard against common mistakes. If your infrastructure is mostly managed by a cloud console with minimal IaC, the first step should be adopting IaC itself; compliance automation is most effective when infrastructure is code.
Personal experience: learning curves and useful habits
Over the last few years, I’ve implemented compliance automation for teams moving from ad-hoc console changes to Terraform-based deployments. The learning curve is steepest in two areas: writing good Rego policies and deciding enforcement levels. A common mistake I made early on was writing overly specific policies that encoded assumptions about how resources were named or structured. When Terraform modules evolved, policies broke. The fix was to express policies against stable fields (resource types, required attributes) and avoid brittle string matching.
Another lesson was the value of “policy tests.” Just as you write unit tests for application code, write test cases for policies. OPA’s opa test framework makes this straightforward. For example, a test file for the S3 encryption policy might include a minimal plan JSON fixture that violates the rule and one that complies, asserting expected outcomes. Over time, we built a small library of fixtures that represented real-world misconfigurations observed in audits, turning incident learnings into preventative checks.
I also learned that messaging matters. A policy failure should tell a developer what changed, why it’s not allowed, and how to fix it. Instead of “S3 bucket violation,” say “S3 bucket ‘example-bucket-12345’ is missing server-side encryption. Add an aws_s3_bucket_server_side_encryption_configuration resource or enable encryption in your module.” This reduces friction and speeds remediation.
Lastly, the moment this approach proved most valuable was during an unexpected audit request. Instead of scrambling to collect screenshots and tickets, we exported policy evaluation logs and AWS Config findings. The evidence was timestamped, scoped, and clear. It turned a stressful process into a routine review, which is the real outcome: less panic, more confidence.
Getting started: workflow, tooling, and mental models
If you are starting from scratch, focus on workflow rather than a specific tool. A minimal path looks like this:
- Pick an IaC tool you already use. Terraform is common, but the same ideas apply to CloudFormation, Pulumi, or Helm.
- Choose one preventive tool (OPA for Terraform or Kubernetes) and one detective tool (AWS Config/Azure Policy/GCP Org Policy).
- Identify one high-impact rule: public resource exposure, encryption at rest, or tagging for cost allocation.
- Write the rule in policy-as-code and run it in audit mode for two weeks. Collect violations and tune the rule.
- Enable enforcement in CI for new deployments. Keep detective controls running to catch drift and manual changes.
- Document policies in the same repo as your IaC. Treat them like code: version, review, and test.
Example directory layout for a combined Terraform and OPA setup:
root/
terraform/
modules/
s3/
main.tf
variables.tf
environments/
dev/
main.tf
terraform.tfvars
main.tf
policy/
terraform/
require_encryption.rego
deny_public_s3.rego
tests/
require_encryption_test.rego
deny_public_s3_test.rego
scripts/
evaluate_plan.sh
.github/
workflows/
compliance.yml
A small script for local evaluation (scripts/evaluate_plan.sh):
#!/bin/bash
set -e
# Usage:
# ./scripts/evaluate_plan.sh path/to/tfplan.json
PLAN_JSON=$1
POLICY_DIR="policy/terraform"
if [[ ! -f "$PLAN_JSON" ]]; then
echo "Plan JSON not found at $PLAN_JSON"
exit 1
fi
# Run OPA test to ensure policy specs pass
opa test "$POLICY_DIR" --verbose
# Evaluate plan against policies
opa eval --format pretty \
--input "$PLAN_JSON" \
--data "$POLICY_DIR" \
"data.terraform[_].deny"
echo "Evaluation complete."
Mental models that help:
- Think in two layers: preventive (block early) and detective (catch drift).
- Write policies as constraints (what is not allowed) rather than permissions (what is allowed), which tends to be easier to reason about.
- Start small and enforce gradually. Policies should be discoverable and well-documented.
- Treat policy code with the same care as application code: review, version, and test.
What makes this approach stand out
Several aspects distinguish policy-as-code from traditional compliance:
- Developer experience: Feedback arrives in the same interface where code is written (PRs and pipelines). This shortens remediation cycles.
- Maintainability: Rules are versioned and tested. When standards change, you update policy code and roll it out predictably.
- Auditability: Every policy decision is logged and attributable. This is invaluable during external reviews.
- Portability: OPA works across Kubernetes, Terraform, APIs, and more. A single engine can cover multiple surfaces.
On the ecosystem side, the tooling is strong. For OPA, the community provides examples and libraries. Native cloud controls reduce setup friction. For Kubernetes, Gatekeeper and Kyverno have robust feature sets. The key is to choose the right tool for each layer and avoid overlapping scopes that create duplicate noise.
Free learning resources
-
Open Policy Agent documentation: https://www.openpolicyagent.org/docs/latest/
Clear, practical guides for writing and testing Rego policies, including Terraform and Kubernetes examples. -
OPA Terraform integration: https://www.openpolicyagent.org/docs/latest/terraform/
Walkthroughs for evaluating Terraform plans with OPA, including patterns for CI integration. -
AWS Config documentation: https://docs.aws.amazon.com/config/
Details on managed rules, custom rules, and delivery channels, useful for detective compliance. -
CIS Benchmarks: https://www.cisecurity.org/cis-benchmarks/
Industry-standard security configuration guidelines, often the source material for compliance policies. -
Kubernetes Policy: OPA Gatekeeper: https://open-policy-agent.github.io/gatekeeper/website/docs/
Tutorials and reference for admission control policies in Kubernetes. -
Azure Policy: https://learn.microsoft.com/en-us/azure/governance/policy/overview
Overview of policy-as-code for Azure, including built-in definitions and custom policies. -
GCP Organization Policies: https://cloud.google.com/resource-manager/docs/organization-policy/overview
Guide to constraint-based policies for Google Cloud.
Summary: who should use this, and who might skip it
Teams that deploy infrastructure frequently, operate across multiple environments, or need to demonstrate compliance to customers and auditors will benefit from infrastructure compliance automation. It scales well with IaC adoption and pays dividends in risk reduction and operational efficiency. Developers get fast feedback, security teams get enforceable standards, and auditors get clear evidence.
If you rarely change infrastructure, rely on a single cloud with simple resources, and have low regulatory pressure, you might skip full-scale automation—at least initially. But even then, starting with one or two critical policies (like public exposure and encryption) can prevent common mistakes and build good habits.
The practical takeaway is to blend preventive and detective controls, keep policies close to your infrastructure code, and invest in testability and clear messaging. Compliance automation is not about eliminating human judgment; it’s about amplifying good practices, making them consistent, and turning governance into a living part of your engineering workflow.




