Security Metrics and Reporting: Turning Noise into Narrative
In an era of constant alerts and audit pressure, engineering teams need signals, not just data.

Every security team I’ve worked with has lived through the same storm: a SIEM dumping thousands of alerts per day, a vulnerability scanner spitting out CSVs, and a cloud CSPM adding another hundred “critical” misconfigurations. The business wants a single risk number and a clear plan; engineers want to fix what matters and get back to building. Metrics and reporting are how we bridge that gap. Without them, security becomes a theater of dashboards; with them, it’s a feedback loop that improves the system.
In this post, I’ll share practical patterns for building a useful security metrics program that engineering teams actually respect. We’ll avoid buzzwords, focus on outcomes, and show real, runnable examples for ingesting vulnerability data, enriching it with context, and producing reports that leaders can act on. If you’ve ever been asked “Are we secure?” on Slack and felt the question deserve more than a single number, this is for you.
Context: Where security metrics fit in modern engineering
Today’s stack is heterogeneous: services in multiple clouds, legacy workloads on-prem, mobile apps, third-party SaaS, and a maze of CI pipelines. Security teams must align with product velocity, cost optimization, and reliability goals. That’s why metrics and reporting have evolved from quarterly spreadsheets to continuously updated, engineering-owned pipelines.
In real-world projects, security metrics show up in a few common places:
- Engineering dashboards (Grafana, Datadog) for near-term signals.
- Incident reviews and postmortems for learning.
- Executive risk summaries for budget and strategy.
- Compliance artifacts for audits like SOC 2 or ISO 27001.
Alternatives exist, but they focus on different audiences:
- SIEM analytics and XDR platforms (e.g., Splunk, Elastic, Sentinel) specialize in detection and telemetry aggregation.
- GRC tools (e.g., Drata, Vanta) automate compliance evidence collection.
- Cloud-native posture tools (e.g., AWS Security Hub, Azure Defender) provide managed findings.
The engineering-centric approach described here complements these by placing metrics close to the code, enabling automation, reproducibility, and better integration with developer workflows. You might still keep a SIEM for detection; the reporting pipeline here uses its outputs, not to replace it.
Core concepts: What to measure and how to think about it
Effective metrics start with goals. Pick measures that connect to business risk and engineering effort. Avoid vanity metrics that look good but don’t drive decisions.
Guiding principles
- Outcome over output: “Mean time to remediate (MTTR) critical vulnerabilities” is better than “Number of scans run.”
- Segmentation: Break data by service, team, environment, and severity to reveal hotspots.
- Trendlines: A single snapshot misleads; changes over time tell the story.
- Context: Raw severity is weak; add exploitability, exposure, and business impact.
- Actionability: Every metric should map to a decision or a task.
Common metrics categories
- Vulnerability posture: Critical/open vulns by service, MTTR, SLA compliance.
- Incident response: Time to detect (TTD), time to respond (TTR), recurrence rate.
- Access and identity: Excessive permissions, MFA coverage, unused credentials.
- Configuration hygiene: Public storage, open security groups, unencrypted secrets.
- Supply chain: Code dependencies with known CVEs, outdated base images.
These aren’t prescriptive; pick what matters for your risk profile. For instance, a SaaS with customer data might prioritize MTTR for critical vulns and access hygiene, while an embedded IoT team might focus on update coverage and SBOM completeness.
Practical pipeline: From findings to actionable reports
Let’s build a small but realistic pipeline that ingests vulnerability findings (e.g., from a scanner or cloud service), enriches them with ownership and environment context, and outputs a CSV and a simple JSON report suitable for dashboards. We’ll use Python because it’s common in data-heavy engineering workflows, but the concepts apply to any stack.
Project structure
security-metrics/
├── README.md
├── requirements.txt
├── config/
│ └── sources.yaml
├── data/
│ └── raw/
│ └── findings.json
├── src/
│ ├── ingest.py
│ ├── enrich.py
│ ├── report.py
│ └── utils.py
├── output/
│ ├── enriched/
│ └── reports/
└── tests/
└── test_enrich.py
Dependencies (requirements.txt)
pyyaml>=6.0
pandas>=2.0
requests>=2.31
Configuration (config/sources.yaml)
This config defines sources for findings and context. In real life, sources might be AWS Security Hub exports, a vulnerability scanner API (e.g., Nessus or Qualys), or a CSV from a third party. We’ll start with a local JSON for demonstration.
sources:
findings:
type: local_json
path: data/raw/findings.json
context:
service_inventory:
type: csv
path: data/raw/services.csv
ownership:
type: csv
path: data/raw/owners.csv
Example raw data (data/raw/findings.json)
Real scanners differ, but most return a list of findings with fields like title, severity, CVE, asset, and timestamps. Here’s a sample:
[
{
"id": "f-001",
"title": "Apache Log4j Code Injection (Log4Shell)",
"cve": "CVE-2021-44228",
"severity": "critical",
"asset": "payment-service",
"environment": "production",
"opened_at": "2025-08-01T12:00:00Z",
"scanner": "example-scanner"
},
{
"id": "f-002",
"title": "Open SSL Vulnerability",
"cve": "CVE-2022-3602",
"severity": "high",
"asset": "analytics-worker",
"environment": "staging",
"opened_at": "2025-08-05T09:30:00Z",
"scanner": "example-scanner"
}
]
We’ll also need lightweight service inventory and ownership data to enrich findings.
- data/raw/services.csv
service,environment,team,exposure
payment-service,production,checkout,internet
analytics-worker,staging,data,limited
- data/raw/owners.csv
team,slack_channel
checkout,#team-checkout
data,#team-data
Ingestion (src/ingest.py)
We’ll read the findings and basic context. In a real system, you’d fetch from APIs and paginate; this function is the entry point for the pipeline.
import yaml
import json
from pathlib import Path
from typing import Dict, List, Any
def load_config(path: Path) -> Dict[str, Any]:
with open(path, "r") as f:
return yaml.safe_load(f)
def load_findings(src: Dict[str, Any]) -> List[Dict[str, Any]]:
if src["type"] == "local_json":
with open(src["path"], "r") as f:
return json.load(f)
raise ValueError(f"Unsupported source type: {src['type']}")
def load_csv_as_dict(path: Path) -> List[Dict[str, str]]:
import csv
rows = []
with open(path, "r") as f:
reader = csv.DictReader(f)
for row in reader:
rows.append(row)
return rows
def ingest(config_path: Path = Path("config/sources.yaml")):
cfg = load_config(config_path)
findings = load_findings(cfg["sources"]["findings"])
services = load_csv_as_dict(Path(cfg["sources"]["context"]["service_inventory"]["path"]))
owners = load_csv_as_dict(Path(cfg["sources"]["context"]["ownership"]["path"]))
return {
"findings": findings,
"services": services,
"owners": owners
}
This is intentionally straightforward: the core idea is to unify data sources so later stages can enrich and analyze. In real projects, ingestion also handles schema normalization, retries, and deduplication. Don’t underestimate the effort here; most “bad” metrics come from inconsistent inputs.
Enrichment (src/enrich.py)
Raw severity is noisy. Enriching with environment, exposure, and ownership lets us prioritize. We’ll add team, exposure, and compute days-open.
from datetime import datetime
from typing import List, Dict, Any
def parse_date(ts: str) -> datetime:
# Example ISO format, adjust for your scanner
return datetime.fromisoformat(ts.replace("Z", "+00:00"))
def enrich_findings(findings: List[Dict[str, Any]],
services: List[Dict[str, str]],
owners: List[Dict[str, str]]) -> List[Dict[str, Any]]:
# Build maps
svc_map = {f"{row['service']}|{row['environment']}": row for row in services}
owner_map = {row["team"]: row for row in owners}
enriched = []
now = datetime.utcnow().replace(tzinfo=None)
for f in findings:
key = f"{f['asset']}|{f.get('environment', 'unknown')}"
svc_meta = svc_map.get(key, {})
team = svc_meta.get("team", "unassigned")
exposure = svc_meta.get("exposure", "unknown")
opened = parse_date(f["opened_at"])
days_open = (now - opened).days
record = {
**f,
"team": team,
"exposure": exposure,
"days_open": days_open,
"slack": owner_map.get(team, {}).get("slack_channel", ""),
# Simple risk score: heuristic combining severity, exposure, and age
"risk_score": compute_risk_score(f["severity"], exposure, days_open)
}
enriched.append(record)
return enriched
def compute_risk_score(severity: str, exposure: str, days_open: int) -> float:
# Heuristic: severity weight, exposure multiplier, age penalty
severity_weights = {"critical": 10, "high": 7, "medium": 3, "low": 1}
exposure_multipliers = {"internet": 2.0, "limited": 1.2, "internal": 1.0, "unknown": 1.0}
age_penalty = min(days_open / 7, 2.0) # +0.14 per day up to 2x
base = severity_weights.get(severity.lower(), 1)
mult = exposure_multipliers.get(exposure.lower(), 1.0)
score = base * mult * (1.0 + age_penalty * 0.1)
return round(score, 2)
This risk score is intentionally simple. In real-world practice, you might tune weights with historical breach data or correlate with exploitability (e.g., EPSS). The point is to add context that makes the queue manageable for engineers.
Reporting (src/report.py)
We’ll produce two outputs: a CSV for ad-hoc analysis and a JSON summary that could feed a dashboard. The report aggregates by team and service, highlighting the highest-risk items.
from pathlib import Path
from typing import List, Dict, Any
import csv
import json
def write_csv(records: List[Dict[str, Any]], path: Path):
if not records:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text("")
return
fieldnames = list(records[0].keys())
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for r in records:
writer.writerow(r)
def summarize(records: List[Dict[str, Any]]) -> Dict[str, Any]:
# Aggregate by team
by_team: Dict[str, List[Dict[str, Any]]] = {}
for r in records:
by_team.setdefault(r["team"], []).append(r)
summary = {
"total_findings": len(records),
"critical_count": sum(1 for r in records if r["severity"].lower() == "critical"),
"teams": {},
"top_risks": sorted(records, key=lambda r: r["risk_score"], reverse=True)[:5]
}
for team, items in by_team.items():
summary["teams"][team] = {
"count": len(items),
"critical": sum(1 for i in items if i["severity"].lower() == "critical"),
"mean_risk": round(sum(i["risk_score"] for i in items) / len(items), 2),
"oldest_days": max(i["days_open"] for i in items) if items else 0
}
return summary
def generate_reports(enriched: List[Dict[str, Any]], output_dir: Path):
output_dir.mkdir(parents=True, exist_ok=True)
write_csv(enriched, output_dir / "enriched_findings.csv")
summary = summarize(enriched)
(output_dir / "summary.json").write_text(json.dumps(summary, indent=2))
Putting it together (src/main.py)
from pathlib import Path
from src.ingest import ingest
from src.enrich import enrich_findings
from src.report import generate_reports
def main():
data = ingest()
enriched = enrich_findings(data["findings"], data["services"], data["owners"])
generate_reports(enriched, Path("output/reports"))
if __name__ == "__main__":
main()
Running the pipeline
# Install dependencies
pip install -r requirements.txt
# Run the pipeline
python src/main.py
After running, check output/reports for enriched_findings.csv and summary.json. The CSV is useful for slicing in Excel or pandas; the JSON is ready for a dashboard widget.
Example outputs
- output/reports/enriched_findings.csv (preview)
id,title,cve,severity,asset,environment,opened_at,scanner,team,exposure,days_open,slack,risk_score
f-001,Apache Log4j Code Injection (Log4Shell),CVE-2021-44228,critical,payment-service,production,2025-08-01T12:00:00Z,example-scanner,checkout,internet,7,#team-checkout,21.4
f-002,Open SSL Vulnerability,CVE-2022-3602,high,analytics-worker,staging,2025-08-05T09:30:00Z,example-scanner,data,limited,3,#team-data,10.08
- output/reports/summary.json
{
"total_findings": 2,
"critical_count": 1,
"teams": {
"checkout": {
"count": 1,
"critical": 1,
"mean_risk": 21.4,
"oldest_days": 7
},
"data": {
"count": 1,
"critical": 0,
"mean_risk": 10.08,
"oldest_days": 3
}
},
"top_risks": [
{
"id": "f-001",
"title": "Apache Log4j Code Injection (Log4Shell)",
"cve": "CVE-2021-44228",
"severity": "critical",
"asset": "payment-service",
"environment": "production",
"opened_at": "2025-08-01T12:00:00Z",
"scanner": "example-scanner",
"team": "checkout",
"exposure": "internet",
"days_open": 7,
"slack": "#team-checkout",
"risk_score": 21.4
},
{
"id": "f-002",
"title": "Open SSL Vulnerability",
"cve": "CVE-2022-3602",
"severity": "high",
"asset": "analytics-worker",
"environment": "staging",
"opened_at": "2025-08-05T09:30:00Z",
"scanner": "example-scanner",
"team": "data",
"exposure": "limited",
"days_open": 3,
"slack": "#team-data",
"risk_score": 10.08
}
]
}
Integrating into CI and team workflows
Metrics become powerful when they show up where work happens. Two practical integrations:
Posting summaries to Slack
A nightly job can post the top risks to team channels. Here’s a minimal script using requests:
# src/integrations/slack.py
import os
import requests
from pathlib import Path
import json
def post_to_slack(summary_path: Path, webhook_url: str | None = None):
webhook = webhook_url or os.getenv("SLACK_WEBHOOK_URL")
if not webhook:
print("No Slack webhook configured; skipping post.")
return
summary = json.loads(summary_path.read_text())
blocks = [
{"type": "header", "text": {"type": "plain_text", "text": "Security Report: Top Risks"}},
{"type": "section", "text": {"type": "mrkdwn", "text": f"*Total findings:* {summary['total_findings']} | *Critical:* {summary['critical_count']}"}}
]
for r in summary["top_risks"]:
text = (
f"• *{r['asset']}* ({r['environment']}) — {r['title']} "
f"(CVE: {r.get('cve', 'N/A')})\n"
f"Risk: {r['risk_score']} | Age: {r['days_open']} days | Team: {r['team']} | Slack: {r['slack']}"
)
blocks.append({"type": "section", "text": {"type": "mrkdwn", "text": text}})
payload = {"blocks": blocks}
resp = requests.post(webhook, json=payload)
resp.raise_for_status()
if __name__ == "__main__":
post_to_slack(Path("output/reports/summary.json"))
This keeps security visible without overwhelming channels. Tune frequency and thresholds to avoid alert fatigue.
GitHub Actions workflow
Run the pipeline nightly and upload the report as an artifact.
# .github/workflows/security-metrics.yml
name: Security Metrics
on:
schedule:
- cron: "0 8 * * *" # 8AM daily
workflow_dispatch:
jobs:
metrics:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run pipeline
run: python src/main.py
- name: Upload report
uses: actions/upload-artifact@v4
with:
name: security-report
path: output/reports/
In real teams, you might also add a step to post to Slack or open issues for items violating SLAs. The key is closing the loop: metrics feed action.
Honest evaluation: Strengths, weaknesses, and tradeoffs
Strengths
- Engineering-owned: Metrics live close to code and pipelines; you can version control them.
- Context-driven: Adding ownership, exposure, and age yields actionable prioritization.
- Automatable: From ingestion to reporting, everything can be scheduled and monitored.
- Flexible: Works with any scanner or source; you can adapt schemas and risk heuristics.
Weaknesses
- Heuristics require tuning: Simple risk scores may misrank items; you’ll need iteration.
- Data quality is hard: Incomplete ownership or mislabeled environments degrade trust.
- Tool sprawl: Maintaining connectors to scanners, clouds, and ticketing systems takes effort.
- Governance: Someone must own the pipeline, review changes, and ensure compliance controls.
Tradeoffs
- Build vs buy: Building gives control but incurs maintenance; buying a GRC or CNAPP reduces build time but may lack flexibility. Many teams start with a hybrid: buy for compliance evidence, build for engineering-specific prioritization.
- Real-time vs batch: Real-time reporting increases engineering overhead; nightly batches often suffice for remediation workflows.
- Simplicity vs sophistication: Start simple; add exploitability (EPSS), asset criticality, and business context gradually. See FIRST’s EPSS: https://www.first.org/epss/.
When it’s not a good fit
- Small, single-product teams with low regulatory pressure may not need a dedicated metrics pipeline; a lightweight spreadsheet or managed GRC may suffice.
- If your security maturity is low and data quality is poor, prioritize basic hygiene (asset inventory, SBOM, patch cadence) before building metrics.
- If your environment is heavily air-gapped or regulated, ensure your pipeline respects data residency and change control before adopting.
Personal experience: Lessons from the trenches
I once joined a team that received a weekly CSV from a vulnerability scanner. It had 2,000 rows; the security lead would manually highlight criticals and email managers. Remediation stalled because teams argued about ownership and severity. Our first move wasn’t a fancy dashboard; it was a tiny enrichment step mapping assets to teams and environments. That alone cut through the noise.
Later, we added a “days-open” penalty to the risk score. A medium-severity vuln on an internet-facing service that had sat for 60 days surfaced higher than a critical vuln in a staging environment patched in two days. That shift sparked the right conversations: engineers asked for SLA clarity, and product helped prioritize fixes affecting customer-facing services.
We also learned to avoid “security theater” metrics. Reporting “findings closed per week” sounded good until teams started closing low-risk items to hit targets. We replaced that with “critical MTTR” and “SLA compliance,” which nudged behavior toward impact.
A final lesson: integrate early. When we added Slack posts with the top five risks and a link to the enriched CSV, engagement improved. Engineers didn’t need to log into a separate portal; the report met them where they worked.
Getting started: Mental model and workflow
Set up your workflow
- Identify inputs: Pick one source to start (e.g., a scanner export or cloud security findings). Avoid connecting everything at once.
- Define context: Map services to teams and environments. Even a simple CSV makes a huge difference.
- Choose a risk model: Start with severity + exposure + age. Write it down, so it’s reviewable.
- Automate incrementally: Ingestion → enrichment → reporting. Add Slack/issue creation only after the core pipeline is stable.
- Test your data: Write basic tests for parsing and enrichment. Treat metrics like code.
Tooling suggestions
- Orchestration: GitHub Actions or Airflow for scheduling.
- Storage: A simple object store or Git repo for outputs; avoid over-engineering early.
- Dashboards: Grafana or Datadog for trendlines; connect JSON endpoints.
- Ticketing: Auto-create issues in Jira/GitHub when SLAs are breached (e.g., critical vulns open >7 days on internet-facing services).
- SBOM and exploitability: Consider integrating tools like Trivy or Grype for scanning, and EPSS data for exploit likelihood. References: Trivy: https://aquasecurity.github.io/trivy/; Grype: https://github.com/anchore/grype; EPSS: https://www.first.org/epss/.
Common pitfalls
- Over-reliance on CVSS: CVSS is a baseline, not a business risk. Add context.
- Too many metrics: Focus on 3–5 KPIs tied to decisions.
- Ignoring data quality: Invest in ownership and environment labeling.
- One-size-fits-all: Different services need different SLAs; consider criticality and exposure.
Free learning resources
- OWASP ASVS (Application Security Verification Standard): Useful for defining measurable controls; map metrics to controls. https://owasp.org/www-project-application-security-verification-standard/
- FIRST EPSS: Learn exploitability scoring to complement severity. https://www.first.org/epss/
- NIST Cybersecurity Framework (CSF): Provides a structure for security outcomes; great for reporting to leadership. https://csrc.nist.gov/projects/cybersecurity-framework
- MITRE ATT&CK: Contextualize detection and response metrics by mapping incidents to techniques. https://attack.mitre.org/
- Trivy and Grype docs: Practical scanning and SBOM generation; good for building a source of findings. https://aquasecurity.github.io/trivy/ and https://github.com/anchore/grype
Summary: Who should use this approach and who might skip it
Use this approach if you’re an engineering-driven team dealing with a steady flow of security findings and you want metrics that help prioritize work, not just document it. It’s especially valuable when:
- You have multiple services and teams, and ownership is unclear.
- You need to align security with developer workflows (Slack, GitHub, CI).
- You want transparency and reproducibility for audits and postmortems.
Consider skipping or postponing if:
- Your environment is very small and simple, and a managed GRC tool already meets your needs.
- You lack the bandwidth to maintain data pipelines or define ownership maps.
- Compliance requires rigid, vendor-aligned reporting that a custom pipeline can’t easily satisfy.
In closing, good security metrics don’t chase perfection; they improve decisions. Start small, measure outcomes, and iterate with engineers. The goal isn’t to eliminate every finding; it’s to make sure the ones that matter get fixed quickly.




