Configuration Drift Control

Name: ops0
Author: ops0

Configuration drift is what happens when your actual cloud doesn't match your Terraform anymore. Someone SSH'd in during an outage, a teammate changed something in the AWS console, two teams applied conflicting configs. It happens to every team eventually. ops0's Resource Graph catches this by continuously comparing your state files against what's running across AWS, GCP, and Azure. It shows you exactly what changed, when, and routes the fix through review, or prevents it entirely with policy gates covering the major compliance frameworks.

Most drift tools tell you something changed. They don't stop it from happening again. That's the real problem.

What if drift could be solved, not only detected?

Why Drift Exists

Drift isn't a mystery. It has clear causes:

Emergency changes. Production is down. You SSH in and change something. It works. You forget to update the Terraform. Three months later, someone runs terraform apply and your fix disappears.

Console convenience. The cloud console is right there. It's faster than writing a PR. You're adding a tag. What could go wrong?

Automation side effects. Auto-scaling adds instances. Functions rotate credentials. Cloud providers do things to your infrastructure that your state files don't know about.

Multi-team coordination. The database team made a change. The security team made a change. Neither team told the infrastructure team. The state file doesn't match reality.

Time. Cloud providers update services. Defaults change. What was compliant last year isn't compliant now.

Drift is structural, not cultural. You can have the best processes in the world and still experience drift, because drift is what happens when there's a gap between your declared state and your actual state.

The Detection Trap

The industry's answer to drift has been detection. Detect drift, alert on drift, remediate drift.

This is like detecting problems instead of preventing them.

Detection tools tell you something is wrong after it's already wrong. By the time you find out, the damage is often done. The security group has been open for three days. The configuration change has been live for a week. The compliance violation existed during the audit.

And detection creates toil. Someone has to review the drift alerts. Someone has to determine if the drift is intentional. Someone has to remediate it, which means writing code and going through the PR process. Someone has to verify the remediation worked.

Detection doesn't solve drift. It makes drift visible.

The Remediation Gap

Even when you detect drift, fixing it is hard.

Is the drift intentional? Maybe someone made a change for a good reason and didn't document it. If you revert it, you might break something.

What's the right state? Your Terraform says one thing, but maybe Terraform is wrong and reality is right. You need to determine which side of the gap to close.

How do you fix it? Do you update Terraform to match reality? Or change reality to match Terraform? Each choice has different implications.

Most drift goes unfixed. Teams see the alerts, triage them, and decide they're not worth the effort. The drift becomes permanent. The gap between declared and actual state widens.

The Closed-Loop Solution

The only way to solve drift is to close the loop.

Not detect to alert to triage to remediate to verify.

: prevent.

A closed-loop system maintains consistency by design. When drift attempts to occur, the system flags it immediately. When drift does occur, the system routes a reviewed remediation through policy and approval. The gap between declared and actual never grows because it's continuously closed.

This requires three capabilities:

Policy enforcement at change time. When someone tries to make a change - through the console, through the CLI, through automation - the system intercepts it. If the change conflicts with declared state, it's blocked or flagged. If the change is approved, the declared state is updated simultaneously.

Continuous reconciliation. The system doesn't wait for a periodic scan. It continuously compares declared and actual state. When they diverge, it acts immediately. Seconds, not days.

Intelligent remediation. The system doesn't revert changes blindly. It understands context. It knows which changes are intentional. It remediates in ways that don't break things.

What This Looks Like in Practice

Your infrastructure is defined. This is your declared state - not only Terraform, but the complete specification of what your infrastructure should be.

ops0's Resource Graph continuously observes actual state. Not through periodic scans, but through real-time monitoring of your cloud.

When they diverge, Resource Graph shows you exactly what changed. Upstream and downstream dependencies are mapped, so you see the full impact.

And here's the key: you route the fix through review. Choose to update your IaC to match reality, or revert reality to match your IaC. The choice is yours, and the change moves through policy and approval.

If someone opens a security group port that violates policy, you see it immediately. If auto-scaling adds instances, the change is tracked. If a configuration changes without authorization, you know about it before it causes problems.

The gap between declared and actual state stays closed. Not trending toward closed. Closed.

The Compliance Angle

For regulated industries, drift isn't an operational problem. It's a compliance risk.

Auditors want to know that your infrastructure matches your documentation. They want to know that changes are authorized and tracked. They want evidence that your controls are effective.

Drift detection provides evidence of ongoing issues. "We found drift 47 times this quarter and remediated it 43 times" is not a story auditors want to hear.

Drift prevention provides evidence of control. "Our system maintains continuous compliance with minimal drift" is a much better story. And you can prove it, because you have logs of every attempted deviation and every remediation.

ops0's Resource Graph

This is what ops0's Resource Graph delivers.

It builds a real-time graph based on your state files and your actual cloud resources. The disparity between IaC and real state is always visible.

Drift detection is continuous, not scheduled. When declared and actual diverge, you see it immediately.

Remediation moves through policy and approval. Update the IaC to match reality, or push reality to match the IaC, depending on what you need.

The full dependency chain is mapped. Upstream and downstream relationships are visible, so you understand the impact of any change.

Configuration drift is a solved problem. The only question is whether you want to keep managing it manually or let a system handle it for you.

Quick answers

What is configuration drift?

Configuration drift is the gap between declared infrastructure state and the live cloud resources that are running in production.

How does ops0 help control drift?

ops0 compares declared state with discovered cloud state, shows affected resources in Resource Graph, and routes remediation through governed workflows.

Why is drift a compliance risk?

Drift can create unreviewed infrastructure changes, missing audit evidence, exposed resources, and policy exceptions that were never approved.

Configuration Drift is a Solved Problem

Why Drift Exists

The Detection Trap

The Remediation Gap

The Closed-Loop Solution

What This Looks Like in Practice

The Compliance Angle

ops0's Resource Graph

What is configuration drift?

How does ops0 help control drift?

Why is drift a compliance risk?

Run Terraform and OpenTofu under one governed model.

Related articles

The AI SRE: Always-On Monitoring With Humans in the Loop

Orchestration vs Engineering in IaC Platforms

Why Your Infrastructure Diagram is Already Wrong