← All research
DevOps

GitOps, progressive delivery, and observability that survives on-call

Kiran Reddy·

From Terraform modules to Argo rollouts: how we structure repos, environments, and SLO-based alerts so incidents are rare and recoveries are boring.

Treat Git as the control plane and the cluster as cattle. Every change—infra, app config, network policy—flows through reviewable PRs with policy-as-code gates.

We standardise progressive delivery: canary traffic shifts tied to automated checks on error budgets, not gut feel. Rollbacks are one revert, not a runbook marathon.

Observability is structured around user journeys: RED metrics for services, USE for nodes, and trace exemplars wired into incident channels. We invest in high-cardinality dimensions only where queries justify cost.

On-call load dropped materially once alerts were tied to SLO burn rates instead of static thresholds. The team spends more time improving the system and less time chasing noisy pages.