Infrastructure as Code: Why CFOs Should Care About How Software Gets Deployed

In January 2017, a GitLab engineer accidentally deleted 300 GB of production database data while trying to replicate data from production to a staging environment. GitLab - a software development platform used by tens of thousands of companies - had its production database gone in seconds.

What followed was a six-hour public incident during which the engineering team tried five different backup and recovery methods, discovering in real time that most of their recovery options did not work as expected. One backup method turned out to have been malfunctioning for months. Another had been set up incorrectly. A third required hardware that was not available. GitLab eventually restored from a backup that was six hours old, losing that window of customer data.

The story became famous in engineering circles not because it was exceptional, but because GitLab made the entire recovery process public - streaming the recovery attempt on YouTube, documenting the failure modes in a public post-mortem, and announcing a comprehensive remediation plan.

The remediation plan included, prominently, infrastructure as code.


What Infrastructure as Code Actually Means

Traditional infrastructure management treated servers, databases, and network configurations as physical objects managed by hand. An engineer would log in to a server, install software, change configuration files, and document the changes - or not. Over time, servers accumulated configuration changes with no reliable record of what had been done, when, or why. This is called "snowflake infrastructure" - every server is unique, fragile, and impossible to reproduce.

Infrastructure as code (IaC) replaces manual management with code. Instead of logging in to a server and running commands, you write code files that define the desired state of your infrastructure:

"There should be three servers with this operating system, these software packages, this network configuration, connected to this load balancer, backed by this database."

Those code files are stored in a version control system (like Git), reviewed like any other code, and executed by automation tools that provision and configure the actual infrastructure. The tools - Terraform, Ansible, AWS CloudFormation, Pulumi - read the code and make the real infrastructure match the declared state.

The key properties that result: infrastructure is reproducible (running the same code gives you the same infrastructure), documented (the code is the documentation), auditable (every change is tracked in version control), and automated (no manual steps that can be forgotten or done inconsistently).


The Three Business Benefits Executives Should Understand

1. Cost Control: Ending Mystery Cloud Bills

Cloud computing creates a new problem that on-premise infrastructure did not have: it is trivially easy to spin up infrastructure and forget to turn it off. An engineer provisions a database for testing, finishes the test, and moves on. The database keeps running. At $400/month, it costs $4,800 per year before anyone notices.

Multiply this by hundreds of engineers across dozens of projects and you have the "mystery cloud bill" phenomenon - cloud spending that nobody can fully explain, growing month over month.

Infrastructure as code attacks this problem from both directions. First, it creates a canonical record of what infrastructure should exist. If a database is not defined in the IaC code, it should not be running - anything running without a code definition is "drift" that can be found and cleaned up. Second, automated provisioning and deprovisioning - spinning up environments for testing and automatically destroying them when the test is done - eliminates the "forgot to turn it off" failure mode.

Large financial services organizations with mature IaC practices report cloud spend reductions of 20-40% from eliminating orphaned infrastructure and automating environment lifecycle management.

2. Compliance: Making Audits Answerable

Financial services organizations face regular audits - internal, external, and regulatory - that ask questions about their infrastructure: Who changed this configuration, and when? Is this server running the approved operating system version? Are these database access controls current? What changed in the month before this incident?

In a manually managed infrastructure environment, answering these questions requires painstaking reconstruction - digging through logs, interviewing engineers, reviewing change tickets. The answers are often incomplete. Regulators - particularly under frameworks like DORA (Digital Operational Resilience Act) in the EU and SR 11-7 from the Federal Reserve - are increasingly specific about expecting documented, auditable infrastructure management processes.

Infrastructure as code makes audit questions answerable through version control history. Every change to infrastructure code is a commit - with a timestamp, the name of the engineer who made it, a description of the change, and the exact technical specification of what changed. The audit trail is inherent in how the system works, not a separate logging exercise.

Additionally, security configuration can be enforced through code rather than policy. A policy that says "all databases must use encrypted connections" can be expressed as a code requirement that is automatically checked in every infrastructure deployment. Drift from the policy becomes immediately visible.

3. Disaster Recovery: Rebuild in Hours, Not Weeks

The GitLab incident illustrates the DR implication directly. When infrastructure exists only as a collection of manually configured servers, recovering from a catastrophic failure requires either restoring from backups (which may be stale, incomplete, or non-functional, as GitLab discovered) or manually rebuilding the infrastructure from memory and incomplete documentation.

With infrastructure as code, a complete environment can be rebuilt by running the code against empty infrastructure. The code defines everything - the servers, the database configuration, the network topology, the security groups, the load balancer rules. Rebuilding takes the time for the automation to run, not the time for engineers to reconstruct months of manual configuration from scratch.

For financial services organizations under regulatory requirements for business continuity and disaster recovery - PCI DSS requires DR testing; financial regulators expect demonstrable recovery capabilities - IaC is the practical foundation that makes stated RTO (recovery time objective) targets achievable.


The Main Tools: Terraform, Ansible, CloudFormation

The IaC tool landscape has converged around a few dominant options, each with different strengths.

Terraform (by HashiCorp, acquired by IBM in 2024) is the most widely used cross-cloud IaC tool. It supports AWS, Azure, Google Cloud, and hundreds of other providers through a plugin model. Terraform code describes infrastructure in a declarative format - you describe what should exist, not the steps to create it. Terraform manages state and can plan changes before applying them, showing what will be created, modified, or destroyed.

AWS CloudFormation is Amazon's native IaC tool, tightly integrated with AWS services. Organizations that are primarily AWS-based often use CloudFormation for its native integration, at the cost of multi-cloud portability. AWS CDK (Cloud Development Kit) extends CloudFormation with programming language support (Python, TypeScript) for teams that prefer code over configuration files.

Ansible is primarily a configuration management tool - it focuses on configuring software on existing servers rather than provisioning the servers themselves. It is commonly used alongside Terraform: Terraform provisions the infrastructure, Ansible configures the software running on it.

Pulumi represents a newer generation of IaC tools that allows infrastructure to be defined in standard programming languages (Python, JavaScript, Go) rather than domain-specific configuration formats. Teams that prefer programming over configuration management often find Pulumi more natural.


The Regulatory Dimension for Financial Services

Regulatory frameworks for financial services are increasingly specific about infrastructure management expectations, and IaC aligns directly with what regulators are asking for.

DORA (Digital Operational Resilience Act) - the EU regulation effective January 2025 - requires financial institutions to maintain detailed documentation of their ICT systems, test their resilience through regular exercises, and demonstrate the ability to recover from ICT incidents. Infrastructure defined as code is inherently documented; recovery can be tested automatically.

SR 11-7 and OCC Third-Party Risk - Federal Reserve and OCC guidance on model risk management and third-party technology dependencies requires financial institutions to understand and document the technology environments supporting regulated activities. IaC provides this documentation as a natural output of the development process.

SOC 2 and ISO 27001 - common security certifications required for financial services vendors and many institutions themselves - include controls around change management, configuration management, and access control. IaC practices directly satisfy these controls through version-controlled, reviewed, automated change processes.

The practical implication: financial services organizations that have not adopted IaC practices are increasingly finding that explaining their infrastructure management to regulators and auditors requires manual documentation that is expensive to produce and inherently incomplete.


Key Takeaways

  • Infrastructure as code defines infrastructure in version-controlled files, creating reproducible, auditable, automatically documented environments rather than manually configured "snowflake" servers.
  • The primary business benefits are cost control (eliminating orphaned cloud resources through code-as-record), compliance (inherent audit trail through version control), and disaster recovery (rebuild from code in hours rather than days).
  • Terraform is the dominant multi-cloud tool; AWS CloudFormation is the native option for AWS-first organizations. Ansible is commonly used for configuration management alongside infrastructure provisioning tools.
  • Regulatory frameworks including DORA and SR 11-7 align directly with IaC practices - documented, auditable, tested infrastructure management is what regulators are increasingly requiring.
  • The organizational change is the hard part. IaC tools are mature. The challenge is building engineering practices - code review for infrastructure changes, automated testing, continuous compliance checks - that capture the full benefit.

Related Reading