Software Delivery Performance
Move Fast and Break Nothing
In Silicon Valley, they celebrate the mantra "move fast and break things." In banking, if you break things, people cannot access their money, trades fail to settle, and regulators show up at your door. The stakes are fundamentally different, and the delivery philosophy must reflect that difference.
Over twenty-five years of leading engineering organizations, Dwayne Helena developed a conviction that the "move fast and break things" philosophy is not just inappropriate for regulated industries -- it is wrong everywhere. Breaking things is not a sign of speed; it is a sign of insufficient engineering discipline. The best teams -- including the teams that drove the DevSecOps transformation at the Tier-1 bank -- were both the fastest and the most reliable. They deployed multiple times per day with change failure rates under three percent. They did not achieve this by being reckless or by being cautious. They achieved it by investing in the engineering practices that make speed and safety complementary: automated testing, continuous integration, deployment pipelines with built-in safety gates, canary deployments, and instant automated rollback. Jez Humble and David Farley described this approach in "Continuous Delivery" in 2010, and the intervening years have only strengthened the evidence. The organizations that deliver the fastest are the ones that invest the most in making delivery safe. This is not a paradox -- it is an engineering truth.
Measuring Software Delivery Effectiveness
This focuses on measuring the effectiveness of your software delivery process using the four key DORA metrics:
- Deployment Frequency: How often your organization releases code to production.
- Lead Time for Changes: The time it takes for a code change to be implemented and deployed.
- Change Failure Rate: The percentage of changes that result in a failure in production.
- Mean Time to Restore (MTTR): The average time it takes to restore service after a failure.
In financial services, these metrics carry additional weight because they map directly to operational risk indicators that regulators track. Deployment frequency reflects change management maturity. Lead time reflects organizational agility. Change failure rate reflects quality and testing adequacy. MTTR reflects operational resilience and incident response capability. When you present DORA metrics to a bank examiner, they understand the story the data tells -- even if they have never heard of DORA.
Benefits
- Improved ability to deliver value to customers quickly
- Enhanced reliability and stability of the software
- Better alignment between development and operations teams
- Demonstrable compliance with regulatory expectations for change management
- Reduced operational risk through smaller, more frequent, and more reversible changes
- Data-driven investment decisions for engineering improvement initiatives
Detailed Explanations and Examples
Deployment Frequency
Deployment Frequency measures how often your organization releases code to production. High deployment frequency indicates a mature and efficient software delivery process. Examples of practices that can improve deployment frequency include:
- Continuous Integration (CI): Regularly integrating code changes and running automated tests to provide early feedback.
- Continuous Delivery (CD): Automating the build, test, and deployment process to enable frequent and reliable releases.
- Feature Toggles: Using feature toggles to deploy code changes without affecting end users.
- Trunk-Based Development: Working on short-lived branches or directly on trunk to reduce merge complexity and enable continuous integration.
Example: Continuous Integration (CI)
Continuous Integration (CI) involves regularly integrating code changes and running automated tests to provide early feedback. This helps in identifying and addressing issues quickly, reducing the risk of integration problems.
At the bank, teams moved from long-lived feature branches (some of which lived for months) to trunk-based development with short-lived branches that merged within one to two days. The impact was immediate: integration conflicts dropped dramatically, and the feedback loop from commit to test result shortened from days to minutes. Kim, Humble, Debois, and Willis describe this pattern extensively in "The DevOps Handbook" -- trunk-based development is one of the strongest predictors of delivery performance in the DORA research.
Example: Feature Toggles in Banking
Feature toggles are particularly powerful in regulated environments because they decouple deployment from release. At the bank, the team deployed code for new features to production behind toggles, allowing us to test in the production environment with real infrastructure while limiting exposure. When the feature was validated, the toggle was enabled for a small percentage of users (canary release), monitored for issues, and gradually rolled out to full availability. If a problem was detected, the toggle was disabled -- no rollback, no redeployment, no change advisory board emergency meeting. The feature was simply turned off. This pattern reduced the risk of production incidents from new features by over seventy percent.
Lead Time for Changes
Lead Time for Changes measures the time it takes for a code change to be implemented and deployed. Short lead times indicate an efficient development process. Examples of practices that can reduce lead time for changes include:
- Automated Testing: Running tests automatically to verify the correctness of code changes.
- Frequent Commits: Committing code changes frequently to detect issues early.
- Incremental Development: Developing features incrementally to reduce the risk of large changes.
- Pipeline Optimization: Identifying and eliminating bottlenecks in the delivery pipeline through systematic measurement.
Example: Automated Testing
Automated Testing involves running tests automatically to verify the correctness of code changes. This helps in identifying and addressing issues quickly, reducing the lead time for changes.
Humble and Farley's "Continuous Delivery" (2010) makes the case that automated testing is the single most important investment an organization can make in delivery performance. Manual testing is the largest contributor to lead time in most organizations. At the bank, manual regression testing for a single release consumed three to five days. By investing in automated test suites -- unit tests, integration tests, contract tests, and security tests -- the team reduced testing time to under thirty minutes and gained confidence that was previously unachievable through manual testing alone.
Example: Pipeline Bottleneck Analysis
At the bank, every stage of the delivery pipeline was instrumented and a lead time decomposition dashboard was built. This dashboard showed, for each team, how much of their lead time was spent in coding, review, build, test, security scan, approval, and deployment stages. The data revealed patterns that contradicted team assumptions:
- Teams that blamed "slow security scans" discovered that security scanning accounted for seven percent of their lead time while code review queues accounted for thirty-five percent.
- Teams that blamed "too many approvals" discovered that their approval gates added less than two hours while their test suite took over an hour due to poorly written integration tests.
This data-driven approach to bottleneck identification is a core principle of the Theory of Constraints, which Gene Kim applied to IT operations in "The Phoenix Project" (2013). You cannot optimize what you cannot measure, and you should not optimize the wrong thing.
Change Failure Rate
Change Failure Rate measures the percentage of changes that result in a failure in production. Low change failure rates indicate a stable and reliable software delivery process. Examples of practices that can reduce change failure rates include:
- Automated Testing: Running tests automatically to verify the correctness of code changes.
- Code Reviews: Conducting code reviews to ensure code quality and reduce the risk of defects.
- Continuous Monitoring: Monitoring the system in real-time to detect and address issues quickly.
- Canary Deployments: Deploying to a small subset of production traffic before full rollout.
- Automated Rollback: Automatically reverting changes when health checks or anomaly detection indicate a problem.
Example: Code Reviews
Code Reviews involve conducting reviews to ensure code quality and reduce the risk of defects. This helps in identifying and addressing issues early, reducing the change failure rate.
Example: Change Failure Categorization in Banking
At the bank, every change failure was categorized to understand root causes systematically:
- Functional defects (application bugs): 45% of failures. Addressed through improved test coverage requirements.
- Configuration errors (environment-specific settings): 25% of failures. Addressed through infrastructure as code and configuration management.
- Security vulnerabilities (detected post-deployment): 15% of failures. Addressed through enhanced SAST and DAST in the pipeline.
- Dependency failures (third-party service or library issues): 15% of failures. Addressed through contract testing and dependency scanning.
This categorization allowed the organization to invest in the highest-impact prevention strategies rather than applying generic "improve quality" guidance. Within one year, the overall change failure rate dropped from eleven percent to four percent, and the security-related failure rate dropped from three percent to under one percent.
Mean Time to Restore (MTTR)
Mean Time to Restore (MTTR) measures the average time it takes to restore service after a failure. Low MTTR indicates a resilient and responsive software delivery process. Examples of practices that can reduce MTTR include:
- Incident Response Plans: Developing and implementing incident response plans to guide the response to failures.
- Automated Rollbacks: Implementing automated rollback mechanisms to quickly revert to a previous stable state.
- Continuous Monitoring: Monitoring the system in real-time to detect and address issues quickly.
- Chaos Engineering: Proactively testing failure modes to build confidence in recovery procedures.
Example: Incident Response Plans
Incident Response Plans involve developing and implementing plans to guide the response to failures. This helps in reducing the time it takes to restore service after a failure.
Example: Automated Rollback in Core Banking
At the bank, automated rollback was not optional for core banking services -- it was a requirement. Every deployment of a Tier-1 service (payments, account management, transaction processing) included:
- Health check validation: Automated checks within sixty seconds of deployment that validated key business functions.
- Anomaly detection: Real-time comparison of error rates, latency, and business metrics against baseline.
- Automatic rollback trigger: If health checks failed or anomaly detection fired, the previous version was restored automatically within ninety seconds.
- Incident creation: A structured incident record was created automatically with deployment metadata, failure signals, and a link to the deployment diff for rapid diagnosis.
This automation reduced MTTR for deployment-related incidents from an average of four hours (under the previous manual process) to under three minutes. More importantly, it changed the psychology of deployment: teams stopped being afraid of deploying because they knew that failures would be caught and reversed automatically.
Software Delivery in Regulated Environments
Delivering software in regulated financial services requires practices that satisfy both engineering excellence and regulatory expectations. The key principles:
Auditability
Every change must be traceable from business requirement through code change, security review, test execution, and deployment. At the bank, the golden path pipeline generated a complete audit trail as a by-product of normal operation. No additional documentation was required from engineering teams. The audit trail included:
- The Git commit and pull request with reviewer approvals
- SAST, DAST, and SCA scan results with finding dispositions
- Automated test execution results
- Deployment metadata (who, when, what, where)
- Post-deployment health check results
This audit trail was stored immutably and indexed for rapid retrieval during examinations.
Segregation of Duties
Regulators require that the person who writes the code is not the same person who approves and deploys it. In a continuous delivery pipeline, this is enforced through:
- Pull request approval requirements (minimum two reviewers, at least one from outside the authoring team for Tier-1 services)
- Automated deployment triggered by merge, not manual action
- Deployment credentials managed through service accounts, not individual developer access
Change Management
Traditional change advisory boards (CABs) that meet weekly to approve changes are incompatible with continuous delivery. At the bank, the weekly CAB was replaced with a tiered change management model:
- Standard changes: Pre-approved change types (deployed through the golden path with all automated checks passing) required no human approval beyond code review.
- Normal changes: Changes that deviated from the golden path required lightweight approval from the service owner and a security reviewer.
- Emergency changes: Expedited process with post-deployment review, used for critical security patches and production incidents.
This model satisfied regulatory expectations while enabling teams to deploy multiple times per day. The key insight from "The DevOps Handbook" applies directly: standard changes should be the default, and the goal of continuous improvement is to convert normal changes into standard changes by automating the checks that make them safe.
Examples of Tools and Technologies for Software Delivery Performance
- Jenkins: An open-source automation server for continuous integration and continuous delivery (CI/CD).
- GitHub Actions: A CI/CD tool that allows you to automate workflows directly from your GitHub repository.
- Prometheus: An open-source monitoring and alerting toolkit.
- Grafana: An open-source platform for monitoring and observability, used to visualize metrics collected by Prometheus and other sources.
- ELK Stack: A set of tools for log aggregation and analysis, including Elasticsearch, Logstash, and Kibana.
- Argo CD: A declarative, GitOps continuous delivery tool for Kubernetes.
- Spinnaker: A multi-cloud continuous delivery platform for releasing software changes with high confidence.
- LaunchDarkly: A feature management platform for feature toggles and controlled rollouts.
References
-
Humble, J. and Farley, D. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley, 2010.
-
Kim, G., Behr, K., and Spafford, G. The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win. IT Revolution Press, 2013.
-
Kim, G., Humble, J., Debois, P., and Willis, J. The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations. IT Revolution Press, 2016.
-
Forsgren, N., Humble, J., and Kim, G. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations. IT Revolution Press, 2018.
-
DORA State of DevOps Report 2023. Google Cloud, 2023. Available at: https://dora.dev/research/2023/dora-report/
-
DORA State of DevOps Report 2024. Google Cloud, 2024. Available at: https://dora.dev/research/2024/dora-report/
-
APRA Prudential Standard CPS 234 -- Information Security. Australian Prudential Regulation Authority, July 2019. Available at: https://www.apra.gov.au/sites/default/files/cps_234_july_2019_for_public_release.pdf
-
Goldratt, E. M. The Goal: A Process of Ongoing Improvement. North River Press, 1984. (Foundational work on the Theory of Constraints, applied to software delivery in "The Phoenix Project.")