Skip to main content

Fast Flow

The Platform That Unlocked Everything

The single most impactful investment during the DevSecOps transformation at the Tier-1 bank was not a security tool, a scanning engine, or a compliance framework. It was a platform engineering team.

At the outset, every delivery team was building and maintaining its own CI/CD pipeline, its own deployment scripts, its own monitoring stack, and its own approach to security scanning. Teams spent thirty to forty percent of their capacity on undifferentiated infrastructure work -- the same problems solved differently (and often poorly) by every team in the organization. Cognitive load was crushing. Teams that should have been focused on building differentiated banking products were instead debugging Kubernetes networking, writing custom Terraform modules from scratch, and fighting with certificate management. Matthew Skelton and Manuel Pais describe this failure mode precisely in "Team Topologies" (2019): when every team must be an expert in every layer of the stack, cognitive load exceeds capacity, flow collapses, and delivery performance degrades. The solution is to organize teams around the flow of value and build platform capabilities that absorb the complexity that would otherwise be distributed across every team.

A platform engineering team was built whose mission was simple: make it trivially easy for any delivery team to build, test, secure, deploy, and operate their services. The platform team did not write the applications -- they built the golden paths, the shared infrastructure, the self-service capabilities, and the observability stack that every team consumed. Within a year, delivery teams reduced their infrastructure overhead from thirty-five percent of capacity to under ten percent. That freed capacity flowed directly into product development and security improvement. Deployment frequency across the organization tripled. Lead time dropped by seventy percent. Don Reinertsen's "The Principles of Product Development Flow" (2009) establishes that flow efficiency is determined by the ratio of value-adding time to total time. Platform engineering is the discipline of minimizing the non-value-adding time.

Key Components

  • Continuous Delivery: Automating the build, test, and deployment process to enable frequent and reliable releases.
  • Working in Small Batches: Breaking down work into smaller, manageable units to reduce risk and improve flow.
  • Streamlining Change Approval: Implementing efficient change management processes to minimize delays.
  • Flexible Infrastructure: Utilizing infrastructure that can be easily provisioned and scaled to meet changing demands.
  • Team Topologies: Organizing teams around the flow of value, with clear interaction modes and deliberate reduction of cognitive load.
  • Platform Engineering: Building internal platforms that abstract away complexity and provide self-service capabilities to stream-aligned teams.

Team Topologies and Fast Flow

Skelton and Pais identify four fundamental team types that enable fast flow:

Stream-Aligned Teams

Stream-aligned teams are organized around a single valuable stream of work -- a product, a service, a set of features, or a user journey. They have end-to-end ownership and can deliver value independently without waiting for other teams.

At the bank, the organization moved from component-based teams (a "database team," a "middleware team," a "frontend team") to stream-aligned teams organized around banking capabilities: payments, account management, lending, fraud detection. Each stream-aligned team owned their services end-to-end, from code to production. This reorganization eliminated the handoff queues that had been the largest source of lead time waste. A change that previously required coordination across four teams now required coordination within a single team.

Platform Teams

Platform teams build and maintain the internal platform that stream-aligned teams consume. The platform reduces cognitive load by abstracting away infrastructure complexity and providing self-service capabilities.

The platform built at the bank included:

  • Golden path pipeline templates: Pre-configured CI/CD pipelines with security scanning, testing, and deployment built in.
  • Self-service infrastructure: Terraform modules and Kubernetes namespaces that teams could provision through a portal without filing tickets.
  • Observability stack: Centralized logging, metrics, tracing, and alerting that teams adopted by adding a single configuration file.
  • Secret management: HashiCorp Vault integration that teams consumed through a simple API without managing Vault infrastructure.
  • Compliance automation: Policy-as-code enforcement and automated evidence generation embedded in the platform.

The platform team treated stream-aligned teams as their customers. They ran product discovery, collected feedback, measured adoption, and iterated on their offerings. The platform was not mandated -- it was adopted because it was genuinely easier and better than the alternative.

Platform engineering has accelerated from an emerging practice to an industry standard. Gartner projects that by 2026, 80% of software engineering organisations will establish platform teams to provide reusable services and tools. The frontier is the convergence of platform engineering and AI: platform teams are beginning to define 'agent golden paths' — opinionated, guardrailed pathways for deploying and operating AI agents within organisational infrastructure — mirroring the same golden path pattern used for traditional software delivery. This convergence means that the platform engineering investment at the bank is not just paying dividends for current delivery performance; it is building the foundation for AI-native development workflows.

Enabling Teams

Enabling teams help stream-aligned teams acquire new capabilities. They do not build the software for the stream-aligned teams; they coach, teach, and facilitate skill development.

At the bank, the security champions program functioned as a distributed enabling capability. Security engineers rotated through stream-aligned teams for two-week engagements, pairing with developers on threat modeling, secure coding practices, and security test development. After the engagement, the stream-aligned team had the capability to sustain the practice independently. This is the enabling team pattern from Team Topologies applied to security.

Complicated-Subsystem Teams

Complicated-subsystem teams own components that require deep specialist knowledge -- cryptographic services, regulatory calculation engines, or machine learning model serving infrastructure.

At the bank, the cryptographic services team was a complicated-subsystem team. They maintained the HSM integration, key management infrastructure, and encryption libraries that every stream-aligned team consumed. Stream-aligned teams did not need to understand the intricacies of key rotation or FIPS 140-2 compliance; they consumed the cryptographic services through well-documented APIs provided by the specialist team.

Detailed Explanations and Examples

Continuous Delivery

Continuous Delivery involves automating the build, test, and deployment process to enable frequent and reliable releases. Key practices include:

  • Automated Builds: Using tools like Jenkins or GitHub Actions to automate the build process.
  • Automated Testing: Running tests automatically to verify the correctness of code changes.
  • Frequent Deployments: Deploying code changes frequently to reduce the risk of integration problems.
  • Deployment Pipelines: Multi-stage pipelines that enforce quality and security gates automatically.

Humble and Farley's "Continuous Delivery" (2010) defines the deployment pipeline as the central artifact of continuous delivery: an automated manifestation of your process for getting software from version control into the hands of your users. Every organization has a process for this. The question is whether that process is automated, repeatable, and auditable -- or manual, error-prone, and opaque.

Example: Automated Builds

Automated builds involve using tools like Jenkins or GitHub Actions to automate the process of building and testing code. This ensures that code changes are integrated smoothly and any issues are detected early.

At the bank, the golden path pipeline executed the following stages automatically on every pull request:

  1. Compile and build: Source code compilation and artifact generation.
  2. Unit tests: Execution of the team's unit test suite with coverage enforcement.
  3. SAST scan: Static application security testing against OWASP Top 10 vulnerability classes.
  4. SCA scan: Software composition analysis for dependency vulnerabilities.
  5. Container image build: Docker image construction from hardened base images.
  6. Container scan: Image scanning against CIS Benchmarks.
  7. Integration tests: Execution against a transient test environment.
  8. Policy check: Open Policy Agent evaluation of deployment policies.

The entire pipeline completed in under twelve minutes for most services. Teams that built their own pipelines typically took thirty to forty-five minutes and missed several of these stages.

Working in Small Batches

Working in Small Batches involves breaking down work into smaller, manageable units to reduce risk and improve flow. Key practices include:

  • Incremental Development: Developing features incrementally to reduce the risk of large changes.
  • Frequent Commits: Committing code changes frequently to detect issues early.
  • Continuous Integration: Integrating code changes frequently to ensure that they work well together.

Reinertsen's "The Principles of Product Development Flow" provides the mathematical foundation for why small batches outperform large batches. Transaction costs in software delivery (build time, test time, deployment time) have decreased dramatically due to automation, while holding costs (the cost of unreleased work, integration risk, and delayed feedback) remain high. When transaction costs are low and holding costs are high, the economically optimal batch size is small. This is not a philosophical preference -- it is a mathematical inevitability.

Example: Incremental Development

Incremental development involves developing features in small, manageable units. For example, instead of developing a large feature all at once, it can be broken down into smaller tasks that can be completed and tested incrementally.

At the bank, teams were coached to break down features into increments that could be merged and deployed independently within one to three days. A new payment feature that might have been specified as a single three-month project was decomposed into dozens of independently deployable increments: API endpoint scaffolding, database schema migration, business logic for each payment type, validation rules, error handling, monitoring instrumentation, and feature toggle configuration. Each increment delivered value (or at least reduced risk) independently, and the full feature was assembled incrementally in production behind a feature toggle.

Streamlining Change Approval

Streamlining Change Approval involves implementing efficient change management processes to minimize delays. Key practices include:

  • Automated Approval Workflows: Using tools to automate the approval process for code changes.
  • Peer Reviews: Conducting peer reviews to ensure code quality and reduce the risk of defects.
  • Continuous Feedback: Providing continuous feedback to developers on code quality and security issues.
  • Standard Change Pre-Approval: Pre-approving change types that follow established, automated processes.

Example: Automated Approval Workflows

Automated approval workflows involve using tools like GitHub Actions or Jenkins to automate the process of approving code changes. This helps in reducing delays and ensuring that code changes are reviewed and approved quickly.

At the bank, the change approval process was the single largest source of lead time waste before the transformation. Changes required manual approval from a change advisory board that met weekly. If you missed the meeting, you waited another week. Emergency changes required phone calls to multiple approvers. This was replaced with the tiered model described in the software delivery section: standard changes (golden path, all checks passing) were pre-approved and deployed automatically, while normal and emergency changes followed streamlined, asynchronous approval workflows. This reduced change approval lead time from an average of five days to under two hours.

Flexible Infrastructure

Flexible Infrastructure involves utilizing infrastructure that can be easily provisioned and scaled to meet changing demands. Key practices include:

  • Infrastructure as Code (IaC): Managing infrastructure using code to ensure consistency and repeatability.
  • Containerization: Using containers to package and deploy applications consistently across different environments.
  • Scalability: Ensuring that infrastructure can be easily scaled to handle increased workloads.
  • Self-Service Provisioning: Enabling teams to provision the infrastructure they need without filing tickets or waiting for a central team.

Example: Infrastructure as Code (IaC)

Infrastructure as Code (IaC) involves managing and provisioning infrastructure through code. For example, using tools like Terraform or Ansible to define and manage infrastructure as code ensures consistency and repeatability across different environments.

At the bank, self-service infrastructure provisioning was a transformative capability. Before the platform team built it, provisioning a new environment required filing a ticket with the infrastructure team, waiting two to four weeks, and then discovering that the environment did not match the specification. After the platform was built, a developer could provision a complete, production-grade Kubernetes namespace with networking, logging, monitoring, secret management, and security policies applied -- in under five minutes, through a self-service portal. The infrastructure was defined in code, version-controlled, and reproducible. This capability alone reduced the average time to onboard a new microservice from six weeks to two days.

Reducing Cognitive Load for Fast Flow

Skelton and Pais argue that cognitive load is the primary constraint on team effectiveness. When teams must understand too many things -- the domain they serve, the programming languages they use, the infrastructure they deploy to, the security controls they must implement, the compliance evidence they must generate -- they slow down, make mistakes, and burn out.

The platform engineering approach directly addresses cognitive load by absorbing complexity into the platform. At the bank, cognitive load was measured through team surveys and tracked as a leading indicator of delivery performance. Teams with high cognitive load scores had lower deployment frequency, higher change failure rates, and higher attrition. As platform adoption increased, cognitive load scores decreased and delivery metrics improved in lockstep.

The three types of cognitive load from Skelton and Pais map to specific platform investments:

  • Intrinsic cognitive load (the complexity inherent in the problem domain): Cannot be reduced by the platform, but can be managed through team boundaries that limit the scope of the domain each team owns.
  • Extraneous cognitive load (the complexity of the environment and tools): Directly reduced by platform capabilities that abstract away infrastructure, security, and compliance complexity.
  • Germane cognitive load (the complexity of learning and improving): Supported by enabling teams, documentation, and communities of practice.

The goal of platform engineering is to minimize extraneous cognitive load so that teams can devote maximum capacity to intrinsic and germane cognitive load -- the work that actually matters.

Benefits

  • Reduced lead time for changes
  • Increased deployment frequency
  • Improved ability to respond to customer needs
  • Reduced cognitive load on stream-aligned teams
  • Higher developer satisfaction and retention
  • More consistent security and compliance posture across the organization

Reduced Lead Time for Changes

Reducing the lead time for changes involves minimizing the time it takes to implement and deploy code changes. Key practices include:

  • Automated Testing: Running tests automatically to verify the correctness of code changes.
  • Continuous Integration: Integrating code changes frequently to detect issues early.
  • Frequent Deployments: Deploying code changes frequently to reduce the risk of integration problems.
  • Platform-Provided Capabilities: Consuming pre-built platform capabilities rather than building from scratch.

Example: Automated Testing

Automated testing involves using tools like Jenkins or GitHub Actions to run tests automatically. This helps in identifying and addressing issues quickly, reducing the lead time for changes.

Increased Deployment Frequency

Increasing the deployment frequency involves deploying code changes frequently to reduce the risk of integration problems. Key practices include:

  • Automated Builds: Using tools like Jenkins or GitHub Actions to automate the build process.
  • Frequent Commits: Committing code changes frequently to detect issues early.
  • Continuous Delivery: Automating the build, test, and deployment process to enable frequent and reliable releases.

Example: Frequent Commits

Frequent commits involve committing code changes frequently to detect issues early. This helps in reducing the risk of integration problems and ensuring that code changes are integrated smoothly.

Improved Ability to Respond to Customer Needs

Improving the ability to respond to customer needs involves optimizing the software development and delivery process to enable frequent and reliable releases. Key practices include:

  • Continuous Delivery: Automating the build, test, and deployment process to enable frequent and reliable releases.
  • Working in Small Batches: Breaking down work into smaller, manageable units to reduce risk and improve flow.
  • Streamlining Change Approval: Implementing efficient change management processes to minimize delays.

Example: Continuous Delivery

Continuous delivery involves automating the build, test, and deployment process to enable frequent and reliable releases. This helps in responding to customer needs quickly by ensuring that code changes are deployed frequently and reliably.

At the bank, fast flow translated directly to competitive advantage. When a regulatory change required all payment services to implement a new validation rule within thirty days, teams on the golden path delivered the change in under a week. Teams that had not adopted the platform required the full thirty days. The difference was not skill -- it was flow. Teams with fast flow had low lead times, automated testing, and deployment confidence. They could respond to urgent requirements quickly because their delivery infrastructure was not a bottleneck.

Tools and Technologies for Fast Flow

Several tools and technologies can help in implementing fast flow practices, including:

  • Jenkins: An open-source automation server for continuous integration and continuous delivery (CI/CD).
  • GitHub Actions: A CI/CD tool that allows you to automate workflows directly from your GitHub repository.
  • Terraform: An open-source tool for defining and provisioning infrastructure as code.
  • Ansible: An open-source automation tool for managing and provisioning infrastructure.
  • Docker: A platform for developing, shipping, and running applications in containers.
  • Kubernetes: An open-source platform for automating the deployment, scaling, and management of containerized applications.
  • Backstage: An open-source developer portal for building internal developer platforms (originally created by Spotify).
  • Crossplane: A Kubernetes-native platform for building and consuming infrastructure through declarative APIs.
  • Argo CD: A declarative, GitOps continuous delivery tool for Kubernetes.

Flow Metrics and Measurement

Beyond DORA metrics, Reinertsen's product development flow framework provides additional metrics for measuring and optimizing flow:

  • Flow Efficiency: The ratio of active work time to total lead time. In most organizations, flow efficiency is below fifteen percent -- meaning that work items spend over eighty-five percent of their time waiting in queues. At the bank, flow efficiency improved from twelve percent to thirty-eight percent through queue management, work-in-progress limits, and platform-provided automation.
  • Work in Progress (WIP): The number of items actively being worked on. Reinertsen demonstrates that WIP is the primary driver of lead time (per Little's Law). Reducing WIP reduces lead time proportionally. WIP limits were implemented at the team level and the organizational level.
  • Queue Length: The number of items waiting in queues between stages. Long queues indicate bottlenecks. Queue lengths were visualized in real-time dashboards and used to trigger process improvements.
  • Cycle Time Distribution: Rather than tracking average cycle time (which masks variability), the team tracked the full distribution and focused on reducing the tail -- the items that took disproportionately long to deliver.

References

  1. Skelton, M. and Pais, M. Team Topologies: Organizing Business and Technology Teams for Fast Flow. IT Revolution Press, 2019.

  2. Reinertsen, D. G. The Principles of Product Development Flow: Second Generation Lean Product Development. Celeritas Publishing, 2009.

  3. Humble, J. and Farley, D. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley, 2010.

  4. Forsgren, N., Humble, J., and Kim, G. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations. IT Revolution Press, 2018.

  5. Kim, G., Humble, J., Debois, P., and Willis, J. The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations. IT Revolution Press, 2016.

  6. Kim, G., Behr, K., and Spafford, G. The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win. IT Revolution Press, 2013.

  7. DORA State of DevOps Report 2024. Google Cloud, 2024. Available at: https://dora.dev/research/2024/dora-report/

  8. Little, J. D. C. "A Proof for the Queuing Formula: L = lambda W." Operations Research, Vol. 9, No. 3, 1961, pp. 383-387. (Mathematical foundation for the relationship between WIP and lead time.)

  9. Gartner. "Platform Engineering Predictions for 2026." Referenced in platformengineering.org. Available at: https://platformengineering.org/blog/10-platform-engineering-predictions-for-2026