AI-Powered SDLC Patterns

Overview

AI is not a bolt-on to the software development lifecycle. It is a force multiplier embedded into every phase, from the moment a requirement is conceived to the moment a feature is retired from production. The organisations that treat AI as a separate initiative — a "Copilot pilot" or a "GenAI sandbox" — are the same organisations that treated cloud as a separate initiative a decade ago. They will arrive late.

This document describes practical patterns for embedding AI capabilities across the full SDLC. It is not a vendor catalogue. It is a set of engineering patterns that have been tested in regulated environments where compliance, auditability, and rollback capability are non-negotiable. The patterns are organised by SDLC phase, but the most effective implementations blur those boundaries because AI-generated telemetry from production feeds directly into the next sprint's prioritisation.

The underlying principle: AI augments human judgment, it does not replace it. The best teams use AI to accelerate the routine and elevate the exceptional. They do not use it to outsource thinking. The six patterns below provide the implementation architecture; validation still comes from delivery outcomes, developer trust, and observable business impact.

June 2026 Research Update

The latest AI Research 2026 update changes the SDLC framing in four material ways.

First, agentic software is becoming a new engineering substrate, not just a faster coding assistant. Research such as Agentic Software (arXiv:2606.05608) and The Meta-Agent Challenge (arXiv:2606.04455) pushes teams to treat agent harnesses, tools, memory, evaluation suites, and approval flows as production software. The deliverable is no longer only the code diff; it is the controlled system that generated, tested, explained, and deployed the diff.

Second, SDLC quality now depends on repository-aware evaluation. Dialogue SWE-Bench (arXiv:2606.13995) measures whether coding agents can resolve ambiguity with a user, while CORE-Bench (arXiv:2606.11864) shows that retrieving the right files and functions from a real repository is a distinct capability from generic code search. This makes dialogue quality, retrieval quality, and trajectory quality first-class SDLC metrics.

Third, persistent memory is now an architectural concern. Agent Memory (arXiv:2606.06448) frames long-horizon memory as a stateful workload with measurable construction, retrieval, generation, latency, and cost trade-offs. For SDLC systems, this means the knowledge graph, ADR archive, incident history, and rejected AI suggestions need explicit write, retrieval, retention, and audit policies.

Fourth, production AI is becoming operations research. Papers on RTP-LLM (arXiv:2605.29639), speculative decoding latency (arXiv:2605.15051), and mathematically grounded LLM serving (arXiv:2605.01280) show that agent reliability depends on serving architecture: routing, batching, KV-cache management, and prefill/decode separation can decide whether multi-step workflows complete reliably. This matters because a slow or unstable model backend becomes a delivery risk, not merely an infrastructure cost.

Implementation Playbooks

The six patterns below are vendor-neutral. The implementation details are not. These companion playbooks translate the patterns into concrete operating models for the major agent ecosystems:

Claude Code and Anthropic - local codebase collaboration, skills, subagents, hooks, permissions, MCP, and managed agents.
Codex and OpenAI - Codex skills, AGENTS.md, MCP, non-interactive automation, GitHub Action workflows, SDKs, tracing, and trace grading.
AWS AI-DLC and Bedrock - AI-DLC operating rituals, Kiro, Amazon Q Developer, Bedrock Knowledge Bases, Agents, Guardrails, Strands, and AgentCore.
Hybrid implementation - a practical combination that uses Claude for exploration, Codex for repeatable repo automation, and AWS for governed enterprise runtime.

Pattern 1: AI-Driven Requirements & Specification

The Problem

Requirements drift, ambiguity, and context loss are the single largest sources of rework in software delivery. A business analyst writes a specification. A product owner interprets it. A developer implements their interpretation. A tester validates against a third interpretation. By the time the feature reaches production, the original intent has been refracted through multiple layers of human translation.

The Pattern

Structured natural language → AI-enriched specification → executable acceptance criteria

Capture intent in structured natural language. Use a consistent template: user story format, acceptance criteria, context (why now, who benefits, what changes if we do not ship this), and constraints (compliance, performance, dependencies).
AI enrichment layer. An LLM processes the structured input and:
- Detects ambiguity, contradictions, and missing edge cases
- Suggests additional acceptance criteria based on historical defect patterns
- Identifies cross-system impacts by querying the architecture knowledge graph
- Generates a risk heat map (regulatory, operational, reputational)
- Proposes rollback criteria: under what conditions do we revert?
Human review and refinement. The product owner and engineering lead review the AI-enriched specification. They accept, reject, or modify each suggestion. This review is logged as part of the audit trail.
Executable acceptance criteria. The enriched specification is translated into executable tests (Gherkin, Playwright, or domain-specific test frameworks) before a single line of implementation code is written.

Implementation Notes

Tooling: This pattern works with Jira + Confluence, Azure DevOps, or plain Markdown in Git. The key is not the tool; it is the template discipline.
Governance: In regulated environments, the AI enrichment step must be documented in the change record. Regulators do not care that you used an LLM; they care that you can demonstrate the requirement was reviewed by a human with accountability.
Dialogue quality: June 2026 coding-agent research makes ambiguity resolution measurable. A requirement is not "ready for AI" until the agent can ask bounded clarifying questions and the answers are captured as part of the spec history.
DORA Link: This pattern directly improves Lead Time for Changes by reducing ambiguity-driven rework. It also improves Change Failure Rate by surfacing edge cases before implementation.

Example

At a Tier-1 bank, a product owner writes: "As a customer, I want to transfer funds internationally so that I can pay overseas suppliers." The AI enrichment layer flags:

Missing: currency conversion timing (spot rate vs. forward rate?)
Missing: sanctions screening integration (which lists? OFAC, EU, UN?)
Missing: fee disclosure requirements ( regulator mandates explicit fee display)
Historical pattern: 73% of international transfer defects involve rounding errors in intermediate currency conversions
Suggested acceptance criterion: "Given a GBP→AUD transfer, when the intermediary currency is USD, then the final AUD amount must match the manual calculation performed by the bank's treasury spreadsheet within 0.01%"

The product owner accepts three of four suggestions, rejects the fourth (it conflicts with an existing regulatory interpretation), and the enriched specification moves to the sprint backlog with executable acceptance criteria.

Pattern 2: AI-Assisted Design & Architecture

The Problem

Architecture decisions are made by humans with incomplete information. They cannot hold the entire system topology, dependency graph, and change history in working memory. Consequently, designs often repeat past mistakes, miss cross-service impacts, or optimise for local simplicity at the cost of global complexity.

The Pattern

Architecture decision record + system knowledge graph → AI-assisted impact analysis → human-validated design

System knowledge graph. Maintain a machine-readable representation of the system: services, APIs, data stores, dependencies, ownership, SLAs, and known failure modes. This is not documentation; it is infrastructure. It lives in version control and is validated by CI.
Architecture decision record (ADR). Every significant design decision is captured in a lightweight ADR template: context, decision, consequences, and compliance implications.
AI-assisted impact analysis. Before committing to a design, an LLM queries the knowledge graph and:
- Identifies all services impacted by the proposed change
- Surfaces historical ADRs that faced similar trade-offs
- Detects SLA conflicts (e.g., "This new synchronous call chain adds 400ms to a path with a 300ms SLA")
- Suggests design alternatives with quantitative comparison (latency, cost, blast radius)
- Flags compliance hotspots (e.g., "This data flow crosses a jurisdictional boundary; check privacy requirements")
Human-validated design. The architect or tech lead reviews the AI analysis, selects a design, and documents the rationale. The ADR is linked to the knowledge graph, closing the feedback loop.

Implementation Notes

Knowledge graph representation: C4 models, Backstage catalog YAML, or a simple structured Markdown file per service. The format matters less than the discipline of keeping it current.
Memory policy: Treat the knowledge graph as persistent agent memory. Store confirmed facts, rejected assumptions, superseded ADRs, and incident-derived lessons separately so retrieval can distinguish current truth from historical context.
Not a replacement for architecture review: AI-assisted analysis accelerates preparation for architecture review boards. It does not eliminate them. In regulated environments, the review board remains the approval gate.
DORA Link: Improves Lead Time for Changes by reducing design iteration cycles. Improves Change Failure Rate by surfacing blast radius before commit.

Example

A team proposes adding a real-time fraud scoring service to the payment flow. The AI queries the knowledge graph and reports:

"The synchronous integration point is the payment orchestrator, which has a 250ms P99 latency SLA. The fraud service's P99 is 180ms. You have 70ms of headroom, but only if the fraud service is already warm. Cold start adds 400ms."
"Historical ADR #47 (2023) attempted a similar synchronous integration and was rolled back after breaching the SLA during a marketing campaign. The rollback triggered a 45-minute outage."
"Alternative: async scoring with synchronous fallback to cached risk rating. Adds complexity but preserves SLA."

The architect selects the async-with-fallback pattern, documents the rationale, and the team proceeds with a design that has been informed by historical failure, not just optimistic estimation.

Pattern 3: AI-Powered Coding & Review

The Problem

Code review is a bottleneck. In large organisations, review latency averages 24–48 hours. Junior developers wait for senior attention. Senior developers context-switch between domains they do not own. And even attentive reviewers miss issues: security vulnerabilities, performance anti-patterns, and compliance violations that are obvious in retrospect.

The Pattern

AI-first code generation → AI-assisted review → human sign-off → continuous learning loop

AI-first generation. Developers use AI coding assistants (Copilot, Cursor, Codex, or self-hosted models) to generate boilerplate, refactor code, and explore implementation approaches. The assistant is configured with the team's style guide, architectural constraints, and security rules.
Pre-commit AI review. Before a human reviewer sees the code, an automated pipeline runs:
- Static analysis (SonarQube, Semgrep, CodeQL)
- AI-powered review (custom rules: "Does this change respect the ADR?" "Does this query have an index?" "Is this input validated before use?")
- Security scan (SAST, dependency vulnerability check, secrets detection)
- Compliance check (e.g., "Does this change touch a regulated data flow? If so, is the privacy impact documented?")
Human review with AI context. The human reviewer sees the code alongside the AI analysis. They focus on:
- Architectural fit (does this align with the ADR?)
- Business logic correctness (does this actually solve the problem?)
- Edge cases the AI missed
- Whether the AI-generated tests are meaningful, not just present
Continuous learning. Approved PRs and their AI analyses are fed back into the model context. Rejected AI suggestions are tagged with human rationale. Over time, the AI review becomes more accurate for this specific codebase.
Trajectory inspection. For high-risk changes, review the agent's path, not only the final diff. Which files did it inspect? Which tests did it run? Which assumptions did it make? Agent Trajectories as Programs (arXiv:2606.16988) makes this a measurable signal: two agents can produce similar patches while taking very different, risk-relevant routes.

Implementation Notes

Custom rules over generic suggestions: Generic AI review tools suggest lint-level fixes. Effective teams build custom rule sets that encode their architectural constraints, security posture, and compliance requirements.
The "explain your AI" requirement: In regulated environments, AI-generated code must be accompanied by a human explanation. Not because the AI is untrusted, but because accountability requires a human who can explain why the code works.
Repository retrieval is a gate: If the AI cannot identify the right local files, contracts, owners, tests, and ADRs, it should not be trusted to modify the code. Treat retrieval misses as review findings, not harmless context gaps.
DORA Link: Dramatically improves Lead Time for Changes by reducing review latency. Improves Change Failure Rate by catching security and compliance issues before merge.

Example

A developer asks the AI assistant: "Generate a Python function that accepts a customer ID and returns their last 10 transactions." The AI generates code with a raw SQL query embedded in the function. The pre-commit AI review flags:

"Line 12: Raw SQL with string interpolation. Use parameterized queries per security guideline SEC-003."
"Line 8: No input validation on customer_id. Accepts negative integers and strings, which will cause a database error."
"Line 15: No pagination. If a customer has 10,000 transactions, this returns all of them."
"Compliance note: This query touches the transactions table, which contains PII. Ensure the privacy impact assessment is linked in the PR description."

The developer fixes all four issues, the human reviewer focuses on whether the business logic correctly handles joint account transactions (an edge case the AI did not know about), and the PR ships in 2 hours instead of 2 days.

Pattern 4: AI-Augmented Testing & Quality

The Problem

Test coverage is a vanity metric. Teams chase 80% line coverage while their most critical paths — the ones that process payments, move customer data, or enforce compliance — remain undertested. Exploratory testing is valuable but unscalable. Regression testing takes hours, so teams run it overnight and discover failures the next morning.

The Pattern

Risk-based test prioritisation → AI-generated test cases → AI-driven failure prediction → continuous quality telemetry

Risk-based prioritisation. The system knowledge graph identifies high-risk paths: high transaction volume, recent changes, known failure modes, and regulated data flows. These paths get the most test attention, not the paths that are easiest to test.
AI-generated test cases. Based on the enriched specification (Pattern 1) and the code changes, the AI generates:
- Unit tests for edge cases humans typically miss (null inputs, boundary values, concurrency)
- Integration tests for cross-service interactions
- Contract tests for API consumers
- Chaos tests that simulate dependency failures
AI-driven failure prediction. Before running the full regression suite, a model predicts which tests are most likely to fail based on:
- Which files changed
- Historical correlation between file changes and test failures
- Complexity metrics (cyclomatic complexity, dependency depth) The highest-risk tests run first. If they pass, confidence is high. If they fail, the team knows immediately.
Continuous quality telemetry. Test results, coverage data, and defect reports are fed into a quality dashboard that the AI monitors. Trends are detected before they become crises: "Test flakiness in the payments module has increased 40% over the last 3 sprints. Root cause: a race condition in the new async fraud scoring integration."
Dialogue-aware validation. When a test fails because the requirement was ambiguous, the agent should open a clarifying loop rather than silently overfitting the test. Dialogue SWE-Bench points toward a more realistic quality model: good SDLC agents know when to ask instead of pretending the spec is complete.

Implementation Notes

Generated tests are not free: They must be reviewed for meaningfulness. A test that asserts assertTrue(true) achieves coverage but provides no value. Teams need "meaningful coverage" metrics, not just line coverage.
Failure prediction is probabilistic: It reduces test runtime but does not eliminate the need for full regression before production releases. Use it for developer feedback loops, not as a gate.
Test the retrieval path: Add checks for whether the agent included the right fixtures, contracts, migrations, and owners in its reasoning context. Poor retrieval creates confident but irrelevant tests.
DORA Link: Improves Lead Time for Changes by reducing test runtime. Improves Change Failure Rate by ensuring high-risk paths are thoroughly tested.

Example

A team changes the currency rounding logic in the payment orchestrator. The AI test generator creates:

Unit tests for rounding at 0.005 boundaries (the classic banker's rounding problem)
Integration tests for multi-currency transactions (GBP→USD→AUD)
Chaos tests for the scenario where the exchange rate service returns stale data

The failure prediction model flags these tests as high-risk because:

The payments module has the highest defect density in the codebase
67% of past rounding changes introduced regressions
The exchange rate service had a flaky test in the last 2 sprints

The tests run first. Two fail: one reveals a rounding discrepancy at 0.005, and one reveals that the chaos test's stale-data fallback does not log the incident (a compliance gap). Both are fixed before merge.

Pattern 5: AI-Enabled Deployment & Operations

The Problem

Deployment is still the most dangerous moment in software delivery. Even with CI/CD pipelines, teams deploy with incomplete confidence. They do not know whether the new version will handle production load, whether it will interact correctly with downstream systems, or whether it will fail in ways that the pre-production environment cannot simulate.

The Pattern

Progressive delivery with AI-assisted rollback → AI-driven anomaly detection → automated incident response

Progressive delivery. Deploy to 1% of traffic, then 5%, then 25%, then 100%. At each stage, the AI monitors:
- Error rate (baseline vs. canary)
- Latency distribution (P50, P95, P99)
- Business metrics (conversion rate, transaction completion rate)
- Resource utilisation (CPU, memory, connection pool saturation)
AI-assisted rollback. If any metric deviates beyond a learned threshold, the AI recommends rollback with a confidence score and a human-readable explanation: "Error rate in the payments service increased 300% in the canary region. Root cause: the new version assumes a database index that does not exist in production (it was added in staging but not promoted). Confidence of rollback correctness: 94%."
AI-driven anomaly detection. In production, the AI continuously models normal behaviour for every service. It detects anomalies that rule-based alerting misses:
- Latency drift that is not yet an outage but indicates capacity exhaustion
- Error rate patterns that correlate with specific customer segments
- Resource utilisation trends that predict failure 30 minutes before it happens
Automated incident response. For known failure modes, the AI triggers runbooks automatically:
- Scale up the affected service
- Route traffic around a degraded dependency
- Page the on-call engineer with a pre-populated incident summary
AI-serving health as a deployment signal. If the deployment path relies on AI-generated analysis, then model-serving health is part of release readiness. Monitor queue depth, prefill latency, decode latency, KV-cache eviction, context-cache hit rate, and fallback model activation. A canary decision made by a degraded inference stack is itself suspect.

Implementation Notes

Rollback is a feature, not a failure: Teams that treat rollback as routine deploy more confidently. Teams that treat rollback as a last resort deploy cautiously and slowly. The AI-assisted rollback pattern normalises the former.
Anomaly detection requires baselines: The AI needs 2–4 weeks of production telemetry to learn normal behaviour. During this period, it operates in "observe and report" mode, not "act" mode.
Inference operations matter: Recent LLM-serving research shows that batching, speculative decoding, cache pressure, and request routing materially affect reliability. Put the AI platform on the same SLO footing as any production dependency.
DORA Link: Improves Deployment Frequency by making deployments safer. Improves Mean Time to Recovery (MTTR) by automating detection and response.

Example

A team deploys a new version of the customer onboarding service. The canary deployment at 5% traffic shows:

Error rate: 0.2% (baseline: 0.1%)
P99 latency: 1.2s (baseline: 800ms)
Conversion rate: 3.1% (baseline: 3.8%)

The AI flags the conversion rate drop as the primary concern. It queries the knowledge graph and finds: "The new version adds a third-party identity verification step. Historical data: this step has a 15% drop-off rate." The AI recommends rollback with the explanation: "The identity verification step is causing customer abandonment. Rollback preserves conversion rate while the product team redesigns the flow."

The team rolls back in 3 minutes. The incident is logged. The product team receives the AI-generated analysis and prioritises a redesign of the verification flow.

Pattern 6: AI-Driven Monitoring & Feedback Loops

The Problem

Monitoring generates data, not insight. Dashboards show metrics, but they do not tell teams what to do next. Post-incident reviews produce action items, but they rarely feed back into the prioritisation process. The learning loop is broken.

The Pattern

AI-synthesised operational insights → automatic ticket generation → sprint prioritisation integration

AI-synthesised insights. The AI monitors production telemetry, support tickets, and customer feedback channels. It synthesises weekly operational reports that include:
- Top 3 reliability risks ranked by business impact
- Customer pain points correlated with system behaviour
- Technical debt hotspots that are starting to slow delivery
- Emerging failure modes that do not yet have alerts
Automatic ticket generation. High-priority insights are automatically converted into tickets in the backlog:
- "Add circuit breaker to the exchange rate service (reliability risk #1)"
- "Refactor the payment orchestrator's retry logic (33% of timeout errors trace to this code)"
- "Update the onboarding flow to handle the identity verification drop-off (conversion impact: $2.1M annually)"
Sprint prioritisation integration. The AI-generated tickets are presented to the product owner and engineering lead at sprint planning. They are not automatically prioritised; they are automatically surfaced. The human team decides what to work on, informed by data rather than anecdote.
Memory consolidation. The AI distils incidents, rejected suggestions, rollback decisions, and human rationales into persistent memory cards linked to services, ADRs, tests, and runbooks. These cards are candidates for the knowledge graph; humans approve promotion from "observed lesson" to "trusted operating rule."

Implementation Notes

The "so what?" filter: Raw AI insights are overwhelming. Effective teams configure the AI to answer "So what?" for every insight. Not "Latency increased 12%" but "Latency increased 12%, which correlates with a 3% drop in conversion rate, which costs $X per month."
Feedback to the model: When a team acts on an AI-generated ticket, the outcome is fed back. When they ignore one, the rationale is captured. This closes the learning loop.
Memory is governed: Long-horizon memory improves continuity, but stale or unreviewed memory creates hidden coupling. Every durable memory item needs provenance, owner, expiry, and supersession rules.
DORA Link: Improves MTTR by ensuring operational learning is actioned, not just documented. Improves Lead Time for Changes by prioritising technical debt that slows delivery.

Example

At the end of a sprint, the AI generates the operational report:

"Reliability risk #1: The fraud scoring service has had 12 latency spikes >500ms in the last 14 days. Each spike correlates with a 0.5% drop in payment completion rate. Recommended action: Add a circuit breaker with a 300ms timeout and fallback to cached risk rating."
"Customer pain point: 23% of support tickets mention 'slow login.' Root cause: the new MFA flow adds 4 seconds to authentication. Recommended action: Async MFA with session token caching."
"Technical debt hotspot: The legacy account service has the highest change failure rate (18%) and the longest lead time (14 days). Recommended action: Incremental strangler fig migration, starting with the read path."

The product owner and engineering lead review the recommendations. They prioritise the circuit breaker (high impact, low effort) and the MFA caching (medium impact, medium effort). They defer the account service migration to the next quarter because it requires regulatory approval.

The Team Operating Model — Collaboration Rituals

The six patterns describe what AI does in each phase. They do not describe how a team organises around the AI's output — and without that, the human-accountability principle quietly collapses. An enriched specification that one person rubber-stamps at 5pm on a Friday is not reviewed; it is laundered. The operating model exists to make the human judgement step visible, deliberate, and shared.

This framework deliberately layers rituals onto an existing sprint cadence rather than replacing it. Methodologies that dissolve sprints into continuous "mob everything" sessions assume co-located teams with spare attention. Regulated enterprises rarely have either. So the default here is async-first review, with synchronous escalation triggered by blast radius or ambiguity — not the other way around.

The Rituals

Ritual	Pattern	Trigger	Participants	Time-box	Output
Spec Review	1	AI enrichment completes on a new story	PO, eng lead, tester, domain SME	30 min	Accepted/rejected suggestions, logged
Design Council	2	Change blast radius crosses a set threshold	Eng lead, affected service owners	45 min	Approved ADR + impact sign-off
Review Pairing	3	AI pre-review report is attached to a PR	Author + one reviewer	Async default	Human review focused on business logic
Quality Triage	4	AI generates risk-based tests + failure predictions	Author, tester	Async default	Prioritised test set, accepted coverage
Deployment Watch	5	Canary enters progressive rollout	On-call engineer	Live	Promote / hold / rollback decision
Reliability Review	6	Weekly operational synthesis	PO, eng lead, on-call	30 min	Backlog tickets, ranked by business impact

Operating Principles

Every AI output has a named human owner. The audit trail records who accepted what, not merely that "AI was used." If a reviewer cannot answer the four questions in the Explain Your AI principle below, the output does not ship.
Synchronous only when it earns its cost. A one-line dependency bump does not need a Design Council; a new cross-border payment flow does. The blast-radius score from the knowledge graph decides which path a change takes — not the calendar.
The AI sets the agenda, humans set the decision. In every ritual, the AI's output — flagged ambiguities, impact analysis, failure predictions — is the agenda. This keeps reviews concrete and short, and stops them degrading into opinion theatre.
Rituals are reversible. Each ritual produces an artefact that can be reverted: a rejected suggestion, a held deployment, a re-opened ticket. Reversibility is the entire point of keeping humans in the loop.

Bootstrapping the Knowledge Graph from a Legacy Estate

Patterns 1, 2, and 4 lean on an architecture knowledge graph — a machine-readable model of services, APIs, data stores, dependencies, ownership, SLAs, and failure modes. Greenfield teams build it as they go. Everyone else inherits a legacy estate with no such map. This section describes how to bootstrap one without a year-long discovery project.

The guiding rule: a thin graph that exists beats a perfect graph that is still being designed. You do not need every layer before the patterns deliver value — Pattern 3 (coding and review) needs no graph at all, while Patterns 1, 2, and 4 improve monotonically as the graph fills in.

Build It in Layers

Layer	Source of Truth	How AI Accelerates It	Effort
Services	CI/CD pipelines, repo inventory, IaC	Clusters repos into services and proposes boundaries	Low
Dependencies	Distributed traces, APM, service mesh	Infers call graphs from telemetry; flags undocumented edges	Low–Med
Contracts	OpenAPI/Protobuf specs, schema registries	Generates missing specs from handler code; diffs against runtime	Med
Data stores	Connection config, IaC, query logs	Maps services to stores; classifies regulated data flows	Med
Ownership & SLAs	On-call rosters, runbooks, team charters	Reconciles code ownership with on-call reality	Low
Failure modes	Incident history, postmortems, alert configs	Mines incidents for recurring failure signatures	Med–High
Decisions (ADRs)	Git history, PR discussions, incident reviews	Reconstructs implicit decisions from commit + incident context	High

The Brownfield Workflow

Auto-discover, then validate. Point the AI at the repos, IaC, and traces and let it propose the first three layers (services, dependencies, contracts). A human validates — auto-discovery is a draft, not a source of truth.
Backfill decisions from history. The richest signal in a legacy estate is its incident record. The AI mines postmortems and git history to reconstruct the architectural decisions that were never written down, surfacing them as candidate ADRs for confirmation.
Start descriptive, become prescriptive. For the first few weeks the graph only describes the estate. Once it has two to four weeks of validated telemetry, it can begin advising — surfacing blast radius and SLA conflicts in the Design Council. This mirrors the observe-then-act discipline used for anomaly detection in Pattern 5.
Keep it alive in CI. A graph updated by hand rots within a sprint. A CI hook updates the relevant nodes on every merge, so the graph stays a byproduct of delivery rather than a side project.

Storage

The graph does not need a specialised database to start. Structured Markdown per service (committed to the docs site), a Backstage catalog, or C4 models in version control are all sufficient for the patterns in this document. Reach for a dedicated graph store only when query complexity — multi-hop blast-radius analysis across hundreds of services — actually demands it.

Implementation Maturity Model

Teams do not adopt all patterns at once. The maturity model provides a progression path:

Level	Focus	Patterns	Team Size	Timeframe
Ad hoc	Individual productivity	Pattern 3 (coding assistant)	1–2 developers	Immediate
Team	Review and testing	Patterns 3 + 4 (AI review + test generation)	3–8 developers	1–2 sprints
Squad	Specification and design	Patterns 1 + 2 + 3 + 4	Full squad	1–2 quarters
Program	Deployment and operations	Patterns 1–5	Multiple squads	2–4 quarters
Enterprise	Feedback loops and governance	All 6 patterns + custom governance	Organisation-wide	1–2 years

The trap: teams stay at Level 1 forever because it feels productive. The real value is in the cross-phase patterns (1, 2, 5, 6) that connect the SDLC into a continuous learning system.

Governance & Risk Considerations

The Governance Framework

AI-powered SDLC does not eliminate governance. It changes the nature of governance from "checklists at gates" to "continuous assurance embedded in the workflow."

Risk	Mitigation
AI hallucinations in specifications	Human review gate. The AI enriches; the human approves.
AI-generated security vulnerabilities	Pre-commit security scan + human security review for high-risk changes.
Compliance gaps	AI compliance checks against regulatory rulesets. Human validation for regulated data flows.
Model drift	Monthly review of AI suggestion acceptance rates. If the rate drops, retrain or replace the model.
Accountability ambiguity	Every AI-enriched output is tagged with model version, prompt version, and human reviewer.
Over-reliance on AI	Mandatory "explain without AI" sessions for junior developers. If they cannot explain the code without the AI, they do not understand it.
Harmful overthinking	Cap reasoning budgets by task class. Longer chain-of-thought is not automatically safer or more accurate.
Memory poisoning or staleness	Require provenance, owner, expiry, and supersession policy for durable agent memory.
Agent reward hacking	Evaluate the trajectory and tool-use pattern, not only the final answer or patch.

The "Explain Your AI" Principle

In regulated environments, the following principle applies: Every AI-generated output that affects production must be explainable by a human who is accountable for it.

This does not mean the human must have written the code. It means the human must be able to explain:

What the code does
Why it is correct
What could go wrong
How they would fix it if it failed

If a developer cannot answer these four questions about AI-generated code, the code does not ship.

The "Right-Size Reasoning" Principle

Reasoning models are powerful because they can spend more compute on harder problems. But the June 2026 safety literature makes the opposite point equally important: more thinking can degrade answers when the model reasons past a correct conclusion. SDLC agents should therefore use tiered reasoning budgets:

Task Class	Reasoning Budget	Human Gate
Formatting, dependency bumps, lint fixes	Low	Normal PR review
Business logic, data migration, API contracts	Medium	Owner review + tests
Regulated data, security controls, production rollback	High but bounded	Named accountable approver
Incident response automation	High with trace capture	On-call approval until mature

The rule is simple: raise reasoning depth when the cost of a wrong answer is high, but capture the trace and stop once the decision has enough evidence.

Measuring Impact — DORA Metrics Alignment

The AI-powered SDLC patterns are not theoretical. They are measured by the same DORA metrics that engineering leaders already track:

DORA Metric	Patterns That Impact It	Expected Improvement
Deployment Frequency	Patterns 3, 4, 5 (faster coding, testing, safer deployment)	2–5x increase
Lead Time for Changes	Patterns 1, 2, 3, 4 (less rework, faster review, faster testing)	30–60% reduction
Change Failure Rate	Patterns 1, 2, 4, 5 (better specs, better design, better testing, safer deployment)	50–80% reduction
Mean Time to Recovery	Patterns 5, 6 (automated rollback, anomaly detection, operational feedback)	60–90% reduction

Use the six SDLC patterns as the implementation architecture, and use DORA's AI capabilities research as the validation frame: stronger policies, better platforms, faster feedback loops, user-centricity, and healthy team adoption should show up in the outcome data. Track the delivery metrics before and after AI pattern adoption, then add a small set of guardrail measures:

Deployment rework rate: how often AI-assisted changes need follow-up correction after deployment.
Stability: whether AI-assisted changes preserve service reliability, incident volume, and operational confidence.
Developer trust: whether engineers understand, review, and confidently maintain AI-assisted work.
Business impact: whether the changes improve customer, risk, revenue, cost, or compliance outcomes.

The measurement discipline: do not claim improvement without baseline data, and do not use AI activity as the target. Token volume, prompt counts, generated lines, and agent run counts are diagnostic signals at best. They are not substitutes for delivery performance, stability, trust, and business value. Regulators and executives respect outcomes; they distrust anecdotes and vanity metrics.

Research References

This page applies the June 2026 research synthesis from AI Research 2026 to the software delivery lifecycle:

Agentic Software: How AI Agents Are Restructuring the Software Paradigm (arXiv:2606.05608) — agent harnesses, memory, tools, and evaluation become the software substrate.
The Meta-Agent Challenge (arXiv:2606.04455) — autonomous agent-building remains high variance and requires reward-hacking controls.
Dialogue SWE-Bench (arXiv:2606.13995) — coding-agent evaluation must include clarification and user dialogue.
CORE-Bench (arXiv:2606.11864) — repository-local retrieval is a distinct capability from generic code search.
Agent Trajectories as Programs (arXiv:2606.16988) — agent paths can be inspected and compared, not only final patches.
Thinking Past the Answer (arXiv:2606.02835) — unbounded reasoning can harm correctness; use task-specific thinking budgets.
Reasoning Structure of Large Language Models (arXiv:2606.03883) — reasoning traces can be converted into claim/dependency graphs for auditability.
Agent Memory (arXiv:2606.06448) — long-horizon memory has measurable cost, latency, retrieval, and quality trade-offs.
RTP-LLM (arXiv:2605.29639), Speculative Decoding Latency (arXiv:2605.15051), and LLM Serving Needs Mathematical Optimization (arXiv:2605.01280) — serving architecture is part of SDLC reliability.
Autonomous Incident Resolution at Hyperscale (arXiv:2606.09122) — runbook-backed multi-agent incident response is moving from prototype to production operations.

Conclusion

AI-powered SDLC is not about replacing developers with machines. It is about giving developers machines that handle the routine so they can focus on the exceptional. The patterns in this document have been tested in environments where failure costs millions and regulatory scrutiny is constant. They work because they treat AI as an engineering capability, not a magic wand.

The organisations that will lead the next decade of software delivery are not the ones with the biggest AI budgets. They are the ones with the discipline to embed AI into every phase of the SDLC, with governance guardrails, with measurable outcomes, and with humans who remain accountable for every line of code that reaches production.

Tools come and go. Practices endure.

GitHub Actions Implementation

Each pattern is implemented as a standalone GitHub Actions workflow. They can be adopted individually or chained into a continuous pipeline. The workflows are designed to work with self-hosted Ollama (local or remote) for AI inference, but gracefully degrade if the AI service is unavailable.

Workflow Files

Pattern	Workflow File	Trigger
1. Requirements	`.github/workflows/ai-sdlc-requirements.yml`	Issue / PR opened or edited
2. Design	`.github/workflows/ai-sdlc-design.yml`	PR touching ADRs, architecture, or contracts
3. Code Review	`.github/workflows/ai-sdlc-code-review.yml`	Every PR push
4. Testing	`.github/workflows/ai-sdlc-testing.yml`	Every PR push
5. Deployment	`.github/workflows/ai-sdlc-deploy.yml`	Push to `main`
6. Monitoring	`.github/workflows/ai-sdlc-monitor.yml`	Weekly cron + manual dispatch

Architecture: How the Workflows Integrate

The diagram below shows how the six workflows connect across the CI/CD pipeline, with feedback loops from production back to the backlog.

Design-to-Production Flow

Workflow Highlights

1. Requirements Enrichment (`ai-sdlc-requirements.yml`)

Trigger: Issue or PR opened/edited
Action: Extracts description, sends to Ollama for enrichment
Output: Comment with ambiguity flags, missing edge cases, compliance notes, acceptance criteria
Fallback: Gracefully skips if Ollama is unreachable

2. Design Impact Analysis (`ai-sdlc-design.yml`)

Trigger: PR touching ADRs, architecture docs, or service contracts
Action: Builds context from changed files + related ADRs, queries knowledge graph
Output: Comment with blast radius, SLA conflicts, historical ADR relevance, design alternatives

3. Code Review (`ai-sdlc-code-review.yml`)

Trigger: Every PR push
Action: Generates diff, sends to Ollama for security, architecture, performance, compliance review
Output: Pre-commit AI review comment with severity-ranked checklist
Human role: Focus on business logic correctness and edge cases the AI missed

4. Test Generation (`ai-sdlc-testing.yml`)

Trigger: Every PR push
Action: Identifies changed source files, generates unit/integration/contract/chaos tests per file
Output: Comment with generated test cases + heuristic failure prediction
Heuristic: Flags high-risk paths (payment, auth, fraud) for prioritised testing

5. Progressive Deploy (`ai-sdlc-deploy.yml`)

Trigger: Push to main
Action: Canary at 1%, health check, AI-assisted canary analysis, progressive ramp
Output: Automated rollback if AI verdict is ROLLBACK (based on error rate, latency, conversion metrics)
Stages: 1% → 5% → 25% → 100%

6. Operational Intelligence (`ai-sdlc-monitor.yml`)

Trigger: Weekly cron (Monday 9 AM) + manual dispatch
Action: Gathers operational snapshot, synthesises prioritised action list
Output: Auto-creates up to 3 GitHub issues with labels ai-ops and intelligence
Feedback loop: Tickets feed back into the backlog for sprint prioritisation

Complete Workflow Definitions

Below are the full, copy-paste-ready workflow definitions. Each is wrapped in a collapsible block to keep the document scannable.

1. Requirements Enrichment — ai-sdlc-requirements.yml

# Pattern 1: AI-Driven Requirements & Specification
# Triggers when an issue or PR is opened/updated to enrich specifications
# with AI-generated acceptance criteria, risk flags, and edge cases.
name: "AI-SDLC :: Requirements Enrichment"

on:
  issues:
    types: [opened, edited]
  pull_request:
    types: [opened, edited]
    paths:
      - "docs/adr/**"
      - "docs/specs/**"
      - "*.md"

permissions:
  contents: read
  issues: write
  pull-requests: write

env:
  OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
  enrich-specification:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Extract issue/PR description
        id: extract
        run: |
          if [ "${{ github.event_name }}" = "issues" ]; then
            echo "body<<EOF" >> $GITHUB_OUTPUT
            echo '${{ github.event.issue.body }}' >> $GITHUB_OUTPUT
            echo "EOF" >> $GITHUB_OUTPUT
            echo "number=${{ github.event.issue.number }}" >> $GITHUB_OUTPUT
          else
            echo "body<<EOF" >> $GITHUB_OUTPUT
            echo '${{ github.event.pull_request.body }}' >> $GITHUB_OUTPUT
            echo "EOF" >> $GITHUB_OUTPUT
            echo "number=${{ github.event.pull_request.number }}" >> $GITHUB_OUTPUT
          fi

      - name: AI enrichment via Ollama (local or remote)
        id: ai
        run: |
          PROMPT=$(cat <<PROMPT
You are an AI specification enrichment engine. Given the following requirement,
detect ambiguity, missing edge cases, compliance gaps, and generate executable
acceptance criteria. Respond in structured markdown with sections:
- Ambiguity Flags
- Missing Edge Cases
- Compliance Notes
- Suggested Acceptance Criteria
- Risk Heat Map (low/medium/high)

Requirement:
${{ steps.extract.outputs.body }}
PROMPT
          )

          # Call Ollama (self-hosted runner) or fallback to echo
          RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
            -d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(echo "$PROMPT" | jq -Rs .),\"stream\":false}" \
            --max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Local enrichment skipped.")

          echo "response<<EOF" >> $GITHUB_OUTPUT
          echo "$RESPONSE" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: Post enrichment as comment
        uses: actions/github-script@v7
        with:
          script: |
            const body = `## 🤖 AI Specification Enrichment\n\n${{ steps.ai.outputs.response }}\n\n---\n*Generated by \`ai-sdlc-requirements.yml\` • Model: gemini-3-flash-preview:cloud*`;
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: ${{ steps.extract.outputs.number }},
              body
            });

2. Design Impact Analysis — ai-sdlc-design.yml

# Pattern 2: AI-Assisted Design & Architecture
# Triggers on PRs that modify ADRs, architecture docs, or service contracts.
# Uses the system knowledge graph to surface blast radius, SLA conflicts,
# and historical ADRs that faced similar trade-offs.
name: "AI-SDLC :: Design Impact Analysis"

on:
  pull_request:
    types: [opened, synchronize]
    paths:
      - "docs/adr/**"
      - "docs/architecture/**"
      - "src/**/*.proto"
      - "openapi/**"
      - "terraform/**"

permissions:
  contents: read
  pull-requests: write

env:
  OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
  design-impact-analysis:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout (full history for ADR search)
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Build knowledge graph context
        id: context
        run: |
          echo "📋 Changed files:"
          git diff --name-only origin/${{ github.base_ref }}...HEAD | tee changed_files.txt

          echo "📚 Related ADRs (by keyword overlap):"
          for file in $(cat changed_files.txt); do
            basename=$(basename "$file" .md)
            grep -ril "$basename" docs/adr/ 2>/dev/null || true
          done | sort -u | tee related_adrs.txt

          echo "📊 Service dependencies (from C4 / catalog):"
          if [ -f "catalog-info.yaml" ]; then
            yq eval '.spec.dependsOn[]' catalog-info.yaml 2>/dev/null || echo "No catalog dependencies found"
          fi

          {
            echo "files<<EOF"
            cat changed_files.txt
            echo "EOF"
            echo "adrs<<EOF"
            cat related_adrs.txt
            echo "EOF"
          } >> $GITHUB_OUTPUT

      - name: AI impact analysis
        id: ai
        run: |
          cat > prompt.txt <<PROMPT
You are an architecture impact analysis engine. Given the changed files and
related ADRs below, produce a structured impact report with:
- Blast Radius (services / APIs / data stores impacted)
- SLA Conflict Detection (latency, throughput, availability)
- Historical ADR Relevance (past decisions with similar trade-offs)
- Compliance Hotspots (cross-jurisdiction data flow, PII exposure)
- Design Alternatives (quantitative comparison if possible)
- Recommended Rollback Criteria

Changed Files:
$(cat ${{ steps.context.outputs.files }})

Related ADRs:
$(cat ${{ steps.context.outputs.adrs }})
PROMPT

          RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
            -d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
            --max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Design analysis skipped.")

          echo "response<<EOF" >> $GITHUB_OUTPUT
          echo "$RESPONSE" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: Comment PR with design analysis
        uses: actions/github-script@v7
        with:
          script: |
            const body = `## 🏗️ AI Design Impact Analysis\n\n${{ steps.ai.outputs.response }}\n\n---\n*Generated by \`ai-sdlc-design.yml\` • Model: gemini-3-flash-preview:cloud*`;
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body
            });

3. Code Review — ai-sdlc-code-review.yml

# Pattern 3: AI-Powered Coding & Review
# Triggers on every PR push. Runs AI-assisted review BEFORE human review.
# Focuses on architectural fit, security, compliance, and test coverage.
name: "AI-SDLC :: Code Review"

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read
  pull-requests: write

env:
  OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
  ai-code-review:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Gather PR context
        id: context
        run: |
          git diff origin/${{ github.base_ref }}...HEAD > diff.patch
          echo "lines=$(wc -l < diff.patch)" >> $GITHUB_OUTPUT

      - name: AI pre-commit review
        id: ai
        run: |
          cat > prompt.txt <<PROMPT
You are a senior staff engineer performing pre-commit code review.
Review the diff below for:
1. Security (SQL injection, unsafe deserialization, missing auth, secrets leakage)
2. Architecture (does this respect the ADR? service boundaries? synchronous chains?)
3. Performance (N+1 queries, unbounded loops, memory pressure)
4. Compliance (PII handling, audit trails, regulatory data flow)
5. Testing (meaningful coverage, not just line coverage)
6. Observability (logging, metrics, tracing hooks)

Respond with a markdown checklist. For each issue, include:
- Severity: 🔴 Critical / 🟠 High / 🟡 Medium / 🟢 Low
- File and line reference
- Explanation
- Suggested fix

If no issues found, state "✅ Clean — no significant issues detected."

Diff:
$(cat diff.patch)
PROMPT

          RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
            -d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
            --max-time 120 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. AI review skipped.")

          echo "response<<EOF" >> $GITHUB_OUTPUT
          echo "$RESPONSE" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: Post AI review comment
        uses: actions/github-script@v7
        with:
          script: |
            const body = `## 📝 AI Pre-Commit Review\n\n${{ steps.ai.outputs.response }}\n\n---\n*Generated by \`ai-sdlc-code-review.yml\` • Model: gemini-3-flash-preview:cloud*`;
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body
            });

      - name: Static analysis (non-AI baseline)
        uses: github/super-linter@v6
        if: false  # Enable when configured
        env:
          DEFAULT_BRANCH: main
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

4. Test Generation — ai-sdlc-testing.yml

# Pattern 4: AI-Augmented Testing & Quality
# Triggers on PRs. Generates risk-based tests, predicts failures,
# and surfaces quality telemetry before merge.
name: "AI-SDLC :: Test Generation & Quality"

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read
  pull-requests: write

env:
  OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
  ai-test-generation:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Identify changed source files
        id: files
        run: |
          git diff --name-only origin/${{ github.base_ref }}...HEAD | grep -E '\.(js|ts|py|java|go|rs)$' | tee changed_src.txt || true
          echo "count=$(wc -l < changed_src.txt | tr -d ' ')" >> $GITHUB_OUTPUT

      - name: AI test case generation
        if: steps.files.outputs.count != '0'
        id: ai
        run: |
          for file in $(cat changed_src.txt); do
            echo "🔬 Generating tests for: $file"
            cat > prompt.txt <<PROMPT
Generate test cases for the following source file.
Include:
- Unit tests for edge cases (null, empty, boundary, concurrency)
- Integration tests for cross-service interactions
- Contract tests if the file defines an API
- Chaos / failure mode tests

File: $file
Content:
$(cat "$file" 2>/dev/null || echo "File not found in checkout")
PROMPT

            RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
              -d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
              --max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Test generation skipped.")

            echo "### $file" >> test_report.md
            echo "\`\`\`" >> test_report.md
            echo "$RESPONSE" >> test_report.md
            echo "\`\`\`" >> test_report.md
            echo "" >> test_report.md
          done

          {
            echo "report<<EOF"
            cat test_report.md
            echo "EOF"
          } >> $GITHUB_OUTPUT

      - name: Post test generation report
        if: steps.files.outputs.count != '0'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const report = fs.readFileSync('test_report.md', 'utf8');
            const body = `## 🧪 AI-Generated Test Cases\n\n${report}\n\n---\n*Generated by \`ai-sdlc-testing.yml\` • Model: gemini-3-flash-preview:cloud*`;
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body
            });

      - name: Failure prediction (heuristic)
        id: predict
        run: |
          echo "📊 Failure prediction heuristic:"
          echo "- Files changed: ${{ steps.files.outputs.count }}"
          echo "- High-risk paths: payment*, auth*, fraud*"
          git diff --name-only origin/${{ github.base_ref }}...HEAD | grep -iE '(payment|auth|fraud|transfer)' && echo "🔴 HIGH RISK: Financial flow touched" || echo "🟢 Standard risk"

5. Progressive Deploy — ai-sdlc-deploy.yml

# Pattern 5: AI-Enabled Deployment & Operations
# Progressive delivery with AI-assisted rollback based on canary metrics.
name: "AI-SDLC :: Progressive Deploy"

on:
  push:
    branches: [main]

permissions:
  contents: read
  deployments: write

env:
  OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Build artifact
        run: echo "Building artifact..."  # Replace with real build

  deploy-canary:
    needs: build
    runs-on: ubuntu-latest
    environment:
      name: canary
      url: https://canary.dwaynehelena.com
    steps:
      - name: Deploy to canary (1%)
        run: echo "🚀 Deployed to canary at 1% traffic"

      - name: Canary health check
        run: |
          sleep 30
          echo "✅ Canary healthy — error rate 0.1%, latency P99 280ms"

      - name: AI-assisted canary analysis
        id: ai
        run: |
          cat > prompt.txt <<PROMPT
You are a deployment safety engine. Given the following canary metrics,
recommend: PROCEED, HOLD, or ROLLBACK with justification.

Canary Metrics:
- Error rate: 0.1% (baseline: 0.1%)
- P50 latency: 45ms (baseline: 42ms)
- P99 latency: 280ms (baseline: 220ms) — 27% increase
- CPU utilisation: 65% (baseline: 58%)
- Memory utilisation: 72% (baseline: 65%)
- Conversion rate: 3.7% (baseline: 3.8%)

Respond with a single verdict and explanation.
PROMPT

          RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
            -d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
            --max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Manual review required.")

          echo "$RESPONSE"
          echo "verdict=$(echo "$RESPONSE" | grep -oE 'PROCEED|HOLD|ROLLBACK' | head -1)" >> $GITHUB_OUTPUT

      - name: Rollback on AI recommendation
        if: contains(steps.ai.outputs.verdict, 'ROLLBACK')
        run: |
          echo "🛑 AI recommended ROLLBACK — executing..."
          # Replace with actual rollback command
          exit 1

  deploy-production:
    needs: deploy-canary
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://dwaynehelena.com
    steps:
      - name: Progressive ramp (5% → 25% → 100%)
        run: |
          echo "📈 Ramping to 5%..."
          sleep 60
          echo "📈 Ramping to 25%..."
          sleep 120
          echo "📈 Ramping to 100%..."
          echo "✅ Fully deployed to production"

6. Operational Intelligence — ai-sdlc-monitor.yml

# Pattern 6: AI-Driven Monitoring & Feedback Loops
# Scheduled job that synthesises operational insights and auto-generates
# backlog tickets for reliability risks and customer pain points.
name: "AI-SDLC :: Operational Intelligence"

on:
  schedule:
    - cron: '0 9 * * 1'  # Weekly Monday 9 AM
  workflow_dispatch:

permissions:
  contents: read
  issues: write

env:
  OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
  operational-intelligence:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Gather operational data (placeholder)
        id: data
        run: |
          cat > ops_report.txt <<REPORT
Weekly Operational Snapshot (placeholder data):
- Top reliability risk: Fraud scoring service latency spikes (>500ms, 12 incidents)
- Customer pain point: 23% of support tickets mention "slow login"
- Technical debt hotspot: Legacy account service — 18% change failure rate
- Emerging failure mode: Exchange rate API timeout cascade (no alert yet)
REPORT
          echo "report<<EOF" >> $GITHUB_OUTPUT
          cat ops_report.txt >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: AI synthesis of insights
        id: ai
        run: |
          cat > prompt.txt <<PROMPT
You are an operational intelligence engine. Given the weekly snapshot below,
synthesise a prioritised action list with business impact justification.
Format each item as:
- [ ] ACTION: description | IMPACT: $X / month | EFFORT: S/M/L | OWNER: team

Snapshot:
$(cat ops_report.txt)
PROMPT

          RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
            -d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
            --max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Manual analysis required.")

          echo "insights<<EOF" >> $GITHUB_OUTPUT
          echo "$RESPONSE" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: Create backlog issues from AI insights
        uses: actions/github-script@v7
        with:
          script: |
            const insights = `${{ steps.ai.outputs.insights }}`;
            const lines = insights.split('\n').filter(l => l.trim().startsWith('- [ ]'));
            for (const line of lines.slice(0, 3)) {
              const title = line.replace(/^- \[ \] /, '').split('|')[0].trim();
              const body = `## Operational Intelligence Insight\n\n${line}\n\n---\n*Generated by \`ai-sdlc-monitor.yml\` • Model: gemini-3-flash-preview:cloud*`;
              await github.rest.issues.create({
                owner: context.repo.owner,
                repo: context.repo.repo,
                title: `[AI-Ops] ${title}`,
                body,
                labels: ['ai-ops', 'intelligence']
              });
            }

Environment Configuration

All workflows use the OLLAMA_HOST environment variable. Configure this as a GitHub secret if your Ollama instance is remote:

Secret	Description
`OLLAMA_HOST`	URL to your Ollama instance (default: `http://localhost:11434`)

For self-hosted runners with Ollama running locally, no secret is required — the default works out of the box.

Adoption Path

Maturity Level	Workflows to Enable	Timeframe
Ad hoc	`ai-sdlc-code-review.yml` only	Immediate
Team	Code review + testing	1–2 sprints
Squad	Requirements + design + code + testing	1–2 quarters
Program	All 5 PR/Merge workflows	2–4 quarters
Enterprise	All 6 workflows + custom governance rules	1–2 years

Start with the code review workflow. It provides immediate value with zero disruption to existing processes. Add requirements and testing next. Deployment and monitoring require more infrastructure (canary environment, telemetry pipeline) and should follow once the earlier patterns are stable.

Customisation

Each workflow is designed to be forked and customised:

Model swap: Replace gemini-3-flash-preview:cloud with your preferred Ollama model in the curl payload
Custom rules: Add team-specific security rules, architectural constraints, or compliance checks to the prompts
Integrations: Replace GitHub comments with Slack notifications, Jira tickets, or PagerDuty alerts
Timeouts: Adjust --max-time based on your model's inference speed
Fallbacks: The || echo "⚠️ ..." pattern ensures CI never fails because the AI is down

The goal is not to prescribe a rigid toolchain. It is to provide a reference implementation that teams can adapt to their context, their model provider, their compliance requirements, and their delivery cadence.

Overview​

June 2026 Research Update​

Implementation Playbooks​

Pattern 1: AI-Driven Requirements & Specification​

The Problem​

The Pattern​

Implementation Notes​

Example​

Pattern 2: AI-Assisted Design & Architecture​

The Problem​

The Pattern​

Implementation Notes​

Example​

Pattern 3: AI-Powered Coding & Review​

The Problem​

The Pattern​

Implementation Notes​

Example​

Pattern 4: AI-Augmented Testing & Quality​

The Problem​

The Pattern​

Implementation Notes​

Example​

Pattern 5: AI-Enabled Deployment & Operations​

The Problem​

The Pattern​

Implementation Notes​

Example​

Pattern 6: AI-Driven Monitoring & Feedback Loops​

The Problem​

The Pattern​

Implementation Notes​

Example​

The Team Operating Model — Collaboration Rituals​

The Rituals​

Operating Principles​

Bootstrapping the Knowledge Graph from a Legacy Estate​

Build It in Layers​

The Brownfield Workflow​

Storage​

Implementation Maturity Model​

Governance & Risk Considerations​

The Governance Framework​

The "Explain Your AI" Principle​

The "Right-Size Reasoning" Principle​

Measuring Impact — DORA Metrics Alignment​

Research References​

Conclusion​

GitHub Actions Implementation​

Workflow Files​

Architecture: How the Workflows Integrate​

Design-to-Production Flow​

Workflow Highlights​

1. Requirements Enrichment (ai-sdlc-requirements.yml)​

2. Design Impact Analysis (ai-sdlc-design.yml)​

3. Code Review (ai-sdlc-code-review.yml)​

4. Test Generation (ai-sdlc-testing.yml)​

5. Progressive Deploy (ai-sdlc-deploy.yml)​

6. Operational Intelligence (ai-sdlc-monitor.yml)​

Complete Workflow Definitions​

Environment Configuration​

Adoption Path​

Customisation​

Overview

June 2026 Research Update

Implementation Playbooks

Pattern 1: AI-Driven Requirements & Specification

The Problem

The Pattern

Implementation Notes

Example

Pattern 2: AI-Assisted Design & Architecture

The Problem

The Pattern

Implementation Notes

Example

Pattern 3: AI-Powered Coding & Review

The Problem

The Pattern

Implementation Notes

Example

Pattern 4: AI-Augmented Testing & Quality

The Problem

The Pattern

Implementation Notes

Example

Pattern 5: AI-Enabled Deployment & Operations

The Problem

The Pattern

Implementation Notes

Example

Pattern 6: AI-Driven Monitoring & Feedback Loops

The Problem

The Pattern

Implementation Notes

Example

The Team Operating Model — Collaboration Rituals

The Rituals

Operating Principles

Bootstrapping the Knowledge Graph from a Legacy Estate

Build It in Layers

The Brownfield Workflow

Storage

Implementation Maturity Model

Governance & Risk Considerations

The Governance Framework

The "Explain Your AI" Principle

The "Right-Size Reasoning" Principle

Measuring Impact — DORA Metrics Alignment

Research References

Conclusion

GitHub Actions Implementation

Workflow Files

Architecture: How the Workflows Integrate

Design-to-Production Flow

Workflow Highlights

1. Requirements Enrichment (`ai-sdlc-requirements.yml`)

2. Design Impact Analysis (`ai-sdlc-design.yml`)

3. Code Review (`ai-sdlc-code-review.yml`)

4. Test Generation (`ai-sdlc-testing.yml`)

5. Progressive Deploy (`ai-sdlc-deploy.yml`)

6. Operational Intelligence (`ai-sdlc-monitor.yml`)

Complete Workflow Definitions

Environment Configuration

Adoption Path

Customisation