Skip to main content

AI-Powered SDLC Patterns

Overview

AI is not a bolt-on to the software development lifecycle. It is a force multiplier embedded into every phase, from the moment a requirement is conceived to the moment a feature is retired from production. The organisations that treat AI as a separate initiative — a "Copilot pilot" or a "GenAI sandbox" — are the same organisations that treated cloud as a separate initiative a decade ago. They will arrive late.

This document describes practical patterns for embedding AI capabilities across the full SDLC. It is not a vendor catalogue. It is a set of engineering patterns that have been tested in regulated environments where compliance, auditability, and rollback capability are non-negotiable. The patterns are organised by SDLC phase, but the most effective implementations blur those boundaries because AI-generated telemetry from production feeds directly into the next sprint's prioritisation.

The underlying principle: AI augments human judgment, it does not replace it. The best teams use AI to accelerate the routine and elevate the exceptional. They do not use it to outsource thinking.


Pattern 1: AI-Driven Requirements & Specification

The Problem

Requirements drift, ambiguity, and context loss are the single largest sources of rework in software delivery. A business analyst writes a specification. A product owner interprets it. A developer implements their interpretation. A tester validates against a third interpretation. By the time the feature reaches production, the original intent has been refracted through multiple layers of human translation.

The Pattern

Structured natural language → AI-enriched specification → executable acceptance criteria

  1. Capture intent in structured natural language. Use a consistent template: user story format, acceptance criteria, context (why now, who benefits, what changes if we do not ship this), and constraints (compliance, performance, dependencies).

  2. AI enrichment layer. An LLM processes the structured input and:

    • Detects ambiguity, contradictions, and missing edge cases
    • Suggests additional acceptance criteria based on historical defect patterns
    • Identifies cross-system impacts by querying the architecture knowledge graph
    • Generates a risk heat map (regulatory, operational, reputational)
    • Proposes rollback criteria: under what conditions do we revert?
  3. Human review and refinement. The product owner and engineering lead review the AI-enriched specification. They accept, reject, or modify each suggestion. This review is logged as part of the audit trail.

  4. Executable acceptance criteria. The enriched specification is translated into executable tests (Gherkin, Playwright, or domain-specific test frameworks) before a single line of implementation code is written.

Implementation Notes

  • Tooling: This pattern works with Jira + Confluence, Azure DevOps, or plain Markdown in Git. The key is not the tool; it is the template discipline.
  • Governance: In regulated environments, the AI enrichment step must be documented in the change record. Regulators do not care that you used an LLM; they care that you can demonstrate the requirement was reviewed by a human with accountability.
  • DORA Link: This pattern directly improves Lead Time for Changes by reducing ambiguity-driven rework. It also improves Change Failure Rate by surfacing edge cases before implementation.

Example

At a Tier-1 bank, a product owner writes: "As a customer, I want to transfer funds internationally so that I can pay overseas suppliers." The AI enrichment layer flags:

  • Missing: currency conversion timing (spot rate vs. forward rate?)
  • Missing: sanctions screening integration (which lists? OFAC, EU, UN?)
  • Missing: fee disclosure requirements ( regulator mandates explicit fee display)
  • Historical pattern: 73% of international transfer defects involve rounding errors in intermediate currency conversions
  • Suggested acceptance criterion: "Given a GBP→AUD transfer, when the intermediary currency is USD, then the final AUD amount must match the manual calculation performed by the bank's treasury spreadsheet within 0.01%"

The product owner accepts three of four suggestions, rejects the fourth (it conflicts with an existing regulatory interpretation), and the enriched specification moves to the sprint backlog with executable acceptance criteria.


Pattern 2: AI-Assisted Design & Architecture

The Problem

Architecture decisions are made by humans with incomplete information. They cannot hold the entire system topology, dependency graph, and change history in working memory. Consequently, designs often repeat past mistakes, miss cross-service impacts, or optimise for local simplicity at the cost of global complexity.

The Pattern

Architecture decision record + system knowledge graph → AI-assisted impact analysis → human-validated design

  1. System knowledge graph. Maintain a machine-readable representation of the system: services, APIs, data stores, dependencies, ownership, SLAs, and known failure modes. This is not documentation; it is infrastructure. It lives in version control and is validated by CI.

  2. Architecture decision record (ADR). Every significant design decision is captured in a lightweight ADR template: context, decision, consequences, and compliance implications.

  3. AI-assisted impact analysis. Before committing to a design, an LLM queries the knowledge graph and:

    • Identifies all services impacted by the proposed change
    • Surfaces historical ADRs that faced similar trade-offs
    • Detects SLA conflicts (e.g., "This new synchronous call chain adds 400ms to a path with a 300ms SLA")
    • Suggests design alternatives with quantitative comparison (latency, cost, blast radius)
    • Flags compliance hotspots (e.g., "This data flow crosses a jurisdictional boundary; check privacy requirements")
  4. Human-validated design. The architect or tech lead reviews the AI analysis, selects a design, and documents the rationale. The ADR is linked to the knowledge graph, closing the feedback loop.

Implementation Notes

  • Knowledge graph representation: C4 models, Backstage catalog YAML, or a simple structured Markdown file per service. The format matters less than the discipline of keeping it current.
  • Not a replacement for architecture review: AI-assisted analysis accelerates preparation for architecture review boards. It does not eliminate them. In regulated environments, the review board remains the approval gate.
  • DORA Link: Improves Lead Time for Changes by reducing design iteration cycles. Improves Change Failure Rate by surfacing blast radius before commit.

Example

A team proposes adding a real-time fraud scoring service to the payment flow. The AI queries the knowledge graph and reports:

  • "The synchronous integration point is the payment orchestrator, which has a 250ms P99 latency SLA. The fraud service's P99 is 180ms. You have 70ms of headroom, but only if the fraud service is already warm. Cold start adds 400ms."
  • "Historical ADR #47 (2023) attempted a similar synchronous integration and was rolled back after breaching the SLA during a marketing campaign. The rollback triggered a 45-minute outage."
  • "Alternative: async scoring with synchronous fallback to cached risk rating. Adds complexity but preserves SLA."

The architect selects the async-with-fallback pattern, documents the rationale, and the team proceeds with a design that has been informed by historical failure, not just optimistic estimation.


Pattern 3: AI-Powered Coding & Review

The Problem

Code review is a bottleneck. In large organisations, review latency averages 24–48 hours. Junior developers wait for senior attention. Senior developers context-switch between domains they do not own. And even attentive reviewers miss issues: security vulnerabilities, performance anti-patterns, and compliance violations that are obvious in retrospect.

The Pattern

AI-first code generation → AI-assisted review → human sign-off → continuous learning loop

  1. AI-first generation. Developers use AI coding assistants (Copilot, Cursor, Codex, or self-hosted models) to generate boilerplate, refactor code, and explore implementation approaches. The assistant is configured with the team's style guide, architectural constraints, and security rules.

  2. Pre-commit AI review. Before a human reviewer sees the code, an automated pipeline runs:

    • Static analysis (SonarQube, Semgrep, CodeQL)
    • AI-powered review (custom rules: "Does this change respect the ADR?" "Does this query have an index?" "Is this input validated before use?")
    • Security scan (SAST, dependency vulnerability check, secrets detection)
    • Compliance check (e.g., "Does this change touch a regulated data flow? If so, is the privacy impact documented?")
  3. Human review with AI context. The human reviewer sees the code alongside the AI analysis. They focus on:

    • Architectural fit (does this align with the ADR?)
    • Business logic correctness (does this actually solve the problem?)
    • Edge cases the AI missed
    • Whether the AI-generated tests are meaningful, not just present
  4. Continuous learning. Approved PRs and their AI analyses are fed back into the model context. Rejected AI suggestions are tagged with human rationale. Over time, the AI review becomes more accurate for this specific codebase.

Implementation Notes

  • Custom rules over generic suggestions: Generic AI review tools suggest lint-level fixes. Effective teams build custom rule sets that encode their architectural constraints, security posture, and compliance requirements.
  • The "explain your AI" requirement: In regulated environments, AI-generated code must be accompanied by a human explanation. Not because the AI is untrusted, but because accountability requires a human who can explain why the code works.
  • DORA Link: Dramatically improves Lead Time for Changes by reducing review latency. Improves Change Failure Rate by catching security and compliance issues before merge.

Example

A developer asks the AI assistant: "Generate a Python function that accepts a customer ID and returns their last 10 transactions." The AI generates code with a raw SQL query embedded in the function. The pre-commit AI review flags:

  • "Line 12: Raw SQL with string interpolation. Use parameterized queries per security guideline SEC-003."
  • "Line 8: No input validation on customer_id. Accepts negative integers and strings, which will cause a database error."
  • "Line 15: No pagination. If a customer has 10,000 transactions, this returns all of them."
  • "Compliance note: This query touches the transactions table, which contains PII. Ensure the privacy impact assessment is linked in the PR description."

The developer fixes all four issues, the human reviewer focuses on whether the business logic correctly handles joint account transactions (an edge case the AI did not know about), and the PR ships in 2 hours instead of 2 days.


Pattern 4: AI-Augmented Testing & Quality

The Problem

Test coverage is a vanity metric. Teams chase 80% line coverage while their most critical paths — the ones that process payments, move customer data, or enforce compliance — remain undertested. Exploratory testing is valuable but unscalable. Regression testing takes hours, so teams run it overnight and discover failures the next morning.

The Pattern

Risk-based test prioritisation → AI-generated test cases → AI-driven failure prediction → continuous quality telemetry

  1. Risk-based prioritisation. The system knowledge graph identifies high-risk paths: high transaction volume, recent changes, known failure modes, and regulated data flows. These paths get the most test attention, not the paths that are easiest to test.

  2. AI-generated test cases. Based on the enriched specification (Pattern 1) and the code changes, the AI generates:

    • Unit tests for edge cases humans typically miss (null inputs, boundary values, concurrency)
    • Integration tests for cross-service interactions
    • Contract tests for API consumers
    • Chaos tests that simulate dependency failures
  3. AI-driven failure prediction. Before running the full regression suite, a model predicts which tests are most likely to fail based on:

    • Which files changed
    • Historical correlation between file changes and test failures
    • Complexity metrics (cyclomatic complexity, dependency depth) The highest-risk tests run first. If they pass, confidence is high. If they fail, the team knows immediately.
  4. Continuous quality telemetry. Test results, coverage data, and defect reports are fed into a quality dashboard that the AI monitors. Trends are detected before they become crises: "Test flakiness in the payments module has increased 40% over the last 3 sprints. Root cause: a race condition in the new async fraud scoring integration."

Implementation Notes

  • Generated tests are not free: They must be reviewed for meaningfulness. A test that asserts assertTrue(true) achieves coverage but provides no value. Teams need "meaningful coverage" metrics, not just line coverage.
  • Failure prediction is probabilistic: It reduces test runtime but does not eliminate the need for full regression before production releases. Use it for developer feedback loops, not as a gate.
  • DORA Link: Improves Lead Time for Changes by reducing test runtime. Improves Change Failure Rate by ensuring high-risk paths are thoroughly tested.

Example

A team changes the currency rounding logic in the payment orchestrator. The AI test generator creates:

  • Unit tests for rounding at 0.005 boundaries (the classic banker's rounding problem)
  • Integration tests for multi-currency transactions (GBP→USD→AUD)
  • Chaos tests for the scenario where the exchange rate service returns stale data

The failure prediction model flags these tests as high-risk because:

  • The payments module has the highest defect density in the codebase
  • 67% of past rounding changes introduced regressions
  • The exchange rate service had a flaky test in the last 2 sprints

The tests run first. Two fail: one reveals a rounding discrepancy at 0.005, and one reveals that the chaos test's stale-data fallback does not log the incident (a compliance gap). Both are fixed before merge.


Pattern 5: AI-Enabled Deployment & Operations

The Problem

Deployment is still the most dangerous moment in software delivery. Even with CI/CD pipelines, teams deploy with incomplete confidence. They do not know whether the new version will handle production load, whether it will interact correctly with downstream systems, or whether it will fail in ways that the pre-production environment cannot simulate.

The Pattern

Progressive delivery with AI-assisted rollback → AI-driven anomaly detection → automated incident response

  1. Progressive delivery. Deploy to 1% of traffic, then 5%, then 25%, then 100%. At each stage, the AI monitors:

    • Error rate (baseline vs. canary)
    • Latency distribution (P50, P95, P99)
    • Business metrics (conversion rate, transaction completion rate)
    • Resource utilisation (CPU, memory, connection pool saturation)
  2. AI-assisted rollback. If any metric deviates beyond a learned threshold, the AI recommends rollback with a confidence score and a human-readable explanation: "Error rate in the payments service increased 300% in the canary region. Root cause: the new version assumes a database index that does not exist in production (it was added in staging but not promoted). Confidence of rollback correctness: 94%."

  3. AI-driven anomaly detection. In production, the AI continuously models normal behaviour for every service. It detects anomalies that rule-based alerting misses:

    • Latency drift that is not yet an outage but indicates capacity exhaustion
    • Error rate patterns that correlate with specific customer segments
    • Resource utilisation trends that predict failure 30 minutes before it happens
  4. Automated incident response. For known failure modes, the AI triggers runbooks automatically:

    • Scale up the affected service
    • Route traffic around a degraded dependency
    • Page the on-call engineer with a pre-populated incident summary

Implementation Notes

  • Rollback is a feature, not a failure: Teams that treat rollback as routine deploy more confidently. Teams that treat rollback as a last resort deploy cautiously and slowly. The AI-assisted rollback pattern normalises the former.
  • Anomaly detection requires baselines: The AI needs 2–4 weeks of production telemetry to learn normal behaviour. During this period, it operates in "observe and report" mode, not "act" mode.
  • DORA Link: Improves Deployment Frequency by making deployments safer. Improves Mean Time to Recovery (MTTR) by automating detection and response.

Example

A team deploys a new version of the customer onboarding service. The canary deployment at 5% traffic shows:

  • Error rate: 0.2% (baseline: 0.1%)
  • P99 latency: 1.2s (baseline: 800ms)
  • Conversion rate: 3.1% (baseline: 3.8%)

The AI flags the conversion rate drop as the primary concern. It queries the knowledge graph and finds: "The new version adds a third-party identity verification step. Historical data: this step has a 15% drop-off rate." The AI recommends rollback with the explanation: "The identity verification step is causing customer abandonment. Rollback preserves conversion rate while the product team redesigns the flow."

The team rolls back in 3 minutes. The incident is logged. The product team receives the AI-generated analysis and prioritises a redesign of the verification flow.


Pattern 6: AI-Driven Monitoring & Feedback Loops

The Problem

Monitoring generates data, not insight. Dashboards show metrics, but they do not tell teams what to do next. Post-incident reviews produce action items, but they rarely feed back into the prioritisation process. The learning loop is broken.

The Pattern

AI-synthesised operational insights → automatic ticket generation → sprint prioritisation integration

  1. AI-synthesised insights. The AI monitors production telemetry, support tickets, and customer feedback channels. It synthesises weekly operational reports that include:

    • Top 3 reliability risks ranked by business impact
    • Customer pain points correlated with system behaviour
    • Technical debt hotspots that are starting to slow delivery
    • Emerging failure modes that do not yet have alerts
  2. Automatic ticket generation. High-priority insights are automatically converted into tickets in the backlog:

    • "Add circuit breaker to the exchange rate service (reliability risk #1)"
    • "Refactor the payment orchestrator's retry logic (33% of timeout errors trace to this code)"
    • "Update the onboarding flow to handle the identity verification drop-off (conversion impact: $2.1M annually)"
  3. Sprint prioritisation integration. The AI-generated tickets are presented to the product owner and engineering lead at sprint planning. They are not automatically prioritised; they are automatically surfaced. The human team decides what to work on, informed by data rather than anecdote.

Implementation Notes

  • The "so what?" filter: Raw AI insights are overwhelming. Effective teams configure the AI to answer "So what?" for every insight. Not "Latency increased 12%" but "Latency increased 12%, which correlates with a 3% drop in conversion rate, which costs $X per month."
  • Feedback to the model: When a team acts on an AI-generated ticket, the outcome is fed back. When they ignore one, the rationale is captured. This closes the learning loop.
  • DORA Link: Improves MTTR by ensuring operational learning is actioned, not just documented. Improves Lead Time for Changes by prioritising technical debt that slows delivery.

Example

At the end of a sprint, the AI generates the operational report:

  • "Reliability risk #1: The fraud scoring service has had 12 latency spikes >500ms in the last 14 days. Each spike correlates with a 0.5% drop in payment completion rate. Recommended action: Add a circuit breaker with a 300ms timeout and fallback to cached risk rating."
  • "Customer pain point: 23% of support tickets mention 'slow login.' Root cause: the new MFA flow adds 4 seconds to authentication. Recommended action: Async MFA with session token caching."
  • "Technical debt hotspot: The legacy account service has the highest change failure rate (18%) and the longest lead time (14 days). Recommended action: Incremental strangler fig migration, starting with the read path."

The product owner and engineering lead review the recommendations. They prioritise the circuit breaker (high impact, low effort) and the MFA caching (medium impact, medium effort). They defer the account service migration to the next quarter because it requires regulatory approval.


Implementation Maturity Model

Teams do not adopt all patterns at once. The maturity model provides a progression path:

LevelFocusPatternsTeam SizeTimeframe
Ad hocIndividual productivityPattern 3 (coding assistant)1–2 developersImmediate
TeamReview and testingPatterns 3 + 4 (AI review + test generation)3–8 developers1–2 sprints
SquadSpecification and designPatterns 1 + 2 + 3 + 4Full squad1–2 quarters
ProgramDeployment and operationsPatterns 1–5Multiple squads2–4 quarters
EnterpriseFeedback loops and governanceAll 6 patterns + custom governanceOrganisation-wide1–2 years

The trap: teams stay at Level 1 forever because it feels productive. The real value is in the cross-phase patterns (1, 2, 5, 6) that connect the SDLC into a continuous learning system.


Governance & Risk Considerations

The Governance Framework

AI-powered SDLC does not eliminate governance. It changes the nature of governance from "checklists at gates" to "continuous assurance embedded in the workflow."

RiskMitigation
AI hallucinations in specificationsHuman review gate. The AI enriches; the human approves.
AI-generated security vulnerabilitiesPre-commit security scan + human security review for high-risk changes.
Compliance gapsAI compliance checks against regulatory rulesets. Human validation for regulated data flows.
Model driftMonthly review of AI suggestion acceptance rates. If the rate drops, retrain or replace the model.
Accountability ambiguityEvery AI-enriched output is tagged with model version, prompt version, and human reviewer.
Over-reliance on AIMandatory "explain without AI" sessions for junior developers. If they cannot explain the code without the AI, they do not understand it.

The "Explain Your AI" Principle

In regulated environments, the following principle applies: Every AI-generated output that affects production must be explainable by a human who is accountable for it.

This does not mean the human must have written the code. It means the human must be able to explain:

  • What the code does
  • Why it is correct
  • What could go wrong
  • How they would fix it if it failed

If a developer cannot answer these four questions about AI-generated code, the code does not ship.


Measuring Impact — DORA Metrics Alignment

The AI-powered SDLC patterns are not theoretical. They are measured by the same DORA metrics that engineering leaders already track:

DORA MetricPatterns That Impact ItExpected Improvement
Deployment FrequencyPatterns 3, 4, 5 (faster coding, testing, safer deployment)2–5x increase
Lead Time for ChangesPatterns 1, 2, 3, 4 (less rework, faster review, faster testing)30–60% reduction
Change Failure RatePatterns 1, 2, 4, 5 (better specs, better design, better testing, safer deployment)50–80% reduction
Mean Time to RecoveryPatterns 5, 6 (automated rollback, anomaly detection, operational feedback)60–90% reduction

The measurement discipline: track DORA metrics before and after AI pattern adoption. Do not claim improvement without baseline data. Regulators and executives respect metrics; they distrust anecdotes.


Conclusion

AI-powered SDLC is not about replacing developers with machines. It is about giving developers machines that handle the routine so they can focus on the exceptional. The patterns in this document have been tested in environments where failure costs millions and regulatory scrutiny is constant. They work because they treat AI as an engineering capability, not a magic wand.

The organisations that will lead the next decade of software delivery are not the ones with the biggest AI budgets. They are the ones with the discipline to embed AI into every phase of the SDLC, with governance guardrails, with measurable outcomes, and with humans who remain accountable for every line of code that reaches production.

Tools come and go. Practices endure.


GitHub Actions Implementation

Each pattern is implemented as a standalone GitHub Actions workflow. They can be adopted individually or chained into a continuous pipeline. The workflows are designed to work with self-hosted Ollama (local or remote) for AI inference, but gracefully degrade if the AI service is unavailable.

Workflow Files

PatternWorkflow FileTrigger
1. Requirements.github/workflows/ai-sdlc-requirements.ymlIssue / PR opened or edited
2. Design.github/workflows/ai-sdlc-design.ymlPR touching ADRs, architecture, or contracts
3. Code Review.github/workflows/ai-sdlc-code-review.ymlEvery PR push
4. Testing.github/workflows/ai-sdlc-testing.ymlEvery PR push
5. Deployment.github/workflows/ai-sdlc-deploy.ymlPush to main
6. Monitoring.github/workflows/ai-sdlc-monitor.ymlWeekly cron + manual dispatch

Architecture: How the Workflows Integrate

The diagram below shows how the six workflows connect across the CI/CD pipeline, with feedback loops from production back to the backlog.

Design-to-Production Flow

Workflow Highlights

1. Requirements Enrichment (ai-sdlc-requirements.yml)

  • Trigger: Issue or PR opened/edited
  • Action: Extracts description, sends to Ollama for enrichment
  • Output: Comment with ambiguity flags, missing edge cases, compliance notes, acceptance criteria
  • Fallback: Gracefully skips if Ollama is unreachable

2. Design Impact Analysis (ai-sdlc-design.yml)

  • Trigger: PR touching ADRs, architecture docs, or service contracts
  • Action: Builds context from changed files + related ADRs, queries knowledge graph
  • Output: Comment with blast radius, SLA conflicts, historical ADR relevance, design alternatives

3. Code Review (ai-sdlc-code-review.yml)

  • Trigger: Every PR push
  • Action: Generates diff, sends to Ollama for security, architecture, performance, compliance review
  • Output: Pre-commit AI review comment with severity-ranked checklist
  • Human role: Focus on business logic correctness and edge cases the AI missed

4. Test Generation (ai-sdlc-testing.yml)

  • Trigger: Every PR push
  • Action: Identifies changed source files, generates unit/integration/contract/chaos tests per file
  • Output: Comment with generated test cases + heuristic failure prediction
  • Heuristic: Flags high-risk paths (payment, auth, fraud) for prioritised testing

5. Progressive Deploy (ai-sdlc-deploy.yml)

  • Trigger: Push to main
  • Action: Canary at 1%, health check, AI-assisted canary analysis, progressive ramp
  • Output: Automated rollback if AI verdict is ROLLBACK (based on error rate, latency, conversion metrics)
  • Stages: 1% → 5% → 25% → 100%

6. Operational Intelligence (ai-sdlc-monitor.yml)

  • Trigger: Weekly cron (Monday 9 AM) + manual dispatch
  • Action: Gathers operational snapshot, synthesises prioritised action list
  • Output: Auto-creates up to 3 GitHub issues with labels ai-ops and intelligence
  • Feedback loop: Tickets feed back into the backlog for sprint prioritisation

Complete Workflow Definitions

Below are the full, copy-paste-ready workflow definitions. Each is wrapped in a collapsible block to keep the document scannable.

1. Requirements Enrichment — ai-sdlc-requirements.yml
# Pattern 1: AI-Driven Requirements & Specification
# Triggers when an issue or PR is opened/updated to enrich specifications
# with AI-generated acceptance criteria, risk flags, and edge cases.
name: "AI-SDLC :: Requirements Enrichment"

on:
issues:
types: [opened, edited]
pull_request:
types: [opened, edited]
paths:
- "docs/adr/**"
- "docs/specs/**"
- "*.md"

permissions:
contents: read
issues: write
pull-requests: write

env:
OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
enrich-specification:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Extract issue/PR description
id: extract
run: |
if [ "${{ github.event_name }}" = "issues" ]; then
echo "body<<EOF" >> $GITHUB_OUTPUT
echo '${{ github.event.issue.body }}' >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
echo "number=${{ github.event.issue.number }}" >> $GITHUB_OUTPUT
else
echo "body<<EOF" >> $GITHUB_OUTPUT
echo '${{ github.event.pull_request.body }}' >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
echo "number=${{ github.event.pull_request.number }}" >> $GITHUB_OUTPUT
fi

- name: AI enrichment via Ollama (local or remote)
id: ai
run: |
PROMPT=$(cat <<PROMPT
You are an AI specification enrichment engine. Given the following requirement,
detect ambiguity, missing edge cases, compliance gaps, and generate executable
acceptance criteria. Respond in structured markdown with sections:
- Ambiguity Flags
- Missing Edge Cases
- Compliance Notes
- Suggested Acceptance Criteria
- Risk Heat Map (low/medium/high)

Requirement:
${{ steps.extract.outputs.body }}
PROMPT
)

# Call Ollama (self-hosted runner) or fallback to echo
RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
-d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(echo "$PROMPT" | jq -Rs .),\"stream\":false}" \
--max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Local enrichment skipped.")

echo "response<<EOF" >> $GITHUB_OUTPUT
echo "$RESPONSE" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT

- name: Post enrichment as comment
uses: actions/github-script@v7
with:
script: |
const body = `## 🤖 AI Specification Enrichment\n\n${{ steps.ai.outputs.response }}\n\n---\n*Generated by \`ai-sdlc-requirements.yml\` • Model: gemini-3-flash-preview:cloud*`;
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: ${{ steps.extract.outputs.number }},
body
});
2. Design Impact Analysis — ai-sdlc-design.yml
# Pattern 2: AI-Assisted Design & Architecture
# Triggers on PRs that modify ADRs, architecture docs, or service contracts.
# Uses the system knowledge graph to surface blast radius, SLA conflicts,
# and historical ADRs that faced similar trade-offs.
name: "AI-SDLC :: Design Impact Analysis"

on:
pull_request:
types: [opened, synchronize]
paths:
- "docs/adr/**"
- "docs/architecture/**"
- "src/**/*.proto"
- "openapi/**"
- "terraform/**"

permissions:
contents: read
pull-requests: write

env:
OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
design-impact-analysis:
runs-on: ubuntu-latest
steps:
- name: Checkout (full history for ADR search)
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Build knowledge graph context
id: context
run: |
echo "📋 Changed files:"
git diff --name-only origin/${{ github.base_ref }}...HEAD | tee changed_files.txt

echo "📚 Related ADRs (by keyword overlap):"
for file in $(cat changed_files.txt); do
basename=$(basename "$file" .md)
grep -ril "$basename" docs/adr/ 2>/dev/null || true
done | sort -u | tee related_adrs.txt

echo "📊 Service dependencies (from C4 / catalog):"
if [ -f "catalog-info.yaml" ]; then
yq eval '.spec.dependsOn[]' catalog-info.yaml 2>/dev/null || echo "No catalog dependencies found"
fi

{
echo "files<<EOF"
cat changed_files.txt
echo "EOF"
echo "adrs<<EOF"
cat related_adrs.txt
echo "EOF"
} >> $GITHUB_OUTPUT

- name: AI impact analysis
id: ai
run: |
cat > prompt.txt <<PROMPT
You are an architecture impact analysis engine. Given the changed files and
related ADRs below, produce a structured impact report with:
- Blast Radius (services / APIs / data stores impacted)
- SLA Conflict Detection (latency, throughput, availability)
- Historical ADR Relevance (past decisions with similar trade-offs)
- Compliance Hotspots (cross-jurisdiction data flow, PII exposure)
- Design Alternatives (quantitative comparison if possible)
- Recommended Rollback Criteria

Changed Files:
$(cat ${{ steps.context.outputs.files }})

Related ADRs:
$(cat ${{ steps.context.outputs.adrs }})
PROMPT

RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
-d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
--max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Design analysis skipped.")

echo "response<<EOF" >> $GITHUB_OUTPUT
echo "$RESPONSE" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT

- name: Comment PR with design analysis
uses: actions/github-script@v7
with:
script: |
const body = `## 🏗️ AI Design Impact Analysis\n\n${{ steps.ai.outputs.response }}\n\n---\n*Generated by \`ai-sdlc-design.yml\` • Model: gemini-3-flash-preview:cloud*`;
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body
});
3. Code Review — ai-sdlc-code-review.yml
# Pattern 3: AI-Powered Coding & Review
# Triggers on every PR push. Runs AI-assisted review BEFORE human review.
# Focuses on architectural fit, security, compliance, and test coverage.
name: "AI-SDLC :: Code Review"

on:
pull_request:
types: [opened, synchronize]

permissions:
contents: read
pull-requests: write

env:
OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
ai-code-review:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Gather PR context
id: context
run: |
git diff origin/${{ github.base_ref }}...HEAD > diff.patch
echo "lines=$(wc -l < diff.patch)" >> $GITHUB_OUTPUT

- name: AI pre-commit review
id: ai
run: |
cat > prompt.txt <<PROMPT
You are a senior staff engineer performing pre-commit code review.
Review the diff below for:
1. Security (SQL injection, unsafe deserialization, missing auth, secrets leakage)
2. Architecture (does this respect the ADR? service boundaries? synchronous chains?)
3. Performance (N+1 queries, unbounded loops, memory pressure)
4. Compliance (PII handling, audit trails, regulatory data flow)
5. Testing (meaningful coverage, not just line coverage)
6. Observability (logging, metrics, tracing hooks)

Respond with a markdown checklist. For each issue, include:
- Severity: 🔴 Critical / 🟠 High / 🟡 Medium / 🟢 Low
- File and line reference
- Explanation
- Suggested fix

If no issues found, state "✅ Clean — no significant issues detected."

Diff:
$(cat diff.patch)
PROMPT

RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
-d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
--max-time 120 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. AI review skipped.")

echo "response<<EOF" >> $GITHUB_OUTPUT
echo "$RESPONSE" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT

- name: Post AI review comment
uses: actions/github-script@v7
with:
script: |
const body = `## 📝 AI Pre-Commit Review\n\n${{ steps.ai.outputs.response }}\n\n---\n*Generated by \`ai-sdlc-code-review.yml\` • Model: gemini-3-flash-preview:cloud*`;
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body
});

- name: Static analysis (non-AI baseline)
uses: github/super-linter@v6
if: false # Enable when configured
env:
DEFAULT_BRANCH: main
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
4. Test Generation — ai-sdlc-testing.yml
# Pattern 4: AI-Augmented Testing & Quality
# Triggers on PRs. Generates risk-based tests, predicts failures,
# and surfaces quality telemetry before merge.
name: "AI-SDLC :: Test Generation & Quality"

on:
pull_request:
types: [opened, synchronize]

permissions:
contents: read
pull-requests: write

env:
OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
ai-test-generation:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Identify changed source files
id: files
run: |
git diff --name-only origin/${{ github.base_ref }}...HEAD | grep -E '\.(js|ts|py|java|go|rs)$' | tee changed_src.txt || true
echo "count=$(wc -l < changed_src.txt | tr -d ' ')" >> $GITHUB_OUTPUT

- name: AI test case generation
if: steps.files.outputs.count != '0'
id: ai
run: |
for file in $(cat changed_src.txt); do
echo "🔬 Generating tests for: $file"
cat > prompt.txt <<PROMPT
Generate test cases for the following source file.
Include:
- Unit tests for edge cases (null, empty, boundary, concurrency)
- Integration tests for cross-service interactions
- Contract tests if the file defines an API
- Chaos / failure mode tests

File: $file
Content:
$(cat "$file" 2>/dev/null || echo "File not found in checkout")
PROMPT

RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
-d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
--max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Test generation skipped.")

echo "### $file" >> test_report.md
echo "\`\`\`" >> test_report.md
echo "$RESPONSE" >> test_report.md
echo "\`\`\`" >> test_report.md
echo "" >> test_report.md
done

{
echo "report<<EOF"
cat test_report.md
echo "EOF"
} >> $GITHUB_OUTPUT

- name: Post test generation report
if: steps.files.outputs.count != '0'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = fs.readFileSync('test_report.md', 'utf8');
const body = `## 🧪 AI-Generated Test Cases\n\n${report}\n\n---\n*Generated by \`ai-sdlc-testing.yml\` • Model: gemini-3-flash-preview:cloud*`;
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body
});

- name: Failure prediction (heuristic)
id: predict
run: |
echo "📊 Failure prediction heuristic:"
echo "- Files changed: ${{ steps.files.outputs.count }}"
echo "- High-risk paths: payment*, auth*, fraud*"
git diff --name-only origin/${{ github.base_ref }}...HEAD | grep -iE '(payment|auth|fraud|transfer)' && echo "🔴 HIGH RISK: Financial flow touched" || echo "🟢 Standard risk"
5. Progressive Deploy — ai-sdlc-deploy.yml
# Pattern 5: AI-Enabled Deployment & Operations
# Progressive delivery with AI-assisted rollback based on canary metrics.
name: "AI-SDLC :: Progressive Deploy"

on:
push:
branches: [main]

permissions:
contents: read
deployments: write

env:
OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Build artifact
run: echo "Building artifact..." # Replace with real build

deploy-canary:
needs: build
runs-on: ubuntu-latest
environment:
name: canary
url: https://canary.dwaynehelena.com
steps:
- name: Deploy to canary (1%)
run: echo "🚀 Deployed to canary at 1% traffic"

- name: Canary health check
run: |
sleep 30
echo "✅ Canary healthy — error rate 0.1%, latency P99 280ms"

- name: AI-assisted canary analysis
id: ai
run: |
cat > prompt.txt <<PROMPT
You are a deployment safety engine. Given the following canary metrics,
recommend: PROCEED, HOLD, or ROLLBACK with justification.

Canary Metrics:
- Error rate: 0.1% (baseline: 0.1%)
- P50 latency: 45ms (baseline: 42ms)
- P99 latency: 280ms (baseline: 220ms) — 27% increase
- CPU utilisation: 65% (baseline: 58%)
- Memory utilisation: 72% (baseline: 65%)
- Conversion rate: 3.7% (baseline: 3.8%)

Respond with a single verdict and explanation.
PROMPT

RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
-d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
--max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Manual review required.")

echo "$RESPONSE"
echo "verdict=$(echo "$RESPONSE" | grep -oE 'PROCEED|HOLD|ROLLBACK' | head -1)" >> $GITHUB_OUTPUT

- name: Rollback on AI recommendation
if: contains(steps.ai.outputs.verdict, 'ROLLBACK')
run: |
echo "🛑 AI recommended ROLLBACK — executing..."
# Replace with actual rollback command
exit 1

deploy-production:
needs: deploy-canary
runs-on: ubuntu-latest
environment:
name: production
url: https://dwaynehelena.com
steps:
- name: Progressive ramp (5% → 25% → 100%)
run: |
echo "📈 Ramping to 5%..."
sleep 60
echo "📈 Ramping to 25%..."
sleep 120
echo "📈 Ramping to 100%..."
echo "✅ Fully deployed to production"
6. Operational Intelligence — ai-sdlc-monitor.yml
# Pattern 6: AI-Driven Monitoring & Feedback Loops
# Scheduled job that synthesises operational insights and auto-generates
# backlog tickets for reliability risks and customer pain points.
name: "AI-SDLC :: Operational Intelligence"

on:
schedule:
- cron: '0 9 * * 1' # Weekly Monday 9 AM
workflow_dispatch:

permissions:
contents: read
issues: write

env:
OLLAMA_HOST: ${{ secrets.OLLAMA_HOST || 'http://localhost:11434' }}

jobs:
operational-intelligence:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Gather operational data (placeholder)
id: data
run: |
cat > ops_report.txt <<REPORT
Weekly Operational Snapshot (placeholder data):
- Top reliability risk: Fraud scoring service latency spikes (>500ms, 12 incidents)
- Customer pain point: 23% of support tickets mention "slow login"
- Technical debt hotspot: Legacy account service — 18% change failure rate
- Emerging failure mode: Exchange rate API timeout cascade (no alert yet)
REPORT
echo "report<<EOF" >> $GITHUB_OUTPUT
cat ops_report.txt >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT

- name: AI synthesis of insights
id: ai
run: |
cat > prompt.txt <<PROMPT
You are an operational intelligence engine. Given the weekly snapshot below,
synthesise a prioritised action list with business impact justification.
Format each item as:
- [ ] ACTION: description | IMPACT: $X / month | EFFORT: S/M/L | OWNER: team

Snapshot:
$(cat ops_report.txt)
PROMPT

RESPONSE=$(curl -s -X POST "${OLLAMA_HOST}/api/generate" \
-d "{\"model\":\"gemini-3-flash-preview:cloud\",\"prompt\":$(cat prompt.txt | jq -Rs .),\"stream\":false}" \
--max-time 60 2>/dev/null | jq -r '.response' || echo "⚠️ Ollama unreachable. Manual analysis required.")

echo "insights<<EOF" >> $GITHUB_OUTPUT
echo "$RESPONSE" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT

- name: Create backlog issues from AI insights
uses: actions/github-script@v7
with:
script: |
const insights = `${{ steps.ai.outputs.insights }}`;
const lines = insights.split('\n').filter(l => l.trim().startsWith('- [ ]'));
for (const line of lines.slice(0, 3)) {
const title = line.replace(/^- \[ \] /, '').split('|')[0].trim();
const body = `## Operational Intelligence Insight\n\n${line}\n\n---\n*Generated by \`ai-sdlc-monitor.yml\` • Model: gemini-3-flash-preview:cloud*`;
await github.rest.issues.create({
owner: context.repo.owner,
repo: context.repo.repo,
title: `[AI-Ops] ${title}`,
body,
labels: ['ai-ops', 'intelligence']
});
}

Environment Configuration

All workflows use the OLLAMA_HOST environment variable. Configure this as a GitHub secret if your Ollama instance is remote:

SecretDescription
OLLAMA_HOSTURL to your Ollama instance (default: http://localhost:11434)

For self-hosted runners with Ollama running locally, no secret is required — the default works out of the box.

Adoption Path

Maturity LevelWorkflows to EnableTimeframe
Ad hocai-sdlc-code-review.yml onlyImmediate
TeamCode review + testing1–2 sprints
SquadRequirements + design + code + testing1–2 quarters
ProgramAll 5 PR/Merge workflows2–4 quarters
EnterpriseAll 6 workflows + custom governance rules1–2 years

Start with the code review workflow. It provides immediate value with zero disruption to existing processes. Add requirements and testing next. Deployment and monitoring require more infrastructure (canary environment, telemetry pipeline) and should follow once the earlier patterns are stable.

Customisation

Each workflow is designed to be forked and customised:

  • Model swap: Replace gemini-3-flash-preview:cloud with your preferred Ollama model in the curl payload
  • Custom rules: Add team-specific security rules, architectural constraints, or compliance checks to the prompts
  • Integrations: Replace GitHub comments with Slack notifications, Jira tickets, or PagerDuty alerts
  • Timeouts: Adjust --max-time based on your model's inference speed
  • Fallbacks: The || echo "⚠️ ..." pattern ensures CI never fails because the AI is down

The goal is not to prescribe a rigid toolchain. It is to provide a reference implementation that teams can adapt to their context, their model provider, their compliance requirements, and their delivery cadence.