SpecFlow Methodology

SpecFlow applies proven frameworks to each pillar. Think of these as checklists that experts have refined over decades. Instead of reinventing the wheel for every feature, you stand on the shoulders of security researchers, cost analysts, and testing practitioners.

Overview

Each pillar uses specific frameworks:

Pillar	Framework	Purpose
Security	STRIDE	Threat modeling
Requirements	BOSS	Acceptance criteria quality
Testing	TEA + Gherkin	Test specification
Cost	FinOps	Cloud cost analysis

STRIDE Threat Modeling

STRIDE is a mnemonic for six threat categories. Security analyst Jordan applies STRIDE to every architecture review:

Letter	Threat	Question
S	Spoofing	Can attackers impersonate legitimate users?
T	Tampering	Can data be modified without detection?
R	Repudiation	Can users deny their actions?
I	Information Disclosure	Can sensitive data leak?
D	Denial of Service	Can the system be overwhelmed?
E	Elevation of Privilege	Can users gain unauthorized access?

How STRIDE Works

Jordan analyzes the architecture in seven steps:

Identify assets requiring protection (user data, credentials, etc.)
Review architecture diagrams and data flows
Map trust boundaries (where data crosses security domains)
Identify attack surfaces (entry points for attackers)
Analyze controls at each layer
Assess threats (likelihood and impact)
Recommend mitigations for each threat

STRIDE Output

For a login feature, Jordan might produce:

## STRIDE Analysis
 
### Spoofing
**Threat**: Attacker impersonates legitimate user
**Attack Vector**: Credential stuffing, phishing
**Mitigation**: Rate limiting, MFA, secure password hashing
 
### Information Disclosure
**Threat**: Password or session token exposure
**Attack Vector**: Logging, network sniffing
**Mitigation**: No password logging, HTTPS only, httpOnly cookies

When STRIDE Applies

STRIDE analysis triggers for medium+ scope features involving:

Authentication or authorization
User data (PII, passwords, tokens)
Payment processing
Public API endpoints
File uploads
External service credentials

BOSS Criteria

BOSS ensures acceptance criteria are testable. The acronym stands for:

Letter	Meaning	Good Example	Bad Example
B	Binary	"Returns 401"	"Handles errors well"
O	Observable	"Displays toast message"	"User feels satisfied"
S	Specific	"Under 200ms"	"Fast response"
S	Scope-bound	"Login endpoint"	"User experience"

Writing BOSS Criteria

Analyst Mary transforms vague requirements into BOSS-compliant acceptance criteria:

Before (vague):

User should be able to log out securely

After (BOSS):

AC-01: When user clicks logout button, session token is invalidated
AC-02: After logout, redirects to login page within 500ms
AC-03: After logout, previous session token returns 401 on API calls
AC-04: Logout button visible on all authenticated pages

Each criterion is:

Binary: Either the token is invalidated or it is not
Observable: Can test via API call
Specific: Exact behavior defined
Scope-bound: Only covers logout functionality

The BOSS Test

Before finalizing any acceptance criterion, apply the BOSS test:

Can a test script verify this criterion passes or fails?

If yes, the criterion is BOSS-compliant. If no, refine until it is.

Test Engineering Analysis (TEA)

TEA plans tests at the right depth for each scope level.

Test Depth by Scope

Scope	Test Scenarios	Types
trivial	1-2	Happy path only
small	3-5	Happy + one error case
medium	6-10	Full Gherkin coverage
large	10-15	Integration + E2E
complex	15+	Performance, security

Gherkin Scenarios

TEA writes tests in Gherkin format, which reads like natural language:

Feature: User Authentication
 
  @AC-01 @happy-path
  Scenario: Successful login with valid credentials
    Given I am on the login page
    And user "test@example.com" exists with password "SecurePass123"
    When I enter "test@example.com" as email
    And I enter "SecurePass123" as password
    And I click the login button
    Then I should be redirected to the dashboard
    And my session should be active
 
  @AC-02 @error-case
  Scenario: Failed login with invalid password
    Given I am on the login page
    And user "test@example.com" exists with password "SecurePass123"
    When I enter "test@example.com" as email
    And I enter "WrongPassword" as password
    And I click the login button
    Then I should see error message "Invalid credentials"
    And I should remain on the login page

Traceability Matrix

TEA creates a matrix linking acceptance criteria to tests:

AC	Unit	Integration	E2E	Coverage
AC-01	UT-01	IT-01	E2E-01	FULL
AC-02	UT-02	-	E2E-02	FULL
AC-03	-	IT-02	-	PARTIAL
AC-04	-	-	-	MISSING

MISSING coverage triggers PM review before development proceeds.

Flow Recommendation

TEA recommends whether QA should write tests before (qa-first) or after (dev-first) development:

qa-first when:

Feature is behavior-heavy (E2E scenarios dominate)
Most AC describe user-facing behavior
Replacing existing functionality (regression risk)

dev-first when:

Feature is algorithm-heavy (unit tests dominate)
Most AC describe internal behavior
Greenfield with no existing contracts

Cost Analysis Framework

Taylor applies FinOps principles to estimate cloud costs:

Five-Step Process

Identify resources created or modified by the feature
Estimate usage patterns (requests per day, storage growth)
Calculate costs using cloud pricing
Compare alternatives (different instance types, regions)
Recommend optimizations (reserved capacity, auto-scaling)

Cost Output by Scope

Scope	Analysis Depth
trivial/small	Skipped
medium	Total estimate only
large	Breakdown by component
complex	Multi-scenario projections

Sample Cost Analysis

## Cost Estimate
 
**Monthly Total**: $47/month (production)
 
### Breakdown
 
| Component | Estimate | Notes |
|-----------|----------|-------|
| Lambda | $12 | 100K requests @ $0.20/M |
| DynamoDB | $25 | 1GB + 10K read/write units |
| CloudWatch | $10 | Logs + metrics |
 
### Optimization Opportunities
 
1. Reserved capacity: -30% on DynamoDB
2. Scheduled scaling: -20% during off-hours
3. Consider: Lambda provisioned vs on-demand

Proportional Ceremony

SpecFlow applies methodology proportional to risk:

A typo fix should not trigger STRIDE analysis. A payment integration should not skip it.

Scope-Based Framework Depth

Framework	Trivial	Small	Medium	Large	Complex
STRIDE	Skip	Skip	Light	Full	Deep
BOSS	1-2 AC	3-5 AC	6-10 AC	10-15 AC	15+ AC
TEA	1 test	3-5 tests	6-10 tests	10-15 tests	15+ tests
FinOps	Skip	Skip	Estimate	Breakdown	Projections

Framework Integration

The frameworks reinforce each other:

BOSS criteria define what to build
STRIDE identifies what could go wrong
TEA ensures everything is testable
Gherkin makes tests readable
QA verifies everything works

TDD Workflow Enforcement

SpecFlow enforces test-driven development at the workflow level, not just the code level.

QA-First vs Dev-First Routing

Based on TEA's analysis, PM routes work differently:

TEA Analysis
    ↓
Signals Detected
    ↓
┌─────────────────────────────────┐
│ 2+ QA-First Signals?            │
│   • Behavioral tests dominate   │
│   • User-facing ACs             │
│   • Medium+ scope               │
│   • Regression risk             │
└─────────────────────────────────┘
    ↓ Yes              ↓ No
QA-First           Dev-First
    ↓                  ↓
QA writes         Dev implements
failing tests     with TDD
    ↓                  ↓
Dev implements    QA validates
to pass tests

Test Evidence Requirements

Stories cannot be marked complete without test evidence:

## Test Evidence (Required)
 
| Field | Value |
|-------|-------|
| test_file | tests/auth/test_logout.py |
| pass_count | 7/7 |
| timestamp | 2024-02-05T10:30:00Z |
| command | pytest tests/auth/test_logout.py |

The validate_completion() function enforces:

Test file exists
All tests pass (no failures)
Timestamp within current session
Coverage meets AC requirements

Automated User Acceptance Testing (UAT)

SpecFlow includes automated UAT triggers for user-facing features.

UAT Signal Detection

The system monitors for UAT signals:

Signal	Detection	Weight
UI Components	Component files created	1
User Workflows	Gherkin with user steps	1
Behavioral ACs	"User should see..."	1
Form Handling	Input validation logic	1
Navigation	Route changes	1

3+ signals triggers automatic UAT routing.

UAT Verification Flow

Feature Complete
    ↓
Signal Count Check
    ↓ (3+ signals)
/gsd:verify-work
    ↓
Conversational UAT
    ↓
Goal-Backward Verification
    ↓
VERIFICATION.md Created
    ↓
Pass? → Ship
Fail? → Gap Closure Plans

Goal-Backward Verification

The verifier checks what actually exists in the codebase against phase goals:

Read phase must_haves from plan
Verify each artifact exists with required content
Check key_links (integrations work)
Produce passed, human_needed, or gaps_found status

Multi-Lens Review Loop

SpecFlow implements dynamic, skill-based code review.

Skill Discovery

Review automatically detects which skills apply:

/sf:review  # Discovers skills from code changes

The skill-detector analyzes:

File types modified
Import statements
Pattern matches
Capability requirements

Parallel Review Execution

Multiple reviewers run simultaneously:

Code Changes
    ↓
Skill Discovery
    ↓
┌────────────────────────────────────────┐
│ Parallel Review                        │
│                                        │
│  Security ──┐                          │
│             │                          │
│  Performance ──→ Aggregated Findings   │
│             │                          │
│  Accessibility ─┘                      │
└────────────────────────────────────────┘
    ↓
PM Synthesis
    ↓
Drift Detection
    ↓
Ship or Iterate

Drift Detection

Review compares implementation against specification:

Drift Type	Detection	Response
Missing AC	Feature not implemented	Block + report
Modified AC	Behavior differs	Classify (bug/drift)
Added behavior	Not in spec	Document + approve/reject
Security regression	STRIDE threat introduced	Block + escalate

Heuristics classify drift automatically:

File-based: Did spec file change?
Assertion-based: Do tests match ACs?
Semantic: Does behavior match intent?

Low-confidence cases escalate to PM or user.

Slop Detection

Prevents common AI-generated anti-patterns.

What is Slop?

"Slop" refers to AI-generated code that:

Adds unnecessary abstractions
Over-engineers simple solutions
Creates premature generalizations
Duplicates existing library functionality
Adds excessive error handling for impossible cases

Detection Patterns

Pattern	Example	Flag
Unnecessary wrapper	Wrapping built-in with identical API	HIGH
Premature abstraction	Interface with single implementation	MEDIUM
Feature creep	Adding config for hardcoded values	MEDIUM
NIH syndrome	Custom implementation of lodash function	HIGH
Over-validation	Checking null after guaranteed initialization	LOW

Library-First Enforcement

SpecFlow prefers battle-tested libraries:

Analyst searches for existing libraries during spec
Dev checks before implementing common patterns
Custom requires justification in PROGRESS.md

## Library Assessment
 
| Pattern | Library Option | Decision |
|---------|---------------|----------|
| Date formatting | date-fns | USE LIBRARY |
| Validation | zod | USE LIBRARY |
| Custom business logic | - | IMPLEMENT (no library exists) |

Slop Review Integration

Review skills flag slop patterns:

## Review Findings
 
### SLOP-01: Unnecessary Abstraction
**File**: src/utils/stringHelper.ts
**Pattern**: Wrapper around `String.prototype.trim()`
**Recommendation**: Delete file, use native method
 
### SLOP-02: Premature Generalization
**File**: src/types/config.ts
**Pattern**: 50-field config interface, 3 fields used
**Recommendation**: Remove unused fields

Summary

SpecFlow methodology ensures:

Security: STRIDE catches threats before code is written
Quality: BOSS criteria are testable and specific
Verification: TEA creates comprehensive test coverage
Cost: FinOps estimates resource impact early
TDD: Test evidence required before completion
UAT: Automated user acceptance testing for user-facing features
Review: Multi-lens parallel review with drift detection
Slop Prevention: Library-first, anti-pattern detection

These frameworks prevent the common failure mode: discovering problems after implementation when fixes are expensive.

Architecture Agents