SpecFlow Methodology
SpecFlow applies proven frameworks to each pillar. Think of these as checklists that experts have refined over decades. Instead of reinventing the wheel for every feature, you stand on the shoulders of security researchers, cost analysts, and testing practitioners.
Overview
Each pillar uses specific frameworks:
| Pillar | Framework | Purpose |
|---|---|---|
| Security | STRIDE | Threat modeling |
| Requirements | BOSS | Acceptance criteria quality |
| Testing | TEA + Gherkin | Test specification |
| Cost | FinOps | Cloud cost analysis |
STRIDE Threat Modeling
STRIDE is a mnemonic for six threat categories. Security analyst Jordan applies STRIDE to every architecture review:
| Letter | Threat | Question |
|---|---|---|
| S | Spoofing | Can attackers impersonate legitimate users? |
| T | Tampering | Can data be modified without detection? |
| R | Repudiation | Can users deny their actions? |
| I | Information Disclosure | Can sensitive data leak? |
| D | Denial of Service | Can the system be overwhelmed? |
| E | Elevation of Privilege | Can users gain unauthorized access? |
How STRIDE Works
Jordan analyzes the architecture in seven steps:
- Identify assets requiring protection (user data, credentials, etc.)
- Review architecture diagrams and data flows
- Map trust boundaries (where data crosses security domains)
- Identify attack surfaces (entry points for attackers)
- Analyze controls at each layer
- Assess threats (likelihood and impact)
- Recommend mitigations for each threat
STRIDE Output
For a login feature, Jordan might produce:
## STRIDE Analysis
### Spoofing
**Threat**: Attacker impersonates legitimate user
**Attack Vector**: Credential stuffing, phishing
**Mitigation**: Rate limiting, MFA, secure password hashing
### Information Disclosure
**Threat**: Password or session token exposure
**Attack Vector**: Logging, network sniffing
**Mitigation**: No password logging, HTTPS only, httpOnly cookiesWhen STRIDE Applies
STRIDE analysis triggers for medium+ scope features involving:
- Authentication or authorization
- User data (PII, passwords, tokens)
- Payment processing
- Public API endpoints
- File uploads
- External service credentials
BOSS Criteria
BOSS ensures acceptance criteria are testable. The acronym stands for:
| Letter | Meaning | Good Example | Bad Example |
|---|---|---|---|
| B | Binary | "Returns 401" | "Handles errors well" |
| O | Observable | "Displays toast message" | "User feels satisfied" |
| S | Specific | "Under 200ms" | "Fast response" |
| S | Scope-bound | "Login endpoint" | "User experience" |
Writing BOSS Criteria
Analyst Mary transforms vague requirements into BOSS-compliant acceptance criteria:
Before (vague):
User should be able to log out securely
After (BOSS):
AC-01: When user clicks logout button, session token is invalidated
AC-02: After logout, redirects to login page within 500ms
AC-03: After logout, previous session token returns 401 on API calls
AC-04: Logout button visible on all authenticated pagesEach criterion is:
- Binary: Either the token is invalidated or it is not
- Observable: Can test via API call
- Specific: Exact behavior defined
- Scope-bound: Only covers logout functionality
The BOSS Test
Before finalizing any acceptance criterion, apply the BOSS test:
Can a test script verify this criterion passes or fails?If yes, the criterion is BOSS-compliant. If no, refine until it is.
Test Engineering Analysis (TEA)
TEA plans tests at the right depth for each scope level.
Test Depth by Scope
| Scope | Test Scenarios | Types |
|---|---|---|
| trivial | 1-2 | Happy path only |
| small | 3-5 | Happy + one error case |
| medium | 6-10 | Full Gherkin coverage |
| large | 10-15 | Integration + E2E |
| complex | 15+ | Performance, security |
Gherkin Scenarios
TEA writes tests in Gherkin format, which reads like natural language:
Feature: User Authentication
@AC-01 @happy-path
Scenario: Successful login with valid credentials
Given I am on the login page
And user "test@example.com" exists with password "SecurePass123"
When I enter "test@example.com" as email
And I enter "SecurePass123" as password
And I click the login button
Then I should be redirected to the dashboard
And my session should be active
@AC-02 @error-case
Scenario: Failed login with invalid password
Given I am on the login page
And user "test@example.com" exists with password "SecurePass123"
When I enter "test@example.com" as email
And I enter "WrongPassword" as password
And I click the login button
Then I should see error message "Invalid credentials"
And I should remain on the login pageTraceability Matrix
TEA creates a matrix linking acceptance criteria to tests:
| AC | Unit | Integration | E2E | Coverage |
|---|---|---|---|---|
| AC-01 | UT-01 | IT-01 | E2E-01 | FULL |
| AC-02 | UT-02 | - | E2E-02 | FULL |
| AC-03 | - | IT-02 | - | PARTIAL |
| AC-04 | - | - | - | MISSING |
MISSING coverage triggers PM review before development proceeds.
Flow Recommendation
TEA recommends whether QA should write tests before (qa-first) or after (dev-first) development:
qa-first when:
- Feature is behavior-heavy (E2E scenarios dominate)
- Most AC describe user-facing behavior
- Replacing existing functionality (regression risk)
dev-first when:
- Feature is algorithm-heavy (unit tests dominate)
- Most AC describe internal behavior
- Greenfield with no existing contracts
Cost Analysis Framework
Taylor applies FinOps principles to estimate cloud costs:
Five-Step Process
- Identify resources created or modified by the feature
- Estimate usage patterns (requests per day, storage growth)
- Calculate costs using cloud pricing
- Compare alternatives (different instance types, regions)
- Recommend optimizations (reserved capacity, auto-scaling)
Cost Output by Scope
| Scope | Analysis Depth |
|---|---|
| trivial/small | Skipped |
| medium | Total estimate only |
| large | Breakdown by component |
| complex | Multi-scenario projections |
Sample Cost Analysis
## Cost Estimate
**Monthly Total**: $47/month (production)
### Breakdown
| Component | Estimate | Notes |
|-----------|----------|-------|
| Lambda | $12 | 100K requests @ $0.20/M |
| DynamoDB | $25 | 1GB + 10K read/write units |
| CloudWatch | $10 | Logs + metrics |
### Optimization Opportunities
1. Reserved capacity: -30% on DynamoDB
2. Scheduled scaling: -20% during off-hours
3. Consider: Lambda provisioned vs on-demandProportional Ceremony
SpecFlow applies methodology proportional to risk:
A typo fix should not trigger STRIDE analysis. A payment integration should not skip it.
Scope-Based Framework Depth
| Framework | Trivial | Small | Medium | Large | Complex |
|---|---|---|---|---|---|
| STRIDE | Skip | Skip | Light | Full | Deep |
| BOSS | 1-2 AC | 3-5 AC | 6-10 AC | 10-15 AC | 15+ AC |
| TEA | 1 test | 3-5 tests | 6-10 tests | 10-15 tests | 15+ tests |
| FinOps | Skip | Skip | Estimate | Breakdown | Projections |
Framework Integration
The frameworks reinforce each other:
- BOSS criteria define what to build
- STRIDE identifies what could go wrong
- TEA ensures everything is testable
- Gherkin makes tests readable
- QA verifies everything works
TDD Workflow Enforcement
SpecFlow enforces test-driven development at the workflow level, not just the code level.
QA-First vs Dev-First Routing
Based on TEA's analysis, PM routes work differently:
TEA Analysis
↓
Signals Detected
↓
┌─────────────────────────────────┐
│ 2+ QA-First Signals? │
│ • Behavioral tests dominate │
│ • User-facing ACs │
│ • Medium+ scope │
│ • Regression risk │
└─────────────────────────────────┘
↓ Yes ↓ No
QA-First Dev-First
↓ ↓
QA writes Dev implements
failing tests with TDD
↓ ↓
Dev implements QA validates
to pass testsTest Evidence Requirements
Stories cannot be marked complete without test evidence:
## Test Evidence (Required)
| Field | Value |
|-------|-------|
| test_file | tests/auth/test_logout.py |
| pass_count | 7/7 |
| timestamp | 2024-02-05T10:30:00Z |
| command | pytest tests/auth/test_logout.py |The validate_completion() function enforces:
- Test file exists
- All tests pass (no failures)
- Timestamp within current session
- Coverage meets AC requirements
Automated User Acceptance Testing (UAT)
SpecFlow includes automated UAT triggers for user-facing features.
UAT Signal Detection
The system monitors for UAT signals:
| Signal | Detection | Weight |
|---|---|---|
| UI Components | Component files created | 1 |
| User Workflows | Gherkin with user steps | 1 |
| Behavioral ACs | "User should see..." | 1 |
| Form Handling | Input validation logic | 1 |
| Navigation | Route changes | 1 |
3+ signals triggers automatic UAT routing.
UAT Verification Flow
Feature Complete
↓
Signal Count Check
↓ (3+ signals)
/gsd:verify-work
↓
Conversational UAT
↓
Goal-Backward Verification
↓
VERIFICATION.md Created
↓
Pass? → Ship
Fail? → Gap Closure PlansGoal-Backward Verification
The verifier checks what actually exists in the codebase against phase goals:
- Read phase
must_havesfrom plan - Verify each artifact exists with required content
- Check key_links (integrations work)
- Produce
passed,human_needed, orgaps_foundstatus
Multi-Lens Review Loop
SpecFlow implements dynamic, skill-based code review.
Skill Discovery
Review automatically detects which skills apply:
/sf:review # Discovers skills from code changesThe skill-detector analyzes:
- File types modified
- Import statements
- Pattern matches
- Capability requirements
Parallel Review Execution
Multiple reviewers run simultaneously:
Code Changes
↓
Skill Discovery
↓
┌────────────────────────────────────────┐
│ Parallel Review │
│ │
│ Security ──┐ │
│ │ │
│ Performance ──→ Aggregated Findings │
│ │ │
│ Accessibility ─┘ │
└────────────────────────────────────────┘
↓
PM Synthesis
↓
Drift Detection
↓
Ship or IterateDrift Detection
Review compares implementation against specification:
| Drift Type | Detection | Response |
|---|---|---|
| Missing AC | Feature not implemented | Block + report |
| Modified AC | Behavior differs | Classify (bug/drift) |
| Added behavior | Not in spec | Document + approve/reject |
| Security regression | STRIDE threat introduced | Block + escalate |
Heuristics classify drift automatically:
- File-based: Did spec file change?
- Assertion-based: Do tests match ACs?
- Semantic: Does behavior match intent?
Low-confidence cases escalate to PM or user.
Slop Detection
Prevents common AI-generated anti-patterns.
What is Slop?
"Slop" refers to AI-generated code that:
- Adds unnecessary abstractions
- Over-engineers simple solutions
- Creates premature generalizations
- Duplicates existing library functionality
- Adds excessive error handling for impossible cases
Detection Patterns
| Pattern | Example | Flag |
|---|---|---|
| Unnecessary wrapper | Wrapping built-in with identical API | HIGH |
| Premature abstraction | Interface with single implementation | MEDIUM |
| Feature creep | Adding config for hardcoded values | MEDIUM |
| NIH syndrome | Custom implementation of lodash function | HIGH |
| Over-validation | Checking null after guaranteed initialization | LOW |
Library-First Enforcement
SpecFlow prefers battle-tested libraries:
- Analyst searches for existing libraries during spec
- Dev checks before implementing common patterns
- Custom requires justification in PROGRESS.md
## Library Assessment
| Pattern | Library Option | Decision |
|---------|---------------|----------|
| Date formatting | date-fns | USE LIBRARY |
| Validation | zod | USE LIBRARY |
| Custom business logic | - | IMPLEMENT (no library exists) |Slop Review Integration
Review skills flag slop patterns:
## Review Findings
### SLOP-01: Unnecessary Abstraction
**File**: src/utils/stringHelper.ts
**Pattern**: Wrapper around `String.prototype.trim()`
**Recommendation**: Delete file, use native method
### SLOP-02: Premature Generalization
**File**: src/types/config.ts
**Pattern**: 50-field config interface, 3 fields used
**Recommendation**: Remove unused fieldsSummary
SpecFlow methodology ensures:
- Security: STRIDE catches threats before code is written
- Quality: BOSS criteria are testable and specific
- Verification: TEA creates comprehensive test coverage
- Cost: FinOps estimates resource impact early
- TDD: Test evidence required before completion
- UAT: Automated user acceptance testing for user-facing features
- Review: Multi-lens parallel review with drift detection
- Slop Prevention: Library-first, anti-pattern detection
These frameworks prevent the common failure mode: discovering problems after implementation when fixes are expensive.