Methodology

SpecFlow Methodology

SpecFlow applies proven frameworks to each pillar. Think of these as checklists that experts have refined over decades. Instead of reinventing the wheel for every feature, you stand on the shoulders of security researchers, cost analysts, and testing practitioners.

Overview

Each pillar uses specific frameworks:

PillarFrameworkPurpose
SecuritySTRIDEThreat modeling
RequirementsBOSSAcceptance criteria quality
TestingTEA + GherkinTest specification
CostFinOpsCloud cost analysis

STRIDE Threat Modeling

STRIDE is a mnemonic for six threat categories. Security analyst Jordan applies STRIDE to every architecture review:

LetterThreatQuestion
SSpoofingCan attackers impersonate legitimate users?
TTamperingCan data be modified without detection?
RRepudiationCan users deny their actions?
IInformation DisclosureCan sensitive data leak?
DDenial of ServiceCan the system be overwhelmed?
EElevation of PrivilegeCan users gain unauthorized access?

How STRIDE Works

Jordan analyzes the architecture in seven steps:

  1. Identify assets requiring protection (user data, credentials, etc.)
  2. Review architecture diagrams and data flows
  3. Map trust boundaries (where data crosses security domains)
  4. Identify attack surfaces (entry points for attackers)
  5. Analyze controls at each layer
  6. Assess threats (likelihood and impact)
  7. Recommend mitigations for each threat

STRIDE Output

For a login feature, Jordan might produce:

## STRIDE Analysis
 
### Spoofing
**Threat**: Attacker impersonates legitimate user
**Attack Vector**: Credential stuffing, phishing
**Mitigation**: Rate limiting, MFA, secure password hashing
 
### Information Disclosure
**Threat**: Password or session token exposure
**Attack Vector**: Logging, network sniffing
**Mitigation**: No password logging, HTTPS only, httpOnly cookies

When STRIDE Applies

STRIDE analysis triggers for medium+ scope features involving:

  • Authentication or authorization
  • User data (PII, passwords, tokens)
  • Payment processing
  • Public API endpoints
  • File uploads
  • External service credentials

BOSS Criteria

BOSS ensures acceptance criteria are testable. The acronym stands for:

LetterMeaningGood ExampleBad Example
BBinary"Returns 401""Handles errors well"
OObservable"Displays toast message""User feels satisfied"
SSpecific"Under 200ms""Fast response"
SScope-bound"Login endpoint""User experience"

Writing BOSS Criteria

Analyst Mary transforms vague requirements into BOSS-compliant acceptance criteria:

Before (vague):

User should be able to log out securely

After (BOSS):

AC-01: When user clicks logout button, session token is invalidated
AC-02: After logout, redirects to login page within 500ms
AC-03: After logout, previous session token returns 401 on API calls
AC-04: Logout button visible on all authenticated pages

Each criterion is:

  • Binary: Either the token is invalidated or it is not
  • Observable: Can test via API call
  • Specific: Exact behavior defined
  • Scope-bound: Only covers logout functionality

The BOSS Test

Before finalizing any acceptance criterion, apply the BOSS test:

Can a test script verify this criterion passes or fails?

If yes, the criterion is BOSS-compliant. If no, refine until it is.

Test Engineering Analysis (TEA)

TEA plans tests at the right depth for each scope level.

Test Depth by Scope

ScopeTest ScenariosTypes
trivial1-2Happy path only
small3-5Happy + one error case
medium6-10Full Gherkin coverage
large10-15Integration + E2E
complex15+Performance, security

Gherkin Scenarios

TEA writes tests in Gherkin format, which reads like natural language:

Feature: User Authentication
 
  @AC-01 @happy-path
  Scenario: Successful login with valid credentials
    Given I am on the login page
    And user "test@example.com" exists with password "SecurePass123"
    When I enter "test@example.com" as email
    And I enter "SecurePass123" as password
    And I click the login button
    Then I should be redirected to the dashboard
    And my session should be active
 
  @AC-02 @error-case
  Scenario: Failed login with invalid password
    Given I am on the login page
    And user "test@example.com" exists with password "SecurePass123"
    When I enter "test@example.com" as email
    And I enter "WrongPassword" as password
    And I click the login button
    Then I should see error message "Invalid credentials"
    And I should remain on the login page

Traceability Matrix

TEA creates a matrix linking acceptance criteria to tests:

ACUnitIntegrationE2ECoverage
AC-01UT-01IT-01E2E-01FULL
AC-02UT-02-E2E-02FULL
AC-03-IT-02-PARTIAL
AC-04---MISSING

MISSING coverage triggers PM review before development proceeds.

Flow Recommendation

TEA recommends whether QA should write tests before (qa-first) or after (dev-first) development:

qa-first when:

  • Feature is behavior-heavy (E2E scenarios dominate)
  • Most AC describe user-facing behavior
  • Replacing existing functionality (regression risk)

dev-first when:

  • Feature is algorithm-heavy (unit tests dominate)
  • Most AC describe internal behavior
  • Greenfield with no existing contracts

Cost Analysis Framework

Taylor applies FinOps principles to estimate cloud costs:

Five-Step Process

  1. Identify resources created or modified by the feature
  2. Estimate usage patterns (requests per day, storage growth)
  3. Calculate costs using cloud pricing
  4. Compare alternatives (different instance types, regions)
  5. Recommend optimizations (reserved capacity, auto-scaling)

Cost Output by Scope

ScopeAnalysis Depth
trivial/smallSkipped
mediumTotal estimate only
largeBreakdown by component
complexMulti-scenario projections

Sample Cost Analysis

## Cost Estimate
 
**Monthly Total**: $47/month (production)
 
### Breakdown
 
| Component | Estimate | Notes |
|-----------|----------|-------|
| Lambda | $12 | 100K requests @ $0.20/M |
| DynamoDB | $25 | 1GB + 10K read/write units |
| CloudWatch | $10 | Logs + metrics |
 
### Optimization Opportunities
 
1. Reserved capacity: -30% on DynamoDB
2. Scheduled scaling: -20% during off-hours
3. Consider: Lambda provisioned vs on-demand

Proportional Ceremony

SpecFlow applies methodology proportional to risk:

A typo fix should not trigger STRIDE analysis. A payment integration should not skip it.

Scope-Based Framework Depth

FrameworkTrivialSmallMediumLargeComplex
STRIDESkipSkipLightFullDeep
BOSS1-2 AC3-5 AC6-10 AC10-15 AC15+ AC
TEA1 test3-5 tests6-10 tests10-15 tests15+ tests
FinOpsSkipSkipEstimateBreakdownProjections

Framework Integration

The frameworks reinforce each other:

  • BOSS criteria define what to build
  • STRIDE identifies what could go wrong
  • TEA ensures everything is testable
  • Gherkin makes tests readable
  • QA verifies everything works

TDD Workflow Enforcement

SpecFlow enforces test-driven development at the workflow level, not just the code level.

QA-First vs Dev-First Routing

Based on TEA's analysis, PM routes work differently:

TEA Analysis

Signals Detected

┌─────────────────────────────────┐
│ 2+ QA-First Signals?            │
│   • Behavioral tests dominate   │
│   • User-facing ACs             │
│   • Medium+ scope               │
│   • Regression risk             │
└─────────────────────────────────┘
    ↓ Yes              ↓ No
QA-First           Dev-First
    ↓                  ↓
QA writes         Dev implements
failing tests     with TDD
    ↓                  ↓
Dev implements    QA validates
to pass tests

Test Evidence Requirements

Stories cannot be marked complete without test evidence:

## Test Evidence (Required)
 
| Field | Value |
|-------|-------|
| test_file | tests/auth/test_logout.py |
| pass_count | 7/7 |
| timestamp | 2024-02-05T10:30:00Z |
| command | pytest tests/auth/test_logout.py |

The validate_completion() function enforces:

  • Test file exists
  • All tests pass (no failures)
  • Timestamp within current session
  • Coverage meets AC requirements

Automated User Acceptance Testing (UAT)

SpecFlow includes automated UAT triggers for user-facing features.

UAT Signal Detection

The system monitors for UAT signals:

SignalDetectionWeight
UI ComponentsComponent files created1
User WorkflowsGherkin with user steps1
Behavioral ACs"User should see..."1
Form HandlingInput validation logic1
NavigationRoute changes1

3+ signals triggers automatic UAT routing.

UAT Verification Flow

Feature Complete

Signal Count Check
    ↓ (3+ signals)
/gsd:verify-work

Conversational UAT

Goal-Backward Verification

VERIFICATION.md Created

Pass? → Ship
Fail? → Gap Closure Plans

Goal-Backward Verification

The verifier checks what actually exists in the codebase against phase goals:

  1. Read phase must_haves from plan
  2. Verify each artifact exists with required content
  3. Check key_links (integrations work)
  4. Produce passed, human_needed, or gaps_found status

Multi-Lens Review Loop

SpecFlow implements dynamic, skill-based code review.

Skill Discovery

Review automatically detects which skills apply:

/sf:review  # Discovers skills from code changes

The skill-detector analyzes:

  • File types modified
  • Import statements
  • Pattern matches
  • Capability requirements

Parallel Review Execution

Multiple reviewers run simultaneously:

Code Changes

Skill Discovery

┌────────────────────────────────────────┐
│ Parallel Review                        │
│                                        │
│  Security ──┐                          │
│             │                          │
│  Performance ──→ Aggregated Findings   │
│             │                          │
│  Accessibility ─┘                      │
└────────────────────────────────────────┘

PM Synthesis

Drift Detection

Ship or Iterate

Drift Detection

Review compares implementation against specification:

Drift TypeDetectionResponse
Missing ACFeature not implementedBlock + report
Modified ACBehavior differsClassify (bug/drift)
Added behaviorNot in specDocument + approve/reject
Security regressionSTRIDE threat introducedBlock + escalate

Heuristics classify drift automatically:

  • File-based: Did spec file change?
  • Assertion-based: Do tests match ACs?
  • Semantic: Does behavior match intent?

Low-confidence cases escalate to PM or user.

Slop Detection

Prevents common AI-generated anti-patterns.

What is Slop?

"Slop" refers to AI-generated code that:

  • Adds unnecessary abstractions
  • Over-engineers simple solutions
  • Creates premature generalizations
  • Duplicates existing library functionality
  • Adds excessive error handling for impossible cases

Detection Patterns

PatternExampleFlag
Unnecessary wrapperWrapping built-in with identical APIHIGH
Premature abstractionInterface with single implementationMEDIUM
Feature creepAdding config for hardcoded valuesMEDIUM
NIH syndromeCustom implementation of lodash functionHIGH
Over-validationChecking null after guaranteed initializationLOW

Library-First Enforcement

SpecFlow prefers battle-tested libraries:

  1. Analyst searches for existing libraries during spec
  2. Dev checks before implementing common patterns
  3. Custom requires justification in PROGRESS.md
## Library Assessment
 
| Pattern | Library Option | Decision |
|---------|---------------|----------|
| Date formatting | date-fns | USE LIBRARY |
| Validation | zod | USE LIBRARY |
| Custom business logic | - | IMPLEMENT (no library exists) |

Slop Review Integration

Review skills flag slop patterns:

## Review Findings
 
### SLOP-01: Unnecessary Abstraction
**File**: src/utils/stringHelper.ts
**Pattern**: Wrapper around `String.prototype.trim()`
**Recommendation**: Delete file, use native method
 
### SLOP-02: Premature Generalization
**File**: src/types/config.ts
**Pattern**: 50-field config interface, 3 fields used
**Recommendation**: Remove unused fields

Summary

SpecFlow methodology ensures:

  • Security: STRIDE catches threats before code is written
  • Quality: BOSS criteria are testable and specific
  • Verification: TEA creates comprehensive test coverage
  • Cost: FinOps estimates resource impact early
  • TDD: Test evidence required before completion
  • UAT: Automated user acceptance testing for user-facing features
  • Review: Multi-lens parallel review with drift detection
  • Slop Prevention: Library-first, anti-pattern detection

These frameworks prevent the common failure mode: discovering problems after implementation when fixes are expensive.