mirror of
https://github.com/mgeeky/decode-spam-headers.git
synced 2026-02-22 13:33:30 +01:00
6.1 KiB
6.1 KiB
Phase 02: Engine Refactoring
This phase decomposes the monolithic decode-spam-headers.py (6,931 lines, 106 test methods, 3 classes) into independently testable scanner modules that the API can invoke programmatically. This is a prerequisite for all user stories — without modular scanners, the backend cannot expose individual tests or stream progress. TDD Red-Green: write failing tests first, then implement parser, scanner base, registry, 10 vendor-grouped scanner modules, and the analyzer orchestrator.
Spec Kit Context
- Feature: 1-web-header-analyzer
- Specification: .specify/specs/1-web-header-analyzer/spec.md
- Plan: .specify/specs/1-web-header-analyzer/plan.md
- Tasks: .specify/specs/1-web-header-analyzer/tasks.md
- Data Model: .specify/specs/1-web-header-analyzer/data-model.md
- Constitution: .specify/memory/constitution.md (TDD mandate: P6)
Architecture Reference
The existing monolith structure (read-only reference):
decode-spam-headers.pylines 209–419:Loggerclassdecode-spam-headers.pylines 421–439:Verstringclassdecode-spam-headers.pylines 441+:SMTPHeadersAnalysisclassdecode-spam-headers.pylines 1896–2027:getAllTests()defining all 106 testsdecode-spam-headers.pylines 2437–6504: All test method implementations
Target modular structure:
backend/app/engine/
├── __init__.py
├── models.py # AnalysisRequest, AnalysisResult, TestResult, HopChainNode, SecurityAppliance
├── logger.py # Adapted Logger class (Python logging module)
├── parser.py # HeaderParser.parse(raw_text) -> list[ParsedHeader]
├── scanner_base.py # BaseScanner protocol: id, name, run(headers) -> TestResult | None
├── scanner_registry.py # ScannerRegistry: get_all(), get_by_ids(), list_tests()
├── analyzer.py # HeaderAnalyzer orchestrator with progress callback
└── scanners/
├── received_headers.py # Tests 1–3
├── forefront_antispam.py # Tests 12–16, 63–64
├── spamassassin.py # Tests 18–21, 74
├── ironport.py # Tests 27–29, 38–43, 88–89
├── mimecast.py # Tests 30, 61–62, 65
├── trendmicro.py # Tests 47–59, 97
├── barracuda.py # Tests 69–73
├── proofpoint.py # Tests 66–67
├── microsoft_general.py # Tests 31–34, 80, 83–85, 99–102
└── general.py # Remaining tests: 4–11, 17, 22–26, 36–37, 44–46, 68, 75–79, 82, 86–87, 90–96, 98, 103–106
Tasks
- T007 Write failing tests (TDD Red) in
backend/tests/engine/test_parser.py(header parsing with sample EML),backend/tests/engine/test_scanner_registry.py(discovery returns 106+ scanners, filtering by ID), andbackend/tests/engine/test_analyzer.py(full pipeline with reference fixture). Createbackend/tests/fixtures/sample_headers.txtwith representative header set extracted from the existing test infrastructure - T008 Create
backend/app/engine/__init__.pyandbackend/app/engine/models.py— Pydantic models forAnalysisRequest,AnalysisResult,TestResult,HopChainNode,SecurityAppliance. Refer to.specify/specs/1-web-header-analyzer/data-model.mdfor field definitions and severity enum values (spam→#ff5555, suspicious→#ffb86c, clean→#50fa7b, info→#bd93f9) - T009 Create
backend/app/engine/logger.py— extract Logger class fromdecode-spam-headers.py(lines 209–419), adapt to use Pythonloggingmodule instead of direct stdout - T010 Create
backend/app/engine/parser.py— extract header parsing fromSMTPHeadersAnalysis.collect()andgetHeader()(lines ~2137–2270). ExposeHeaderParser.parse(raw_text: str) -> list[ParsedHeader]including MIME boundary and line-break handling. Verifytest_parser.pypasses (TDD Green) - T011 Create
backend/app/engine/scanner_base.py— abstractBaseScanner(Protocol or ABC) with interface:id: int,name: str,run(headers: list[ParsedHeader]) -> TestResult | None(implemented Protocol inbackend/app/engine/scanner_base.py) - T012 Create
backend/app/engine/scanner_registry.py—ScannerRegistrywith auto-discovery:get_all(),get_by_ids(ids),list_tests(). Verifytest_scanner_registry.pypasses (TDD Green) - T013 [P] Create scanner modules by extracting test methods from
SMTPHeadersAnalysisintobackend/app/engine/scanners/. Each file implementsBaseScanner:backend/app/engine/scanners/received_headers.py(tests 1–3)backend/app/engine/scanners/forefront_antispam.py(tests 12–16, 63–64)backend/app/engine/scanners/spamassassin.py(tests 18–21, 74)backend/app/engine/scanners/ironport.py(tests 27–29, 38–43, 88–89)backend/app/engine/scanners/mimecast.py(tests 30, 61–62, 65)backend/app/engine/scanners/trendmicro.py(tests 47–59, 97)backend/app/engine/scanners/barracuda.py(tests 69–73)backend/app/engine/scanners/proofpoint.py(tests 66–67)backend/app/engine/scanners/microsoft_general.py(tests 31–34, 80, 83–85, 99–102)backend/app/engine/scanners/general.py(remaining tests: 4–11, 17, 22–26, 36–37, 44–46, 68, 75–79, 82, 86–87, 90–96, 98, 103–106)
- T014 Create
backend/app/engine/analyzer.py—HeaderAnalyzerorchestrator: acceptsAnalysisRequest, usesHeaderParser+ScannerRegistry, runs scanners with per-test timeout, collects results (marking failed tests with error status per FR-25), supports progress callbackCallable[[int, int, str], None]. Verifytest_analyzer.pypasses (TDD Green)
Completion
pytest backend/tests/engine/passes with all tests green- All 106+ tests are registered in the scanner registry (
ScannerRegistry.get_all()returns 106+ scanners) - Analysis of
backend/tests/fixtures/sample_headers.txtproduces results matching original CLI output ruff check backend/passes with zero errors- Run
/speckit.analyzeto verify consistency