Capability Progression
Level 1 → Level 4 Across the Trimester
Level 1 — Manual
Where you started
Everything by hand
No code, no scripts
W1–W3 baseline
Level 2 → 3
Minimum by W5
One automated step
GitHub Actions
Structured outputs
Level 3
Expected by W8
Multi-agent pipeline
LLM API calls
Delta Engine live
Level 4
Distinction — W10
Full pipeline auto
Prescriptive Rx output
UI + 11 sectors
Sprint-by-Sprint Roadmap — Exact Dates
W3 (8 Jun) → W10 (3 Aug) · Study Week 29 Jun Blanked
Each sprint requires a market prediction + release tag (Sunday midnight SGT) plus the software increment for that sprint. The gold dot is where we are now.
W3 — Manual Pipeline Complete
✓ Done
✓SPX, NDX, IWM prediction locked · tag vW23 · presented Mon 8 Jun
✓Full manual pipeline: Sources → Agents → 4 LLMs → Human Score → GitHub
✓Scrum ceremonies completed · Sprint 4 software development announced
W4 — First Software Increment + Sector Expansion
▶ NOW · tag vW24 due Sun 14 Jun
Choose your capability tier. Ship something that runs. Expand prediction to cover at least 3 S&P sectors.
📌Prediction: SPX + NDX + IWM + minimum 3 S&P 500 sectors · tag
vW24📌Software increment: at minimum one automated data fetch script committed to GitHub
📌DECISION.md: what did you choose to automate first and why?
📌W3 delta report: how accurate was your W3 prediction? Direction correct? Error size?
★Level 3+: one LLM API call with logged prompt + response in repo
★Level 4: Delta Engine script computing W3 accuracy automatically
W5 — Agent Automation + Delta Engine
tag vW25 due Sun 21 Jun
→Prediction: SPX + NDX + IWM + minimum 5 S&P sectors · tag
vW25→At least 2 agents producing structured output files automatically (JSON or CSV)
→Delta Engine: W4 predicted vs actual calculated in code, output as file
→GitHub Actions workflow: pipeline runs on push or schedule, no manual trigger needed
★Level 4: multi-LLM comparison table generated programmatically from API responses
★Level 4: FinBERT sentiment on one macro data source (earnings transcript or Fed minutes)
📚 STUDY WEEK — 29 June · No session · No prediction · No sprint tag · Rest and prepare for exams
W7 — LLM Integration + Calibration Suite
tag vW28 due Sun 5 Jul
Back from study week. Two-week gap — resume with sharpened focus. Pipeline should be running automatically by now.
→Prediction: SPX + NDX + IWM + all 11 S&P sectors · tag
vW28→At least 2 LLMs called via API with identical prompts, responses logged and compared
→Calibration suite: accuracy tracked across W3–W5, directional accuracy per model in code
→Human Score interface: structured form or guided template replacing unstructured notes
★Level 4: agreement matrix auto-generated, disagreement zones flagged automatically
W8 — Prescriptive Engine · Mid-Trimester Milestone
🎯 Milestone · tag vW29 due Sun 12 Jul
Checkpoint: every team must have an end-to-end running pipeline by this session. If a team is still at Level 1, intervention occurs here.
🎯Prescriptive output: system generates at least one specific agent weight adjustment from delta history
🎯Mid-trimester demo: full pipeline runs live, no manual data entry steps visible
🎯Architecture document updated: what was planned vs what was actually built and why it differs
★Level 4: simple web UI showing pipeline state, latest prediction, and calibration scores
★Bonus: two-week prediction horizon (Almanac + Macro dominant, wider range)
W9 — System Integration + Sector Depth
tag vW30 due Sun 19 Jul
→Full pipeline runs from one command or one GitHub Actions trigger, no exceptions
→All 11 S&P sectors covered with at least one automated signal per sector
→Calibration history: W3–W8 accuracy tracked, which model has best directional record?
→Pipeline hardening: error handling for missing data, API failures, malformed responses
★Bonus: one-month prediction horizon using Almanac + FinBERT + Macro agents
W10 Prep — Final Sprint + Demo Rehearsal
tag vW31 due Sun 26 Jul
Last prediction sprint before Demo Day. System should produce the prediction with minimal human input. Rehearse every role's two-minute demo segment.
→Final prediction sprint: system does the heavy lifting, team reviews and approves
→Demo rehearsal: every role presents their pipeline segment in under two minutes
→System story prepared: Level 1 in W1 → Level 3–4 now, what changed and what was learned
FINAL DEMO DAY — Mon 3 August 2026
🏁 End of Trimester
No new prediction this week. You demo the system you built. Every role speaks. The pipeline runs live.
🏁Live demo: pipeline runs from data fetch to prediction output in real time, no slides replacing it
🏁Calibration story: W3–W10 accuracy shown — what improved, what the system learned to prescribe
🏁Scrum story: how did the team mature from Level 1 manual to the system running today?
🏁Architecture diagram: from the manual pipeline you specified to the code you shipped
★Distinction: system demonstrates self-improvement across sprints — calibration loop visibly tightens prediction accuracy week by week
Core Requirement — All 11 S&P 500 Sectors
Full Sector Coverage Expected by W9 (20 July)
Start with the sectors most relevant to the week's macro theme. Build coverage progressively: 3 sectors by W4, 5 by W5, all 11 by W9.
Energy
Oil · USD strength
Materials
Global commodities
Industrials
GDP · PMI · ISM
Cons. Disc.
Consumer confidence
Cons. Staples
Defensive · Inflation
Healthcare
Defensive · Policy risk
Financials
Rates · Yield curve
Info Tech
Rate sensitive · Earnings
Comm. Svcs
Ad spend · Streaming
Utilities
Bond proxy · Rates
Real Estate
REITs · Mortgage rates