Commit Graph

6 Commits

Author SHA1 Message Date
FullStackDev
a5bf2ad59c feat: v0.2.0 sprint — ground truth eval, crossover/mutation, checkpointing, similarity guards, dataset loader, CLI commands, extended test coverage
Aggregates all v0.2.0 sprint work (GARAA-30 through GARAA-40) and fixes
2 integration tests that broke when the codebase went async (DSPyLLMAdapter
and full pipeline tests now properly await coroutines).

277 tests pass (260 unit + 17 integration).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 19:13:50 +00:00
FullStackDev
b9745566c8 feat: custom judge criteria and multi-dimensional scoring
Add configurable judge rubrics and multi-dimensional scoring with
weighted aggregation. New config fields: judge_criteria (free text)
and judge_dimensions (list of {name, weight, description}). CLI
--judge-criteria flag provides quick overrides. The judge adapter
computes weighted aggregate scores and enriches feedback with
per-dimension breakdowns.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 15:40:21 +00:00
FullStackDev
c92ca4a2b8 feat: async/parallel execution with configurable concurrency
Parallelize LLM calls across minibatches to reduce wall-clock time.
All domain ports (LLMPort, JudgePort, ProposerPort) are now async.
Adapter implementations wrap synchronous DSPy calls with asyncio.to_thread.
Judge calls run in parallel within a batch using asyncio.gather + semaphore.
Evaluator parallelizes minibatch execution with configurable concurrency.
Evolution loop and use case are fully async. Proposer stays sequential.
Added --max-concurrency CLI flag and max_concurrency YAML config field.
Added async_retry_with_backoff for async error handling.
All 139 unit tests pass.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 13:15:34 +00:00
FullStackDev
e2d111ce5b feat: error handling, retry with backoff, and circuit breaker
Add robust error handling to the evolution loop and LLM adapters:
- Retry utility with exponential backoff for transient errors (429, 5xx, timeouts)
- Per-call error isolation in evaluator and judge adapter
- Circuit breaker in EvolutionLoop (trips after N consecutive failures)
- CLI flags: --max-retries, --error-strategy (skip|retry|abort)
- Config fields: max_retries, retry_delay_base, circuit_breaker_threshold, error_strategy
- 16 new unit tests covering all error handling paths

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 12:47:55 +00:00
FullStackDev
f516ca4be6 fix: multi-model routing — each adapter uses own dspy.LM instance
- DSPyLLMAdapter now accepts dspy.LM instead of model string, uses dspy.context(lm=...)
- DSPyJudgeAdapter, DSPyProposerAdapter, DSPySyntheticAdapter each accept and use own LM
- OptimizationConfig gains per-model api_base/api_key_env override fields
- cli/app.py creates separate dspy.LM per adapter with per-model overrides
- New unit tests verify each adapter isolates its LM from global config

Fixes Bug #1 (multi-model config not wired) and Bug #2 (DSPyLLMAdapter ignores model param).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 12:31:48 +00:00
837a44970f Initial commit: PROMETHEUS v0.1.0 - Prompt optimizer
- Clean architecture (domain/application/infrastructure)
- DSPy-based evolution engine with scoring
- CLI via pyproject.toml entry point
- Unit + integration tests (~300 tests)
- Configs for glm-5.1 and glm-4.5-air models
- Z.AI endpoint integration
2026-03-29 11:44:03 +00:00