Aggregates all v0.2.0 sprint work (GARAA-30 through GARAA-40) and fixes
2 integration tests that broke when the codebase went async (DSPyLLMAdapter
and full pipeline tests now properly await coroutines).
277 tests pass (260 unit + 17 integration).
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Add configurable judge rubrics and multi-dimensional scoring with
weighted aggregation. New config fields: judge_criteria (free text)
and judge_dimensions (list of {name, weight, description}). CLI
--judge-criteria flag provides quick overrides. The judge adapter
computes weighted aggregate scores and enriches feedback with
per-dimension breakdowns.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Parallelize LLM calls across minibatches to reduce wall-clock time.
All domain ports (LLMPort, JudgePort, ProposerPort) are now async.
Adapter implementations wrap synchronous DSPy calls with asyncio.to_thread.
Judge calls run in parallel within a batch using asyncio.gather + semaphore.
Evaluator parallelizes minibatch execution with configurable concurrency.
Evolution loop and use case are fully async. Proposer stays sequential.
Added --max-concurrency CLI flag and max_concurrency YAML config field.
Added async_retry_with_backoff for async error handling.
All 139 unit tests pass.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- DSPyLLMAdapter now accepts dspy.LM instead of model string, uses dspy.context(lm=...)
- DSPyJudgeAdapter, DSPyProposerAdapter, DSPySyntheticAdapter each accept and use own LM
- OptimizationConfig gains per-model api_base/api_key_env override fields
- cli/app.py creates separate dspy.LM per adapter with per-model overrides
- New unit tests verify each adapter isolates its LM from global config
Fixes Bug #1 (multi-model config not wired) and Bug #2 (DSPyLLMAdapter ignores model param).
Co-Authored-By: Paperclip <noreply@paperclip.ing>