Commit Graph

6 Commits

Author SHA1 Message Date
FullStackDev
b9745566c8 feat: custom judge criteria and multi-dimensional scoring
Add configurable judge rubrics and multi-dimensional scoring with
weighted aggregation. New config fields: judge_criteria (free text)
and judge_dimensions (list of {name, weight, description}). CLI
--judge-criteria flag provides quick overrides. The judge adapter
computes weighted aggregate scores and enriches feedback with
per-dimension breakdowns.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 15:40:21 +00:00
FullStackDev
336774a164 feat: Pydantic config validation with clear CLI error messages
Convert OptimizationConfig from dataclass to Pydantic BaseModel with
field validators for ranges, types, and enum values. Missing/invalid
fields now produce actionable CLI errors instead of cryptic KeyErrors.

- Range validators: max_iterations>=1, minibatch_size>=1, seed>=0, etc.
- Enum validator: error_strategy must be skip|retry|abort
- Config migration hook via config_version field
- CLI catches ValidationError and prints per-field error messages
- Remove unused AppSettings class (Bug #7)
- 30 unit tests covering all validation edge cases

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 13:25:44 +00:00
FullStackDev
c92ca4a2b8 feat: async/parallel execution with configurable concurrency
Parallelize LLM calls across minibatches to reduce wall-clock time.
All domain ports (LLMPort, JudgePort, ProposerPort) are now async.
Adapter implementations wrap synchronous DSPy calls with asyncio.to_thread.
Judge calls run in parallel within a batch using asyncio.gather + semaphore.
Evaluator parallelizes minibatch execution with configurable concurrency.
Evolution loop and use case are fully async. Proposer stays sequential.
Added --max-concurrency CLI flag and max_concurrency YAML config field.
Added async_retry_with_backoff for async error handling.
All 139 unit tests pass.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 13:15:34 +00:00
FullStackDev
e2d111ce5b feat: error handling, retry with backoff, and circuit breaker
Add robust error handling to the evolution loop and LLM adapters:
- Retry utility with exponential backoff for transient errors (429, 5xx, timeouts)
- Per-call error isolation in evaluator and judge adapter
- Circuit breaker in EvolutionLoop (trips after N consecutive failures)
- CLI flags: --max-retries, --error-strategy (skip|retry|abort)
- Config fields: max_retries, retry_delay_base, circuit_breaker_threshold, error_strategy
- 16 new unit tests covering all error handling paths

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 12:47:55 +00:00
FullStackDev
f516ca4be6 fix: multi-model routing — each adapter uses own dspy.LM instance
- DSPyLLMAdapter now accepts dspy.LM instead of model string, uses dspy.context(lm=...)
- DSPyJudgeAdapter, DSPyProposerAdapter, DSPySyntheticAdapter each accept and use own LM
- OptimizationConfig gains per-model api_base/api_key_env override fields
- cli/app.py creates separate dspy.LM per adapter with per-model overrides
- New unit tests verify each adapter isolates its LM from global config

Fixes Bug #1 (multi-model config not wired) and Bug #2 (DSPyLLMAdapter ignores model param).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-29 12:31:48 +00:00
837a44970f Initial commit: PROMETHEUS v0.1.0 - Prompt optimizer
- Clean architecture (domain/application/infrastructure)
- DSPy-based evolution engine with scoring
- CLI via pyproject.toml entry point
- Unit + integration tests (~300 tests)
- Configs for glm-5.1 and glm-4.5-air models
- Z.AI endpoint integration
2026-03-29 11:44:03 +00:00