- Clean architecture (domain/application/infrastructure) - DSPy-based evolution engine with scoring - CLI via pyproject.toml entry point - Unit + integration tests (~300 tests) - Configs for glm-5.1 and glm-4.5-air models - Z.AI endpoint integration
1642 lines
61 KiB
Markdown
1642 lines
61 KiB
Markdown
# PROMETHEUS MVP — Spécification Technique Détaillée
|
||
**Version**: 0.1.0
|
||
**Stack**: Python 3.12+ · uv · DSPy · Typer
|
||
**Architecture**: Clean Architecture (hexagonale)
|
||
**Date**: 2025
|
||
---
|
||
## Table des Matières
|
||
1. [Vue d'Ensemble & Objectifs](#1-vue-densemble--objectifs)
|
||
2. [Structure du Projet](#2-structure-du-projet)
|
||
3. [Couche Domaine](#3-couche-domaine--entities--ports)
|
||
4. [Couche Application](#4-couche-application--use-cases)
|
||
5. [Couche Infrastructure](#5-couche-infrastructure--dspy-adapters)
|
||
6. [Couche Présentation (CLI)](#6-couche-présentation--cli)
|
||
7. [Algorithme Central — Pseudo-Code Détaillé](#7-algorithme-central)
|
||
8. [Format des Fichiers I/O](#8-format-des-fichiers-io)
|
||
9. [Configuration & Environnement](#9-configuration--environnement)
|
||
10. [Tests](#10-stratégie-de-tests)
|
||
11. [Diagrammes d'Architecture Complète](#11-diagrammes-darchitecture)
|
||
---
|
||
## 1. Vue d'Ensemble & Objectifs
|
||
### 1.1 Énoncé du Problème
|
||
Les frameworks d'optimisation de prompt (GEPA, TextGrad, Promptolution) nécessitent
|
||
tous un dataset labellisé pour calculer un signal de qualité. PROMETHEUS élimine
|
||
cette dépendance en synthétisant ses propres données de test et en utilisant un
|
||
LLM-as-Judge comme fonction d'évaluation.
|
||
### 1.2 Objectifs du MVP
|
||
| # | Objectif | Critère d'acceptance |
|
||
|---|----------|---------------------|
|
||
| O1 | Optimiser un prompt sans aucune donnée labellisée | Seed prompt → prompt amélioré, 0 fichier de données requis |
|
||
| O2 | Interface CLI simple | `prometheus optimize -i config.yaml -o result.yaml` |
|
||
| O3 | Budget maîtrisé | < 500 appels LLM pour une run complète |
|
||
| O4 | Reproductible | Seed déterministe, résultats identiques si même seed + même modèle |
|
||
| O5 | Observable | Logging structuré, métriques par itération |
|
||
### 1.3 Flux Nominal
|
||
```
|
||
┌──────────────┐ ┌───────────────┐ ┌──────────────────┐ ┌────────────┐
|
||
│ Fichier │ │ │ │ │ │ Fichier │
|
||
│ config.yaml ├───► │ Bootstrap ├───► │ Evolution Loop ├───► │ output │
|
||
│ (seed prompt│ │ (synth inputs│ │ (judge + mutate │ │ (optimized│
|
||
│ + params) │ │ generation) │ │ + accept) │ │ prompt) │
|
||
└──────────────┘ └───────────────┘ └──────────────────┘ └────────────┘
|
||
```
|
||
---
|
||
## 2. Structure du Projet
|
||
```
|
||
prometheus/
|
||
├── pyproject.toml # uv project config
|
||
├── README.md
|
||
├── specs/
|
||
│ └── technical-spec.md # ce fichier
|
||
│
|
||
├── src/
|
||
│ └── prometheus/
|
||
│ ├── __init__.py
|
||
│ ├── cli/ # PRESENTATION LAYER
|
||
│ │ ├── __init__.py
|
||
│ │ └── app.py # Typer CLI app
|
||
│ │
|
||
│ ├── domain/ # DOMAIN LAYER (zero dependencies)
|
||
│ │ ├── __init__.py
|
||
│ │ ├── entities.py # Dataclasses: Prompt, Candidate, EvalResult, SyntheticExample
|
||
│ │ ├── ports.py # Abstract interfaces (Protocol classes)
|
||
│ │ └── scoring.py # Score combination logic, acceptance criteria
|
||
│ │
|
||
│ ├── application/ # APPLICATION LAYER (depends on domain only)
|
||
│ │ ├── __init__.py
|
||
│ │ ├── use_cases.py # OptimizePromptUseCase
|
||
│ │ ├── bootstrap.py # SyntheticInputBootstrap
|
||
│ │ ├── evolution.py # EvolutionLoop, ReflectiveMutation
|
||
│ │ ├── evaluator.py # DualEvaluator (judge + execution)
|
||
│ │ └── dto.py # Config & Result dataclasses
|
||
│ │
|
||
│ ├── infrastructure/ # INFRASTRUCTURE LAYER (depends on domain + application)
|
||
│ │ ├── __init__.py
|
||
│ │ ├── dspy_signatures.py # DSPy Signature definitions
|
||
│ │ ├── dspy_modules.py # DSPy Module implementations
|
||
│ │ ├── llm_adapter.py # LLMAdapter (implements domain port)
|
||
│ │ ├── judge_adapter.py # JudgeAdapter (implements domain port)
|
||
│ │ ├── proposer_adapter.py # ProposerAdapter (implements domain port)
|
||
│ │ ├── synth_adapter.py # SyntheticGeneratorAdapter (implements domain port)
|
||
│ │ └── file_io.py # FileReader, FileWriter
|
||
│ │
|
||
│ └── config.py # Settings (pydantic-settings)
|
||
│
|
||
├── tests/
|
||
│ ├── unit/
|
||
│ │ ├── test_entities.py
|
||
│ │ ├── test_scoring.py
|
||
│ │ ├── test_evolution.py
|
||
│ │ └── test_bootstrap.py
|
||
│ ├── integration/
|
||
│ │ ├── test_dspy_adapters.py
|
||
│ │ └── test_full_pipeline.py
|
||
│ └── conftest.py
|
||
│
|
||
└── examples/
|
||
├── basic_usage.py
|
||
└── sample_config.yaml
|
||
```
|
||
### 2.1 `pyproject.toml`
|
||
```toml
|
||
[project]
|
||
name = "prometheus"
|
||
version = "0.1.0"
|
||
description = "Prompt evolution without reference data"
|
||
readme = "README.md"
|
||
requires-python = ">=3.12"
|
||
dependencies = [
|
||
"dspy>=2.6",
|
||
"typer>=0.15",
|
||
"pydantic>=2.10",
|
||
"pydantic-settings>=2.7",
|
||
"pyyaml>=6.0",
|
||
"rich>=13.9",
|
||
]
|
||
[project.optional-dependencies]
|
||
dev = [
|
||
"pytest>=8.3",
|
||
"pytest-cov>=6.0",
|
||
"ruff>=0.9",
|
||
"mypy>=1.14",
|
||
]
|
||
[project.scripts]
|
||
prometheus = "prometheus.cli.app:app"
|
||
[build-system]
|
||
requires = ["hatchling"]
|
||
build-backend = "hatchling.build"
|
||
[tool.ruff]
|
||
line-length = 100
|
||
target-version = "py312"
|
||
[tool.mypy]
|
||
python_version = "3.12"
|
||
strict = true
|
||
```
|
||
---
|
||
## 3. Couche Domaine — Entities & Ports
|
||
### Objectif
|
||
Définir le cœur métier sans aucune dépendance externe.
|
||
Aucune import de `dspy`, `pydantic`, ou quoi que ce soit hors stdlib.
|
||
### 3.1 `entities.py`
|
||
```python
|
||
"""Domain entities — pure data, zero dependencies."""
|
||
from __future__ import annotations
|
||
from dataclasses import dataclass, field
|
||
from typing import Any
|
||
@dataclass(frozen=True)
|
||
class Prompt:
|
||
"""
|
||
Représente un prompt candidat.
|
||
frozen=True → immutable, safe pour le Pareto tracking.
|
||
"""
|
||
text: str
|
||
metadata: dict[str, Any] = field(default_factory=dict)
|
||
def __len__(self) -> int:
|
||
return len(self.text)
|
||
@dataclass(frozen=True)
|
||
class SyntheticExample:
|
||
"""
|
||
Un exemple synthétique: un input généré à partir de la task description.
|
||
Pas d'output attendu — le juge évaluera la sortie directement.
|
||
"""
|
||
input_text: str
|
||
category: str = "default" # pour le sampling stratifié futur
|
||
id: int = 0
|
||
@dataclass
|
||
class Trajectory:
|
||
"""
|
||
Trace d'exécution d'un prompt sur un input.
|
||
Utilisé par la reflective mutation pour comprendre les échecs.
|
||
"""
|
||
input_text: str
|
||
output_text: str
|
||
score: float
|
||
feedback: str # feedback textuel du juge
|
||
prompt_used: str
|
||
@dataclass
|
||
class EvalResult:
|
||
"""Résultat d'une évaluation sur un minibatch."""
|
||
scores: list[float]
|
||
feedbacks: list[str]
|
||
trajectories: list[Trajectory]
|
||
@property
|
||
def total_score(self) -> float:
|
||
return sum(self.scores)
|
||
@property
|
||
def mean_score(self) -> float:
|
||
return sum(self.scores) / len(self.scores) if self.scores else 0.0
|
||
@dataclass
|
||
class Candidate:
|
||
"""
|
||
Un candidat dans le pool d'évolution.
|
||
Contient le prompt + ses scores cumulés.
|
||
"""
|
||
prompt: Prompt
|
||
best_score: float = 0.0
|
||
generation: int = 0 # à quelle itération il a été créé
|
||
parent_id: int | None = None
|
||
@dataclass
|
||
class OptimizationState:
|
||
"""État complet de l'optimisation — snapshot sérialisable."""
|
||
iteration: int = 0
|
||
best_candidate: Candidate | None = None
|
||
candidates: list[Candidate] = field(default_factory=list)
|
||
synthetic_pool: list[SyntheticExample] = field(default_factory=list)
|
||
history: list[dict[str, Any]] = field(default_factory=list)
|
||
total_llm_calls: int = 0
|
||
```
|
||
### 3.2 `ports.py`
|
||
```python
|
||
"""
|
||
Domain ports — interfaces abstraites que l'infrastructure implémente.
|
||
Utilise Protocol (structural typing) pour le loose coupling.
|
||
"""
|
||
from __future__ import annotations
|
||
from abc import ABC, abstractmethod
|
||
from prometheus.domain.entities import (
|
||
Prompt, SyntheticExample, Trajectory, EvalResult
|
||
)
|
||
class LLMPort(ABC):
|
||
"""
|
||
Port d'exécution d'un prompt sur un input.
|
||
L'infrastructure fournira une implémentation via DSPy.
|
||
"""
|
||
@abstractmethod
|
||
def execute(self, prompt: Prompt, input_text: str) -> str:
|
||
"""Exécute le prompt sur l'input, retourne la réponse brute."""
|
||
...
|
||
class JudgePort(ABC):
|
||
"""
|
||
Port d'évaluation par LLM-as-Judge.
|
||
Prend des paires (input, output) + la task description.
|
||
Retourne un score + un feedback textuel par paire.
|
||
"""
|
||
@abstractmethod
|
||
def judge_batch(
|
||
self,
|
||
task_description: str,
|
||
pairs: list[tuple[str, str]],
|
||
) -> list[tuple[float, str]]:
|
||
"""
|
||
Évalue un batch de (input, output).
|
||
Retourne une liste de (score, feedback).
|
||
"""
|
||
...
|
||
class ProposerPort(ABC):
|
||
"""
|
||
Port de proposition d'un nouveau prompt.
|
||
Utilise les trajectoires d'évaluation pour proposer une amélioration.
|
||
"""
|
||
@abstractmethod
|
||
def propose(
|
||
self,
|
||
current_prompt: Prompt,
|
||
trajectories: list[Trajectory],
|
||
task_description: str,
|
||
) -> Prompt:
|
||
"""Propose un nouveau prompt basé sur les trajectoires d'échec."""
|
||
...
|
||
class SyntheticGeneratorPort(ABC):
|
||
"""
|
||
Port de génération d'inputs synthétiques.
|
||
"""
|
||
@abstractmethod
|
||
def generate_inputs(
|
||
self,
|
||
task_description: str,
|
||
n_examples: int,
|
||
) -> list[SyntheticExample]:
|
||
"""Génère N inputs synthétiques diversifiés."""
|
||
...
|
||
class PersistencePort(ABC):
|
||
"""Port de lecture/écriture des fichiers."""
|
||
@abstractmethod
|
||
def read_config(self, path: str) -> dict:
|
||
...
|
||
@abstractmethod
|
||
def write_result(self, path: str, data: dict) -> None:
|
||
...
|
||
```
|
||
### 3.3 `scoring.py`
|
||
```python
|
||
"""Logique de scoring et critères d'acceptation — pur domaine."""
|
||
from prometheus.domain.entities import EvalResult
|
||
def should_accept(
|
||
old_result: EvalResult,
|
||
new_result: EvalResult,
|
||
min_improvement: float = 0.0,
|
||
) -> bool:
|
||
"""
|
||
Critère d'acceptation strict.
|
||
Le nouveau candidat doit strictement améliorer le score total.
|
||
"""
|
||
return new_result.total_score > old_result.total_score + min_improvement
|
||
def normalize_score(raw: float, min_val: float = 0.0, max_val: float = 1.0) -> float:
|
||
"""Clamp un score dans [min_val, max_val]."""
|
||
return max(min_val, min(max_val, raw))
|
||
```
|
||
---
|
||
## 4. Couche Application — Use Cases
|
||
### Objectif
|
||
Orchestrer la logique métier en utilisant uniquement les ports du domaine.
|
||
Ne dépend jamais de l'infrastructure concrète.
|
||
### 4.1 `dto.py`
|
||
```python
|
||
"""Data Transfer Objects — configuration et résultats."""
|
||
from dataclasses import dataclass, field
|
||
@dataclass
|
||
class OptimizationConfig:
|
||
"""Configuration complète d'une run PROMETHEUS."""
|
||
# --- Prompt ---
|
||
seed_prompt: str
|
||
task_description: str
|
||
# --- Modèles ---
|
||
task_model: str = "openai/gpt-4o-mini"
|
||
judge_model: str = "openai/gpt-4o"
|
||
proposer_model: str = "openai/gpt-4o"
|
||
synth_model: str = "openai/gpt-4o"
|
||
# --- Paramètres d'évolution ---
|
||
max_iterations: int = 30
|
||
n_synthetic_inputs: int = 20
|
||
minibatch_size: int = 5
|
||
perfect_score: float = 1.0
|
||
# --- Reproductibilité ---
|
||
seed: int = 42
|
||
# --- Sortie ---
|
||
output_path: str = "output.yaml"
|
||
verbose: bool = False
|
||
@dataclass
|
||
class OptimizationResult:
|
||
"""Résultat d'une optimisation complète."""
|
||
optimized_prompt: str
|
||
initial_prompt: str
|
||
iterations_used: int
|
||
total_llm_calls: int
|
||
initial_score: float
|
||
final_score: float
|
||
improvement: float
|
||
history: list[dict] = field(default_factory=list)
|
||
```
|
||
### 4.2 `bootstrap.py`
|
||
```python
|
||
"""
|
||
Bootstrap — génération d'inputs synthétiques.
|
||
Objectif: Créer un pool d'inputs de test à partir de la task description.
|
||
C'est le remplacement du dataset labellisé.
|
||
"""
|
||
from __future__ import annotations
|
||
import random
|
||
from prometheus.domain.ports import SyntheticGeneratorPort
|
||
from prometheus.domain.entities import SyntheticExample
|
||
class SyntheticBootstrap:
|
||
"""
|
||
Orchestre la génération d'inputs synthétiques.
|
||
Ne dépend que du port abstrait, pas de DSPy directement.
|
||
"""
|
||
def __init__(self, generator: SyntheticGeneratorPort, seed: int = 42):
|
||
self._generator = generator
|
||
self._rng = random.Random(seed)
|
||
def run(self, task_description: str, n_examples: int) -> list[SyntheticExample]:
|
||
"""
|
||
Génère le pool synthétique en un seul appel.
|
||
Pourquoi un seul appel ?
|
||
- Minimise les coûts LLM (1 appel au lieu de N)
|
||
- Le LLM peut assurer la diversité en une seule génération
|
||
- Le batch dans un seul prompt permet une meilleure couverture
|
||
"""
|
||
examples = self._generator.generate_inputs(task_description, n_examples)
|
||
# Shuffle pour la randomisation
|
||
self._rng.shuffle(examples)
|
||
return examples
|
||
def sample_minibatch(
|
||
self,
|
||
pool: list[SyntheticExample],
|
||
size: int,
|
||
) -> list[SyntheticExample]:
|
||
"""Échantillonne un minibatch du pool synthétique."""
|
||
size = min(size, len(pool))
|
||
return self._rng.sample(pool, size)
|
||
```
|
||
### 4.3 `evaluator.py`
|
||
```python
|
||
"""
|
||
Évaluateur — exécution + jugement.
|
||
Objectif: Produire un signal de qualité sans ground truth.
|
||
Combine l'exécution du prompt candidat + l'évaluation par un LLM-as-Judge.
|
||
"""
|
||
from __future__ import annotations
|
||
from prometheus.domain.entities import (
|
||
Prompt, SyntheticExample, Trajectory, EvalResult
|
||
)
|
||
from prometheus.domain.ports import LLMPort, JudgePort
|
||
class PromptEvaluator:
|
||
"""
|
||
Évalue un prompt sur un minibatch d'inputs synthétiques.
|
||
Pipeline: execute → judge → construire les trajectoires.
|
||
Ce composant remplace la EvaluatorFn de GEPA.
|
||
Au lieu de comparer à un ground truth, il utilise un LLM-as-Judge.
|
||
"""
|
||
def __init__(self, executor: LLMPort, judge: JudgePort):
|
||
self._executor = executor
|
||
self._judge = judge
|
||
def evaluate(
|
||
self,
|
||
prompt: Prompt,
|
||
minibatch: list[SyntheticExample],
|
||
task_description: str,
|
||
) -> EvalResult:
|
||
"""
|
||
Évalue le prompt sur le minibatch.
|
||
Étapes:
|
||
1. Exécuter le prompt sur chaque input du minibatch
|
||
2. Juger chaque paire (input, output)
|
||
3. Construire les trajectoires avec le feedback
|
||
Retourne un EvalResult avec scores + feedbacks + trajectoires.
|
||
"""
|
||
# ── Étape 1: Exécution ──
|
||
outputs: list[str] = []
|
||
for example in minibatch:
|
||
raw_output = self._executor.execute(prompt, example.input_text)
|
||
outputs.append(raw_output)
|
||
# ── Étape 2: Jugement ──
|
||
pairs = [(ex.input_text, out) for ex, out in zip(minibatch, outputs)]
|
||
judge_results = self._judge.judge_batch(task_description, pairs)
|
||
# ── Étape 3: Construction des trajectoires ──
|
||
scores: list[float] = []
|
||
feedbacks: list[str] = []
|
||
trajectories: list[Trajectory] = []
|
||
for i, (example, output) in enumerate(zip(minibatch, outputs)):
|
||
score, feedback = judge_results[i]
|
||
scores.append(score)
|
||
feedbacks.append(feedback)
|
||
trajectories.append(Trajectory(
|
||
input_text=example.input_text,
|
||
output_text=output,
|
||
score=score,
|
||
feedback=feedback,
|
||
prompt_used=prompt.text,
|
||
))
|
||
return EvalResult(
|
||
scores=scores,
|
||
feedbacks=feedbacks,
|
||
trajectories=trajectories,
|
||
)
|
||
```
|
||
### 4.4 `evolution.py`
|
||
```python
|
||
"""
|
||
Boucle d'évolution — cœur du moteur PROMETHEUS.
|
||
Objectif: Orchestrer le cycle select → evaluate → propose → accept.
|
||
C'est l'équivalent du GEPAEngine.run(), adapté pour fonctionner sans valset.
|
||
"""
|
||
from __future__ import annotations
|
||
from prometheus.domain.entities import (
|
||
Prompt, Candidate, EvalResult, OptimizationState, SyntheticExample
|
||
)
|
||
from prometheus.domain.ports import ProposerPort
|
||
from prometheus.domain.scoring import should_accept
|
||
from prometheus.application.evaluator import PromptEvaluator
|
||
from prometheus.application.bootstrap import SyntheticBootstrap
|
||
class EvolutionLoop:
|
||
"""
|
||
Boucle d'évolution principale.
|
||
Design:
|
||
- Garde seulement le meilleur candidat (pas de population complète)
|
||
- Cela simplifie énormément vs GEPA (pas de Pareto, pas de merge)
|
||
- Si le MVP fonctionne, on ajoutera la population dans la v2
|
||
"""
|
||
def __init__(
|
||
self,
|
||
evaluator: PromptEvaluator,
|
||
proposer: ProposerPort,
|
||
bootstrap: SyntheticBootstrap,
|
||
max_iterations: int = 30,
|
||
minibatch_size: int = 5,
|
||
perfect_score: float = 1.0,
|
||
verbose: bool = False,
|
||
):
|
||
self._evaluator = evaluator
|
||
self._proposer = proposer
|
||
self._bootstrap = bootstrap
|
||
self._max_iterations = max_iterations
|
||
self._minibatch_size = minibatch_size
|
||
self._perfect_score = perfect_score
|
||
self._verbose = verbose
|
||
def run(
|
||
self,
|
||
seed_prompt: Prompt,
|
||
synthetic_pool: list[SyntheticExample],
|
||
task_description: str,
|
||
) -> OptimizationState:
|
||
"""
|
||
Exécute la boucle d'évolution complète.
|
||
Pseudo-code:
|
||
```
|
||
state.best = Candidate(seed_prompt)
|
||
state.best.score = evaluate(seed_prompt)
|
||
for i in range(max_iterations):
|
||
batch = sample_minibatch(pool)
|
||
old_eval = evaluate(state.best.prompt, batch)
|
||
if all perfect: continue
|
||
new_prompt = propose(state.best.prompt, old_eval.trajectories)
|
||
new_eval = evaluate(new_prompt, batch)
|
||
if new_eval > old_eval:
|
||
state.best = Candidate(new_prompt, score=new_eval)
|
||
return state
|
||
```
|
||
"""
|
||
state = OptimizationState()
|
||
# ── Évaluer le seed ──
|
||
initial_batch = self._bootstrap.sample_minibatch(
|
||
synthetic_pool, self._minibatch_size
|
||
)
|
||
initial_eval = self._evaluator.evaluate(
|
||
seed_prompt, initial_batch, task_description
|
||
)
|
||
state.total_llm_calls += self._minibatch_size + 1 # executions + 1 judge
|
||
best_candidate = Candidate(
|
||
prompt=seed_prompt,
|
||
best_score=initial_eval.total_score,
|
||
generation=0,
|
||
)
|
||
state.best_candidate = best_candidate
|
||
state.candidates.append(best_candidate)
|
||
self._log(f"Initial score: {initial_eval.total_score:.2f}")
|
||
# ── Boucle principale ──
|
||
for i in range(1, self._max_iterations + 1):
|
||
state.iteration = i
|
||
# 1. Sampler un minibatch frais
|
||
batch = self._bootstrap.sample_minibatch(
|
||
synthetic_pool, self._minibatch_size
|
||
)
|
||
# 2. Évaluer le candidat actuel
|
||
current_eval = self._evaluator.evaluate(
|
||
best_candidate.prompt, batch, task_description
|
||
)
|
||
state.total_llm_calls += self._minibatch_size + 1
|
||
# 3. Skip si parfait
|
||
if all(s >= self._perfect_score for s in current_eval.scores):
|
||
self._log(f"Iter {i}: All scores perfect, skipping.")
|
||
state.history.append({
|
||
"iteration": i,
|
||
"event": "skip_perfect",
|
||
"current_score": current_eval.total_score,
|
||
})
|
||
continue
|
||
# 4. Proposer un nouveau prompt (reflective mutation)
|
||
new_prompt = self._proposer.propose(
|
||
best_candidate.prompt,
|
||
current_eval.trajectories,
|
||
task_description,
|
||
)
|
||
state.total_llm_calls += 1 # 1 appel de proposition
|
||
# 5. Évaluer le nouveau prompt sur le même minibatch
|
||
new_eval = self._evaluator.evaluate(
|
||
new_prompt, batch, task_description
|
||
)
|
||
state.total_llm_calls += self._minibatch_size + 1
|
||
# 6. Accepter ou rejeter
|
||
if should_accept(current_eval, new_eval):
|
||
best_candidate = Candidate(
|
||
prompt=new_prompt,
|
||
best_score=new_eval.total_score,
|
||
generation=i,
|
||
parent_id=id(best_candidate),
|
||
)
|
||
state.best_candidate = best_candidate
|
||
state.candidates.append(best_candidate)
|
||
self._log(
|
||
f"Iter {i}: ACCEPTED "
|
||
f"({current_eval.total_score:.2f} → {new_eval.total_score:.2f})"
|
||
)
|
||
state.history.append({
|
||
"iteration": i,
|
||
"event": "accepted",
|
||
"old_score": current_eval.total_score,
|
||
"new_score": new_eval.total_score,
|
||
"improvement": new_eval.total_score - current_eval.total_score,
|
||
})
|
||
else:
|
||
self._log(
|
||
f"Iter {i}: REJECTED "
|
||
f"({new_eval.total_score:.2f} ≤ {current_eval.total_score:.2f})"
|
||
)
|
||
state.history.append({
|
||
"iteration": i,
|
||
"event": "rejected",
|
||
"old_score": current_eval.total_score,
|
||
"new_score": new_eval.total_score,
|
||
})
|
||
return state
|
||
def _log(self, msg: str) -> None:
|
||
if self._verbose:
|
||
print(f"[PROMETHEUS] {msg}")
|
||
```
|
||
### 4.5 `use_cases.py`
|
||
```python
|
||
"""
|
||
Use Case principal — orchestration de haut niveau.
|
||
Objectif: Point d'entrée métier. Coordonne bootstrap → evolution → résultat.
|
||
Ne contient aucune logique technique, seulement de l'orchestration.
|
||
"""
|
||
from __future__ import annotations
|
||
from prometheus.domain.entities import Prompt
|
||
from prometheus.application.dto import OptimizationConfig, OptimizationResult
|
||
from prometheus.application.bootstrap import SyntheticBootstrap
|
||
from prometheus.application.evaluator import PromptEvaluator
|
||
from prometheus.application.evolution import EvolutionLoop
|
||
class OptimizePromptUseCase:
|
||
"""
|
||
Use case unique du MVP.
|
||
Injecte les dépendances via le constructeur (dependency injection).
|
||
"""
|
||
def __init__(
|
||
self,
|
||
evaluator: PromptEvaluator,
|
||
proposer: "ProposerPort", # noqa: F821
|
||
bootstrap: SyntheticBootstrap,
|
||
):
|
||
self._evaluator = evaluator
|
||
self._proposer = proposer
|
||
self._bootstrap = bootstrap
|
||
def execute(self, config: OptimizationConfig) -> OptimizationResult:
|
||
"""
|
||
Pipeline complet:
|
||
1. Bootstrap → générer les inputs synthétiques
|
||
2. Evolution → boucle d'optimisation
|
||
3. Retourner le résultat
|
||
"""
|
||
# ── Phase 0: Bootstrap ──
|
||
synthetic_pool = self._bootstrap.run(
|
||
task_description=config.task_description,
|
||
n_examples=config.n_synthetic_inputs,
|
||
)
|
||
# ── Phase 1: Evolution ──
|
||
loop = EvolutionLoop(
|
||
evaluator=self._evaluator,
|
||
proposer=self._proposer,
|
||
bootstrap=self._bootstrap,
|
||
max_iterations=config.max_iterations,
|
||
minibatch_size=config.minibatch_size,
|
||
perfect_score=config.perfect_score,
|
||
verbose=config.verbose,
|
||
)
|
||
seed_prompt = Prompt(text=config.seed_prompt)
|
||
state = loop.run(seed_prompt, synthetic_pool, config.task_description)
|
||
# ── Phase 2: Résultat ──
|
||
initial_score = state.history[0].get("current_score", 0.0) if state.history else 0.0
|
||
final_score = state.best_candidate.best_score if state.best_candidate else 0.0
|
||
return OptimizationResult(
|
||
optimized_prompt=state.best_candidate.prompt.text if state.best_candidate else config.seed_prompt,
|
||
initial_prompt=config.seed_prompt,
|
||
iterations_used=state.iteration,
|
||
total_llm_calls=state.total_llm_calls + 1, # +1 pour le bootstrap
|
||
initial_score=initial_score,
|
||
final_score=final_score,
|
||
improvement=final_score - initial_score,
|
||
history=state.history,
|
||
)
|
||
```
|
||
---
|
||
## 5. Couche Infrastructure — DSPy Adapters
|
||
### Objectif
|
||
Implémenter les ports du domaine avec DSPy.
|
||
Chaque adapter encapsule un `dspy.Signature` + un `dspy.Module`.
|
||
### 5.1 `dspy_signatures.py`
|
||
```python
|
||
"""
|
||
DSPy Signatures — contrats LLM déclaratifs.
|
||
Objectif: Définir CE que fait chaque appel LLM, pas COMMENT.
|
||
DSPy Signature = input_fields → output_fields + instruction.
|
||
DSPy se charge du prompting, du parsing, et de la structuration.
|
||
"""
|
||
import dspy
|
||
class GenerateSyntheticInputs(dspy.Signature):
|
||
"""Generate diverse, realistic input examples for a given task."""
|
||
task_description: str = dspy.InputField(
|
||
desc="Description of the task the prompt should accomplish."
|
||
)
|
||
n_examples: int = dspy.InputField(
|
||
desc="Number of examples to generate."
|
||
)
|
||
examples: str = dspy.OutputField(
|
||
desc=(
|
||
"A JSON array of strings, each being a realistic input "
|
||
"for the task. Cover: normal cases, edge cases, long inputs, "
|
||
"short inputs, ambiguous cases, and tricky scenarios."
|
||
),
|
||
)
|
||
class JudgeOutput(dspy.Signature):
|
||
"""
|
||
Evaluate the quality of an LLM output for a given task and input.
|
||
Score: 0.0 (completely wrong) to 1.0 (perfect).
|
||
Feedback: specific, actionable criticism.
|
||
"""
|
||
task_description: str = dspy.InputField(
|
||
desc="What the assistant is supposed to do."
|
||
)
|
||
input_text: str = dspy.InputField(
|
||
desc="The input provided to the assistant."
|
||
)
|
||
output_text: str = dspy.InputField(
|
||
desc="The assistant's response to evaluate."
|
||
)
|
||
score: float = dspy.OutputField(
|
||
desc="Quality score from 0.0 (wrong) to 1.0 (perfect)."
|
||
)
|
||
feedback: str = dspy.OutputField(
|
||
desc=(
|
||
"Specific, actionable feedback explaining what's wrong "
|
||
"with the output and how to improve it. Be critical."
|
||
),
|
||
)
|
||
class ProposeInstruction(dspy.Signature):
|
||
"""
|
||
Given a current prompt and examples of where it fails with feedback,
|
||
propose an improved version of the prompt.
|
||
The new prompt should address all the issues identified in the feedback.
|
||
"""
|
||
current_instruction: str = dspy.InputField(
|
||
desc="The current prompt/instruction to improve."
|
||
)
|
||
task_description: str = dspy.InputField(
|
||
desc="Description of the task."
|
||
)
|
||
failure_examples: str = dspy.InputField(
|
||
desc=(
|
||
"Examples of inputs, outputs, scores, and feedback "
|
||
"showing where the current instruction fails."
|
||
),
|
||
)
|
||
new_instruction: str = dspy.OutputField(
|
||
desc="An improved version of the instruction."
|
||
)
|
||
```
|
||
### 5.2 `dspy_modules.py`
|
||
```python
|
||
"""
|
||
DSPy Modules — composition de signatures.
|
||
Objectif: Orchestration déclarative des appels LLM via DSPy.
|
||
"""
|
||
import dspy
|
||
import json
|
||
class SyntheticInputGenerator(dspy.Module):
|
||
"""
|
||
Génère des inputs synthétiques en un seul appel batch.
|
||
Utilise ChainOfThought pour une meilleure diversité.
|
||
"""
|
||
def __init__(self):
|
||
super().__init__()
|
||
self.generate = dspy.ChainOfThought(GenerateSyntheticInputs)
|
||
def forward(self, task_description: str, n_examples: int):
|
||
result = self.generate(
|
||
task_description=task_description,
|
||
n_examples=n_examples,
|
||
)
|
||
# Parser le JSON array
|
||
try:
|
||
examples = json.loads(result.examples)
|
||
except json.JSONDecodeError:
|
||
# Fallback: extraire les strings du texte
|
||
examples = self._parse_fallback(result.examples)
|
||
return dspy.Prediction(examples=examples)
|
||
@staticmethod
|
||
def _parse_fallback(text: str) -> list[str]:
|
||
"""Extract strings from non-JSON output."""
|
||
# Tenter de trouver un JSON array dans le texte
|
||
import re
|
||
matches = re.findall(r'"([^"]+)"', text)
|
||
return matches if matches else [text]
|
||
class OutputJudge(dspy.Module):
|
||
"""
|
||
Juge un output unique. Sera appelé en batch par le JudgeAdapter.
|
||
"""
|
||
def __init__(self):
|
||
super().__init__()
|
||
self.judge = dspy.ChainOfThought(JudgeOutput)
|
||
def forward(self, task_description: str, input_text: str, output_text: str):
|
||
result = self.judge(
|
||
task_description=task_description,
|
||
input_text=input_text,
|
||
output_text=output_text,
|
||
)
|
||
# Parser le score (DSPy peut retourner un string)
|
||
try:
|
||
score = float(result.score)
|
||
except (ValueError, TypeError):
|
||
score = 0.5 # fallback neutre
|
||
score = max(0.0, min(1.0, score))
|
||
return dspy.Prediction(score=score, feedback=result.feedback)
|
||
class InstructionProposer(dspy.Module):
|
||
"""
|
||
Propose un nouveau prompt à partir des trajectoires d'échec.
|
||
C'est l'équivalent du InstructionProposalSignature de GEPA.
|
||
"""
|
||
def __init__(self):
|
||
super().__init__()
|
||
self.propose = dspy.ChainOfThought(ProposeInstruction)
|
||
def forward(
|
||
self,
|
||
current_instruction: str,
|
||
task_description: str,
|
||
failure_examples: str,
|
||
):
|
||
result = self.propose(
|
||
current_instruction=current_instruction,
|
||
task_description=task_description,
|
||
failure_examples=failure_examples,
|
||
)
|
||
return dspy.Prediction(new_instruction=result.new_instruction)
|
||
```
|
||
### 5.3 `llm_adapter.py`
|
||
```python
|
||
"""
|
||
Adapter: Exécution d'un prompt sur un input.
|
||
Objectif: Implémenter le port LLMPort via DSPy.
|
||
"""
|
||
import dspy
|
||
from prometheus.domain.ports import LLMPort
|
||
from prometheus.domain.entities import Prompt
|
||
class DSPyLLMAdapter(LLMPort):
|
||
"""
|
||
Exécute un prompt en utilisant dspy.Predict avec une signature simple.
|
||
"""
|
||
class _ExecuteSignature(dspy.Signature):
|
||
"""Execute the instruction on the given input."""
|
||
instruction: str = dspy.InputField(desc="The instruction/prompt to follow.")
|
||
input_text: str = dspy.InputField(desc="The input to process.")
|
||
output: str = dspy.OutputField(desc="The response following the instruction.")
|
||
def __init__(self, model: str):
|
||
self._predictor = dspy.Predict(self._ExecuteSignature)
|
||
# Le modèle est configuré globalement via dspy.configure()
|
||
# Mais on peut aussi le configurer localement si besoin
|
||
def execute(self, prompt: Prompt, input_text: str) -> str:
|
||
result = self._predictor(
|
||
instruction=prompt.text,
|
||
input_text=input_text,
|
||
)
|
||
return result.output
|
||
```
|
||
### 5.4 `judge_adapter.py`
|
||
```python
|
||
"""
|
||
Adapter: LLM-as-Judge.
|
||
Objectif: Implémenter le port JudgePort via le DSPy OutputJudge module.
|
||
"""
|
||
from prometheus.domain.ports import JudgePort
|
||
from prometheus.infrastructure.dspy_modules import OutputJudge
|
||
class DSPyJudgeAdapter(JudgePort):
|
||
"""
|
||
Évalue un batch de (input, output) en appelant le Judge pour chaque paire.
|
||
Optimisation future: paralléliser les appels via dspy.Parallel.
|
||
Pour le MVP, on reste séquentiel.
|
||
"""
|
||
def __init__(self):
|
||
self._judge = OutputJudge()
|
||
def judge_batch(
|
||
self,
|
||
task_description: str,
|
||
pairs: list[tuple[str, str]],
|
||
) -> list[tuple[float, str]]:
|
||
results = []
|
||
for input_text, output_text in pairs:
|
||
pred = self._judge(
|
||
task_description=task_description,
|
||
input_text=input_text,
|
||
output_text=output_text,
|
||
)
|
||
results.append((pred.score, pred.feedback))
|
||
return results
|
||
```
|
||
### 5.5 `proposer_adapter.py`
|
||
```python
|
||
"""
|
||
Adapter: Reflective Mutation Proposer.
|
||
Objectif: Implémenter le port ProposerPort via le DSPy InstructionProposer.
|
||
Convertit les trajectoires en format lisible pour le LLM proposer.
|
||
"""
|
||
from prometheus.domain.ports import ProposerPort
|
||
from prometheus.domain.entities import Prompt, Trajectory
|
||
from prometheus.infrastructure.dspy_modules import InstructionProposer
|
||
class DSPyProposerAdapter(ProposerPort):
|
||
"""
|
||
Utilise les trajectoires d'évaluation pour construire
|
||
un "failure report" et proposer un nouveau prompt.
|
||
"""
|
||
def __init__(self):
|
||
self._proposer = InstructionProposer()
|
||
def propose(
|
||
self,
|
||
current_prompt: Prompt,
|
||
trajectories: list[Trajectory],
|
||
task_description: str,
|
||
) -> Prompt:
|
||
# Formater les trajectoires en exemples d'échec
|
||
failure_examples = self._format_failures(trajectories)
|
||
pred = self._proposer(
|
||
current_instruction=current_prompt.text,
|
||
task_description=task_description,
|
||
failure_examples=failure_examples,
|
||
)
|
||
return Prompt(text=pred.new_instruction)
|
||
@staticmethod
|
||
def _format_failures(trajectories: list[Trajectory]) -> str:
|
||
"""
|
||
Convertit les trajectoires en un rapport textuel structuré.
|
||
Format inspiré du InstructionProposalSignature de GEPA:
|
||
# Example 1
|
||
## Input
|
||
<input_text>
|
||
## Generated Output
|
||
<output_text>
|
||
## Score
|
||
<score>
|
||
## Feedback
|
||
<feedback>
|
||
"""
|
||
sections = []
|
||
for i, t in enumerate(trajectories, 1):
|
||
section = (
|
||
f"# Example {i}\n"
|
||
f"## Input\n{t.input_text}\n\n"
|
||
f"## Generated Output\n{t.output_text}\n\n"
|
||
f"## Score\n{t.score:.2f}\n\n"
|
||
f"## Feedback\n{t.feedback}\n"
|
||
)
|
||
sections.append(section)
|
||
return "\n---\n".join(sections)
|
||
```
|
||
### 5.6 `synth_adapter.py`
|
||
```python
|
||
"""
|
||
Adapter: Génération d'inputs synthétiques.
|
||
Objectif: Implémenter le port SyntheticGeneratorPort via DSPy.
|
||
"""
|
||
from prometheus.domain.ports import SyntheticGeneratorPort
|
||
from prometheus.domain.entities import SyntheticExample
|
||
from prometheus.infrastructure.dspy_modules import SyntheticInputGenerator
|
||
class DSPySyntheticAdapter(SyntheticGeneratorPort):
|
||
"""
|
||
Génère des inputs synthétiques en un seul appel batch via DSPy.
|
||
"""
|
||
def __init__(self):
|
||
self._generator = SyntheticInputGenerator()
|
||
def generate_inputs(
|
||
self,
|
||
task_description: str,
|
||
n_examples: int,
|
||
) -> list[SyntheticExample]:
|
||
pred = self._generator(
|
||
task_description=task_description,
|
||
n_examples=n_examples,
|
||
)
|
||
return [
|
||
SyntheticExample(
|
||
input_text=text,
|
||
id=i,
|
||
)
|
||
for i, text in enumerate(pred.examples[:n_examples])
|
||
]
|
||
```
|
||
### 5.7 `file_io.py`
|
||
```python
|
||
"""
|
||
File I/O — lecture/écriture des fichiers config et résultats.
|
||
Objectif: Implémenter le port PersistencePort avec YAML.
|
||
"""
|
||
import yaml
|
||
from prometheus.domain.ports import PersistencePort
|
||
class YamlPersistence(PersistencePort):
|
||
"""Lit et écrit des fichiers YAML."""
|
||
def read_config(self, path: str) -> dict:
|
||
with open(path, "r", encoding="utf-8") as f:
|
||
return yaml.safe_load(f)
|
||
def write_result(self, path: str, data: dict) -> None:
|
||
with open(path, "w", encoding="utf-8") as f:
|
||
yaml.dump(data, f, default_flow_style=False, allow_unicode=True)
|
||
```
|
||
---
|
||
## 6. Couche Présentation — CLI
|
||
### Objectif
|
||
Fournir une interface CLI simple via Typer.
|
||
Point d'entrée unique: `prometheus optimize -i config.yaml -o result.yaml`
|
||
### 6.1 `config.py`
|
||
```python
|
||
"""
|
||
Configuration globale — pydantic-settings.
|
||
Objectif: Charger la config depuis fichier + env vars + defaults.
|
||
"""
|
||
from __future__ import annotations
|
||
from dataclasses import dataclass
|
||
@dataclass
|
||
class AppSettings:
|
||
"""Settings non-sensibles, hardcoded pour le MVP."""
|
||
app_name: str = "prometheus"
|
||
version: str = "0.1.0"
|
||
```
|
||
### 6.2 `cli/app.py`
|
||
```python
|
||
"""
|
||
CLI — point d'entrée utilisateur.
|
||
Objectif: Interface Typer avec options -i (input) et -o (output).
|
||
"""
|
||
import typer
|
||
from rich.console import Console
|
||
from rich.panel import Panel
|
||
from rich.table import Table
|
||
import dspy
|
||
from prometheus.application.dto import OptimizationConfig, OptimizationResult
|
||
from prometheus.application.use_cases import OptimizePromptUseCase
|
||
from prometheus.application.bootstrap import SyntheticBootstrap
|
||
from prometheus.application.evaluator import PromptEvaluator
|
||
from prometheus.application.evolution import EvolutionLoop
|
||
from prometheus.infrastructure.file_io import YamlPersistence
|
||
from prometheus.infrastructure.llm_adapter import DSPyLLMAdapter
|
||
from prometheus.infrastructure.judge_adapter import DSPyJudgeAdapter
|
||
from prometheus.infrastructure.proposer_adapter import DSPyProposerAdapter
|
||
from prometheus.infrastructure.synth_adapter import DSPySyntheticAdapter
|
||
app = typer.Typer(
|
||
name="prometheus",
|
||
help="🔥 PROMETHEUS — Prompt evolution without reference data.",
|
||
no_args_is_help=True,
|
||
)
|
||
console = Console()
|
||
@app.command()
|
||
def optimize(
|
||
input: str = typer.Option(
|
||
..., "-i", "--input",
|
||
help="Path to input YAML config file.",
|
||
exists=True, readable=True,
|
||
),
|
||
output: str = typer.Option(
|
||
"output.yaml", "-o", "--output",
|
||
help="Path to output YAML result file.",
|
||
),
|
||
verbose: bool = typer.Option(
|
||
False, "-v", "--verbose",
|
||
help="Print detailed progress.",
|
||
),
|
||
) -> None:
|
||
"""
|
||
Optimize a prompt without any reference data.
|
||
Usage:
|
||
prometheus optimize -i config.yaml -o result.yaml
|
||
"""
|
||
console.print(Panel.fit(
|
||
"🔥 [bold red]PROMETHEUS[/bold red] — Prompt Evolution Engine",
|
||
subtitle="No reference data required",
|
||
))
|
||
# ── 1. Charger la config ──
|
||
persistence = YamlPersistence()
|
||
raw_config = persistence.read_config(input)
|
||
config = OptimizationConfig(
|
||
seed_prompt=raw_config["seed_prompt"],
|
||
task_description=raw_config["task_description"],
|
||
task_model=raw_config.get("task_model", "openai/gpt-4o-mini"),
|
||
judge_model=raw_config.get("judge_model", "openai/gpt-4o"),
|
||
proposer_model=raw_config.get("proposer_model", "openai/gpt-4o"),
|
||
synth_model=raw_config.get("synth_model", "openai/gpt-4o"),
|
||
max_iterations=raw_config.get("max_iterations", 30),
|
||
n_synthetic_inputs=raw_config.get("n_synthetic_inputs", 20),
|
||
minibatch_size=raw_config.get("minibatch_size", 5),
|
||
seed=raw_config.get("seed", 42),
|
||
output_path=output,
|
||
verbose=verbose,
|
||
)
|
||
console.print(f"[dim]Task: {config.task_description[:80]}...[/dim]")
|
||
console.print(f"[dim]Seed prompt: {config.seed_prompt[:80]}...[/dim]")
|
||
# ── 2. Configurer DSPy ──
|
||
# Modèle principal pour la plupart des opérations
|
||
task_lm = dspy.LM(config.task_model)
|
||
judge_lm = dspy.LM(config.judge_model)
|
||
proposer_lm = dspy.LM(config.proposer_model)
|
||
synth_lm = dspy.LM(config.synth_model)
|
||
# ── 3. Construire les adaptateurs (Dependency Injection) ──
|
||
dspy.configure(lm=task_lm) # default LM
|
||
synth_adapter = DSPySyntheticAdapter()
|
||
# Configurer le modèle de synthèse spécifiquement
|
||
# (Dans le MVP, on utilise le LM par défaut)
|
||
llm_adapter = DSPyLLMAdapter(model=config.task_model)
|
||
judge_adapter = DSPyJudgeAdapter()
|
||
proposer_adapter = DSPyProposerAdapter()
|
||
bootstrap = SyntheticBootstrap(generator=synth_adapter, seed=config.seed)
|
||
evaluator = PromptEvaluator(executor=llm_adapter, judge=judge_adapter)
|
||
use_case = OptimizePromptUseCase(
|
||
evaluator=evaluator,
|
||
proposer=proposer_adapter,
|
||
bootstrap=bootstrap,
|
||
)
|
||
# ── 4. Exécuter ──
|
||
with console.status("[bold green]Evolving prompt..."):
|
||
result = use_case.execute(config)
|
||
# ── 5. Afficher les résultats ──
|
||
_display_result(result)
|
||
# ── 6. Sauvegarder ──
|
||
_save_result(persistence, output, result)
|
||
console.print(f"\n[green]✅ Results saved to {output}[/green]")
|
||
def _display_result(result: OptimizationResult) -> None:
|
||
"""Affiche un résumé Rich dans le terminal."""
|
||
console.print()
|
||
console.print(Panel(
|
||
f"[bold green]Optimized Prompt[/bold green]\n\n{result.optimized_prompt}",
|
||
title="🔥 Result",
|
||
))
|
||
table = Table(title="Metrics")
|
||
table.add_column("Metric", style="cyan")
|
||
table.add_column("Value", style="bold")
|
||
table.add_row("Initial Score", f"{result.initial_score:.2f}")
|
||
table.add_row("Final Score", f"{result.final_score:.2f}")
|
||
table.add_row("Improvement", f"{result.improvement:+.2f}")
|
||
table.add_row("Iterations", str(result.iterations_used))
|
||
table.add_row("LLM Calls", str(result.total_llm_calls))
|
||
console.print(table)
|
||
def _save_result(
|
||
persistence: YamlPersistence,
|
||
path: str,
|
||
result: OptimizationResult,
|
||
) -> None:
|
||
"""Sauvegarde le résultat en YAML."""
|
||
from dataclasses import asdict
|
||
persistence.write_result(path, asdict(result))
|
||
if __name__ == "__main__":
|
||
app()
|
||
```
|
||
---
|
||
## 7. Algorithme Central
|
||
### Diagramme de Flux Détaillé
|
||
```mermaid
|
||
flowchart TB
|
||
START(["prometheus optimize<br/>-i config.yaml<br/>-o result.yaml"]) --> LOAD
|
||
LOAD["Load config.yaml"] --> INIT_DSPY["Configure DSPy LMs"]
|
||
INIT_DSPY --> BOOTSTRAP
|
||
subgraph BOOTSTRAP["Phase 0: Bootstrap"]
|
||
direction TB
|
||
B1["DSPySyntheticAdapter<br/>.generate_inputs()"] --> B2["SyntheticInputGenerator<br/>dspy.ChainOfThought<br/>(GenerateSyntheticInputs)"]
|
||
B2 --> B3["Pool d'inputs synthétiques<br/>[input₁, input₂, ..., input₂₀]"]
|
||
end
|
||
B3 --> LOOP_START
|
||
subgraph LOOP["Phase 1: Evolution Loop (×30)"]
|
||
direction TB
|
||
LOOP_START --> SELECT["Garder le meilleur candidat"]
|
||
SELECT --> SAMPLE["Bootstrap.sample_minibatch()<br/>5 inputs aléatoires"]
|
||
SAMPLE --> EXEC
|
||
subgraph EXEC["Evaluate Current"]
|
||
direction TB
|
||
E1["DSPyLLMAdapter.execute()<br/>→ 5 outputs"] --> E2["DSPyJudgeAdapter.judge_batch()<br/>→ 5 × (score, feedback)"]
|
||
E2 --> E3["Construire Trajectories<br/>(input, output, score, feedback)"]
|
||
end
|
||
E3 --> CHECK_PERFECT{"All scores ≥ 1.0 ?"}
|
||
CHECK_PERFECT -->|Yes| NEXT_ITER["Skip → next iteration"]
|
||
CHECK_PERFECT -->|No| PROPOSE
|
||
subgraph PROPOSE["Reflective Mutation"]
|
||
direction TB
|
||
P1["DSPyProposerAdapter.propose()"] --> P2["Formater failure report<br/>à partir des Trajectories"]
|
||
P2 --> P3["InstructionProposer<br/>dspy.ChainOfThought<br/>(ProposeInstruction)"]
|
||
P3 --> P4["new_prompt"]
|
||
end
|
||
PROPOSE --> EVAL_NEW
|
||
subgraph EVAL_NEW["Evaluate New"]
|
||
direction TB
|
||
EN1["DSPyLLMAdapter.execute()<br/>→ 5 outputs"] --> EN2["DSPyJudgeAdapter.judge_batch()<br/>→ 5 × (score, feedback)"]
|
||
end
|
||
EVAL_NEW --> ACCEPT{"new_score > old_score ?"}
|
||
ACCEPT -->|Yes| UPDATE["best_candidate = nouveau"]
|
||
ACCEPT -->|No| NEXT_ITER
|
||
UPDATE --> NEXT_ITER
|
||
end
|
||
NEXT_ITER --> MORE{"iterations < max ?"}
|
||
MORE -->|Yes| SELECT
|
||
MORE -->|No| SAVE["Sauvegarder output.yaml"]
|
||
SAVE --> DONE(["✅ Done"])
|
||
style BOOTSTRAP fill:#0f3460,stroke:#00d2ff,color:#fff
|
||
style LOOP fill:#1a1a2e,stroke:#e94560,color:#fff
|
||
style EXEC fill:#16213e,stroke:#00d2ff,color:#fff
|
||
style PROPOSE fill:#16213e,stroke:#e94560,color:#fff
|
||
style EVAL_NEW fill:#16213e,stroke:#00d2ff,color:#fff
|
||
```
|
||
### Budget LLM Détaillé par Itération
|
||
```
|
||
Itération type (minibatch_size=5):
|
||
┌──────────────────────────────────────┬──────────┐
|
||
│ Opération │ Appels │
|
||
├──────────────────────────────────────┼──────────┤
|
||
│ Execute current (task_lm) │ 5 │
|
||
│ Judge current (judge_lm) │ 5 │
|
||
│ Propose new (proposer_lm) │ 1 │
|
||
│ Execute new (task_lm) │ 5 │
|
||
│ Judge new (judge_lm) │ 5 │
|
||
├──────────────────────────────────────┼──────────┤
|
||
│ TOTAL par itération │ 21 │
|
||
├──────────────────────────────────────┼──────────┤
|
||
│ Bootstrap │ 1 │
|
||
│ 30 itérations × 21 │ 630 │
|
||
├──────────────────────────────────────┼──────────┤
|
||
│ TOTAL MVP │ ~631 │
|
||
└──────────────────────────────────────┴──────────┘
|
||
```
|
||
---
|
||
## 8. Format des Fichiers I/O
|
||
### 8.1 Input: `config.yaml`
|
||
```yaml
|
||
# PROMETHEUS Configuration File
|
||
# ==================================
|
||
# Le prompt initial à optimiser
|
||
seed_prompt: |
|
||
Tu es un assistant expert en analyse de contrats.
|
||
Analyse le texte fourni et identifie les clauses potentiellement abusives.
|
||
Sois précis et cite les passages concernés.
|
||
# Description de la tâche (utilisé pour générer les inputs synthétiques)
|
||
task_description: |
|
||
Analyse juridique de contrats pour identifier les clauses abusives.
|
||
L'assistant doit examiner un texte de contrat et signaler
|
||
toute clause qui pourrait être considérée comme abusive selon
|
||
le droit de la consommation français.
|
||
# Modèles LLM (format DSPy/litellm)
|
||
task_model: "openai/gpt-4o-mini"
|
||
judge_model: "openai/gpt-4o"
|
||
proposer_model: "openai/gpt-4o"
|
||
synth_model: "openai/gpt-4o"
|
||
# Paramètres d'évolution
|
||
max_iterations: 30
|
||
n_synthetic_inputs: 20
|
||
minibatch_size: 5
|
||
seed: 42
|
||
```
|
||
### 8.2 Output: `result.yaml`
|
||
```yaml
|
||
# PROMETHEUS Optimization Result
|
||
# ================================
|
||
optimized_prompt: |
|
||
Tu es un analyste juridique spécialisé en droit de la consommation français.
|
||
Pour chaque contrat analysé, applique cette méthodologie:
|
||
1. Identifie toutes les clauses restrictives pour le consommateur
|
||
2. Compare chaque clause aux critères d'abusivité de l'Article L.212-1
|
||
3. Signale les clauses abusives avec: le texte exact, le motif d'abusivité,
|
||
et le risque juridique associé
|
||
Sois exhaustif et cite systématiquement les passages concernés.
|
||
initial_prompt: |
|
||
Tu es un assistant expert en analyse de contrats.
|
||
Analyse le texte fourni et identifie les clauses potentiellement abusives.
|
||
Sois précis et cite les passages concernés.
|
||
initial_score: 6.8
|
||
final_score: 8.9
|
||
improvement: 2.1
|
||
iterations_used: 30
|
||
total_llm_calls: 631
|
||
history:
|
||
- iteration: 1
|
||
event: "accepted"
|
||
old_score: 1.2
|
||
new_score: 1.8
|
||
improvement: 0.6
|
||
- iteration: 2
|
||
event: "rejected"
|
||
old_score: 1.8
|
||
new_score: 1.5
|
||
# ... etc
|
||
```
|
||
---
|
||
## 9. Configuration & Environnement
|
||
### 9.1 Variables d'Environnement
|
||
```bash
|
||
# Requis (si utilisation d'OpenAI)
|
||
export OPENAI_API_KEY="sk-..."
|
||
# Optionnel (si utilisation d'autres providers)
|
||
export ANTHROPIC_API_KEY="..."
|
||
export TOGETHER_API_KEY="..."
|
||
# Optionnel
|
||
export PROMETHEUS_LOG_LEVEL="INFO" # DEBUG pour les traces détaillées
|
||
```
|
||
### 9.2 Installation et Exécution
|
||
```bash
|
||
# Installation
|
||
git clone <repo>
|
||
cd prometheus
|
||
uv sync
|
||
# Exécution
|
||
uv run prometheus optimize -i config.yaml -o result.yaml -v
|
||
# Avec options
|
||
uv run prometheus optimize \
|
||
-i examples/legal_contract.yaml \
|
||
-o results/legal_optimized.yaml \
|
||
--verbose
|
||
```
|
||
---
|
||
## 10. Stratégie de Tests
|
||
### 10.1 Pyramide de Tests
|
||
```
|
||
┌─────────────┐
|
||
│ E2E Test │ test_full_pipeline.py
|
||
│ (1-2 tests) │ → Mock LLM, vérifie le flux complet
|
||
├─────────────┤
|
||
│ Integration │ test_dspy_adapters.py
|
||
│ (3-5 tests) │ → Vraies signatures DSPy, mock LM
|
||
├─────────────┤
|
||
│ Unit │ test_entities.py
|
||
│ (10+ tests) │ test_scoring.py
|
||
│ │ test_evolution.py (avec mocks)
|
||
└─────────────┘
|
||
```
|
||
### 10.2 `tests/conftest.py`
|
||
```python
|
||
"""Shared test fixtures."""
|
||
import pytest
|
||
from unittest.mock import MagicMock
|
||
from prometheus.domain.entities import (
|
||
Prompt, SyntheticExample, Trajectory, EvalResult, Candidate
|
||
)
|
||
@pytest.fixture
|
||
def seed_prompt():
|
||
return Prompt(text="You are a helpful assistant. Answer the question.")
|
||
@pytest.fixture
|
||
def task_description():
|
||
return "Answer factual questions accurately and concisely."
|
||
@pytest.fixture
|
||
def synthetic_pool():
|
||
return [
|
||
SyntheticExample(input_text=f"Test input {i}", id=i)
|
||
for i in range(20)
|
||
]
|
||
@pytest.fixture
|
||
def mock_eval_result():
|
||
return EvalResult(
|
||
scores=[0.3, 0.5, 0.4, 0.6, 0.2],
|
||
feedbacks=[
|
||
"Incomplete answer",
|
||
"Missing key detail",
|
||
"Wrong format",
|
||
"Partially correct",
|
||
"Completely off topic",
|
||
],
|
||
trajectories=[
|
||
Trajectory(
|
||
input_text=f"Input {i}",
|
||
output_text=f"Output {i}",
|
||
score=s,
|
||
feedback=f,
|
||
prompt_used="test prompt",
|
||
)
|
||
for i, (s, f) in enumerate(zip(
|
||
[0.3, 0.5, 0.4, 0.6, 0.2],
|
||
[
|
||
"Incomplete answer",
|
||
"Missing key detail",
|
||
"Wrong format",
|
||
"Partially correct",
|
||
"Completely off topic",
|
||
],
|
||
))
|
||
],
|
||
)
|
||
@pytest.fixture
|
||
def mock_llm_port():
|
||
"""Mock LLMPort that returns canned responses."""
|
||
port = MagicMock()
|
||
port.execute.return_value = "This is a mock response."
|
||
return port
|
||
@pytest.fixture
|
||
def mock_judge_port():
|
||
"""Mock JudgePort that returns moderate scores."""
|
||
port = MagicMock()
|
||
port.judge_batch.return_value = [
|
||
(0.5, "Moderate quality, needs improvement."),
|
||
] * 5
|
||
return port
|
||
@pytest.fixture
|
||
def mock_proposer_port():
|
||
"""Mock ProposerPort that returns a slightly modified prompt."""
|
||
port = MagicMock()
|
||
port.propose.return_value = Prompt(
|
||
text="You are a very helpful assistant. Answer the question precisely."
|
||
)
|
||
return port
|
||
```
|
||
### 10.3 `tests/unit/test_evolution.py`
|
||
```python
|
||
"""Unit tests for the evolution loop — with full mocking."""
|
||
import pytest
|
||
from unittest.mock import MagicMock, patch
|
||
from prometheus.domain.entities import Prompt, SyntheticExample, EvalResult, Trajectory
|
||
from prometheus.application.evolution import EvolutionLoop
|
||
from prometheus.application.evaluator import PromptEvaluator
|
||
from prometheus.application.bootstrap import SyntheticBootstrap
|
||
class TestEvolutionLoop:
|
||
"""Teste la logique d'acceptation/rejet de la boucle d'évolution."""
|
||
def test_accepts_improvement(self, seed_prompt, synthetic_pool, task_description,
|
||
mock_llm_port, mock_judge_port, mock_proposer_port):
|
||
"""
|
||
Scénario: le nouveau prompt améliore le score.
|
||
Attendu: le meilleur candidat est mis à jour.
|
||
"""
|
||
evaluator = PromptEvaluator(mock_llm_port, mock_judge_port)
|
||
bootstrap = MagicMock(spec=SyntheticBootstrap)
|
||
bootstrap.sample_minibatch.return_value = synthetic_pool[:5]
|
||
# Old eval = low scores, new eval = high scores
|
||
old_eval = EvalResult(
|
||
scores=[0.3, 0.4, 0.3, 0.5, 0.2],
|
||
feedbacks=["bad"] * 5,
|
||
trajectories=[
|
||
Trajectory(f"input{i}", f"output{i}", s, "bad", "prompt")
|
||
for i, s in enumerate([0.3, 0.4, 0.3, 0.5, 0.2])
|
||
],
|
||
)
|
||
new_eval = EvalResult(
|
||
scores=[0.8, 0.9, 0.7, 0.8, 0.9],
|
||
feedbacks=["good"] * 5,
|
||
trajectories=[],
|
||
)
|
||
# evaluator.evaluate called twice per iteration (old + new)
|
||
evaluator.evaluate = MagicMock(side_effect=[old_eval, new_eval])
|
||
loop = EvolutionLoop(
|
||
evaluator=evaluator,
|
||
proposer=mock_proposer_port,
|
||
bootstrap=bootstrap,
|
||
max_iterations=1,
|
||
minibatch_size=5,
|
||
)
|
||
# initial eval
|
||
initial_eval = MagicMock()
|
||
initial_eval.total_score = 1.7
|
||
with patch.object(loop, '_log'):
|
||
state = loop.run(seed_prompt, synthetic_pool, task_description)
|
||
assert state.best_candidate.best_score > 0
|
||
def test_rejects_regression(self, seed_prompt, synthetic_pool, task_description,
|
||
mock_llm_port, mock_judge_port, mock_proposer_port):
|
||
"""
|
||
Scénario: le nouveau prompt dégrade le score.
|
||
Attendu: le meilleur candidat reste inchangé.
|
||
"""
|
||
evaluator = PromptEvaluator(mock_llm_port, mock_judge_port)
|
||
bootstrap = MagicMock(spec=SyntheticBootstrap)
|
||
bootstrap.sample_minibatch.return_value = synthetic_pool[:5]
|
||
old_eval = EvalResult(
|
||
scores=[0.7, 0.8, 0.7, 0.8, 0.9],
|
||
feedbacks=["ok"] * 5,
|
||
trajectories=[
|
||
Trajectory(f"input{i}", f"output{i}", s, "ok", "prompt")
|
||
for i, s in enumerate([0.7, 0.8, 0.7, 0.8, 0.9])
|
||
],
|
||
)
|
||
new_eval = EvalResult(
|
||
scores=[0.2, 0.1, 0.3, 0.2, 0.1],
|
||
feedbacks=["bad"] * 5,
|
||
trajectories=[],
|
||
)
|
||
evaluator.evaluate = MagicMock(side_effect=[old_eval, new_eval])
|
||
loop = EvolutionLoop(
|
||
evaluator=evaluator,
|
||
proposer=mock_proposer_port,
|
||
bootstrap=bootstrap,
|
||
max_iterations=1,
|
||
minibatch_size=5,
|
||
)
|
||
with patch.object(loop, '_log'):
|
||
state = loop.run(seed_prompt, synthetic_pool, task_description)
|
||
# Le seed prompt devrait rester le meilleur
|
||
assert state.best_candidate.prompt.text == seed_prompt.text
|
||
def test_skips_perfect_scores(self, seed_prompt, synthetic_pool, task_description,
|
||
mock_llm_port, mock_judge_port, mock_proposer_port):
|
||
"""
|
||
Scénario: tous les scores sont parfaits.
|
||
Attendu: pas de proposition, passage à l'itération suivante.
|
||
"""
|
||
evaluator = PromptEvaluator(mock_llm_port, mock_judge_port)
|
||
bootstrap = MagicMock(spec=SyntheticBootstrap)
|
||
bootstrap.sample_minibatch.return_value = synthetic_pool[:5]
|
||
perfect_eval = EvalResult(
|
||
scores=[1.0, 1.0, 1.0, 1.0, 1.0],
|
||
feedbacks=["perfect"] * 5,
|
||
trajectories=[
|
||
Trajectory(f"input{i}", f"output{i}", 1.0, "perfect", "prompt")
|
||
for i in range(5)
|
||
],
|
||
)
|
||
evaluator.evaluate = MagicMock(return_value=perfect_eval)
|
||
loop = EvolutionLoop(
|
||
evaluator=evaluator,
|
||
proposer=mock_proposer_port,
|
||
bootstrap=bootstrap,
|
||
max_iterations=3,
|
||
minibatch_size=5,
|
||
)
|
||
with patch.object(loop, '_log'):
|
||
state = loop.run(seed_prompt, synthetic_pool, task_description)
|
||
# Le proposer ne devrait jamais avoir été appelé
|
||
mock_proposer_port.propose.assert_not_called()
|
||
```
|
||
---
|
||
## 11. Diagrammes d'Architecture
|
||
### 11.1 Architecture Hexagonale — Vue Composants
|
||
```mermaid
|
||
flowchart TB
|
||
subgraph PRESENTATION["🎯 PRESENTATION (CLI)"]
|
||
CLI["typer CLI<br/>prometheus/cli/app.py"]
|
||
end
|
||
subgraph APPLICATION["⚙️ APPLICATION (Use Cases)"]
|
||
UC["OptimizePromptUseCase"]
|
||
BOOT["SyntheticBootstrap"]
|
||
EVAL["PromptEvaluator"]
|
||
EVO["EvolutionLoop"]
|
||
end
|
||
subgraph DOMAIN["💎 DOMAIN (Entities + Ports)"]
|
||
ENT["Prompt<br/>SyntheticExample<br/>Trajectory<br/>EvalResult<br/>Candidate<br/>OptimizationState"]
|
||
PORTS["LLMPort<br/>JudgePort<br/>ProposerPort<br/>SyntheticGeneratorPort<br/>PersistencePort"]
|
||
SCORE["scoring.py"]
|
||
end
|
||
subgraph INFRA["🔧 INFRASTRUCTURE (DSPy)"]
|
||
DSPY_SIG["dspy_signatures.py<br/>GenerateSyntheticInputs<br/>JudgeOutput<br/>ProposeInstruction"]
|
||
DSPY_MOD["dspy_modules.py<br/>SyntheticInputGenerator<br/>OutputJudge<br/>InstructionProposer"]
|
||
ADAPTERS["DSPyLLMAdapter<br/>DSPyJudgeAdapter<br/>DSPyProposerAdapter<br/>DSPySyntheticAdapter"]
|
||
FILE_IO["YamlPersistence"]
|
||
end
|
||
CLI -->|"OptimizationConfig"| UC
|
||
UC --> BOOT
|
||
UC --> EVO
|
||
EVO --> EVAL
|
||
EVO -->|"ProposerPort"| ADAPTERS
|
||
BOOT -->|"SyntheticGeneratorPort"| ADAPTERS
|
||
EVAL -->|"LLMPort"| ADAPTERS
|
||
EVAL -->|"JudgePort"| ADAPTERS
|
||
ADAPTERS --> DSPY_MOD
|
||
DSPY_MOD --> DSPY_SIG
|
||
CLI -->|"PersistencePort"| FILE_IO
|
||
UC -.->|"depends on"| ENT
|
||
UC -.->|"depends on"| PORTS
|
||
EVO -.->|"depends on"| ENT
|
||
EVO -.->|"depends on"| SCORE
|
||
EVAL -.->|"depends on"| ENT
|
||
ADAPTERS -.->|"implements"| PORTS
|
||
style PRESENTATION fill:#1a1a2e,stroke:#00d2ff,color:#fff
|
||
style APPLICATION fill:#0f3460,stroke:#00d2ff,color:#fff
|
||
style DOMAIN fill:#16213e,stroke:#e94560,color:#fff
|
||
style INFRA fill:#1a1a2e,stroke:#e94560,color:#fff
|
||
```
|
||
### 11.2 Dependency Rule
|
||
```mermaid
|
||
flowchart LR
|
||
CLI["CLI"] --> APP["Application"]
|
||
APP --> DOMAIN["Domain"]
|
||
INFRA["Infrastructure"] --> DOMAIN
|
||
CLI --> INFRA
|
||
style DOMAIN fill:#e94560,color:#fff
|
||
style APP fill:#0f3460,color:#fff
|
||
style INFRA fill:#1a1a2e,color:#fff
|
||
style CLI fill:#16213e,color:#fff
|
||
```
|
||
> **Règle**: Les flèches ne vont JAMAIS du Domain vers l'extérieur.
|
||
> Le Domain ne connaît ni DSPy, ni Typer, ni YAML.
|
||
### 11.3 Sequence Diagram — Run Complète
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant User
|
||
participant CLI as CLI (Typer)
|
||
participant UC as OptimizePromptUseCase
|
||
participant BOOT as SyntheticBootstrap
|
||
participant SYNTH as DSPySyntheticAdapter
|
||
participant LOOP as EvolutionLoop
|
||
participant EVAL as PromptEvaluator
|
||
participant LLM as DSPyLLMAdapter
|
||
participant JUDGE as DSPyJudgeAdapter
|
||
participant PROP as DSPyProposerAdapter
|
||
participant FS as YamlPersistence
|
||
User->>CLI: prometheus optimize -i in.yaml -o out.yaml
|
||
CLI->>FS: read_config("in.yaml")
|
||
FS-->>CLI: raw_config dict
|
||
CLI->>UC: execute(config)
|
||
Note over UC,SYNTH: Phase 0: Bootstrap
|
||
UC->>BOOT: run(task_desc, 20)
|
||
BOOT->>SYNTH: generate_inputs(task_desc, 20)
|
||
SYNTH->>SYNTH: dspy.ChainOfThought(GenerateSyntheticInputs)
|
||
SYNTH-->>BOOT: [20 SyntheticExample]
|
||
BOOT-->>UC: synthetic_pool
|
||
Note over UC,PROP: Phase 1: Evolution
|
||
loop 30 iterations
|
||
UC->>LOOP: run(seed_prompt, pool, task_desc)
|
||
LOOP->>BOOT: sample_minibatch(pool, 5)
|
||
BOOT-->>LOOP: [5 examples]
|
||
LOOP->>EVAL: evaluate(current_prompt, batch)
|
||
EVAL->>LLM: execute(prompt, input) ×5
|
||
LLM-->>EVAL: 5 outputs
|
||
EVAL->>JUDGE: judge_batch(task_desc, pairs)
|
||
JUDGE->>JUDGE: dspy.ChainOfThought(JudgeOutput) ×5
|
||
JUDGE-->>EVAL: [(score, feedback) ×5]
|
||
EVAL-->>LOOP: EvalResult
|
||
LOOP->>PROP: propose(prompt, trajectories)
|
||
PROP->>PROP: dspy.ChainOfThought(ProposeInstruction)
|
||
PROP-->>LOOP: new Prompt
|
||
LOOP->>EVAL: evaluate(new_prompt, batch)
|
||
EVAL->>LLM: execute(new_prompt, input) ×5
|
||
EVAL->>JUDGE: judge_batch(task_desc, pairs)
|
||
EVAL-->>LOOP: new EvalResult
|
||
alt new_score > old_score
|
||
LOOP->>LOOP: best = new_prompt
|
||
end
|
||
end
|
||
LOOP-->>UC: OptimizationState
|
||
UC-->>CLI: OptimizationResult
|
||
CLI->>FS: write_result("out.yaml", result)
|
||
CLI-->>User: ✅ Optimized prompt + metrics
|
||
```
|
||
### 11.4 Data Flow Diagram
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph INPUT
|
||
YAML["config.yaml"]
|
||
end
|
||
subgraph GENERATION
|
||
SYNTH["Synthetic Pool<br/>20 inputs"]
|
||
end
|
||
subgraph EVAL["Evaluation Pipeline"]
|
||
EXEC["Execute<br/>(task_lm)"]
|
||
JUDGE["Judge<br/>(judge_lm)"]
|
||
end
|
||
subgraph PROPOSAL
|
||
PROP["Propose<br/>(proposer_lm)"]
|
||
end
|
||
subgraph OUTPUT
|
||
RESULT["result.yaml"]
|
||
end
|
||
YAML --> SYNTH
|
||
SYNTH -->|"minibatch 5"| EXEC
|
||
EXEC -->|"outputs"| JUDGE
|
||
JUDGE -->|"scores + feedbacks"| PROP
|
||
PROP -->|"new_prompt"| EXEC
|
||
JUDGE -->|"scores"| RESULT
|
||
PROP --> RESULT
|
||
style INPUT fill:#1a1a2e,stroke:#00d2ff,color:#fff
|
||
style GENERATION fill:#0f3460,stroke:#00d2ff,color:#fff
|
||
style EVAL fill:#16213e,stroke:#e94560,color:#fff
|
||
style PROPOSAL fill:#1a1a2e,stroke:#e94560,color:#fff
|
||
style OUTPUT fill:#0f3460,stroke:#e94560,color:#fff
|
||
```
|
||
---
|
||
## Résumé des Sections
|
||
| Section | Objectif | Fichiers clés |
|
||
|---------|----------|--------------|
|
||
| **Domain** | Cœur métier pur, zéro dépendance | `entities.py`, `ports.py`, `scoring.py` |
|
||
| **Application** | Orchestration métier via les ports | `use_cases.py`, `bootstrap.py`, `evaluator.py`, `evolution.py` |
|
||
| **Infrastructure** | Implémentation DSPy des ports | `dspy_signatures.py`, `dspy_modules.py`, `*_adapter.py` |
|
||
| **CLI** | Interface utilisateur Typer | `cli/app.py` |
|
||
| **I/O** | Config YAML en entrée, résultat YAML en sortie | `file_io.py` |
|
||
| **Tests** | Pyramide unit → integration → e2e | `tests/` |
|
||
**Le flux**: `config.yaml` → CLI → UseCase → Bootstrap (synth inputs) → EvolutionLoop (evaluate × propose × accept) × N → `result.yaml` |