- Clean architecture (domain/application/infrastructure) - DSPy-based evolution engine with scoring - CLI via pyproject.toml entry point - Unit + integration tests (~300 tests) - Configs for glm-5.1 and glm-4.5-air models - Z.AI endpoint integration
61 KiB
PROMETHEUS MVP — Spécification Technique Détaillée
Version: 0.1.0
Stack: Python 3.12+ · uv · DSPy · Typer
Architecture: Clean Architecture (hexagonale)
Date: 2025
Table des Matières
- Vue d'Ensemble & Objectifs
- Structure du Projet
- Couche Domaine
- Couche Application
- Couche Infrastructure
- Couche Présentation (CLI)
- Algorithme Central — Pseudo-Code Détaillé
- Format des Fichiers I/O
- Configuration & Environnement
- Tests
- Diagrammes d'Architecture Complète
1. Vue d'Ensemble & Objectifs
1.1 Énoncé du Problème
Les frameworks d'optimisation de prompt (GEPA, TextGrad, Promptolution) nécessitent tous un dataset labellisé pour calculer un signal de qualité. PROMETHEUS élimine cette dépendance en synthétisant ses propres données de test et en utilisant un LLM-as-Judge comme fonction d'évaluation.
1.2 Objectifs du MVP
| # | Objectif | Critère d'acceptance |
|---|---|---|
| O1 | Optimiser un prompt sans aucune donnée labellisée | Seed prompt → prompt amélioré, 0 fichier de données requis |
| O2 | Interface CLI simple | prometheus optimize -i config.yaml -o result.yaml |
| O3 | Budget maîtrisé | < 500 appels LLM pour une run complète |
| O4 | Reproductible | Seed déterministe, résultats identiques si même seed + même modèle |
| O5 | Observable | Logging structuré, métriques par itération |
1.3 Flux Nominal
┌──────────────┐ ┌───────────────┐ ┌──────────────────┐ ┌────────────┐
│ Fichier │ │ │ │ │ │ Fichier │
│ config.yaml ├───► │ Bootstrap ├───► │ Evolution Loop ├───► │ output │
│ (seed prompt│ │ (synth inputs│ │ (judge + mutate │ │ (optimized│
│ + params) │ │ generation) │ │ + accept) │ │ prompt) │
└──────────────┘ └───────────────┘ └──────────────────┘ └────────────┘
2. Structure du Projet
prometheus/
├── pyproject.toml # uv project config
├── README.md
├── specs/
│ └── technical-spec.md # ce fichier
│
├── src/
│ └── prometheus/
│ ├── __init__.py
│ ├── cli/ # PRESENTATION LAYER
│ │ ├── __init__.py
│ │ └── app.py # Typer CLI app
│ │
│ ├── domain/ # DOMAIN LAYER (zero dependencies)
│ │ ├── __init__.py
│ │ ├── entities.py # Dataclasses: Prompt, Candidate, EvalResult, SyntheticExample
│ │ ├── ports.py # Abstract interfaces (Protocol classes)
│ │ └── scoring.py # Score combination logic, acceptance criteria
│ │
│ ├── application/ # APPLICATION LAYER (depends on domain only)
│ │ ├── __init__.py
│ │ ├── use_cases.py # OptimizePromptUseCase
│ │ ├── bootstrap.py # SyntheticInputBootstrap
│ │ ├── evolution.py # EvolutionLoop, ReflectiveMutation
│ │ ├── evaluator.py # DualEvaluator (judge + execution)
│ │ └── dto.py # Config & Result dataclasses
│ │
│ ├── infrastructure/ # INFRASTRUCTURE LAYER (depends on domain + application)
│ │ ├── __init__.py
│ │ ├── dspy_signatures.py # DSPy Signature definitions
│ │ ├── dspy_modules.py # DSPy Module implementations
│ │ ├── llm_adapter.py # LLMAdapter (implements domain port)
│ │ ├── judge_adapter.py # JudgeAdapter (implements domain port)
│ │ ├── proposer_adapter.py # ProposerAdapter (implements domain port)
│ │ ├── synth_adapter.py # SyntheticGeneratorAdapter (implements domain port)
│ │ └── file_io.py # FileReader, FileWriter
│ │
│ └── config.py # Settings (pydantic-settings)
│
├── tests/
│ ├── unit/
│ │ ├── test_entities.py
│ │ ├── test_scoring.py
│ │ ├── test_evolution.py
│ │ └── test_bootstrap.py
│ ├── integration/
│ │ ├── test_dspy_adapters.py
│ │ └── test_full_pipeline.py
│ └── conftest.py
│
└── examples/
├── basic_usage.py
└── sample_config.yaml
2.1 pyproject.toml
[project]
name = "prometheus"
version = "0.1.0"
description = "Prompt evolution without reference data"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"dspy>=2.6",
"typer>=0.15",
"pydantic>=2.10",
"pydantic-settings>=2.7",
"pyyaml>=6.0",
"rich>=13.9",
]
[project.optional-dependencies]
dev = [
"pytest>=8.3",
"pytest-cov>=6.0",
"ruff>=0.9",
"mypy>=1.14",
]
[project.scripts]
prometheus = "prometheus.cli.app:app"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.ruff]
line-length = 100
target-version = "py312"
[tool.mypy]
python_version = "3.12"
strict = true
3. Couche Domaine — Entities & Ports
Objectif
Définir le cœur métier sans aucune dépendance externe.
Aucune import de dspy, pydantic, ou quoi que ce soit hors stdlib.
3.1 entities.py
"""Domain entities — pure data, zero dependencies."""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any
@dataclass(frozen=True)
class Prompt:
"""
Représente un prompt candidat.
frozen=True → immutable, safe pour le Pareto tracking.
"""
text: str
metadata: dict[str, Any] = field(default_factory=dict)
def __len__(self) -> int:
return len(self.text)
@dataclass(frozen=True)
class SyntheticExample:
"""
Un exemple synthétique: un input généré à partir de la task description.
Pas d'output attendu — le juge évaluera la sortie directement.
"""
input_text: str
category: str = "default" # pour le sampling stratifié futur
id: int = 0
@dataclass
class Trajectory:
"""
Trace d'exécution d'un prompt sur un input.
Utilisé par la reflective mutation pour comprendre les échecs.
"""
input_text: str
output_text: str
score: float
feedback: str # feedback textuel du juge
prompt_used: str
@dataclass
class EvalResult:
"""Résultat d'une évaluation sur un minibatch."""
scores: list[float]
feedbacks: list[str]
trajectories: list[Trajectory]
@property
def total_score(self) -> float:
return sum(self.scores)
@property
def mean_score(self) -> float:
return sum(self.scores) / len(self.scores) if self.scores else 0.0
@dataclass
class Candidate:
"""
Un candidat dans le pool d'évolution.
Contient le prompt + ses scores cumulés.
"""
prompt: Prompt
best_score: float = 0.0
generation: int = 0 # à quelle itération il a été créé
parent_id: int | None = None
@dataclass
class OptimizationState:
"""État complet de l'optimisation — snapshot sérialisable."""
iteration: int = 0
best_candidate: Candidate | None = None
candidates: list[Candidate] = field(default_factory=list)
synthetic_pool: list[SyntheticExample] = field(default_factory=list)
history: list[dict[str, Any]] = field(default_factory=list)
total_llm_calls: int = 0
3.2 ports.py
"""
Domain ports — interfaces abstraites que l'infrastructure implémente.
Utilise Protocol (structural typing) pour le loose coupling.
"""
from __future__ import annotations
from abc import ABC, abstractmethod
from prometheus.domain.entities import (
Prompt, SyntheticExample, Trajectory, EvalResult
)
class LLMPort(ABC):
"""
Port d'exécution d'un prompt sur un input.
L'infrastructure fournira une implémentation via DSPy.
"""
@abstractmethod
def execute(self, prompt: Prompt, input_text: str) -> str:
"""Exécute le prompt sur l'input, retourne la réponse brute."""
...
class JudgePort(ABC):
"""
Port d'évaluation par LLM-as-Judge.
Prend des paires (input, output) + la task description.
Retourne un score + un feedback textuel par paire.
"""
@abstractmethod
def judge_batch(
self,
task_description: str,
pairs: list[tuple[str, str]],
) -> list[tuple[float, str]]:
"""
Évalue un batch de (input, output).
Retourne une liste de (score, feedback).
"""
...
class ProposerPort(ABC):
"""
Port de proposition d'un nouveau prompt.
Utilise les trajectoires d'évaluation pour proposer une amélioration.
"""
@abstractmethod
def propose(
self,
current_prompt: Prompt,
trajectories: list[Trajectory],
task_description: str,
) -> Prompt:
"""Propose un nouveau prompt basé sur les trajectoires d'échec."""
...
class SyntheticGeneratorPort(ABC):
"""
Port de génération d'inputs synthétiques.
"""
@abstractmethod
def generate_inputs(
self,
task_description: str,
n_examples: int,
) -> list[SyntheticExample]:
"""Génère N inputs synthétiques diversifiés."""
...
class PersistencePort(ABC):
"""Port de lecture/écriture des fichiers."""
@abstractmethod
def read_config(self, path: str) -> dict:
...
@abstractmethod
def write_result(self, path: str, data: dict) -> None:
...
3.3 scoring.py
"""Logique de scoring et critères d'acceptation — pur domaine."""
from prometheus.domain.entities import EvalResult
def should_accept(
old_result: EvalResult,
new_result: EvalResult,
min_improvement: float = 0.0,
) -> bool:
"""
Critère d'acceptation strict.
Le nouveau candidat doit strictement améliorer le score total.
"""
return new_result.total_score > old_result.total_score + min_improvement
def normalize_score(raw: float, min_val: float = 0.0, max_val: float = 1.0) -> float:
"""Clamp un score dans [min_val, max_val]."""
return max(min_val, min(max_val, raw))
4. Couche Application — Use Cases
Objectif
Orchestrer la logique métier en utilisant uniquement les ports du domaine. Ne dépend jamais de l'infrastructure concrète.
4.1 dto.py
"""Data Transfer Objects — configuration et résultats."""
from dataclasses import dataclass, field
@dataclass
class OptimizationConfig:
"""Configuration complète d'une run PROMETHEUS."""
# --- Prompt ---
seed_prompt: str
task_description: str
# --- Modèles ---
task_model: str = "openai/gpt-4o-mini"
judge_model: str = "openai/gpt-4o"
proposer_model: str = "openai/gpt-4o"
synth_model: str = "openai/gpt-4o"
# --- Paramètres d'évolution ---
max_iterations: int = 30
n_synthetic_inputs: int = 20
minibatch_size: int = 5
perfect_score: float = 1.0
# --- Reproductibilité ---
seed: int = 42
# --- Sortie ---
output_path: str = "output.yaml"
verbose: bool = False
@dataclass
class OptimizationResult:
"""Résultat d'une optimisation complète."""
optimized_prompt: str
initial_prompt: str
iterations_used: int
total_llm_calls: int
initial_score: float
final_score: float
improvement: float
history: list[dict] = field(default_factory=list)
4.2 bootstrap.py
"""
Bootstrap — génération d'inputs synthétiques.
Objectif: Créer un pool d'inputs de test à partir de la task description.
C'est le remplacement du dataset labellisé.
"""
from __future__ import annotations
import random
from prometheus.domain.ports import SyntheticGeneratorPort
from prometheus.domain.entities import SyntheticExample
class SyntheticBootstrap:
"""
Orchestre la génération d'inputs synthétiques.
Ne dépend que du port abstrait, pas de DSPy directement.
"""
def __init__(self, generator: SyntheticGeneratorPort, seed: int = 42):
self._generator = generator
self._rng = random.Random(seed)
def run(self, task_description: str, n_examples: int) -> list[SyntheticExample]:
"""
Génère le pool synthétique en un seul appel.
Pourquoi un seul appel ?
- Minimise les coûts LLM (1 appel au lieu de N)
- Le LLM peut assurer la diversité en une seule génération
- Le batch dans un seul prompt permet une meilleure couverture
"""
examples = self._generator.generate_inputs(task_description, n_examples)
# Shuffle pour la randomisation
self._rng.shuffle(examples)
return examples
def sample_minibatch(
self,
pool: list[SyntheticExample],
size: int,
) -> list[SyntheticExample]:
"""Échantillonne un minibatch du pool synthétique."""
size = min(size, len(pool))
return self._rng.sample(pool, size)
4.3 evaluator.py
"""
Évaluateur — exécution + jugement.
Objectif: Produire un signal de qualité sans ground truth.
Combine l'exécution du prompt candidat + l'évaluation par un LLM-as-Judge.
"""
from __future__ import annotations
from prometheus.domain.entities import (
Prompt, SyntheticExample, Trajectory, EvalResult
)
from prometheus.domain.ports import LLMPort, JudgePort
class PromptEvaluator:
"""
Évalue un prompt sur un minibatch d'inputs synthétiques.
Pipeline: execute → judge → construire les trajectoires.
Ce composant remplace la EvaluatorFn de GEPA.
Au lieu de comparer à un ground truth, il utilise un LLM-as-Judge.
"""
def __init__(self, executor: LLMPort, judge: JudgePort):
self._executor = executor
self._judge = judge
def evaluate(
self,
prompt: Prompt,
minibatch: list[SyntheticExample],
task_description: str,
) -> EvalResult:
"""
Évalue le prompt sur le minibatch.
Étapes:
1. Exécuter le prompt sur chaque input du minibatch
2. Juger chaque paire (input, output)
3. Construire les trajectoires avec le feedback
Retourne un EvalResult avec scores + feedbacks + trajectoires.
"""
# ── Étape 1: Exécution ──
outputs: list[str] = []
for example in minibatch:
raw_output = self._executor.execute(prompt, example.input_text)
outputs.append(raw_output)
# ── Étape 2: Jugement ──
pairs = [(ex.input_text, out) for ex, out in zip(minibatch, outputs)]
judge_results = self._judge.judge_batch(task_description, pairs)
# ── Étape 3: Construction des trajectoires ──
scores: list[float] = []
feedbacks: list[str] = []
trajectories: list[Trajectory] = []
for i, (example, output) in enumerate(zip(minibatch, outputs)):
score, feedback = judge_results[i]
scores.append(score)
feedbacks.append(feedback)
trajectories.append(Trajectory(
input_text=example.input_text,
output_text=output,
score=score,
feedback=feedback,
prompt_used=prompt.text,
))
return EvalResult(
scores=scores,
feedbacks=feedbacks,
trajectories=trajectories,
)
4.4 evolution.py
"""
Boucle d'évolution — cœur du moteur PROMETHEUS.
Objectif: Orchestrer le cycle select → evaluate → propose → accept.
C'est l'équivalent du GEPAEngine.run(), adapté pour fonctionner sans valset.
"""
from __future__ import annotations
from prometheus.domain.entities import (
Prompt, Candidate, EvalResult, OptimizationState, SyntheticExample
)
from prometheus.domain.ports import ProposerPort
from prometheus.domain.scoring import should_accept
from prometheus.application.evaluator import PromptEvaluator
from prometheus.application.bootstrap import SyntheticBootstrap
class EvolutionLoop:
"""
Boucle d'évolution principale.
Design:
- Garde seulement le meilleur candidat (pas de population complète)
- Cela simplifie énormément vs GEPA (pas de Pareto, pas de merge)
- Si le MVP fonctionne, on ajoutera la population dans la v2
"""
def __init__(
self,
evaluator: PromptEvaluator,
proposer: ProposerPort,
bootstrap: SyntheticBootstrap,
max_iterations: int = 30,
minibatch_size: int = 5,
perfect_score: float = 1.0,
verbose: bool = False,
):
self._evaluator = evaluator
self._proposer = proposer
self._bootstrap = bootstrap
self._max_iterations = max_iterations
self._minibatch_size = minibatch_size
self._perfect_score = perfect_score
self._verbose = verbose
def run(
self,
seed_prompt: Prompt,
synthetic_pool: list[SyntheticExample],
task_description: str,
) -> OptimizationState:
"""
Exécute la boucle d'évolution complète.
Pseudo-code:
```
state.best = Candidate(seed_prompt)
state.best.score = evaluate(seed_prompt)
for i in range(max_iterations):
batch = sample_minibatch(pool)
old_eval = evaluate(state.best.prompt, batch)
if all perfect: continue
new_prompt = propose(state.best.prompt, old_eval.trajectories)
new_eval = evaluate(new_prompt, batch)
if new_eval > old_eval:
state.best = Candidate(new_prompt, score=new_eval)
return state
```
"""
state = OptimizationState()
# ── Évaluer le seed ──
initial_batch = self._bootstrap.sample_minibatch(
synthetic_pool, self._minibatch_size
)
initial_eval = self._evaluator.evaluate(
seed_prompt, initial_batch, task_description
)
state.total_llm_calls += self._minibatch_size + 1 # executions + 1 judge
best_candidate = Candidate(
prompt=seed_prompt,
best_score=initial_eval.total_score,
generation=0,
)
state.best_candidate = best_candidate
state.candidates.append(best_candidate)
self._log(f"Initial score: {initial_eval.total_score:.2f}")
# ── Boucle principale ──
for i in range(1, self._max_iterations + 1):
state.iteration = i
# 1. Sampler un minibatch frais
batch = self._bootstrap.sample_minibatch(
synthetic_pool, self._minibatch_size
)
# 2. Évaluer le candidat actuel
current_eval = self._evaluator.evaluate(
best_candidate.prompt, batch, task_description
)
state.total_llm_calls += self._minibatch_size + 1
# 3. Skip si parfait
if all(s >= self._perfect_score for s in current_eval.scores):
self._log(f"Iter {i}: All scores perfect, skipping.")
state.history.append({
"iteration": i,
"event": "skip_perfect",
"current_score": current_eval.total_score,
})
continue
# 4. Proposer un nouveau prompt (reflective mutation)
new_prompt = self._proposer.propose(
best_candidate.prompt,
current_eval.trajectories,
task_description,
)
state.total_llm_calls += 1 # 1 appel de proposition
# 5. Évaluer le nouveau prompt sur le même minibatch
new_eval = self._evaluator.evaluate(
new_prompt, batch, task_description
)
state.total_llm_calls += self._minibatch_size + 1
# 6. Accepter ou rejeter
if should_accept(current_eval, new_eval):
best_candidate = Candidate(
prompt=new_prompt,
best_score=new_eval.total_score,
generation=i,
parent_id=id(best_candidate),
)
state.best_candidate = best_candidate
state.candidates.append(best_candidate)
self._log(
f"Iter {i}: ACCEPTED "
f"({current_eval.total_score:.2f} → {new_eval.total_score:.2f})"
)
state.history.append({
"iteration": i,
"event": "accepted",
"old_score": current_eval.total_score,
"new_score": new_eval.total_score,
"improvement": new_eval.total_score - current_eval.total_score,
})
else:
self._log(
f"Iter {i}: REJECTED "
f"({new_eval.total_score:.2f} ≤ {current_eval.total_score:.2f})"
)
state.history.append({
"iteration": i,
"event": "rejected",
"old_score": current_eval.total_score,
"new_score": new_eval.total_score,
})
return state
def _log(self, msg: str) -> None:
if self._verbose:
print(f"[PROMETHEUS] {msg}")
4.5 use_cases.py
"""
Use Case principal — orchestration de haut niveau.
Objectif: Point d'entrée métier. Coordonne bootstrap → evolution → résultat.
Ne contient aucune logique technique, seulement de l'orchestration.
"""
from __future__ import annotations
from prometheus.domain.entities import Prompt
from prometheus.application.dto import OptimizationConfig, OptimizationResult
from prometheus.application.bootstrap import SyntheticBootstrap
from prometheus.application.evaluator import PromptEvaluator
from prometheus.application.evolution import EvolutionLoop
class OptimizePromptUseCase:
"""
Use case unique du MVP.
Injecte les dépendances via le constructeur (dependency injection).
"""
def __init__(
self,
evaluator: PromptEvaluator,
proposer: "ProposerPort", # noqa: F821
bootstrap: SyntheticBootstrap,
):
self._evaluator = evaluator
self._proposer = proposer
self._bootstrap = bootstrap
def execute(self, config: OptimizationConfig) -> OptimizationResult:
"""
Pipeline complet:
1. Bootstrap → générer les inputs synthétiques
2. Evolution → boucle d'optimisation
3. Retourner le résultat
"""
# ── Phase 0: Bootstrap ──
synthetic_pool = self._bootstrap.run(
task_description=config.task_description,
n_examples=config.n_synthetic_inputs,
)
# ── Phase 1: Evolution ──
loop = EvolutionLoop(
evaluator=self._evaluator,
proposer=self._proposer,
bootstrap=self._bootstrap,
max_iterations=config.max_iterations,
minibatch_size=config.minibatch_size,
perfect_score=config.perfect_score,
verbose=config.verbose,
)
seed_prompt = Prompt(text=config.seed_prompt)
state = loop.run(seed_prompt, synthetic_pool, config.task_description)
# ── Phase 2: Résultat ──
initial_score = state.history[0].get("current_score", 0.0) if state.history else 0.0
final_score = state.best_candidate.best_score if state.best_candidate else 0.0
return OptimizationResult(
optimized_prompt=state.best_candidate.prompt.text if state.best_candidate else config.seed_prompt,
initial_prompt=config.seed_prompt,
iterations_used=state.iteration,
total_llm_calls=state.total_llm_calls + 1, # +1 pour le bootstrap
initial_score=initial_score,
final_score=final_score,
improvement=final_score - initial_score,
history=state.history,
)
5. Couche Infrastructure — DSPy Adapters
Objectif
Implémenter les ports du domaine avec DSPy.
Chaque adapter encapsule un dspy.Signature + un dspy.Module.
5.1 dspy_signatures.py
"""
DSPy Signatures — contrats LLM déclaratifs.
Objectif: Définir CE que fait chaque appel LLM, pas COMMENT.
DSPy Signature = input_fields → output_fields + instruction.
DSPy se charge du prompting, du parsing, et de la structuration.
"""
import dspy
class GenerateSyntheticInputs(dspy.Signature):
"""Generate diverse, realistic input examples for a given task."""
task_description: str = dspy.InputField(
desc="Description of the task the prompt should accomplish."
)
n_examples: int = dspy.InputField(
desc="Number of examples to generate."
)
examples: str = dspy.OutputField(
desc=(
"A JSON array of strings, each being a realistic input "
"for the task. Cover: normal cases, edge cases, long inputs, "
"short inputs, ambiguous cases, and tricky scenarios."
),
)
class JudgeOutput(dspy.Signature):
"""
Evaluate the quality of an LLM output for a given task and input.
Score: 0.0 (completely wrong) to 1.0 (perfect).
Feedback: specific, actionable criticism.
"""
task_description: str = dspy.InputField(
desc="What the assistant is supposed to do."
)
input_text: str = dspy.InputField(
desc="The input provided to the assistant."
)
output_text: str = dspy.InputField(
desc="The assistant's response to evaluate."
)
score: float = dspy.OutputField(
desc="Quality score from 0.0 (wrong) to 1.0 (perfect)."
)
feedback: str = dspy.OutputField(
desc=(
"Specific, actionable feedback explaining what's wrong "
"with the output and how to improve it. Be critical."
),
)
class ProposeInstruction(dspy.Signature):
"""
Given a current prompt and examples of where it fails with feedback,
propose an improved version of the prompt.
The new prompt should address all the issues identified in the feedback.
"""
current_instruction: str = dspy.InputField(
desc="The current prompt/instruction to improve."
)
task_description: str = dspy.InputField(
desc="Description of the task."
)
failure_examples: str = dspy.InputField(
desc=(
"Examples of inputs, outputs, scores, and feedback "
"showing where the current instruction fails."
),
)
new_instruction: str = dspy.OutputField(
desc="An improved version of the instruction."
)
5.2 dspy_modules.py
"""
DSPy Modules — composition de signatures.
Objectif: Orchestration déclarative des appels LLM via DSPy.
"""
import dspy
import json
class SyntheticInputGenerator(dspy.Module):
"""
Génère des inputs synthétiques en un seul appel batch.
Utilise ChainOfThought pour une meilleure diversité.
"""
def __init__(self):
super().__init__()
self.generate = dspy.ChainOfThought(GenerateSyntheticInputs)
def forward(self, task_description: str, n_examples: int):
result = self.generate(
task_description=task_description,
n_examples=n_examples,
)
# Parser le JSON array
try:
examples = json.loads(result.examples)
except json.JSONDecodeError:
# Fallback: extraire les strings du texte
examples = self._parse_fallback(result.examples)
return dspy.Prediction(examples=examples)
@staticmethod
def _parse_fallback(text: str) -> list[str]:
"""Extract strings from non-JSON output."""
# Tenter de trouver un JSON array dans le texte
import re
matches = re.findall(r'"([^"]+)"', text)
return matches if matches else [text]
class OutputJudge(dspy.Module):
"""
Juge un output unique. Sera appelé en batch par le JudgeAdapter.
"""
def __init__(self):
super().__init__()
self.judge = dspy.ChainOfThought(JudgeOutput)
def forward(self, task_description: str, input_text: str, output_text: str):
result = self.judge(
task_description=task_description,
input_text=input_text,
output_text=output_text,
)
# Parser le score (DSPy peut retourner un string)
try:
score = float(result.score)
except (ValueError, TypeError):
score = 0.5 # fallback neutre
score = max(0.0, min(1.0, score))
return dspy.Prediction(score=score, feedback=result.feedback)
class InstructionProposer(dspy.Module):
"""
Propose un nouveau prompt à partir des trajectoires d'échec.
C'est l'équivalent du InstructionProposalSignature de GEPA.
"""
def __init__(self):
super().__init__()
self.propose = dspy.ChainOfThought(ProposeInstruction)
def forward(
self,
current_instruction: str,
task_description: str,
failure_examples: str,
):
result = self.propose(
current_instruction=current_instruction,
task_description=task_description,
failure_examples=failure_examples,
)
return dspy.Prediction(new_instruction=result.new_instruction)
5.3 llm_adapter.py
"""
Adapter: Exécution d'un prompt sur un input.
Objectif: Implémenter le port LLMPort via DSPy.
"""
import dspy
from prometheus.domain.ports import LLMPort
from prometheus.domain.entities import Prompt
class DSPyLLMAdapter(LLMPort):
"""
Exécute un prompt en utilisant dspy.Predict avec une signature simple.
"""
class _ExecuteSignature(dspy.Signature):
"""Execute the instruction on the given input."""
instruction: str = dspy.InputField(desc="The instruction/prompt to follow.")
input_text: str = dspy.InputField(desc="The input to process.")
output: str = dspy.OutputField(desc="The response following the instruction.")
def __init__(self, model: str):
self._predictor = dspy.Predict(self._ExecuteSignature)
# Le modèle est configuré globalement via dspy.configure()
# Mais on peut aussi le configurer localement si besoin
def execute(self, prompt: Prompt, input_text: str) -> str:
result = self._predictor(
instruction=prompt.text,
input_text=input_text,
)
return result.output
5.4 judge_adapter.py
"""
Adapter: LLM-as-Judge.
Objectif: Implémenter le port JudgePort via le DSPy OutputJudge module.
"""
from prometheus.domain.ports import JudgePort
from prometheus.infrastructure.dspy_modules import OutputJudge
class DSPyJudgeAdapter(JudgePort):
"""
Évalue un batch de (input, output) en appelant le Judge pour chaque paire.
Optimisation future: paralléliser les appels via dspy.Parallel.
Pour le MVP, on reste séquentiel.
"""
def __init__(self):
self._judge = OutputJudge()
def judge_batch(
self,
task_description: str,
pairs: list[tuple[str, str]],
) -> list[tuple[float, str]]:
results = []
for input_text, output_text in pairs:
pred = self._judge(
task_description=task_description,
input_text=input_text,
output_text=output_text,
)
results.append((pred.score, pred.feedback))
return results
5.5 proposer_adapter.py
"""
Adapter: Reflective Mutation Proposer.
Objectif: Implémenter le port ProposerPort via le DSPy InstructionProposer.
Convertit les trajectoires en format lisible pour le LLM proposer.
"""
from prometheus.domain.ports import ProposerPort
from prometheus.domain.entities import Prompt, Trajectory
from prometheus.infrastructure.dspy_modules import InstructionProposer
class DSPyProposerAdapter(ProposerPort):
"""
Utilise les trajectoires d'évaluation pour construire
un "failure report" et proposer un nouveau prompt.
"""
def __init__(self):
self._proposer = InstructionProposer()
def propose(
self,
current_prompt: Prompt,
trajectories: list[Trajectory],
task_description: str,
) -> Prompt:
# Formater les trajectoires en exemples d'échec
failure_examples = self._format_failures(trajectories)
pred = self._proposer(
current_instruction=current_prompt.text,
task_description=task_description,
failure_examples=failure_examples,
)
return Prompt(text=pred.new_instruction)
@staticmethod
def _format_failures(trajectories: list[Trajectory]) -> str:
"""
Convertit les trajectoires en un rapport textuel structuré.
Format inspiré du InstructionProposalSignature de GEPA:
# Example 1
## Input
<input_text>
## Generated Output
<output_text>
## Score
<score>
## Feedback
<feedback>
"""
sections = []
for i, t in enumerate(trajectories, 1):
section = (
f"# Example {i}\n"
f"## Input\n{t.input_text}\n\n"
f"## Generated Output\n{t.output_text}\n\n"
f"## Score\n{t.score:.2f}\n\n"
f"## Feedback\n{t.feedback}\n"
)
sections.append(section)
return "\n---\n".join(sections)
5.6 synth_adapter.py
"""
Adapter: Génération d'inputs synthétiques.
Objectif: Implémenter le port SyntheticGeneratorPort via DSPy.
"""
from prometheus.domain.ports import SyntheticGeneratorPort
from prometheus.domain.entities import SyntheticExample
from prometheus.infrastructure.dspy_modules import SyntheticInputGenerator
class DSPySyntheticAdapter(SyntheticGeneratorPort):
"""
Génère des inputs synthétiques en un seul appel batch via DSPy.
"""
def __init__(self):
self._generator = SyntheticInputGenerator()
def generate_inputs(
self,
task_description: str,
n_examples: int,
) -> list[SyntheticExample]:
pred = self._generator(
task_description=task_description,
n_examples=n_examples,
)
return [
SyntheticExample(
input_text=text,
id=i,
)
for i, text in enumerate(pred.examples[:n_examples])
]
5.7 file_io.py
"""
File I/O — lecture/écriture des fichiers config et résultats.
Objectif: Implémenter le port PersistencePort avec YAML.
"""
import yaml
from prometheus.domain.ports import PersistencePort
class YamlPersistence(PersistencePort):
"""Lit et écrit des fichiers YAML."""
def read_config(self, path: str) -> dict:
with open(path, "r", encoding="utf-8") as f:
return yaml.safe_load(f)
def write_result(self, path: str, data: dict) -> None:
with open(path, "w", encoding="utf-8") as f:
yaml.dump(data, f, default_flow_style=False, allow_unicode=True)
6. Couche Présentation — CLI
Objectif
Fournir une interface CLI simple via Typer.
Point d'entrée unique: prometheus optimize -i config.yaml -o result.yaml
6.1 config.py
"""
Configuration globale — pydantic-settings.
Objectif: Charger la config depuis fichier + env vars + defaults.
"""
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class AppSettings:
"""Settings non-sensibles, hardcoded pour le MVP."""
app_name: str = "prometheus"
version: str = "0.1.0"
6.2 cli/app.py
"""
CLI — point d'entrée utilisateur.
Objectif: Interface Typer avec options -i (input) et -o (output).
"""
import typer
from rich.console import Console
from rich.panel import Panel
from rich.table import Table
import dspy
from prometheus.application.dto import OptimizationConfig, OptimizationResult
from prometheus.application.use_cases import OptimizePromptUseCase
from prometheus.application.bootstrap import SyntheticBootstrap
from prometheus.application.evaluator import PromptEvaluator
from prometheus.application.evolution import EvolutionLoop
from prometheus.infrastructure.file_io import YamlPersistence
from prometheus.infrastructure.llm_adapter import DSPyLLMAdapter
from prometheus.infrastructure.judge_adapter import DSPyJudgeAdapter
from prometheus.infrastructure.proposer_adapter import DSPyProposerAdapter
from prometheus.infrastructure.synth_adapter import DSPySyntheticAdapter
app = typer.Typer(
name="prometheus",
help="🔥 PROMETHEUS — Prompt evolution without reference data.",
no_args_is_help=True,
)
console = Console()
@app.command()
def optimize(
input: str = typer.Option(
..., "-i", "--input",
help="Path to input YAML config file.",
exists=True, readable=True,
),
output: str = typer.Option(
"output.yaml", "-o", "--output",
help="Path to output YAML result file.",
),
verbose: bool = typer.Option(
False, "-v", "--verbose",
help="Print detailed progress.",
),
) -> None:
"""
Optimize a prompt without any reference data.
Usage:
prometheus optimize -i config.yaml -o result.yaml
"""
console.print(Panel.fit(
"🔥 [bold red]PROMETHEUS[/bold red] — Prompt Evolution Engine",
subtitle="No reference data required",
))
# ── 1. Charger la config ──
persistence = YamlPersistence()
raw_config = persistence.read_config(input)
config = OptimizationConfig(
seed_prompt=raw_config["seed_prompt"],
task_description=raw_config["task_description"],
task_model=raw_config.get("task_model", "openai/gpt-4o-mini"),
judge_model=raw_config.get("judge_model", "openai/gpt-4o"),
proposer_model=raw_config.get("proposer_model", "openai/gpt-4o"),
synth_model=raw_config.get("synth_model", "openai/gpt-4o"),
max_iterations=raw_config.get("max_iterations", 30),
n_synthetic_inputs=raw_config.get("n_synthetic_inputs", 20),
minibatch_size=raw_config.get("minibatch_size", 5),
seed=raw_config.get("seed", 42),
output_path=output,
verbose=verbose,
)
console.print(f"[dim]Task: {config.task_description[:80]}...[/dim]")
console.print(f"[dim]Seed prompt: {config.seed_prompt[:80]}...[/dim]")
# ── 2. Configurer DSPy ──
# Modèle principal pour la plupart des opérations
task_lm = dspy.LM(config.task_model)
judge_lm = dspy.LM(config.judge_model)
proposer_lm = dspy.LM(config.proposer_model)
synth_lm = dspy.LM(config.synth_model)
# ── 3. Construire les adaptateurs (Dependency Injection) ──
dspy.configure(lm=task_lm) # default LM
synth_adapter = DSPySyntheticAdapter()
# Configurer le modèle de synthèse spécifiquement
# (Dans le MVP, on utilise le LM par défaut)
llm_adapter = DSPyLLMAdapter(model=config.task_model)
judge_adapter = DSPyJudgeAdapter()
proposer_adapter = DSPyProposerAdapter()
bootstrap = SyntheticBootstrap(generator=synth_adapter, seed=config.seed)
evaluator = PromptEvaluator(executor=llm_adapter, judge=judge_adapter)
use_case = OptimizePromptUseCase(
evaluator=evaluator,
proposer=proposer_adapter,
bootstrap=bootstrap,
)
# ── 4. Exécuter ──
with console.status("[bold green]Evolving prompt..."):
result = use_case.execute(config)
# ── 5. Afficher les résultats ──
_display_result(result)
# ── 6. Sauvegarder ──
_save_result(persistence, output, result)
console.print(f"\n[green]✅ Results saved to {output}[/green]")
def _display_result(result: OptimizationResult) -> None:
"""Affiche un résumé Rich dans le terminal."""
console.print()
console.print(Panel(
f"[bold green]Optimized Prompt[/bold green]\n\n{result.optimized_prompt}",
title="🔥 Result",
))
table = Table(title="Metrics")
table.add_column("Metric", style="cyan")
table.add_column("Value", style="bold")
table.add_row("Initial Score", f"{result.initial_score:.2f}")
table.add_row("Final Score", f"{result.final_score:.2f}")
table.add_row("Improvement", f"{result.improvement:+.2f}")
table.add_row("Iterations", str(result.iterations_used))
table.add_row("LLM Calls", str(result.total_llm_calls))
console.print(table)
def _save_result(
persistence: YamlPersistence,
path: str,
result: OptimizationResult,
) -> None:
"""Sauvegarde le résultat en YAML."""
from dataclasses import asdict
persistence.write_result(path, asdict(result))
if __name__ == "__main__":
app()
7. Algorithme Central
Diagramme de Flux Détaillé
flowchart TB
START(["prometheus optimize<br/>-i config.yaml<br/>-o result.yaml"]) --> LOAD
LOAD["Load config.yaml"] --> INIT_DSPY["Configure DSPy LMs"]
INIT_DSPY --> BOOTSTRAP
subgraph BOOTSTRAP["Phase 0: Bootstrap"]
direction TB
B1["DSPySyntheticAdapter<br/>.generate_inputs()"] --> B2["SyntheticInputGenerator<br/>dspy.ChainOfThought<br/>(GenerateSyntheticInputs)"]
B2 --> B3["Pool d'inputs synthétiques<br/>[input₁, input₂, ..., input₂₀]"]
end
B3 --> LOOP_START
subgraph LOOP["Phase 1: Evolution Loop (×30)"]
direction TB
LOOP_START --> SELECT["Garder le meilleur candidat"]
SELECT --> SAMPLE["Bootstrap.sample_minibatch()<br/>5 inputs aléatoires"]
SAMPLE --> EXEC
subgraph EXEC["Evaluate Current"]
direction TB
E1["DSPyLLMAdapter.execute()<br/>→ 5 outputs"] --> E2["DSPyJudgeAdapter.judge_batch()<br/>→ 5 × (score, feedback)"]
E2 --> E3["Construire Trajectories<br/>(input, output, score, feedback)"]
end
E3 --> CHECK_PERFECT{"All scores ≥ 1.0 ?"}
CHECK_PERFECT -->|Yes| NEXT_ITER["Skip → next iteration"]
CHECK_PERFECT -->|No| PROPOSE
subgraph PROPOSE["Reflective Mutation"]
direction TB
P1["DSPyProposerAdapter.propose()"] --> P2["Formater failure report<br/>à partir des Trajectories"]
P2 --> P3["InstructionProposer<br/>dspy.ChainOfThought<br/>(ProposeInstruction)"]
P3 --> P4["new_prompt"]
end
PROPOSE --> EVAL_NEW
subgraph EVAL_NEW["Evaluate New"]
direction TB
EN1["DSPyLLMAdapter.execute()<br/>→ 5 outputs"] --> EN2["DSPyJudgeAdapter.judge_batch()<br/>→ 5 × (score, feedback)"]
end
EVAL_NEW --> ACCEPT{"new_score > old_score ?"}
ACCEPT -->|Yes| UPDATE["best_candidate = nouveau"]
ACCEPT -->|No| NEXT_ITER
UPDATE --> NEXT_ITER
end
NEXT_ITER --> MORE{"iterations < max ?"}
MORE -->|Yes| SELECT
MORE -->|No| SAVE["Sauvegarder output.yaml"]
SAVE --> DONE(["✅ Done"])
style BOOTSTRAP fill:#0f3460,stroke:#00d2ff,color:#fff
style LOOP fill:#1a1a2e,stroke:#e94560,color:#fff
style EXEC fill:#16213e,stroke:#00d2ff,color:#fff
style PROPOSE fill:#16213e,stroke:#e94560,color:#fff
style EVAL_NEW fill:#16213e,stroke:#00d2ff,color:#fff
Budget LLM Détaillé par Itération
Itération type (minibatch_size=5):
┌──────────────────────────────────────┬──────────┐
│ Opération │ Appels │
├──────────────────────────────────────┼──────────┤
│ Execute current (task_lm) │ 5 │
│ Judge current (judge_lm) │ 5 │
│ Propose new (proposer_lm) │ 1 │
│ Execute new (task_lm) │ 5 │
│ Judge new (judge_lm) │ 5 │
├──────────────────────────────────────┼──────────┤
│ TOTAL par itération │ 21 │
├──────────────────────────────────────┼──────────┤
│ Bootstrap │ 1 │
│ 30 itérations × 21 │ 630 │
├──────────────────────────────────────┼──────────┤
│ TOTAL MVP │ ~631 │
└──────────────────────────────────────┴──────────┘
8. Format des Fichiers I/O
8.1 Input: config.yaml
# PROMETHEUS Configuration File
# ==================================
# Le prompt initial à optimiser
seed_prompt: |
Tu es un assistant expert en analyse de contrats.
Analyse le texte fourni et identifie les clauses potentiellement abusives.
Sois précis et cite les passages concernés.
# Description de la tâche (utilisé pour générer les inputs synthétiques)
task_description: |
Analyse juridique de contrats pour identifier les clauses abusives.
L'assistant doit examiner un texte de contrat et signaler
toute clause qui pourrait être considérée comme abusive selon
le droit de la consommation français.
# Modèles LLM (format DSPy/litellm)
task_model: "openai/gpt-4o-mini"
judge_model: "openai/gpt-4o"
proposer_model: "openai/gpt-4o"
synth_model: "openai/gpt-4o"
# Paramètres d'évolution
max_iterations: 30
n_synthetic_inputs: 20
minibatch_size: 5
seed: 42
8.2 Output: result.yaml
# PROMETHEUS Optimization Result
# ================================
optimized_prompt: |
Tu es un analyste juridique spécialisé en droit de la consommation français.
Pour chaque contrat analysé, applique cette méthodologie:
1. Identifie toutes les clauses restrictives pour le consommateur
2. Compare chaque clause aux critères d'abusivité de l'Article L.212-1
3. Signale les clauses abusives avec: le texte exact, le motif d'abusivité,
et le risque juridique associé
Sois exhaustif et cite systématiquement les passages concernés.
initial_prompt: |
Tu es un assistant expert en analyse de contrats.
Analyse le texte fourni et identifie les clauses potentiellement abusives.
Sois précis et cite les passages concernés.
initial_score: 6.8
final_score: 8.9
improvement: 2.1
iterations_used: 30
total_llm_calls: 631
history:
- iteration: 1
event: "accepted"
old_score: 1.2
new_score: 1.8
improvement: 0.6
- iteration: 2
event: "rejected"
old_score: 1.8
new_score: 1.5
# ... etc
9. Configuration & Environnement
9.1 Variables d'Environnement
# Requis (si utilisation d'OpenAI)
export OPENAI_API_KEY="sk-..."
# Optionnel (si utilisation d'autres providers)
export ANTHROPIC_API_KEY="..."
export TOGETHER_API_KEY="..."
# Optionnel
export PROMETHEUS_LOG_LEVEL="INFO" # DEBUG pour les traces détaillées
9.2 Installation et Exécution
# Installation
git clone <repo>
cd prometheus
uv sync
# Exécution
uv run prometheus optimize -i config.yaml -o result.yaml -v
# Avec options
uv run prometheus optimize \
-i examples/legal_contract.yaml \
-o results/legal_optimized.yaml \
--verbose
10. Stratégie de Tests
10.1 Pyramide de Tests
┌─────────────┐
│ E2E Test │ test_full_pipeline.py
│ (1-2 tests) │ → Mock LLM, vérifie le flux complet
├─────────────┤
│ Integration │ test_dspy_adapters.py
│ (3-5 tests) │ → Vraies signatures DSPy, mock LM
├─────────────┤
│ Unit │ test_entities.py
│ (10+ tests) │ test_scoring.py
│ │ test_evolution.py (avec mocks)
└─────────────┘
10.2 tests/conftest.py
"""Shared test fixtures."""
import pytest
from unittest.mock import MagicMock
from prometheus.domain.entities import (
Prompt, SyntheticExample, Trajectory, EvalResult, Candidate
)
@pytest.fixture
def seed_prompt():
return Prompt(text="You are a helpful assistant. Answer the question.")
@pytest.fixture
def task_description():
return "Answer factual questions accurately and concisely."
@pytest.fixture
def synthetic_pool():
return [
SyntheticExample(input_text=f"Test input {i}", id=i)
for i in range(20)
]
@pytest.fixture
def mock_eval_result():
return EvalResult(
scores=[0.3, 0.5, 0.4, 0.6, 0.2],
feedbacks=[
"Incomplete answer",
"Missing key detail",
"Wrong format",
"Partially correct",
"Completely off topic",
],
trajectories=[
Trajectory(
input_text=f"Input {i}",
output_text=f"Output {i}",
score=s,
feedback=f,
prompt_used="test prompt",
)
for i, (s, f) in enumerate(zip(
[0.3, 0.5, 0.4, 0.6, 0.2],
[
"Incomplete answer",
"Missing key detail",
"Wrong format",
"Partially correct",
"Completely off topic",
],
))
],
)
@pytest.fixture
def mock_llm_port():
"""Mock LLMPort that returns canned responses."""
port = MagicMock()
port.execute.return_value = "This is a mock response."
return port
@pytest.fixture
def mock_judge_port():
"""Mock JudgePort that returns moderate scores."""
port = MagicMock()
port.judge_batch.return_value = [
(0.5, "Moderate quality, needs improvement."),
] * 5
return port
@pytest.fixture
def mock_proposer_port():
"""Mock ProposerPort that returns a slightly modified prompt."""
port = MagicMock()
port.propose.return_value = Prompt(
text="You are a very helpful assistant. Answer the question precisely."
)
return port
10.3 tests/unit/test_evolution.py
"""Unit tests for the evolution loop — with full mocking."""
import pytest
from unittest.mock import MagicMock, patch
from prometheus.domain.entities import Prompt, SyntheticExample, EvalResult, Trajectory
from prometheus.application.evolution import EvolutionLoop
from prometheus.application.evaluator import PromptEvaluator
from prometheus.application.bootstrap import SyntheticBootstrap
class TestEvolutionLoop:
"""Teste la logique d'acceptation/rejet de la boucle d'évolution."""
def test_accepts_improvement(self, seed_prompt, synthetic_pool, task_description,
mock_llm_port, mock_judge_port, mock_proposer_port):
"""
Scénario: le nouveau prompt améliore le score.
Attendu: le meilleur candidat est mis à jour.
"""
evaluator = PromptEvaluator(mock_llm_port, mock_judge_port)
bootstrap = MagicMock(spec=SyntheticBootstrap)
bootstrap.sample_minibatch.return_value = synthetic_pool[:5]
# Old eval = low scores, new eval = high scores
old_eval = EvalResult(
scores=[0.3, 0.4, 0.3, 0.5, 0.2],
feedbacks=["bad"] * 5,
trajectories=[
Trajectory(f"input{i}", f"output{i}", s, "bad", "prompt")
for i, s in enumerate([0.3, 0.4, 0.3, 0.5, 0.2])
],
)
new_eval = EvalResult(
scores=[0.8, 0.9, 0.7, 0.8, 0.9],
feedbacks=["good"] * 5,
trajectories=[],
)
# evaluator.evaluate called twice per iteration (old + new)
evaluator.evaluate = MagicMock(side_effect=[old_eval, new_eval])
loop = EvolutionLoop(
evaluator=evaluator,
proposer=mock_proposer_port,
bootstrap=bootstrap,
max_iterations=1,
minibatch_size=5,
)
# initial eval
initial_eval = MagicMock()
initial_eval.total_score = 1.7
with patch.object(loop, '_log'):
state = loop.run(seed_prompt, synthetic_pool, task_description)
assert state.best_candidate.best_score > 0
def test_rejects_regression(self, seed_prompt, synthetic_pool, task_description,
mock_llm_port, mock_judge_port, mock_proposer_port):
"""
Scénario: le nouveau prompt dégrade le score.
Attendu: le meilleur candidat reste inchangé.
"""
evaluator = PromptEvaluator(mock_llm_port, mock_judge_port)
bootstrap = MagicMock(spec=SyntheticBootstrap)
bootstrap.sample_minibatch.return_value = synthetic_pool[:5]
old_eval = EvalResult(
scores=[0.7, 0.8, 0.7, 0.8, 0.9],
feedbacks=["ok"] * 5,
trajectories=[
Trajectory(f"input{i}", f"output{i}", s, "ok", "prompt")
for i, s in enumerate([0.7, 0.8, 0.7, 0.8, 0.9])
],
)
new_eval = EvalResult(
scores=[0.2, 0.1, 0.3, 0.2, 0.1],
feedbacks=["bad"] * 5,
trajectories=[],
)
evaluator.evaluate = MagicMock(side_effect=[old_eval, new_eval])
loop = EvolutionLoop(
evaluator=evaluator,
proposer=mock_proposer_port,
bootstrap=bootstrap,
max_iterations=1,
minibatch_size=5,
)
with patch.object(loop, '_log'):
state = loop.run(seed_prompt, synthetic_pool, task_description)
# Le seed prompt devrait rester le meilleur
assert state.best_candidate.prompt.text == seed_prompt.text
def test_skips_perfect_scores(self, seed_prompt, synthetic_pool, task_description,
mock_llm_port, mock_judge_port, mock_proposer_port):
"""
Scénario: tous les scores sont parfaits.
Attendu: pas de proposition, passage à l'itération suivante.
"""
evaluator = PromptEvaluator(mock_llm_port, mock_judge_port)
bootstrap = MagicMock(spec=SyntheticBootstrap)
bootstrap.sample_minibatch.return_value = synthetic_pool[:5]
perfect_eval = EvalResult(
scores=[1.0, 1.0, 1.0, 1.0, 1.0],
feedbacks=["perfect"] * 5,
trajectories=[
Trajectory(f"input{i}", f"output{i}", 1.0, "perfect", "prompt")
for i in range(5)
],
)
evaluator.evaluate = MagicMock(return_value=perfect_eval)
loop = EvolutionLoop(
evaluator=evaluator,
proposer=mock_proposer_port,
bootstrap=bootstrap,
max_iterations=3,
minibatch_size=5,
)
with patch.object(loop, '_log'):
state = loop.run(seed_prompt, synthetic_pool, task_description)
# Le proposer ne devrait jamais avoir été appelé
mock_proposer_port.propose.assert_not_called()
11. Diagrammes d'Architecture
11.1 Architecture Hexagonale — Vue Composants
flowchart TB
subgraph PRESENTATION["🎯 PRESENTATION (CLI)"]
CLI["typer CLI<br/>prometheus/cli/app.py"]
end
subgraph APPLICATION["⚙️ APPLICATION (Use Cases)"]
UC["OptimizePromptUseCase"]
BOOT["SyntheticBootstrap"]
EVAL["PromptEvaluator"]
EVO["EvolutionLoop"]
end
subgraph DOMAIN["💎 DOMAIN (Entities + Ports)"]
ENT["Prompt<br/>SyntheticExample<br/>Trajectory<br/>EvalResult<br/>Candidate<br/>OptimizationState"]
PORTS["LLMPort<br/>JudgePort<br/>ProposerPort<br/>SyntheticGeneratorPort<br/>PersistencePort"]
SCORE["scoring.py"]
end
subgraph INFRA["🔧 INFRASTRUCTURE (DSPy)"]
DSPY_SIG["dspy_signatures.py<br/>GenerateSyntheticInputs<br/>JudgeOutput<br/>ProposeInstruction"]
DSPY_MOD["dspy_modules.py<br/>SyntheticInputGenerator<br/>OutputJudge<br/>InstructionProposer"]
ADAPTERS["DSPyLLMAdapter<br/>DSPyJudgeAdapter<br/>DSPyProposerAdapter<br/>DSPySyntheticAdapter"]
FILE_IO["YamlPersistence"]
end
CLI -->|"OptimizationConfig"| UC
UC --> BOOT
UC --> EVO
EVO --> EVAL
EVO -->|"ProposerPort"| ADAPTERS
BOOT -->|"SyntheticGeneratorPort"| ADAPTERS
EVAL -->|"LLMPort"| ADAPTERS
EVAL -->|"JudgePort"| ADAPTERS
ADAPTERS --> DSPY_MOD
DSPY_MOD --> DSPY_SIG
CLI -->|"PersistencePort"| FILE_IO
UC -.->|"depends on"| ENT
UC -.->|"depends on"| PORTS
EVO -.->|"depends on"| ENT
EVO -.->|"depends on"| SCORE
EVAL -.->|"depends on"| ENT
ADAPTERS -.->|"implements"| PORTS
style PRESENTATION fill:#1a1a2e,stroke:#00d2ff,color:#fff
style APPLICATION fill:#0f3460,stroke:#00d2ff,color:#fff
style DOMAIN fill:#16213e,stroke:#e94560,color:#fff
style INFRA fill:#1a1a2e,stroke:#e94560,color:#fff
11.2 Dependency Rule
flowchart LR
CLI["CLI"] --> APP["Application"]
APP --> DOMAIN["Domain"]
INFRA["Infrastructure"] --> DOMAIN
CLI --> INFRA
style DOMAIN fill:#e94560,color:#fff
style APP fill:#0f3460,color:#fff
style INFRA fill:#1a1a2e,color:#fff
style CLI fill:#16213e,color:#fff
Règle: Les flèches ne vont JAMAIS du Domain vers l'extérieur. Le Domain ne connaît ni DSPy, ni Typer, ni YAML.
11.3 Sequence Diagram — Run Complète
sequenceDiagram
participant User
participant CLI as CLI (Typer)
participant UC as OptimizePromptUseCase
participant BOOT as SyntheticBootstrap
participant SYNTH as DSPySyntheticAdapter
participant LOOP as EvolutionLoop
participant EVAL as PromptEvaluator
participant LLM as DSPyLLMAdapter
participant JUDGE as DSPyJudgeAdapter
participant PROP as DSPyProposerAdapter
participant FS as YamlPersistence
User->>CLI: prometheus optimize -i in.yaml -o out.yaml
CLI->>FS: read_config("in.yaml")
FS-->>CLI: raw_config dict
CLI->>UC: execute(config)
Note over UC,SYNTH: Phase 0: Bootstrap
UC->>BOOT: run(task_desc, 20)
BOOT->>SYNTH: generate_inputs(task_desc, 20)
SYNTH->>SYNTH: dspy.ChainOfThought(GenerateSyntheticInputs)
SYNTH-->>BOOT: [20 SyntheticExample]
BOOT-->>UC: synthetic_pool
Note over UC,PROP: Phase 1: Evolution
loop 30 iterations
UC->>LOOP: run(seed_prompt, pool, task_desc)
LOOP->>BOOT: sample_minibatch(pool, 5)
BOOT-->>LOOP: [5 examples]
LOOP->>EVAL: evaluate(current_prompt, batch)
EVAL->>LLM: execute(prompt, input) ×5
LLM-->>EVAL: 5 outputs
EVAL->>JUDGE: judge_batch(task_desc, pairs)
JUDGE->>JUDGE: dspy.ChainOfThought(JudgeOutput) ×5
JUDGE-->>EVAL: [(score, feedback) ×5]
EVAL-->>LOOP: EvalResult
LOOP->>PROP: propose(prompt, trajectories)
PROP->>PROP: dspy.ChainOfThought(ProposeInstruction)
PROP-->>LOOP: new Prompt
LOOP->>EVAL: evaluate(new_prompt, batch)
EVAL->>LLM: execute(new_prompt, input) ×5
EVAL->>JUDGE: judge_batch(task_desc, pairs)
EVAL-->>LOOP: new EvalResult
alt new_score > old_score
LOOP->>LOOP: best = new_prompt
end
end
LOOP-->>UC: OptimizationState
UC-->>CLI: OptimizationResult
CLI->>FS: write_result("out.yaml", result)
CLI-->>User: ✅ Optimized prompt + metrics
11.4 Data Flow Diagram
flowchart LR
subgraph INPUT
YAML["config.yaml"]
end
subgraph GENERATION
SYNTH["Synthetic Pool<br/>20 inputs"]
end
subgraph EVAL["Evaluation Pipeline"]
EXEC["Execute<br/>(task_lm)"]
JUDGE["Judge<br/>(judge_lm)"]
end
subgraph PROPOSAL
PROP["Propose<br/>(proposer_lm)"]
end
subgraph OUTPUT
RESULT["result.yaml"]
end
YAML --> SYNTH
SYNTH -->|"minibatch 5"| EXEC
EXEC -->|"outputs"| JUDGE
JUDGE -->|"scores + feedbacks"| PROP
PROP -->|"new_prompt"| EXEC
JUDGE -->|"scores"| RESULT
PROP --> RESULT
style INPUT fill:#1a1a2e,stroke:#00d2ff,color:#fff
style GENERATION fill:#0f3460,stroke:#00d2ff,color:#fff
style EVAL fill:#16213e,stroke:#e94560,color:#fff
style PROPOSAL fill:#1a1a2e,stroke:#e94560,color:#fff
style OUTPUT fill:#0f3460,stroke:#e94560,color:#fff
Résumé des Sections
| Section | Objectif | Fichiers clés |
|---|---|---|
| Domain | Cœur métier pur, zéro dépendance | entities.py, ports.py, scoring.py |
| Application | Orchestration métier via les ports | use_cases.py, bootstrap.py, evaluator.py, evolution.py |
| Infrastructure | Implémentation DSPy des ports | dspy_signatures.py, dspy_modules.py, *_adapter.py |
| CLI | Interface utilisateur Typer | cli/app.py |
| I/O | Config YAML en entrée, résultat YAML en sortie | file_io.py |
| Tests | Pyramide unit → integration → e2e | tests/ |
Le flux: config.yaml → CLI → UseCase → Bootstrap (synth inputs) → EvolutionLoop (evaluate × propose × accept) × N → result.yaml |