# PROMETHEUS MVP — Spécification Technique Détaillée **Version**: 0.1.0 **Stack**: Python 3.12+ · uv · DSPy · Typer **Architecture**: Clean Architecture (hexagonale) **Date**: 2025 --- ## Table des Matières 1. [Vue d'Ensemble & Objectifs](#1-vue-densemble--objectifs) 2. [Structure du Projet](#2-structure-du-projet) 3. [Couche Domaine](#3-couche-domaine--entities--ports) 4. [Couche Application](#4-couche-application--use-cases) 5. [Couche Infrastructure](#5-couche-infrastructure--dspy-adapters) 6. [Couche Présentation (CLI)](#6-couche-présentation--cli) 7. [Algorithme Central — Pseudo-Code Détaillé](#7-algorithme-central) 8. [Format des Fichiers I/O](#8-format-des-fichiers-io) 9. [Configuration & Environnement](#9-configuration--environnement) 10. [Tests](#10-stratégie-de-tests) 11. [Diagrammes d'Architecture Complète](#11-diagrammes-darchitecture) --- ## 1. Vue d'Ensemble & Objectifs ### 1.1 Énoncé du Problème Les frameworks d'optimisation de prompt (GEPA, TextGrad, Promptolution) nécessitent tous un dataset labellisé pour calculer un signal de qualité. PROMETHEUS élimine cette dépendance en synthétisant ses propres données de test et en utilisant un LLM-as-Judge comme fonction d'évaluation. ### 1.2 Objectifs du MVP | # | Objectif | Critère d'acceptance | |---|----------|---------------------| | O1 | Optimiser un prompt sans aucune donnée labellisée | Seed prompt → prompt amélioré, 0 fichier de données requis | | O2 | Interface CLI simple | `prometheus optimize -i config.yaml -o result.yaml` | | O3 | Budget maîtrisé | < 500 appels LLM pour une run complète | | O4 | Reproductible | Seed déterministe, résultats identiques si même seed + même modèle | | O5 | Observable | Logging structuré, métriques par itération | ### 1.3 Flux Nominal ``` ┌──────────────┐ ┌───────────────┐ ┌──────────────────┐ ┌────────────┐ │ Fichier │ │ │ │ │ │ Fichier │ │ config.yaml ├───► │ Bootstrap ├───► │ Evolution Loop ├───► │ output │ │ (seed prompt│ │ (synth inputs│ │ (judge + mutate │ │ (optimized│ │ + params) │ │ generation) │ │ + accept) │ │ prompt) │ └──────────────┘ └───────────────┘ └──────────────────┘ └────────────┘ ``` --- ## 2. Structure du Projet ``` prometheus/ ├── pyproject.toml # uv project config ├── README.md ├── specs/ │ └── technical-spec.md # ce fichier │ ├── src/ │ └── prometheus/ │ ├── __init__.py │ ├── cli/ # PRESENTATION LAYER │ │ ├── __init__.py │ │ └── app.py # Typer CLI app │ │ │ ├── domain/ # DOMAIN LAYER (zero dependencies) │ │ ├── __init__.py │ │ ├── entities.py # Dataclasses: Prompt, Candidate, EvalResult, SyntheticExample │ │ ├── ports.py # Abstract interfaces (Protocol classes) │ │ └── scoring.py # Score combination logic, acceptance criteria │ │ │ ├── application/ # APPLICATION LAYER (depends on domain only) │ │ ├── __init__.py │ │ ├── use_cases.py # OptimizePromptUseCase │ │ ├── bootstrap.py # SyntheticInputBootstrap │ │ ├── evolution.py # EvolutionLoop, ReflectiveMutation │ │ ├── evaluator.py # DualEvaluator (judge + execution) │ │ └── dto.py # Config & Result dataclasses │ │ │ ├── infrastructure/ # INFRASTRUCTURE LAYER (depends on domain + application) │ │ ├── __init__.py │ │ ├── dspy_signatures.py # DSPy Signature definitions │ │ ├── dspy_modules.py # DSPy Module implementations │ │ ├── llm_adapter.py # LLMAdapter (implements domain port) │ │ ├── judge_adapter.py # JudgeAdapter (implements domain port) │ │ ├── proposer_adapter.py # ProposerAdapter (implements domain port) │ │ ├── synth_adapter.py # SyntheticGeneratorAdapter (implements domain port) │ │ └── file_io.py # FileReader, FileWriter │ │ │ └── config.py # Settings (pydantic-settings) │ ├── tests/ │ ├── unit/ │ │ ├── test_entities.py │ │ ├── test_scoring.py │ │ ├── test_evolution.py │ │ └── test_bootstrap.py │ ├── integration/ │ │ ├── test_dspy_adapters.py │ │ └── test_full_pipeline.py │ └── conftest.py │ └── examples/ ├── basic_usage.py └── sample_config.yaml ``` ### 2.1 `pyproject.toml` ```toml [project] name = "prometheus" version = "0.1.0" description = "Prompt evolution without reference data" readme = "README.md" requires-python = ">=3.12" dependencies = [ "dspy>=2.6", "typer>=0.15", "pydantic>=2.10", "pydantic-settings>=2.7", "pyyaml>=6.0", "rich>=13.9", ] [project.optional-dependencies] dev = [ "pytest>=8.3", "pytest-cov>=6.0", "ruff>=0.9", "mypy>=1.14", ] [project.scripts] prometheus = "prometheus.cli.app:app" [build-system] requires = ["hatchling"] build-backend = "hatchling.build" [tool.ruff] line-length = 100 target-version = "py312" [tool.mypy] python_version = "3.12" strict = true ``` --- ## 3. Couche Domaine — Entities & Ports ### Objectif Définir le cœur métier sans aucune dépendance externe. Aucune import de `dspy`, `pydantic`, ou quoi que ce soit hors stdlib. ### 3.1 `entities.py` ```python """Domain entities — pure data, zero dependencies.""" from __future__ import annotations from dataclasses import dataclass, field from typing import Any @dataclass(frozen=True) class Prompt: """ Représente un prompt candidat. frozen=True → immutable, safe pour le Pareto tracking. """ text: str metadata: dict[str, Any] = field(default_factory=dict) def __len__(self) -> int: return len(self.text) @dataclass(frozen=True) class SyntheticExample: """ Un exemple synthétique: un input généré à partir de la task description. Pas d'output attendu — le juge évaluera la sortie directement. """ input_text: str category: str = "default" # pour le sampling stratifié futur id: int = 0 @dataclass class Trajectory: """ Trace d'exécution d'un prompt sur un input. Utilisé par la reflective mutation pour comprendre les échecs. """ input_text: str output_text: str score: float feedback: str # feedback textuel du juge prompt_used: str @dataclass class EvalResult: """Résultat d'une évaluation sur un minibatch.""" scores: list[float] feedbacks: list[str] trajectories: list[Trajectory] @property def total_score(self) -> float: return sum(self.scores) @property def mean_score(self) -> float: return sum(self.scores) / len(self.scores) if self.scores else 0.0 @dataclass class Candidate: """ Un candidat dans le pool d'évolution. Contient le prompt + ses scores cumulés. """ prompt: Prompt best_score: float = 0.0 generation: int = 0 # à quelle itération il a été créé parent_id: int | None = None @dataclass class OptimizationState: """État complet de l'optimisation — snapshot sérialisable.""" iteration: int = 0 best_candidate: Candidate | None = None candidates: list[Candidate] = field(default_factory=list) synthetic_pool: list[SyntheticExample] = field(default_factory=list) history: list[dict[str, Any]] = field(default_factory=list) total_llm_calls: int = 0 ``` ### 3.2 `ports.py` ```python """ Domain ports — interfaces abstraites que l'infrastructure implémente. Utilise Protocol (structural typing) pour le loose coupling. """ from __future__ import annotations from abc import ABC, abstractmethod from prometheus.domain.entities import ( Prompt, SyntheticExample, Trajectory, EvalResult ) class LLMPort(ABC): """ Port d'exécution d'un prompt sur un input. L'infrastructure fournira une implémentation via DSPy. """ @abstractmethod def execute(self, prompt: Prompt, input_text: str) -> str: """Exécute le prompt sur l'input, retourne la réponse brute.""" ... class JudgePort(ABC): """ Port d'évaluation par LLM-as-Judge. Prend des paires (input, output) + la task description. Retourne un score + un feedback textuel par paire. """ @abstractmethod def judge_batch( self, task_description: str, pairs: list[tuple[str, str]], ) -> list[tuple[float, str]]: """ Évalue un batch de (input, output). Retourne une liste de (score, feedback). """ ... class ProposerPort(ABC): """ Port de proposition d'un nouveau prompt. Utilise les trajectoires d'évaluation pour proposer une amélioration. """ @abstractmethod def propose( self, current_prompt: Prompt, trajectories: list[Trajectory], task_description: str, ) -> Prompt: """Propose un nouveau prompt basé sur les trajectoires d'échec.""" ... class SyntheticGeneratorPort(ABC): """ Port de génération d'inputs synthétiques. """ @abstractmethod def generate_inputs( self, task_description: str, n_examples: int, ) -> list[SyntheticExample]: """Génère N inputs synthétiques diversifiés.""" ... class PersistencePort(ABC): """Port de lecture/écriture des fichiers.""" @abstractmethod def read_config(self, path: str) -> dict: ... @abstractmethod def write_result(self, path: str, data: dict) -> None: ... ``` ### 3.3 `scoring.py` ```python """Logique de scoring et critères d'acceptation — pur domaine.""" from prometheus.domain.entities import EvalResult def should_accept( old_result: EvalResult, new_result: EvalResult, min_improvement: float = 0.0, ) -> bool: """ Critère d'acceptation strict. Le nouveau candidat doit strictement améliorer le score total. """ return new_result.total_score > old_result.total_score + min_improvement def normalize_score(raw: float, min_val: float = 0.0, max_val: float = 1.0) -> float: """Clamp un score dans [min_val, max_val].""" return max(min_val, min(max_val, raw)) ``` --- ## 4. Couche Application — Use Cases ### Objectif Orchestrer la logique métier en utilisant uniquement les ports du domaine. Ne dépend jamais de l'infrastructure concrète. ### 4.1 `dto.py` ```python """Data Transfer Objects — configuration et résultats.""" from dataclasses import dataclass, field @dataclass class OptimizationConfig: """Configuration complète d'une run PROMETHEUS.""" # --- Prompt --- seed_prompt: str task_description: str # --- Modèles --- task_model: str = "openai/gpt-4o-mini" judge_model: str = "openai/gpt-4o" proposer_model: str = "openai/gpt-4o" synth_model: str = "openai/gpt-4o" # --- Paramètres d'évolution --- max_iterations: int = 30 n_synthetic_inputs: int = 20 minibatch_size: int = 5 perfect_score: float = 1.0 # --- Reproductibilité --- seed: int = 42 # --- Sortie --- output_path: str = "output.yaml" verbose: bool = False @dataclass class OptimizationResult: """Résultat d'une optimisation complète.""" optimized_prompt: str initial_prompt: str iterations_used: int total_llm_calls: int initial_score: float final_score: float improvement: float history: list[dict] = field(default_factory=list) ``` ### 4.2 `bootstrap.py` ```python """ Bootstrap — génération d'inputs synthétiques. Objectif: Créer un pool d'inputs de test à partir de la task description. C'est le remplacement du dataset labellisé. """ from __future__ import annotations import random from prometheus.domain.ports import SyntheticGeneratorPort from prometheus.domain.entities import SyntheticExample class SyntheticBootstrap: """ Orchestre la génération d'inputs synthétiques. Ne dépend que du port abstrait, pas de DSPy directement. """ def __init__(self, generator: SyntheticGeneratorPort, seed: int = 42): self._generator = generator self._rng = random.Random(seed) def run(self, task_description: str, n_examples: int) -> list[SyntheticExample]: """ Génère le pool synthétique en un seul appel. Pourquoi un seul appel ? - Minimise les coûts LLM (1 appel au lieu de N) - Le LLM peut assurer la diversité en une seule génération - Le batch dans un seul prompt permet une meilleure couverture """ examples = self._generator.generate_inputs(task_description, n_examples) # Shuffle pour la randomisation self._rng.shuffle(examples) return examples def sample_minibatch( self, pool: list[SyntheticExample], size: int, ) -> list[SyntheticExample]: """Échantillonne un minibatch du pool synthétique.""" size = min(size, len(pool)) return self._rng.sample(pool, size) ``` ### 4.3 `evaluator.py` ```python """ Évaluateur — exécution + jugement. Objectif: Produire un signal de qualité sans ground truth. Combine l'exécution du prompt candidat + l'évaluation par un LLM-as-Judge. """ from __future__ import annotations from prometheus.domain.entities import ( Prompt, SyntheticExample, Trajectory, EvalResult ) from prometheus.domain.ports import LLMPort, JudgePort class PromptEvaluator: """ Évalue un prompt sur un minibatch d'inputs synthétiques. Pipeline: execute → judge → construire les trajectoires. Ce composant remplace la EvaluatorFn de GEPA. Au lieu de comparer à un ground truth, il utilise un LLM-as-Judge. """ def __init__(self, executor: LLMPort, judge: JudgePort): self._executor = executor self._judge = judge def evaluate( self, prompt: Prompt, minibatch: list[SyntheticExample], task_description: str, ) -> EvalResult: """ Évalue le prompt sur le minibatch. Étapes: 1. Exécuter le prompt sur chaque input du minibatch 2. Juger chaque paire (input, output) 3. Construire les trajectoires avec le feedback Retourne un EvalResult avec scores + feedbacks + trajectoires. """ # ── Étape 1: Exécution ── outputs: list[str] = [] for example in minibatch: raw_output = self._executor.execute(prompt, example.input_text) outputs.append(raw_output) # ── Étape 2: Jugement ── pairs = [(ex.input_text, out) for ex, out in zip(minibatch, outputs)] judge_results = self._judge.judge_batch(task_description, pairs) # ── Étape 3: Construction des trajectoires ── scores: list[float] = [] feedbacks: list[str] = [] trajectories: list[Trajectory] = [] for i, (example, output) in enumerate(zip(minibatch, outputs)): score, feedback = judge_results[i] scores.append(score) feedbacks.append(feedback) trajectories.append(Trajectory( input_text=example.input_text, output_text=output, score=score, feedback=feedback, prompt_used=prompt.text, )) return EvalResult( scores=scores, feedbacks=feedbacks, trajectories=trajectories, ) ``` ### 4.4 `evolution.py` ```python """ Boucle d'évolution — cœur du moteur PROMETHEUS. Objectif: Orchestrer le cycle select → evaluate → propose → accept. C'est l'équivalent du GEPAEngine.run(), adapté pour fonctionner sans valset. """ from __future__ import annotations from prometheus.domain.entities import ( Prompt, Candidate, EvalResult, OptimizationState, SyntheticExample ) from prometheus.domain.ports import ProposerPort from prometheus.domain.scoring import should_accept from prometheus.application.evaluator import PromptEvaluator from prometheus.application.bootstrap import SyntheticBootstrap class EvolutionLoop: """ Boucle d'évolution principale. Design: - Garde seulement le meilleur candidat (pas de population complète) - Cela simplifie énormément vs GEPA (pas de Pareto, pas de merge) - Si le MVP fonctionne, on ajoutera la population dans la v2 """ def __init__( self, evaluator: PromptEvaluator, proposer: ProposerPort, bootstrap: SyntheticBootstrap, max_iterations: int = 30, minibatch_size: int = 5, perfect_score: float = 1.0, verbose: bool = False, ): self._evaluator = evaluator self._proposer = proposer self._bootstrap = bootstrap self._max_iterations = max_iterations self._minibatch_size = minibatch_size self._perfect_score = perfect_score self._verbose = verbose def run( self, seed_prompt: Prompt, synthetic_pool: list[SyntheticExample], task_description: str, ) -> OptimizationState: """ Exécute la boucle d'évolution complète. Pseudo-code: ``` state.best = Candidate(seed_prompt) state.best.score = evaluate(seed_prompt) for i in range(max_iterations): batch = sample_minibatch(pool) old_eval = evaluate(state.best.prompt, batch) if all perfect: continue new_prompt = propose(state.best.prompt, old_eval.trajectories) new_eval = evaluate(new_prompt, batch) if new_eval > old_eval: state.best = Candidate(new_prompt, score=new_eval) return state ``` """ state = OptimizationState() # ── Évaluer le seed ── initial_batch = self._bootstrap.sample_minibatch( synthetic_pool, self._minibatch_size ) initial_eval = self._evaluator.evaluate( seed_prompt, initial_batch, task_description ) state.total_llm_calls += self._minibatch_size + 1 # executions + 1 judge best_candidate = Candidate( prompt=seed_prompt, best_score=initial_eval.total_score, generation=0, ) state.best_candidate = best_candidate state.candidates.append(best_candidate) self._log(f"Initial score: {initial_eval.total_score:.2f}") # ── Boucle principale ── for i in range(1, self._max_iterations + 1): state.iteration = i # 1. Sampler un minibatch frais batch = self._bootstrap.sample_minibatch( synthetic_pool, self._minibatch_size ) # 2. Évaluer le candidat actuel current_eval = self._evaluator.evaluate( best_candidate.prompt, batch, task_description ) state.total_llm_calls += self._minibatch_size + 1 # 3. Skip si parfait if all(s >= self._perfect_score for s in current_eval.scores): self._log(f"Iter {i}: All scores perfect, skipping.") state.history.append({ "iteration": i, "event": "skip_perfect", "current_score": current_eval.total_score, }) continue # 4. Proposer un nouveau prompt (reflective mutation) new_prompt = self._proposer.propose( best_candidate.prompt, current_eval.trajectories, task_description, ) state.total_llm_calls += 1 # 1 appel de proposition # 5. Évaluer le nouveau prompt sur le même minibatch new_eval = self._evaluator.evaluate( new_prompt, batch, task_description ) state.total_llm_calls += self._minibatch_size + 1 # 6. Accepter ou rejeter if should_accept(current_eval, new_eval): best_candidate = Candidate( prompt=new_prompt, best_score=new_eval.total_score, generation=i, parent_id=id(best_candidate), ) state.best_candidate = best_candidate state.candidates.append(best_candidate) self._log( f"Iter {i}: ACCEPTED " f"({current_eval.total_score:.2f} → {new_eval.total_score:.2f})" ) state.history.append({ "iteration": i, "event": "accepted", "old_score": current_eval.total_score, "new_score": new_eval.total_score, "improvement": new_eval.total_score - current_eval.total_score, }) else: self._log( f"Iter {i}: REJECTED " f"({new_eval.total_score:.2f} ≤ {current_eval.total_score:.2f})" ) state.history.append({ "iteration": i, "event": "rejected", "old_score": current_eval.total_score, "new_score": new_eval.total_score, }) return state def _log(self, msg: str) -> None: if self._verbose: print(f"[PROMETHEUS] {msg}") ``` ### 4.5 `use_cases.py` ```python """ Use Case principal — orchestration de haut niveau. Objectif: Point d'entrée métier. Coordonne bootstrap → evolution → résultat. Ne contient aucune logique technique, seulement de l'orchestration. """ from __future__ import annotations from prometheus.domain.entities import Prompt from prometheus.application.dto import OptimizationConfig, OptimizationResult from prometheus.application.bootstrap import SyntheticBootstrap from prometheus.application.evaluator import PromptEvaluator from prometheus.application.evolution import EvolutionLoop class OptimizePromptUseCase: """ Use case unique du MVP. Injecte les dépendances via le constructeur (dependency injection). """ def __init__( self, evaluator: PromptEvaluator, proposer: "ProposerPort", # noqa: F821 bootstrap: SyntheticBootstrap, ): self._evaluator = evaluator self._proposer = proposer self._bootstrap = bootstrap def execute(self, config: OptimizationConfig) -> OptimizationResult: """ Pipeline complet: 1. Bootstrap → générer les inputs synthétiques 2. Evolution → boucle d'optimisation 3. Retourner le résultat """ # ── Phase 0: Bootstrap ── synthetic_pool = self._bootstrap.run( task_description=config.task_description, n_examples=config.n_synthetic_inputs, ) # ── Phase 1: Evolution ── loop = EvolutionLoop( evaluator=self._evaluator, proposer=self._proposer, bootstrap=self._bootstrap, max_iterations=config.max_iterations, minibatch_size=config.minibatch_size, perfect_score=config.perfect_score, verbose=config.verbose, ) seed_prompt = Prompt(text=config.seed_prompt) state = loop.run(seed_prompt, synthetic_pool, config.task_description) # ── Phase 2: Résultat ── initial_score = state.history[0].get("current_score", 0.0) if state.history else 0.0 final_score = state.best_candidate.best_score if state.best_candidate else 0.0 return OptimizationResult( optimized_prompt=state.best_candidate.prompt.text if state.best_candidate else config.seed_prompt, initial_prompt=config.seed_prompt, iterations_used=state.iteration, total_llm_calls=state.total_llm_calls + 1, # +1 pour le bootstrap initial_score=initial_score, final_score=final_score, improvement=final_score - initial_score, history=state.history, ) ``` --- ## 5. Couche Infrastructure — DSPy Adapters ### Objectif Implémenter les ports du domaine avec DSPy. Chaque adapter encapsule un `dspy.Signature` + un `dspy.Module`. ### 5.1 `dspy_signatures.py` ```python """ DSPy Signatures — contrats LLM déclaratifs. Objectif: Définir CE que fait chaque appel LLM, pas COMMENT. DSPy Signature = input_fields → output_fields + instruction. DSPy se charge du prompting, du parsing, et de la structuration. """ import dspy class GenerateSyntheticInputs(dspy.Signature): """Generate diverse, realistic input examples for a given task.""" task_description: str = dspy.InputField( desc="Description of the task the prompt should accomplish." ) n_examples: int = dspy.InputField( desc="Number of examples to generate." ) examples: str = dspy.OutputField( desc=( "A JSON array of strings, each being a realistic input " "for the task. Cover: normal cases, edge cases, long inputs, " "short inputs, ambiguous cases, and tricky scenarios." ), ) class JudgeOutput(dspy.Signature): """ Evaluate the quality of an LLM output for a given task and input. Score: 0.0 (completely wrong) to 1.0 (perfect). Feedback: specific, actionable criticism. """ task_description: str = dspy.InputField( desc="What the assistant is supposed to do." ) input_text: str = dspy.InputField( desc="The input provided to the assistant." ) output_text: str = dspy.InputField( desc="The assistant's response to evaluate." ) score: float = dspy.OutputField( desc="Quality score from 0.0 (wrong) to 1.0 (perfect)." ) feedback: str = dspy.OutputField( desc=( "Specific, actionable feedback explaining what's wrong " "with the output and how to improve it. Be critical." ), ) class ProposeInstruction(dspy.Signature): """ Given a current prompt and examples of where it fails with feedback, propose an improved version of the prompt. The new prompt should address all the issues identified in the feedback. """ current_instruction: str = dspy.InputField( desc="The current prompt/instruction to improve." ) task_description: str = dspy.InputField( desc="Description of the task." ) failure_examples: str = dspy.InputField( desc=( "Examples of inputs, outputs, scores, and feedback " "showing where the current instruction fails." ), ) new_instruction: str = dspy.OutputField( desc="An improved version of the instruction." ) ``` ### 5.2 `dspy_modules.py` ```python """ DSPy Modules — composition de signatures. Objectif: Orchestration déclarative des appels LLM via DSPy. """ import dspy import json class SyntheticInputGenerator(dspy.Module): """ Génère des inputs synthétiques en un seul appel batch. Utilise ChainOfThought pour une meilleure diversité. """ def __init__(self): super().__init__() self.generate = dspy.ChainOfThought(GenerateSyntheticInputs) def forward(self, task_description: str, n_examples: int): result = self.generate( task_description=task_description, n_examples=n_examples, ) # Parser le JSON array try: examples = json.loads(result.examples) except json.JSONDecodeError: # Fallback: extraire les strings du texte examples = self._parse_fallback(result.examples) return dspy.Prediction(examples=examples) @staticmethod def _parse_fallback(text: str) -> list[str]: """Extract strings from non-JSON output.""" # Tenter de trouver un JSON array dans le texte import re matches = re.findall(r'"([^"]+)"', text) return matches if matches else [text] class OutputJudge(dspy.Module): """ Juge un output unique. Sera appelé en batch par le JudgeAdapter. """ def __init__(self): super().__init__() self.judge = dspy.ChainOfThought(JudgeOutput) def forward(self, task_description: str, input_text: str, output_text: str): result = self.judge( task_description=task_description, input_text=input_text, output_text=output_text, ) # Parser le score (DSPy peut retourner un string) try: score = float(result.score) except (ValueError, TypeError): score = 0.5 # fallback neutre score = max(0.0, min(1.0, score)) return dspy.Prediction(score=score, feedback=result.feedback) class InstructionProposer(dspy.Module): """ Propose un nouveau prompt à partir des trajectoires d'échec. C'est l'équivalent du InstructionProposalSignature de GEPA. """ def __init__(self): super().__init__() self.propose = dspy.ChainOfThought(ProposeInstruction) def forward( self, current_instruction: str, task_description: str, failure_examples: str, ): result = self.propose( current_instruction=current_instruction, task_description=task_description, failure_examples=failure_examples, ) return dspy.Prediction(new_instruction=result.new_instruction) ``` ### 5.3 `llm_adapter.py` ```python """ Adapter: Exécution d'un prompt sur un input. Objectif: Implémenter le port LLMPort via DSPy. """ import dspy from prometheus.domain.ports import LLMPort from prometheus.domain.entities import Prompt class DSPyLLMAdapter(LLMPort): """ Exécute un prompt en utilisant dspy.Predict avec une signature simple. """ class _ExecuteSignature(dspy.Signature): """Execute the instruction on the given input.""" instruction: str = dspy.InputField(desc="The instruction/prompt to follow.") input_text: str = dspy.InputField(desc="The input to process.") output: str = dspy.OutputField(desc="The response following the instruction.") def __init__(self, model: str): self._predictor = dspy.Predict(self._ExecuteSignature) # Le modèle est configuré globalement via dspy.configure() # Mais on peut aussi le configurer localement si besoin def execute(self, prompt: Prompt, input_text: str) -> str: result = self._predictor( instruction=prompt.text, input_text=input_text, ) return result.output ``` ### 5.4 `judge_adapter.py` ```python """ Adapter: LLM-as-Judge. Objectif: Implémenter le port JudgePort via le DSPy OutputJudge module. """ from prometheus.domain.ports import JudgePort from prometheus.infrastructure.dspy_modules import OutputJudge class DSPyJudgeAdapter(JudgePort): """ Évalue un batch de (input, output) en appelant le Judge pour chaque paire. Optimisation future: paralléliser les appels via dspy.Parallel. Pour le MVP, on reste séquentiel. """ def __init__(self): self._judge = OutputJudge() def judge_batch( self, task_description: str, pairs: list[tuple[str, str]], ) -> list[tuple[float, str]]: results = [] for input_text, output_text in pairs: pred = self._judge( task_description=task_description, input_text=input_text, output_text=output_text, ) results.append((pred.score, pred.feedback)) return results ``` ### 5.5 `proposer_adapter.py` ```python """ Adapter: Reflective Mutation Proposer. Objectif: Implémenter le port ProposerPort via le DSPy InstructionProposer. Convertit les trajectoires en format lisible pour le LLM proposer. """ from prometheus.domain.ports import ProposerPort from prometheus.domain.entities import Prompt, Trajectory from prometheus.infrastructure.dspy_modules import InstructionProposer class DSPyProposerAdapter(ProposerPort): """ Utilise les trajectoires d'évaluation pour construire un "failure report" et proposer un nouveau prompt. """ def __init__(self): self._proposer = InstructionProposer() def propose( self, current_prompt: Prompt, trajectories: list[Trajectory], task_description: str, ) -> Prompt: # Formater les trajectoires en exemples d'échec failure_examples = self._format_failures(trajectories) pred = self._proposer( current_instruction=current_prompt.text, task_description=task_description, failure_examples=failure_examples, ) return Prompt(text=pred.new_instruction) @staticmethod def _format_failures(trajectories: list[Trajectory]) -> str: """ Convertit les trajectoires en un rapport textuel structuré. Format inspiré du InstructionProposalSignature de GEPA: # Example 1 ## Input ## Generated Output ## Score ## Feedback """ sections = [] for i, t in enumerate(trajectories, 1): section = ( f"# Example {i}\n" f"## Input\n{t.input_text}\n\n" f"## Generated Output\n{t.output_text}\n\n" f"## Score\n{t.score:.2f}\n\n" f"## Feedback\n{t.feedback}\n" ) sections.append(section) return "\n---\n".join(sections) ``` ### 5.6 `synth_adapter.py` ```python """ Adapter: Génération d'inputs synthétiques. Objectif: Implémenter le port SyntheticGeneratorPort via DSPy. """ from prometheus.domain.ports import SyntheticGeneratorPort from prometheus.domain.entities import SyntheticExample from prometheus.infrastructure.dspy_modules import SyntheticInputGenerator class DSPySyntheticAdapter(SyntheticGeneratorPort): """ Génère des inputs synthétiques en un seul appel batch via DSPy. """ def __init__(self): self._generator = SyntheticInputGenerator() def generate_inputs( self, task_description: str, n_examples: int, ) -> list[SyntheticExample]: pred = self._generator( task_description=task_description, n_examples=n_examples, ) return [ SyntheticExample( input_text=text, id=i, ) for i, text in enumerate(pred.examples[:n_examples]) ] ``` ### 5.7 `file_io.py` ```python """ File I/O — lecture/écriture des fichiers config et résultats. Objectif: Implémenter le port PersistencePort avec YAML. """ import yaml from prometheus.domain.ports import PersistencePort class YamlPersistence(PersistencePort): """Lit et écrit des fichiers YAML.""" def read_config(self, path: str) -> dict: with open(path, "r", encoding="utf-8") as f: return yaml.safe_load(f) def write_result(self, path: str, data: dict) -> None: with open(path, "w", encoding="utf-8") as f: yaml.dump(data, f, default_flow_style=False, allow_unicode=True) ``` --- ## 6. Couche Présentation — CLI ### Objectif Fournir une interface CLI simple via Typer. Point d'entrée unique: `prometheus optimize -i config.yaml -o result.yaml` ### 6.1 `config.py` ```python """ Configuration globale — pydantic-settings. Objectif: Charger la config depuis fichier + env vars + defaults. """ from __future__ import annotations from dataclasses import dataclass @dataclass class AppSettings: """Settings non-sensibles, hardcoded pour le MVP.""" app_name: str = "prometheus" version: str = "0.1.0" ``` ### 6.2 `cli/app.py` ```python """ CLI — point d'entrée utilisateur. Objectif: Interface Typer avec options -i (input) et -o (output). """ import typer from rich.console import Console from rich.panel import Panel from rich.table import Table import dspy from prometheus.application.dto import OptimizationConfig, OptimizationResult from prometheus.application.use_cases import OptimizePromptUseCase from prometheus.application.bootstrap import SyntheticBootstrap from prometheus.application.evaluator import PromptEvaluator from prometheus.application.evolution import EvolutionLoop from prometheus.infrastructure.file_io import YamlPersistence from prometheus.infrastructure.llm_adapter import DSPyLLMAdapter from prometheus.infrastructure.judge_adapter import DSPyJudgeAdapter from prometheus.infrastructure.proposer_adapter import DSPyProposerAdapter from prometheus.infrastructure.synth_adapter import DSPySyntheticAdapter app = typer.Typer( name="prometheus", help="🔥 PROMETHEUS — Prompt evolution without reference data.", no_args_is_help=True, ) console = Console() @app.command() def optimize( input: str = typer.Option( ..., "-i", "--input", help="Path to input YAML config file.", exists=True, readable=True, ), output: str = typer.Option( "output.yaml", "-o", "--output", help="Path to output YAML result file.", ), verbose: bool = typer.Option( False, "-v", "--verbose", help="Print detailed progress.", ), ) -> None: """ Optimize a prompt without any reference data. Usage: prometheus optimize -i config.yaml -o result.yaml """ console.print(Panel.fit( "🔥 [bold red]PROMETHEUS[/bold red] — Prompt Evolution Engine", subtitle="No reference data required", )) # ── 1. Charger la config ── persistence = YamlPersistence() raw_config = persistence.read_config(input) config = OptimizationConfig( seed_prompt=raw_config["seed_prompt"], task_description=raw_config["task_description"], task_model=raw_config.get("task_model", "openai/gpt-4o-mini"), judge_model=raw_config.get("judge_model", "openai/gpt-4o"), proposer_model=raw_config.get("proposer_model", "openai/gpt-4o"), synth_model=raw_config.get("synth_model", "openai/gpt-4o"), max_iterations=raw_config.get("max_iterations", 30), n_synthetic_inputs=raw_config.get("n_synthetic_inputs", 20), minibatch_size=raw_config.get("minibatch_size", 5), seed=raw_config.get("seed", 42), output_path=output, verbose=verbose, ) console.print(f"[dim]Task: {config.task_description[:80]}...[/dim]") console.print(f"[dim]Seed prompt: {config.seed_prompt[:80]}...[/dim]") # ── 2. Configurer DSPy ── # Modèle principal pour la plupart des opérations task_lm = dspy.LM(config.task_model) judge_lm = dspy.LM(config.judge_model) proposer_lm = dspy.LM(config.proposer_model) synth_lm = dspy.LM(config.synth_model) # ── 3. Construire les adaptateurs (Dependency Injection) ── dspy.configure(lm=task_lm) # default LM synth_adapter = DSPySyntheticAdapter() # Configurer le modèle de synthèse spécifiquement # (Dans le MVP, on utilise le LM par défaut) llm_adapter = DSPyLLMAdapter(model=config.task_model) judge_adapter = DSPyJudgeAdapter() proposer_adapter = DSPyProposerAdapter() bootstrap = SyntheticBootstrap(generator=synth_adapter, seed=config.seed) evaluator = PromptEvaluator(executor=llm_adapter, judge=judge_adapter) use_case = OptimizePromptUseCase( evaluator=evaluator, proposer=proposer_adapter, bootstrap=bootstrap, ) # ── 4. Exécuter ── with console.status("[bold green]Evolving prompt..."): result = use_case.execute(config) # ── 5. Afficher les résultats ── _display_result(result) # ── 6. Sauvegarder ── _save_result(persistence, output, result) console.print(f"\n[green]✅ Results saved to {output}[/green]") def _display_result(result: OptimizationResult) -> None: """Affiche un résumé Rich dans le terminal.""" console.print() console.print(Panel( f"[bold green]Optimized Prompt[/bold green]\n\n{result.optimized_prompt}", title="🔥 Result", )) table = Table(title="Metrics") table.add_column("Metric", style="cyan") table.add_column("Value", style="bold") table.add_row("Initial Score", f"{result.initial_score:.2f}") table.add_row("Final Score", f"{result.final_score:.2f}") table.add_row("Improvement", f"{result.improvement:+.2f}") table.add_row("Iterations", str(result.iterations_used)) table.add_row("LLM Calls", str(result.total_llm_calls)) console.print(table) def _save_result( persistence: YamlPersistence, path: str, result: OptimizationResult, ) -> None: """Sauvegarde le résultat en YAML.""" from dataclasses import asdict persistence.write_result(path, asdict(result)) if __name__ == "__main__": app() ``` --- ## 7. Algorithme Central ### Diagramme de Flux Détaillé ```mermaid flowchart TB START(["prometheus optimize
-i config.yaml
-o result.yaml"]) --> LOAD LOAD["Load config.yaml"] --> INIT_DSPY["Configure DSPy LMs"] INIT_DSPY --> BOOTSTRAP subgraph BOOTSTRAP["Phase 0: Bootstrap"] direction TB B1["DSPySyntheticAdapter
.generate_inputs()"] --> B2["SyntheticInputGenerator
dspy.ChainOfThought
(GenerateSyntheticInputs)"] B2 --> B3["Pool d'inputs synthétiques
[input₁, input₂, ..., input₂₀]"] end B3 --> LOOP_START subgraph LOOP["Phase 1: Evolution Loop (×30)"] direction TB LOOP_START --> SELECT["Garder le meilleur candidat"] SELECT --> SAMPLE["Bootstrap.sample_minibatch()
5 inputs aléatoires"] SAMPLE --> EXEC subgraph EXEC["Evaluate Current"] direction TB E1["DSPyLLMAdapter.execute()
→ 5 outputs"] --> E2["DSPyJudgeAdapter.judge_batch()
→ 5 × (score, feedback)"] E2 --> E3["Construire Trajectories
(input, output, score, feedback)"] end E3 --> CHECK_PERFECT{"All scores ≥ 1.0 ?"} CHECK_PERFECT -->|Yes| NEXT_ITER["Skip → next iteration"] CHECK_PERFECT -->|No| PROPOSE subgraph PROPOSE["Reflective Mutation"] direction TB P1["DSPyProposerAdapter.propose()"] --> P2["Formater failure report
à partir des Trajectories"] P2 --> P3["InstructionProposer
dspy.ChainOfThought
(ProposeInstruction)"] P3 --> P4["new_prompt"] end PROPOSE --> EVAL_NEW subgraph EVAL_NEW["Evaluate New"] direction TB EN1["DSPyLLMAdapter.execute()
→ 5 outputs"] --> EN2["DSPyJudgeAdapter.judge_batch()
→ 5 × (score, feedback)"] end EVAL_NEW --> ACCEPT{"new_score > old_score ?"} ACCEPT -->|Yes| UPDATE["best_candidate = nouveau"] ACCEPT -->|No| NEXT_ITER UPDATE --> NEXT_ITER end NEXT_ITER --> MORE{"iterations < max ?"} MORE -->|Yes| SELECT MORE -->|No| SAVE["Sauvegarder output.yaml"] SAVE --> DONE(["✅ Done"]) style BOOTSTRAP fill:#0f3460,stroke:#00d2ff,color:#fff style LOOP fill:#1a1a2e,stroke:#e94560,color:#fff style EXEC fill:#16213e,stroke:#00d2ff,color:#fff style PROPOSE fill:#16213e,stroke:#e94560,color:#fff style EVAL_NEW fill:#16213e,stroke:#00d2ff,color:#fff ``` ### Budget LLM Détaillé par Itération ``` Itération type (minibatch_size=5): ┌──────────────────────────────────────┬──────────┐ │ Opération │ Appels │ ├──────────────────────────────────────┼──────────┤ │ Execute current (task_lm) │ 5 │ │ Judge current (judge_lm) │ 5 │ │ Propose new (proposer_lm) │ 1 │ │ Execute new (task_lm) │ 5 │ │ Judge new (judge_lm) │ 5 │ ├──────────────────────────────────────┼──────────┤ │ TOTAL par itération │ 21 │ ├──────────────────────────────────────┼──────────┤ │ Bootstrap │ 1 │ │ 30 itérations × 21 │ 630 │ ├──────────────────────────────────────┼──────────┤ │ TOTAL MVP │ ~631 │ └──────────────────────────────────────┴──────────┘ ``` --- ## 8. Format des Fichiers I/O ### 8.1 Input: `config.yaml` ```yaml # PROMETHEUS Configuration File # ================================== # Le prompt initial à optimiser seed_prompt: | Tu es un assistant expert en analyse de contrats. Analyse le texte fourni et identifie les clauses potentiellement abusives. Sois précis et cite les passages concernés. # Description de la tâche (utilisé pour générer les inputs synthétiques) task_description: | Analyse juridique de contrats pour identifier les clauses abusives. L'assistant doit examiner un texte de contrat et signaler toute clause qui pourrait être considérée comme abusive selon le droit de la consommation français. # Modèles LLM (format DSPy/litellm) task_model: "openai/gpt-4o-mini" judge_model: "openai/gpt-4o" proposer_model: "openai/gpt-4o" synth_model: "openai/gpt-4o" # Paramètres d'évolution max_iterations: 30 n_synthetic_inputs: 20 minibatch_size: 5 seed: 42 ``` ### 8.2 Output: `result.yaml` ```yaml # PROMETHEUS Optimization Result # ================================ optimized_prompt: | Tu es un analyste juridique spécialisé en droit de la consommation français. Pour chaque contrat analysé, applique cette méthodologie: 1. Identifie toutes les clauses restrictives pour le consommateur 2. Compare chaque clause aux critères d'abusivité de l'Article L.212-1 3. Signale les clauses abusives avec: le texte exact, le motif d'abusivité, et le risque juridique associé Sois exhaustif et cite systématiquement les passages concernés. initial_prompt: | Tu es un assistant expert en analyse de contrats. Analyse le texte fourni et identifie les clauses potentiellement abusives. Sois précis et cite les passages concernés. initial_score: 6.8 final_score: 8.9 improvement: 2.1 iterations_used: 30 total_llm_calls: 631 history: - iteration: 1 event: "accepted" old_score: 1.2 new_score: 1.8 improvement: 0.6 - iteration: 2 event: "rejected" old_score: 1.8 new_score: 1.5 # ... etc ``` --- ## 9. Configuration & Environnement ### 9.1 Variables d'Environnement ```bash # Requis (si utilisation d'OpenAI) export OPENAI_API_KEY="sk-..." # Optionnel (si utilisation d'autres providers) export ANTHROPIC_API_KEY="..." export TOGETHER_API_KEY="..." # Optionnel export PROMETHEUS_LOG_LEVEL="INFO" # DEBUG pour les traces détaillées ``` ### 9.2 Installation et Exécution ```bash # Installation git clone cd prometheus uv sync # Exécution uv run prometheus optimize -i config.yaml -o result.yaml -v # Avec options uv run prometheus optimize \ -i examples/legal_contract.yaml \ -o results/legal_optimized.yaml \ --verbose ``` --- ## 10. Stratégie de Tests ### 10.1 Pyramide de Tests ``` ┌─────────────┐ │ E2E Test │ test_full_pipeline.py │ (1-2 tests) │ → Mock LLM, vérifie le flux complet ├─────────────┤ │ Integration │ test_dspy_adapters.py │ (3-5 tests) │ → Vraies signatures DSPy, mock LM ├─────────────┤ │ Unit │ test_entities.py │ (10+ tests) │ test_scoring.py │ │ test_evolution.py (avec mocks) └─────────────┘ ``` ### 10.2 `tests/conftest.py` ```python """Shared test fixtures.""" import pytest from unittest.mock import MagicMock from prometheus.domain.entities import ( Prompt, SyntheticExample, Trajectory, EvalResult, Candidate ) @pytest.fixture def seed_prompt(): return Prompt(text="You are a helpful assistant. Answer the question.") @pytest.fixture def task_description(): return "Answer factual questions accurately and concisely." @pytest.fixture def synthetic_pool(): return [ SyntheticExample(input_text=f"Test input {i}", id=i) for i in range(20) ] @pytest.fixture def mock_eval_result(): return EvalResult( scores=[0.3, 0.5, 0.4, 0.6, 0.2], feedbacks=[ "Incomplete answer", "Missing key detail", "Wrong format", "Partially correct", "Completely off topic", ], trajectories=[ Trajectory( input_text=f"Input {i}", output_text=f"Output {i}", score=s, feedback=f, prompt_used="test prompt", ) for i, (s, f) in enumerate(zip( [0.3, 0.5, 0.4, 0.6, 0.2], [ "Incomplete answer", "Missing key detail", "Wrong format", "Partially correct", "Completely off topic", ], )) ], ) @pytest.fixture def mock_llm_port(): """Mock LLMPort that returns canned responses.""" port = MagicMock() port.execute.return_value = "This is a mock response." return port @pytest.fixture def mock_judge_port(): """Mock JudgePort that returns moderate scores.""" port = MagicMock() port.judge_batch.return_value = [ (0.5, "Moderate quality, needs improvement."), ] * 5 return port @pytest.fixture def mock_proposer_port(): """Mock ProposerPort that returns a slightly modified prompt.""" port = MagicMock() port.propose.return_value = Prompt( text="You are a very helpful assistant. Answer the question precisely." ) return port ``` ### 10.3 `tests/unit/test_evolution.py` ```python """Unit tests for the evolution loop — with full mocking.""" import pytest from unittest.mock import MagicMock, patch from prometheus.domain.entities import Prompt, SyntheticExample, EvalResult, Trajectory from prometheus.application.evolution import EvolutionLoop from prometheus.application.evaluator import PromptEvaluator from prometheus.application.bootstrap import SyntheticBootstrap class TestEvolutionLoop: """Teste la logique d'acceptation/rejet de la boucle d'évolution.""" def test_accepts_improvement(self, seed_prompt, synthetic_pool, task_description, mock_llm_port, mock_judge_port, mock_proposer_port): """ Scénario: le nouveau prompt améliore le score. Attendu: le meilleur candidat est mis à jour. """ evaluator = PromptEvaluator(mock_llm_port, mock_judge_port) bootstrap = MagicMock(spec=SyntheticBootstrap) bootstrap.sample_minibatch.return_value = synthetic_pool[:5] # Old eval = low scores, new eval = high scores old_eval = EvalResult( scores=[0.3, 0.4, 0.3, 0.5, 0.2], feedbacks=["bad"] * 5, trajectories=[ Trajectory(f"input{i}", f"output{i}", s, "bad", "prompt") for i, s in enumerate([0.3, 0.4, 0.3, 0.5, 0.2]) ], ) new_eval = EvalResult( scores=[0.8, 0.9, 0.7, 0.8, 0.9], feedbacks=["good"] * 5, trajectories=[], ) # evaluator.evaluate called twice per iteration (old + new) evaluator.evaluate = MagicMock(side_effect=[old_eval, new_eval]) loop = EvolutionLoop( evaluator=evaluator, proposer=mock_proposer_port, bootstrap=bootstrap, max_iterations=1, minibatch_size=5, ) # initial eval initial_eval = MagicMock() initial_eval.total_score = 1.7 with patch.object(loop, '_log'): state = loop.run(seed_prompt, synthetic_pool, task_description) assert state.best_candidate.best_score > 0 def test_rejects_regression(self, seed_prompt, synthetic_pool, task_description, mock_llm_port, mock_judge_port, mock_proposer_port): """ Scénario: le nouveau prompt dégrade le score. Attendu: le meilleur candidat reste inchangé. """ evaluator = PromptEvaluator(mock_llm_port, mock_judge_port) bootstrap = MagicMock(spec=SyntheticBootstrap) bootstrap.sample_minibatch.return_value = synthetic_pool[:5] old_eval = EvalResult( scores=[0.7, 0.8, 0.7, 0.8, 0.9], feedbacks=["ok"] * 5, trajectories=[ Trajectory(f"input{i}", f"output{i}", s, "ok", "prompt") for i, s in enumerate([0.7, 0.8, 0.7, 0.8, 0.9]) ], ) new_eval = EvalResult( scores=[0.2, 0.1, 0.3, 0.2, 0.1], feedbacks=["bad"] * 5, trajectories=[], ) evaluator.evaluate = MagicMock(side_effect=[old_eval, new_eval]) loop = EvolutionLoop( evaluator=evaluator, proposer=mock_proposer_port, bootstrap=bootstrap, max_iterations=1, minibatch_size=5, ) with patch.object(loop, '_log'): state = loop.run(seed_prompt, synthetic_pool, task_description) # Le seed prompt devrait rester le meilleur assert state.best_candidate.prompt.text == seed_prompt.text def test_skips_perfect_scores(self, seed_prompt, synthetic_pool, task_description, mock_llm_port, mock_judge_port, mock_proposer_port): """ Scénario: tous les scores sont parfaits. Attendu: pas de proposition, passage à l'itération suivante. """ evaluator = PromptEvaluator(mock_llm_port, mock_judge_port) bootstrap = MagicMock(spec=SyntheticBootstrap) bootstrap.sample_minibatch.return_value = synthetic_pool[:5] perfect_eval = EvalResult( scores=[1.0, 1.0, 1.0, 1.0, 1.0], feedbacks=["perfect"] * 5, trajectories=[ Trajectory(f"input{i}", f"output{i}", 1.0, "perfect", "prompt") for i in range(5) ], ) evaluator.evaluate = MagicMock(return_value=perfect_eval) loop = EvolutionLoop( evaluator=evaluator, proposer=mock_proposer_port, bootstrap=bootstrap, max_iterations=3, minibatch_size=5, ) with patch.object(loop, '_log'): state = loop.run(seed_prompt, synthetic_pool, task_description) # Le proposer ne devrait jamais avoir été appelé mock_proposer_port.propose.assert_not_called() ``` --- ## 11. Diagrammes d'Architecture ### 11.1 Architecture Hexagonale — Vue Composants ```mermaid flowchart TB subgraph PRESENTATION["🎯 PRESENTATION (CLI)"] CLI["typer CLI
prometheus/cli/app.py"] end subgraph APPLICATION["⚙️ APPLICATION (Use Cases)"] UC["OptimizePromptUseCase"] BOOT["SyntheticBootstrap"] EVAL["PromptEvaluator"] EVO["EvolutionLoop"] end subgraph DOMAIN["💎 DOMAIN (Entities + Ports)"] ENT["Prompt
SyntheticExample
Trajectory
EvalResult
Candidate
OptimizationState"] PORTS["LLMPort
JudgePort
ProposerPort
SyntheticGeneratorPort
PersistencePort"] SCORE["scoring.py"] end subgraph INFRA["🔧 INFRASTRUCTURE (DSPy)"] DSPY_SIG["dspy_signatures.py
GenerateSyntheticInputs
JudgeOutput
ProposeInstruction"] DSPY_MOD["dspy_modules.py
SyntheticInputGenerator
OutputJudge
InstructionProposer"] ADAPTERS["DSPyLLMAdapter
DSPyJudgeAdapter
DSPyProposerAdapter
DSPySyntheticAdapter"] FILE_IO["YamlPersistence"] end CLI -->|"OptimizationConfig"| UC UC --> BOOT UC --> EVO EVO --> EVAL EVO -->|"ProposerPort"| ADAPTERS BOOT -->|"SyntheticGeneratorPort"| ADAPTERS EVAL -->|"LLMPort"| ADAPTERS EVAL -->|"JudgePort"| ADAPTERS ADAPTERS --> DSPY_MOD DSPY_MOD --> DSPY_SIG CLI -->|"PersistencePort"| FILE_IO UC -.->|"depends on"| ENT UC -.->|"depends on"| PORTS EVO -.->|"depends on"| ENT EVO -.->|"depends on"| SCORE EVAL -.->|"depends on"| ENT ADAPTERS -.->|"implements"| PORTS style PRESENTATION fill:#1a1a2e,stroke:#00d2ff,color:#fff style APPLICATION fill:#0f3460,stroke:#00d2ff,color:#fff style DOMAIN fill:#16213e,stroke:#e94560,color:#fff style INFRA fill:#1a1a2e,stroke:#e94560,color:#fff ``` ### 11.2 Dependency Rule ```mermaid flowchart LR CLI["CLI"] --> APP["Application"] APP --> DOMAIN["Domain"] INFRA["Infrastructure"] --> DOMAIN CLI --> INFRA style DOMAIN fill:#e94560,color:#fff style APP fill:#0f3460,color:#fff style INFRA fill:#1a1a2e,color:#fff style CLI fill:#16213e,color:#fff ``` > **Règle**: Les flèches ne vont JAMAIS du Domain vers l'extérieur. > Le Domain ne connaît ni DSPy, ni Typer, ni YAML. ### 11.3 Sequence Diagram — Run Complète ```mermaid sequenceDiagram participant User participant CLI as CLI (Typer) participant UC as OptimizePromptUseCase participant BOOT as SyntheticBootstrap participant SYNTH as DSPySyntheticAdapter participant LOOP as EvolutionLoop participant EVAL as PromptEvaluator participant LLM as DSPyLLMAdapter participant JUDGE as DSPyJudgeAdapter participant PROP as DSPyProposerAdapter participant FS as YamlPersistence User->>CLI: prometheus optimize -i in.yaml -o out.yaml CLI->>FS: read_config("in.yaml") FS-->>CLI: raw_config dict CLI->>UC: execute(config) Note over UC,SYNTH: Phase 0: Bootstrap UC->>BOOT: run(task_desc, 20) BOOT->>SYNTH: generate_inputs(task_desc, 20) SYNTH->>SYNTH: dspy.ChainOfThought(GenerateSyntheticInputs) SYNTH-->>BOOT: [20 SyntheticExample] BOOT-->>UC: synthetic_pool Note over UC,PROP: Phase 1: Evolution loop 30 iterations UC->>LOOP: run(seed_prompt, pool, task_desc) LOOP->>BOOT: sample_minibatch(pool, 5) BOOT-->>LOOP: [5 examples] LOOP->>EVAL: evaluate(current_prompt, batch) EVAL->>LLM: execute(prompt, input) ×5 LLM-->>EVAL: 5 outputs EVAL->>JUDGE: judge_batch(task_desc, pairs) JUDGE->>JUDGE: dspy.ChainOfThought(JudgeOutput) ×5 JUDGE-->>EVAL: [(score, feedback) ×5] EVAL-->>LOOP: EvalResult LOOP->>PROP: propose(prompt, trajectories) PROP->>PROP: dspy.ChainOfThought(ProposeInstruction) PROP-->>LOOP: new Prompt LOOP->>EVAL: evaluate(new_prompt, batch) EVAL->>LLM: execute(new_prompt, input) ×5 EVAL->>JUDGE: judge_batch(task_desc, pairs) EVAL-->>LOOP: new EvalResult alt new_score > old_score LOOP->>LOOP: best = new_prompt end end LOOP-->>UC: OptimizationState UC-->>CLI: OptimizationResult CLI->>FS: write_result("out.yaml", result) CLI-->>User: ✅ Optimized prompt + metrics ``` ### 11.4 Data Flow Diagram ```mermaid flowchart LR subgraph INPUT YAML["config.yaml"] end subgraph GENERATION SYNTH["Synthetic Pool
20 inputs"] end subgraph EVAL["Evaluation Pipeline"] EXEC["Execute
(task_lm)"] JUDGE["Judge
(judge_lm)"] end subgraph PROPOSAL PROP["Propose
(proposer_lm)"] end subgraph OUTPUT RESULT["result.yaml"] end YAML --> SYNTH SYNTH -->|"minibatch 5"| EXEC EXEC -->|"outputs"| JUDGE JUDGE -->|"scores + feedbacks"| PROP PROP -->|"new_prompt"| EXEC JUDGE -->|"scores"| RESULT PROP --> RESULT style INPUT fill:#1a1a2e,stroke:#00d2ff,color:#fff style GENERATION fill:#0f3460,stroke:#00d2ff,color:#fff style EVAL fill:#16213e,stroke:#e94560,color:#fff style PROPOSAL fill:#1a1a2e,stroke:#e94560,color:#fff style OUTPUT fill:#0f3460,stroke:#e94560,color:#fff ``` --- ## Résumé des Sections | Section | Objectif | Fichiers clés | |---------|----------|--------------| | **Domain** | Cœur métier pur, zéro dépendance | `entities.py`, `ports.py`, `scoring.py` | | **Application** | Orchestration métier via les ports | `use_cases.py`, `bootstrap.py`, `evaluator.py`, `evolution.py` | | **Infrastructure** | Implémentation DSPy des ports | `dspy_signatures.py`, `dspy_modules.py`, `*_adapter.py` | | **CLI** | Interface utilisateur Typer | `cli/app.py` | | **I/O** | Config YAML en entrée, résultat YAML en sortie | `file_io.py` | | **Tests** | Pyramide unit → integration → e2e | `tests/` | **Le flux**: `config.yaml` → CLI → UseCase → Bootstrap (synth inputs) → EvolutionLoop (evaluate × propose × accept) × N → `result.yaml`