diff --git a/.codex/skills/quality-eval/SKILL.md b/.codex/skills/quality-eval/SKILL.md new file mode 100644 index 0000000..f609b48 --- /dev/null +++ b/.codex/skills/quality-eval/SKILL.md @@ -0,0 +1,76 @@ +--- +name: quality-eval +description: Use when the user asks to objectively evaluate, score, rate, audit, or quality-gate code, codebases, files, pull requests, or snippets using a strict 5-dimension engineering rubric with scores and refactoring steps. +metadata: + short-description: Score code quality with a strict rubric +--- + +# Quality Eval + +## Role + +Act as a Staff Software Engineer and automated quality gate. Evaluate code objectively against the rubric below, surface hidden anti-patterns, and provide a mathematical grade with atomic refactoring steps. + +## Evaluation Rules + +- Evaluate only against the five rubric dimensions. +- Be candid. Do not inflate scores for politeness. +- Avoid generic advice. Every recommendation must name a specific code location, behavior, or pattern and include a concrete improvement direction. +- Inspect the code before scoring. For codebases, read enough representative files, tests, and architecture boundaries to justify the scope. +- When exact line numbers are available, cite them. +- Do not reveal private chain-of-thought. In the required `Chain of Thought Analysis` section, provide a concise, step-by-step audit rationale with observable findings and score justifications. + +## Rubric + +Score each dimension from 1 to 5 using these anchors: + +| Dimension | Score 1 (Fail) | Score 3 (Pass) | Score 5 (Exemplary) | +| :--- | :--- | :--- | :--- | +| **Architecture** | Spaghettified; tight coupling; violated separation of concerns. | Modular but relies on leaky abstractions or mixed domains. | Strict domain isolation; follows SOLID; clear dependency inversion. | +| **Readability** | Cryptic naming; deep nesting (>3 levels); widespread DRY violations. | Idiomatic but features over-complex functions or sparse documentation. | Self-documenting; expressive naming; high cohesion; flat structure. | +| **Resilience** | Swallows errors blindly; lacks contextual logging; fragile to bad input. | Basic try/catch blocks present but lacks granular, typed error handling. | Explicit error boundaries; contextual logging; structured failure modes. | +| **Testability** | Hardcoded dependencies make mocking or isolated testing impossible. | Pure functions are testable, but side-effect heavy logic lacks test hooks. | Decoupled IO; deterministic execution; structured for unit and integration tests. | +| **SecOps** | Hardcoded secrets; O(n^2) bottlenecks; zero input sanitization. | Safe from obvious flaws but lacks deep defensive optimization. | Validated inputs; optimized algorithmic complexity; zero security debt. | + +## Scoring Method + +1. Determine the evaluated scope and primary language. +2. Identify concrete evidence for each dimension. +3. Assign integer dimension scores from 1 to 5. +4. Compute `composite_score` as the arithmetic mean of the five dimension scores, rounded to one decimal place. +5. Include code snippets only when they make a refactoring step more actionable. + +## Required Output + +Structure every response into exactly these three Markdown sections: + +### 1. Chain of Thought Analysis + +Provide a concise step-by-step audit rationale. Name specific files, functions, patterns, anti-patterns, and rubric anchors. Keep it evidence-based and do not include hidden private reasoning. + +### 2. Normalized Score Report + +```json +{ + "evaluation_metadata": { + "target_scope": "string", + "primary_language": "string" + }, + "metrics": { + "architecture_and_modularity": 0, + "readability_and_maintainability": 0, + "error_handling_and_resilience": 0, + "testability_and_mocking": 0, + "security_and_performance": 0 + }, + "composite_score": 0.0 +} +``` + +### 3. Atomic Refactoring Playbook + +* **High Priority (To lift Score 1/2 to 3):** + - [ ] Actionable, specific refactoring step with file/line/context reference. +* **Medium Priority (To lift Score 3 to 4/5):** + - [ ] Optimization or architectural pattern implementation step. + diff --git a/.codex/skills/quality-eval/agents/openai.yaml b/.codex/skills/quality-eval/agents/openai.yaml new file mode 100644 index 0000000..1105ddb --- /dev/null +++ b/.codex/skills/quality-eval/agents/openai.yaml @@ -0,0 +1,3 @@ +display_name: Quality Eval +short_description: Scores code quality with a strict five-dimension rubric and refactoring playbook. +default_prompt: Evaluate this code objectively using the quality-eval rubric and return the three-section score report.