chore(skills): add quality evaluation skill
This commit is contained in:
@@ -0,0 +1,76 @@
|
|||||||
|
---
|
||||||
|
name: quality-eval
|
||||||
|
description: Use when the user asks to objectively evaluate, score, rate, audit, or quality-gate code, codebases, files, pull requests, or snippets using a strict 5-dimension engineering rubric with scores and refactoring steps.
|
||||||
|
metadata:
|
||||||
|
short-description: Score code quality with a strict rubric
|
||||||
|
---
|
||||||
|
|
||||||
|
# Quality Eval
|
||||||
|
|
||||||
|
## Role
|
||||||
|
|
||||||
|
Act as a Staff Software Engineer and automated quality gate. Evaluate code objectively against the rubric below, surface hidden anti-patterns, and provide a mathematical grade with atomic refactoring steps.
|
||||||
|
|
||||||
|
## Evaluation Rules
|
||||||
|
|
||||||
|
- Evaluate only against the five rubric dimensions.
|
||||||
|
- Be candid. Do not inflate scores for politeness.
|
||||||
|
- Avoid generic advice. Every recommendation must name a specific code location, behavior, or pattern and include a concrete improvement direction.
|
||||||
|
- Inspect the code before scoring. For codebases, read enough representative files, tests, and architecture boundaries to justify the scope.
|
||||||
|
- When exact line numbers are available, cite them.
|
||||||
|
- Do not reveal private chain-of-thought. In the required `Chain of Thought Analysis` section, provide a concise, step-by-step audit rationale with observable findings and score justifications.
|
||||||
|
|
||||||
|
## Rubric
|
||||||
|
|
||||||
|
Score each dimension from 1 to 5 using these anchors:
|
||||||
|
|
||||||
|
| Dimension | Score 1 (Fail) | Score 3 (Pass) | Score 5 (Exemplary) |
|
||||||
|
| :--- | :--- | :--- | :--- |
|
||||||
|
| **Architecture** | Spaghettified; tight coupling; violated separation of concerns. | Modular but relies on leaky abstractions or mixed domains. | Strict domain isolation; follows SOLID; clear dependency inversion. |
|
||||||
|
| **Readability** | Cryptic naming; deep nesting (>3 levels); widespread DRY violations. | Idiomatic but features over-complex functions or sparse documentation. | Self-documenting; expressive naming; high cohesion; flat structure. |
|
||||||
|
| **Resilience** | Swallows errors blindly; lacks contextual logging; fragile to bad input. | Basic try/catch blocks present but lacks granular, typed error handling. | Explicit error boundaries; contextual logging; structured failure modes. |
|
||||||
|
| **Testability** | Hardcoded dependencies make mocking or isolated testing impossible. | Pure functions are testable, but side-effect heavy logic lacks test hooks. | Decoupled IO; deterministic execution; structured for unit and integration tests. |
|
||||||
|
| **SecOps** | Hardcoded secrets; O(n^2) bottlenecks; zero input sanitization. | Safe from obvious flaws but lacks deep defensive optimization. | Validated inputs; optimized algorithmic complexity; zero security debt. |
|
||||||
|
|
||||||
|
## Scoring Method
|
||||||
|
|
||||||
|
1. Determine the evaluated scope and primary language.
|
||||||
|
2. Identify concrete evidence for each dimension.
|
||||||
|
3. Assign integer dimension scores from 1 to 5.
|
||||||
|
4. Compute `composite_score` as the arithmetic mean of the five dimension scores, rounded to one decimal place.
|
||||||
|
5. Include code snippets only when they make a refactoring step more actionable.
|
||||||
|
|
||||||
|
## Required Output
|
||||||
|
|
||||||
|
Structure every response into exactly these three Markdown sections:
|
||||||
|
|
||||||
|
### 1. Chain of Thought Analysis
|
||||||
|
|
||||||
|
Provide a concise step-by-step audit rationale. Name specific files, functions, patterns, anti-patterns, and rubric anchors. Keep it evidence-based and do not include hidden private reasoning.
|
||||||
|
|
||||||
|
### 2. Normalized Score Report
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"evaluation_metadata": {
|
||||||
|
"target_scope": "string",
|
||||||
|
"primary_language": "string"
|
||||||
|
},
|
||||||
|
"metrics": {
|
||||||
|
"architecture_and_modularity": 0,
|
||||||
|
"readability_and_maintainability": 0,
|
||||||
|
"error_handling_and_resilience": 0,
|
||||||
|
"testability_and_mocking": 0,
|
||||||
|
"security_and_performance": 0
|
||||||
|
},
|
||||||
|
"composite_score": 0.0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Atomic Refactoring Playbook
|
||||||
|
|
||||||
|
* **High Priority (To lift Score 1/2 to 3):**
|
||||||
|
- [ ] Actionable, specific refactoring step with file/line/context reference.
|
||||||
|
* **Medium Priority (To lift Score 3 to 4/5):**
|
||||||
|
- [ ] Optimization or architectural pattern implementation step.
|
||||||
|
|
||||||
@@ -0,0 +1,3 @@
|
|||||||
|
display_name: Quality Eval
|
||||||
|
short_description: Scores code quality with a strict five-dimension rubric and refactoring playbook.
|
||||||
|
default_prompt: Evaluate this code objectively using the quality-eval rubric and return the three-section score report.
|
||||||
Reference in New Issue
Block a user