Short answer: Model Evaluator is a verified OpenClaw skill for ai & llms. Trust Score 92/100 based on source transparency, permission scope, install safety, update recency, community signal, and documentation quality.
npx clawhub@latest install model-evaluatorModel Evaluator provides a comprehensive framework for evaluating AI model performance. Run standardized benchmarks, create custom evaluation suites, compare models head-to-head with statistical significance testing, and track quality over time. Supports automated grading with rubrics, human preference collection, and regression detection.
npm install -g clawhub@latestnpx clawhub@latest install model-evaluatoropenclaw skills listThis skill is currently classified as Verified with a low risk profile. Our reviewers inspected the SKILL.md manifest, dependency tree, declared permissions, network calls, and shell commands before publishing this score. See our editorial policy and Trust Score methodology for the full rubric.
Accuracy, fluency, relevance, factuality, latency, cost, and custom rubric scores. All configurable per use case.
Yes — it uses LLM-as-judge with configurable rubrics and supports human preference annotation.
Run `npx clawhub@latest install model-evaluator` from any directory with Claude Code or OpenClaw installed. The skill is added to your local SKILL.md registry and is available to your agent immediately — no restart required.