...
Evaluate taxonomy quality via pointwise rubrics. Use when the user wants to assess whether a taxonomy's dimensions and categories are well-defined enough for reliable LLM labeling — e.g. "evaluate this taxonomy", "run rubrics on my taxonomy", "check taxonomy quality", "are these categories too ambiguous?", "test if this taxonomy is production-ready". Measures inter-run consistency (Cohen's kappa) across N independent labeling passes. Requires a taxonomy JSON; prompts default to a built-in 29-prompt sample.
Erstellt von: songlin she