LLM-as-a-Judge is a method where one large language model (LLM) evaluates the output of another model using defined criteria, such as relevance, factual accuracy, helpfulness, or tone.
It’s used as a scalable alternative to manual review: you give the judge model the prompt, the generated answer, and a rubric (a guide or standard), then it returns a score, label, or comparison verdict. This is especially useful for evaluating open-ended text where exact automatic metrics are weak or unavailable.
It’s used as a scalable alternative to manual review: you give the judge model the prompt, the generated answer, and a rubric (a guide or standard), then it returns a score, label, or comparison verdict. This is especially useful for evaluating open-ended text where exact automatic metrics are weak or unavailable.