LJPCheck: Functional Tests for Legal Judgment Prediction

Yuan Zhang, Wanhong Huang, Yi Feng, Chuanyi Li, Zhiwei Fei, Jidong Ge, Bin Luo, and Vincent Ng.
Findings of the Association for Computational Linguistics: ACL 2024, pp. 5787-5894, 2024.

Click here for the PDF version.

Abstract

Legal Judgment Prediction (LJP) refers to the task of automatically predicting judgment results (e.g., charges, law articles and term of penalty) given the fact description of cases. While SOTA models have achieved high accuracy and F1 scores on public datasets, existing datasets fail to evaluate specific aspects of these models (e.g., legal fairness, which significantly impact their applications in real scenarios). Inspired by functional testing in software engineering, we introduce LJPCHECK, a suite of functional tests for LJP models, to comprehend LJP models’ behaviors and offer diagnostic insights. We illustrate the utility of LJPCHECK on five SOTA LJP models. Extensive experiments reveal vulnerabilities in these models, prompting an in-depth discussion into the underlying reasons of their shortcomings.

BibTeX entry

@InProceedings{Zhang+etal:24a,
  author = {Yuan Zhang and Wanhong Huang and Yi Feng and Chuanyi Li and Zhiwei Fei and Jidong Ge and Bin Luo and Vincent Ng},
  title = {LJPCheck: Functional Tests for Legal Judgment Prediction},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2024},
  pages = {5878--5894}, 
  year = 2024}