I recently did a quick language analysis of appellate case briefs to determine whether there were linguistic traits of the briefs that could be used to predict the case outcome. I collected about 100 briefs, from cases where Westlaw had both appellee and appellant briefs. Briefs were then coded by procedural posture, word count, subjectivity, and sentiment polarity. The subjectivity and polarity analysis were both done using Sentiwordnet’s nice Python library http://sentiwordnet.isti.cnr.it/.
Below are a few histograms followed by the results of a logistic regression using case outcome (i.e. whether the brief’s side won or lost on appeal) as the dependent variable.
The above shows the word count histogram for the briefs analyzed. Mean brief length was 11805 words.
This shows the sentiment of the briefs along a polarity scale (-1 to +1). Briefs are very close to neutral and there is very little variance. It would be interesting to compare these scores with other forms of writing, including district court briefs.
The above shows the distribution of language subjectivity used in the briefs (scored 0 to 1). There’s a bit more variance here, but also a clear tendency for lawyers to avoid overly subjective language. Again, it would be interesting to compare these results with district court briefs and other forms of writing.
Below are the results of a logistic regression using the case outcome (i.e. won or lost) as the dependent variable.
Estimate |
Std. Error |
T value |
Pr(>|t|) |
Significance |
|
(Intercept) |
-3.25E-002 |
2.78E-001 |
-0.117 |
0.907 |
|
Subjectivity |
8.15E-001 |
1.06E+000 |
0.766 |
0.446 |
|
Polarity |
1.54E-001 |
1.64E+000 |
0.094 |
0.926 |
|
Word Count |
5.51E-006 |
8.11E-006 |
0.679 |
0.499 |
|
Procedural Posture |
5.74E-001 |
9.02E-002 |
6.364 |
9.87E-009 |
*** |
The take away here is that there appears to be no relationship between how positive or negative the language in a brief is or how subjective the language used is and a case’s outcome. Neither is a brief’s length a useful predictor of how it will fare. The only statistically significant predictor of whether a brief’s side will prevail on appeal is whether they won at the district court level. Appellees are more likely to lose while appellants are more likely to win.
I think the most interesting finding of this little spring break project is that there is very little variance in all of the data measured. Law schools, professional norms, and court rules all serve to homogenize brief styles and lawyerly writing. The result is that analyzing briefs using these sorts of natural language processing techniques doesn’t help much when trying to predict whether a brief will prevail. Perhaps with a huge sample of briefs some significant effects could be measured, but no doubt they’d pale in comparison to the importance of the merits of the case.