I’m currently creating a dataset of all publicly available superior, appellate, and federal courts common law decisions from 2018 (just under 13,000 decisions):

  • The dataset will rely on decisions from 2018 vs. 2019 because appeals from 2018 are more likely to have become perfected appeals. So I’ll be able to examine the relationship of dataset features to appeal rates/success. Additionally, 2018 is a representative year in terms of the amount of decisions that each jurisdiction publishes.

  • The dataset will include over 24 features to enable robust descriptive and predictive analysis.

  • Quebec is omitted from the dataset because comparing common and civil law would create potentially misleading conclusions. Such research, however, should be pursued.

I will use this dataset for both descriptive and predictive studies:

  • Descriptive. The descriptive goal is generating descriptive information about court and judicial output; decision length, delivery time, complexity, and structure; judicial behaviour; and appeal rates. Canada currently lacks this information. This dataset will provide a status quo benchmark on Canadian judicial decisions. Aside from legal publishing companies, no one has data on what is the average length, delivery time, and complexity for Canadian decisions. So any guidance to judges about what is an acceptable or desirable length; delivery time; or complexity rests on anecdote.

  • Predictive. The predictive goal is determining what dataset features are statistically significant predictors of four dependent features: decision length, delivery time, complexity, and appeal. I’ll rely on random forest machine learning models to make these predictions.