Does Semantic Search Performs Better than Lexical Search in the Task of Assisting Legal Opinion Writing?

Many of the criminal cases analysed by the Prosecution Office of the Federal District and Territories are repetitive and processing them can be streamlined by providing similar previous cases as template. We investigate the use of information retrieval techniques to enable automated identification o...

Full description

Saved in:
Bibliographic Details
Published in:2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA) pp. 680 - 685
Main Authors: de Souza Costa Pedroso, Daniel, Ladeira, Marcelo, de Paulo Faleiros, Thiago
Format: Conference Proceeding
Language:English
Published: IEEE 01-12-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many of the criminal cases analysed by the Prosecution Office of the Federal District and Territories are repetitive and processing them can be streamlined by providing similar previous cases as template. We investigate the use of information retrieval techniques to enable automated identification of similar cases and evaluate if semantic search performs better than lexical search in the task of assisting legal opinion writing. As a proof of concept, syntactic indexing (TF-IDF and BM25) and semantic indexing (Latent Semantic Indexing - LSI and Latent Dirichlet Allocation - LDA) techniques were evaluated using document collections from two public prosecutors offices. In addition, we evaluate model enrichment with the use of recorded data about the cases, and also with the legal norm citations observed in documents. Baseline document collections sampled from full document collection from two public prosecutors offices were used for model evaluation utilizing Normalized Discounted Cumulated Gain (NDCG) as metric. We conclude that there is no significant performance difference between semantic and syntactic indexing techniques. In addition, we observe no significant performance gain with model enrichment. We chose the BM25 technique as more adequate because it has a good balance between performance and simplicity.
DOI:10.1109/ICMLA.2019.00123