Legal domain is a challenging focus of attention for scholars in computer science and related fields as it lends itself to a unique blend of research opportunities at convergence not only with law and jurisprudence, but also humanities, linguistics, social sciences, economics, cognitive psychology, and other disciplines. This has been long witnessed by a number of venues for developing and publishing computer-science-related research studies applied to the legal domain, for which the volume of data of interest is rapidly growing, also thanks to the support of Internet and online media platforms. Moreover, recent breakthroughs in data science, machine learning, and cybersecurity, have unveiled a range of new opportunities and solutions for dealing with legal information sources and providing a deeper understanding of laws, legal systems, legal reasoning, and the role and impact of laws in our society. The aim of the Special Issue on “Managing, Mining and Learning in the Legal Data Domain” was to address relevant and challenging topics related to the processing, management and analysis of legal databases and text corpora, covering models, methodologies, algorithms, evaluation benchmarks and tools for the development and application of legal information systems and knowledge engineering. The Call-for-Papers relating to the Special Issue was open to all interested scholars working in the field cross-cutting information and knowledge-based systems, artificial intelligence, data science, and legal informatics. We also point out that contributing authors at the CAiSE for Legal Documents (COUrT) Workshop — held in conjunction with the 32nd International Conference on Advanced Information Systems Engineering (CAiSE 2020) — were invited to submit a revised and extended version of their workshop papers at this Special Issue. The Special Issue on “Managing, Mining and Learning in the Legal Data Domain” includes 8 accepted articles, out of 15 received manuscripts, each subjected to a peer-reviewed evaluation process, with two, or even three rounds of review for some papers. In the following, we briefly introduce the featured articles. “Legal Information Retrieval systems: State-of-the-art and open issues”, by Carlo Sansone and Giancarlo Sperlí, offers an overview of information retrieval frameworks for legal data, which includes a discussion of the main concepts and tasks involved, and the corresponding approaches based on natural language processing, ontology management, and deep learning. “ECHR-OD: On building an integrated open repository of legal documents for machine learning applications”, by Alexandre Quemy and Robert Wrembel, presents a novel database for storing and managing all cases judged by the European Court of Human Rights. This repository is aimed at being automatically maintained, and intended to be used as a unified benchmark to compare machine learning methods for the legal domain. The authors have provided the whole data extraction, transformation, integration, and loading pipeline used to generate the benchmark data repository, as open-source software. “A knowledge-centered framework for exploration and retrieval of legal documents”, by Silvana Castano, Mattia Falduti, Alfio Ferrara, and Stefano Montanelli, proposes CRIKE (CRIme Knowledge Extraction), a knowledge-based framework conceived to support legal knowledge extraction from a collection of legal documents, based on a reference legal ontology, LATO (Legal Abstract Term Ontology). The CRIKE framework is conceived to improve the automatic recognition of new terminology useful for document exploration and retrieval, thus reducing the involvement of legal domain experts in manual annotation tasks. “Lynx: A knowledge-based AI service platform for content processing, enrichment and analysis for the legal domain”, by Julián Moreno Schneider, Georg Rehm, Elena Montiel-Ponsoda, Victor Rodriguez-Doncel, Patricia Martin-Chozas, Maria Navas-Loro, Martin Kaltenböck, Artem Revenko, Sotirios Karampatakis, Christian Sageder, Jorge Gracia, Filippo Maganza, Ilan Kernerman, Dorielle Lonke, Andis Lagzdins, Julia Bosque Gil, Pieter Verhoeven, Elsa Gomez Diaz, and Pascual Boil Ballesteros, presents the EU-funded project Lynx, which focuses on the creation of a knowledge graph for the legal domain and its use for the semantic and linguistic processing, analysis and enrichment of legal documents. The legal knowledge graph is multilingual, so as to capture the legal landscape in the multilingual Europe area. The Lynx project consists of three use cases — geothermal energy challenges, contract analysis, and labour law — which demonstrate the usefulness of the legal knowledge graph, the Lynx services and the service platform. “Graph-based managing and mining of processes and data in the domain of intellectual property”, by Gerd Hübscher, Verena Geist, Dagmar Auer, Andreas Ekelhart, Rudolf Mayer, Stefan Nadschläger, and Josef Küng, addresses the need of integrating knowledge work and administrative tasks in communication-intensive contexts, such as in the legal domain. To this purpose, a bottom-up approach is proposed that applies a continuously evolving graph of integrated data objects and tasks to model and store static and dynamic aspects of administrative as well as knowledge work. The proposed method is tested in a scenario of intellectual property protection. “The GDPR enforcement fines at glance”, by Jukka Ruohonen and Kalle Hjerppe, investigates on the General Data Protection Regulation (GDPR) articles referenced in the enforcement decisions, and studies how to predict the amount of enforcement fines with available meta-data and text mining features extracted from the enforcement decision documents. The predictions are discussed in terms of the GDPR administrative and political aspects, as well as in the context of a broader debate on automatic decision-making systems used in the public sector. The authors have made their evaluation data available to the research community. “Exploiting co-occurrence networks for classification of implicit inter-relationships in legal texts”, by Emilio Sulis, Llio Humphreys, Fabiana Vernero, Ilaria Angela Amantea, Davide Audrito, and Luigi Di Caro, describes a general framework for the identification and classification of implicit relations between parts of a legal text. Based on a co-occurrence network of the law terms, a binary classification task is defined to identify the existence and the type of inter-relationships by using a bag-of-ngrams model integrated with network analysis features. “Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training”, by Dezhao Song, Andrew Vold, Kanika Madan, and Frank Schilder, presents a deep learning architecture that adopts domain-specific pre-training of a RoBERTa model, along with a label-attention mechanism for multi-label document classification, and label-attended multi-task learning for handling low-frequency classes. The proposed method is tested on a newly created corpus of legal opinions and their manually labeled legal procedural postures. In summary, the contributing articles to this Special Issue have provided advances on legal data management, mining and learning, with both timely theoretical and application-oriented studies that can help enhance our understanding of problems at convergence of databases and data mining, machine learning, natural language processing, and law. We expect that this Special Issue will trigger further development on legal data analysis and related topics. The Guest Editors wish to thank the authors for making this Special Issue with their contributing articles, as well as the reviewers for timely providing their comments and suggestions throughout a multi-stage reviewing process. We are also grateful to the staff of the Information Systems Journal for their help and kind support in the production of this Special Issue.

“Managing, Mining and Learning in the Legal Data Domain” / Tagarelli, Andrea; Zumpano, Ester; Anasiasiu, David C.; Cali', Andrea; Vossen, Gottfried. - In: INFORMATION SYSTEMS. - ISSN 0306-4379. - 106:(2022). [10.1016/j.is.2022.101981]

“Managing, Mining and Learning in the Legal Data Domain”

Andrea Cali';
2022

Abstract

Legal domain is a challenging focus of attention for scholars in computer science and related fields as it lends itself to a unique blend of research opportunities at convergence not only with law and jurisprudence, but also humanities, linguistics, social sciences, economics, cognitive psychology, and other disciplines. This has been long witnessed by a number of venues for developing and publishing computer-science-related research studies applied to the legal domain, for which the volume of data of interest is rapidly growing, also thanks to the support of Internet and online media platforms. Moreover, recent breakthroughs in data science, machine learning, and cybersecurity, have unveiled a range of new opportunities and solutions for dealing with legal information sources and providing a deeper understanding of laws, legal systems, legal reasoning, and the role and impact of laws in our society. The aim of the Special Issue on “Managing, Mining and Learning in the Legal Data Domain” was to address relevant and challenging topics related to the processing, management and analysis of legal databases and text corpora, covering models, methodologies, algorithms, evaluation benchmarks and tools for the development and application of legal information systems and knowledge engineering. The Call-for-Papers relating to the Special Issue was open to all interested scholars working in the field cross-cutting information and knowledge-based systems, artificial intelligence, data science, and legal informatics. We also point out that contributing authors at the CAiSE for Legal Documents (COUrT) Workshop — held in conjunction with the 32nd International Conference on Advanced Information Systems Engineering (CAiSE 2020) — were invited to submit a revised and extended version of their workshop papers at this Special Issue. The Special Issue on “Managing, Mining and Learning in the Legal Data Domain” includes 8 accepted articles, out of 15 received manuscripts, each subjected to a peer-reviewed evaluation process, with two, or even three rounds of review for some papers. In the following, we briefly introduce the featured articles. “Legal Information Retrieval systems: State-of-the-art and open issues”, by Carlo Sansone and Giancarlo Sperlí, offers an overview of information retrieval frameworks for legal data, which includes a discussion of the main concepts and tasks involved, and the corresponding approaches based on natural language processing, ontology management, and deep learning. “ECHR-OD: On building an integrated open repository of legal documents for machine learning applications”, by Alexandre Quemy and Robert Wrembel, presents a novel database for storing and managing all cases judged by the European Court of Human Rights. This repository is aimed at being automatically maintained, and intended to be used as a unified benchmark to compare machine learning methods for the legal domain. The authors have provided the whole data extraction, transformation, integration, and loading pipeline used to generate the benchmark data repository, as open-source software. “A knowledge-centered framework for exploration and retrieval of legal documents”, by Silvana Castano, Mattia Falduti, Alfio Ferrara, and Stefano Montanelli, proposes CRIKE (CRIme Knowledge Extraction), a knowledge-based framework conceived to support legal knowledge extraction from a collection of legal documents, based on a reference legal ontology, LATO (Legal Abstract Term Ontology). The CRIKE framework is conceived to improve the automatic recognition of new terminology useful for document exploration and retrieval, thus reducing the involvement of legal domain experts in manual annotation tasks. “Lynx: A knowledge-based AI service platform for content processing, enrichment and analysis for the legal domain”, by Julián Moreno Schneider, Georg Rehm, Elena Montiel-Ponsoda, Victor Rodriguez-Doncel, Patricia Martin-Chozas, Maria Navas-Loro, Martin Kaltenböck, Artem Revenko, Sotirios Karampatakis, Christian Sageder, Jorge Gracia, Filippo Maganza, Ilan Kernerman, Dorielle Lonke, Andis Lagzdins, Julia Bosque Gil, Pieter Verhoeven, Elsa Gomez Diaz, and Pascual Boil Ballesteros, presents the EU-funded project Lynx, which focuses on the creation of a knowledge graph for the legal domain and its use for the semantic and linguistic processing, analysis and enrichment of legal documents. The legal knowledge graph is multilingual, so as to capture the legal landscape in the multilingual Europe area. The Lynx project consists of three use cases — geothermal energy challenges, contract analysis, and labour law — which demonstrate the usefulness of the legal knowledge graph, the Lynx services and the service platform. “Graph-based managing and mining of processes and data in the domain of intellectual property”, by Gerd Hübscher, Verena Geist, Dagmar Auer, Andreas Ekelhart, Rudolf Mayer, Stefan Nadschläger, and Josef Küng, addresses the need of integrating knowledge work and administrative tasks in communication-intensive contexts, such as in the legal domain. To this purpose, a bottom-up approach is proposed that applies a continuously evolving graph of integrated data objects and tasks to model and store static and dynamic aspects of administrative as well as knowledge work. The proposed method is tested in a scenario of intellectual property protection. “The GDPR enforcement fines at glance”, by Jukka Ruohonen and Kalle Hjerppe, investigates on the General Data Protection Regulation (GDPR) articles referenced in the enforcement decisions, and studies how to predict the amount of enforcement fines with available meta-data and text mining features extracted from the enforcement decision documents. The predictions are discussed in terms of the GDPR administrative and political aspects, as well as in the context of a broader debate on automatic decision-making systems used in the public sector. The authors have made their evaluation data available to the research community. “Exploiting co-occurrence networks for classification of implicit inter-relationships in legal texts”, by Emilio Sulis, Llio Humphreys, Fabiana Vernero, Ilaria Angela Amantea, Davide Audrito, and Luigi Di Caro, describes a general framework for the identification and classification of implicit relations between parts of a legal text. Based on a co-occurrence network of the law terms, a binary classification task is defined to identify the existence and the type of inter-relationships by using a bag-of-ngrams model integrated with network analysis features. “Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training”, by Dezhao Song, Andrew Vold, Kanika Madan, and Frank Schilder, presents a deep learning architecture that adopts domain-specific pre-training of a RoBERTa model, along with a label-attention mechanism for multi-label document classification, and label-attended multi-task learning for handling low-frequency classes. The proposed method is tested on a newly created corpus of legal opinions and their manually labeled legal procedural postures. In summary, the contributing articles to this Special Issue have provided advances on legal data management, mining and learning, with both timely theoretical and application-oriented studies that can help enhance our understanding of problems at convergence of databases and data mining, machine learning, natural language processing, and law. We expect that this Special Issue will trigger further development on legal data analysis and related topics. The Guest Editors wish to thank the authors for making this Special Issue with their contributing articles, as well as the reviewers for timely providing their comments and suggestions throughout a multi-stage reviewing process. We are also grateful to the staff of the Information Systems Journal for their help and kind support in the production of this Special Issue.
2022
“Managing, Mining and Learning in the Legal Data Domain” / Tagarelli, Andrea; Zumpano, Ester; Anasiasiu, David C.; Cali', Andrea; Vossen, Gottfried. - In: INFORMATION SYSTEMS. - ISSN 0306-4379. - 106:(2022). [10.1016/j.is.2022.101981]
File in questo prodotto:
File Dimensione Formato  
01.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 254.96 kB
Formato Adobe PDF
254.96 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/990719
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact