DeTermIt! Workshop
Evaluating Text Difficulty in a Multilingual Context

21 May 2024, Turin, Italy

Co-Located with the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation LREC-COLING 2024

DeTermIt! 2024 will be an onsite event

Workshop theme

In today's interconnected world, where information dissemination knows no linguistic bounds, it is mandatory to ensure that knowledge is accessible to diverse audiences, regardless of their language proficiency. Automatic Text Simplification (ATS) is the process that involves the reduction of linguistic complexity within a text to enhance its comprehensibility and readability. ATS plays a pivotal role in enhancing content and conveying clear, unambiguous information (comprehensibility and readability) for diverse audiences.
The DeTermIt! workshop builds upon the recent achievements of several initiatives that addressed specific areas within our realm of interest. It is aligned with the CLEF SimpleText Track which provides appropriate reusable data and benchmarks for scientific text summarization and simplification.
DeTermIt! aims to bring together researchers and practitioners in the field of text simplification, with a particular focus on the intersection of lexicography, terminology, and keyword extraction. This workshop will explore the theoretical and practical perspectives surrounding the evaluation of text difficulty in a multilingual context, and it will serve as a platform for discussing advancements, methodologies, and applications in simplification techniques that target different linguistic nuances and audiences.
We welcome contributions that present different viewpoints on automatic text simplification, considering document genres, diverse languages, and the challenges posed by linguistic complexities in general. In particular, we encourage authors to explore: theoretical elements identifying text or lexical complexity and experimental analyses for aligning text with the reading proficiency of diverse audiences. The workshop seeks contributions including, but not limited to, the following themes:

  1. Theoretical Perspectives:
    • Refinement of models and strategies for Automated Text Simplification.
    • Identification of common linguistic patterns and challenges in different languages for ATS.
    • Role of multilingual resources in simplifying complex terminology.
    • Exploration of innovative methodologies for simplifying complex terminologies without compromising meaning.
    • Study the role of lexicography in simplifying texts; for example, the development of lexicons and dictionaries tailored for simplification tasks.
  2. Practical Applications:
    • Creation of effective tools and multilingual resources for linguistic inclusivity.
    • Development and utilization of language resources like bilingual and multilingual glossaries, translation memories, and terminology databases.
    • Evaluation of machine translation and NLP techniques in text simplification across languages.
    • Analysis of practical methods to adapt domain-specific terminology for enhanced accessibility in various fields such as medicine, law, or technology.
    • Creation of lexical resources that assist in the automatic generation of simplified texts across different domains and languages.
    • Enhancement of summarization techniques by effectively identifying and prioritizing key information in simplified content.

We invite original contributions, including research papers, case studies, and system demonstrations. Submissions may include previously unpublished work or work in progress.
When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e., also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC-COLING authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).
Papers must be compliant with the stylesheet adopted for the LREC-COLING conference Proceedings. Workshop Proceedings will be published on the LREC-COLING 2024 website.

Paper types.

Submissions may be of three types:

Authors must submit their papers via the SoftConf platform at the following link DeTermIt! 2024.

Important dates

  • 10 March 2024
  • Submission Deadline
  • 25 March 2024
  • Acceptance Notification
  • 5 April 2024
  • Camera Ready
  • 2 May 2024
  • Workshop Program

General Chairs

Giorgio Maria Di Nunzio - Università degli Studi di Padova, Italy
Federica Vezzani - Università degli Studi di Padova, Italy
Liana Ermakova - Université de Bretagne Occidentale, France
Hosein Azarbonyad - Elsevier, The Netherlands
Jaap Kamps - University of Amsterdam, The Netherlands

Scientific committee

Florian Boudin - Nantes University, France
Lynne Bowker - University of Ottawa, Canada
Sara Carvalho - Universidade NOVA de Lisboa / Universidade de Aveiro, Portugal
Rute Costa - Universidade NOVA de Lisboa, Portugal
Giorgio Maria Di Nunzio - Università degli Studi di Padova, Italy
Eric Gaussier - University Grenoble Alpes, France
Natalia Grabar - CNRS, France
Rodolfo Maslias - Head of Terminology Coordination, European Parliament (2008-2022), Luxembourg
Ana Ostroški Anić - Institute of Croatian Language and Linguistics, Croatia
Horacio Saggion - University Pompeu Fabra
Grigorios Tsoumakas - Aristotle University of Thessaloniki
Sara Vecchiato - University of Udine, Italy
Federica Vezzani - Università degli Studi di Padova, Italy
Cornelia Wermuth - KU Leuven, Belgium

Conference Venue

The DeTermIt! 2024 Wokrshop will be hosted at the Lingotto Centro Congressi, co-Located with the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation LREC-COLING 2024.

Keynote Speaker

We have the pleasure to announce the following keynote speaker:

Prof. Sara Carvalho

University of Aveiro, Aveiro, Portugal

Title: Clear Communication, Better Healthcare: Leveraging Terminological Data for Automatic Text Simplification

Abstract: Effective communication lies at the heart of quality healthcare delivery. Yet, the complexity of medical terminology often creates barriers in patient-healthcare provider interactions and may hamper patient engagement. Automatic Text Simplification (ATS) has emerged as a promising approach for improving the readability and understanding of medical texts. However, the success of ATS systems relies on consistent and structured terminological data. Drawing upon a double-dimensional approach to terminology work, this talk explores how the systematic representation, organisation, and sharing of terminological data in healthcare can contribute to the development of ATS tools that can better address the unique needs of healthcare communication.
While there are still many challenges to overcome in this regard - namely concerning ambiguity, polysemy, and terminological variation, along with conceptual multidimensionality and interoperability -, there is undeniable potential in leveraging terminological data to help advance ATS tools in this subject field (e.g. tailored outputs based on health literacy levels and background knowledge, enhancing the training of machine learning models, as well as the precision of simplification algorithms).
By underscoring the tangible benefits of incorporating terminological data and its underlying organisation principles into the development pipeline of ATS tools, this talk highlights the broader implications of clear communication in healthcare, emphasising its role in improving health literacy, fostering more effective patient-healthcare provider interactions, enhancing patient satisfaction and engagement, and ultimately driving better healthcare outcomes.

Biography: Sara Carvalho is an Assistant Professor at the Department of Languages and Cultures, University of Aveiro, where she teaches courses in the fields of terminology, specialised translation, English and German linguistics, and technical communication.
She holds a PhD in Linguistics, with a specialisation in Lexicology, Lexicography and Terminology, from the Faculty of Social Sciences and Humanities – Universidade NOVA de Lisboa (NOVA FCSH). Her thesis, entitled “A terminological approach to knowledge organisation within the scope of endometriosis: the EndoTerm project”, was developed within the scope of a co-tutelle agreement between the Universidade NOVA de Lisboa and the Communauté Université Grenoble Alpes. She holds an MA in German Studies – specialisation in German Linguistics – from the University of Aveiro (UA), and graduated in Modern Languages and Literature (English and German Studies) at the Faculty of Arts and Humanities – University of Coimbra.
She is a researcher at the Languages, Literatures and Cultures Research Centre of the University of Aveiro (CLLC-UA) and at the Linguistics Research Centre of the Universidade NOVA de Lisboa (NOVA CLUNL). In addition, she is a member of the ISO/TC 37 "Language and terminology" and of the Portuguese mirror committee "CT 221 – Terminologia, Língua e Linguagens" at IPQ. She also integrates the COST Action 18209 - European network for Web-centred linguistic data science, where she currently leads Working Group 4 (Use cases and applications).

Program Outline

Tuesday 21 May 2024

08:30 - 09:00
09:00 - 09:10
Opening and welcome
09:10 - 10:00
Keynote speaker: Sara Carvalho, University of Aveiro, Aveiro, Portugal
Clear Communication, Better Healthcare: Leveraging Terminological Data for Automatic Text Simplification
10:00 - 10:30
Session 1 (short papers: 12 mins presentation + 3 mins questions)
10:00 - 10:15
Plain Language Summarization of Clinical Trials
Polydoros Giannouris, Theodoros Myridis, Tatiana Passali and Grigorios Tsoumakas
10:15 - 10:30
Pre-Gamus: Reducing Complexity of Scientific Literature as a Support against Misinformation
Nico Colic, Jin-Dong Kim and Fabio Rinaldi
10:30 - 11:00
Coffee break
11:00 - 13:00
Session 2 (long papers: 15 mins presentation + 5 mins questions)
11:00 - 11:20
Reproduction of German Text Simplification Systems
Regina Stodden
11:20 - 11:40
Simplification Strategies in French Spontaneous Speech
Lucía Ormaechea, Nikos Tsourakis, Didier Schwab, Pierrette Bouillon and Benjamin Lecouteux
11:40 - 12:00
Towards Automatic Finnish Text Simplification
Anna Dmitrieva and Jörg Tiedemann
12:00 - 12:20
Complexity-Aware Scientific Literature Search: Searching for Relevant and Accessible Scientific Text
Liana Ermakova and Jaap Kamps
12:20 - 12:40
DARES: Dataset for Arabic Readability Estimation of School Materials
Mo El-Haj, Sultan Almujaiwel, Damith Premasiri, Tharindu Ranasinghe and Ruslan Mitkov
12:40 - 13:00
Legal Text Reader Profiling: Evidences from Eye Tracking and Surprisal Based Analysis
Calogero J. Scozzaro, Davide Colla, Matteo Delsanto, Antonio Mastropaolo, Enrico Mensa, Luisa Revelli and Daniele P. Radicioni
13:00 - 14:00
Lunch break
14:00 - 16:00
Session 3 (long papers: 15 mins presentation + 5 mins questions)
14:00 - 14:20
Beyond Sentence-level Text Simplification: Reproducibility Study of Context-Aware Document Simplification
Jan Bakker and Jaap Kamps
14:20 - 14:40
A Multilingual Survey of Recent Lexical Complexity Prediction Resources through the Recommendations of the Complex 2.0 Framework
Matthew Shardlow, Kai North and Marcos Zampieri
14:40 - 15:00
LARGEMED: A Resource for Identifying and Generating Paraphrases for French Medical Terms
Ioana Buhnila and Amalia Todirascu
15:00 - 15:20
Enhancing Lexical Complexity Prediction through Few-shot Learning with Gpt-3
Jenny Alexandra Ortiz-Zambrano, César Humberto Espín-Riofrío and Arturo Montejo-Ráez
15:20 - 15:40
Clearer Governmental Communication: Text Simplification with ChatGPT Evaluated by Quantitative and Qualitative Research
Nadine Beks van Raaij, Daan Kolkman and Ksenia Podoynitsyna
15:40 - 16:00
Simpler Becomes Harder: Do LLMs Exhibit a Coherent Behavior on Simplified Corpora?
Miriam Anschütz, Edoardo Mosca and Georg Groh
16:30 - 17:00
Coffee break
16:30 - 18:00
Session 4 (short papers: 12 mins presentation + 3 mins of questions)
16:30 - 16:45
An Approach towards Unsupervised Text Simplification on Paragraph-Level for German Texts
Leon Fruth, Robin Jegan and Andreas Henrich
16:45 - 17:00
Legal Science and Computer Science: A Preliminary Discussions on How to Represent the "Penumbra" Cone with AI
Angela Condello and Giorgio Maria Di Nunzio
17:00 - 17:15
The Simplification of the Language of Public Administration: The Case of Ombudsman Institutions
Gabriel Gonzalez-Delgado and Borja Navarro-Colorado
17:15 - 17:30
Term Variation in Institutional Languages: Degrees of Specialization in Municipal Waste Management Terminology
Nicola Cirillo and Daniela Vellutino
17:30 - 18:00
