Special Day on Equity, Diversity and Inclusion

Promoting Inclusivity through Natural Language Processing

Speaker: Moreno La Quatra
Pronouns: He/Him/His

moreno.laquatra@unikore.it
https://mlaquatra.me

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Outline

  1. What is Inclusive Language
  2. Biases in Language Models
  3. Multilingual Inclusivity
  4. The E-MIMIC Project
  5. The Inclusively Tool
Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

What is Inclusive Language?

Inclusive language avoids the use of expressions that might be considered to exclude particular groups of people, especially those already marginalized.

Goal: To ensure that communication is fair, respectful, and does not exclude anyone based on gender, race, disability, or any other characteristic.

center

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Why is Inclusive Language Important?

  • Promotes Equality: Everyone should feel included and respected.
  • Prevents Discrimination: Avoids reinforcing stereotypes.
  • Legal Compliance: required in legal, academic, and corporate settings.

Inclusive language goes beyond political correctness; it's about making sure our communication truly reflects the diversity and equality we value in society.

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Examples of Inclusive Language in English

Category Non-Inclusive Inclusive
Race Third World Countries Low-income countries
LGBTQ+ Homosexual LGBTQ+ individual
Disability Mentally ill Person with a mental health condition
Gender Chairman Chairperson

University of Edinburgh Inclusive Language Guide, https://blogs.ed.ac.uk/website-communications/inclusive-language-guide/

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Challenges in Gendered Languages

Gendered languages like Italian, Spanish, and French inherently include gender in their grammar, making inclusivity more complex.

๐Ÿ‡ฎ๐Ÿ‡น I cittadini sono responsabili per il loro paese.
The male citizens are responsible for their country.

๐Ÿ‡ซ๐Ÿ‡ท Les รฉtudiants doivent soumettre leurs devoirs.
The male students must submit their assignments.

๐Ÿ‡ช๐Ÿ‡ธ Los doctores estรกn en la sala de conferencias.
The male doctors are in the conference room.

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Non-Inclusive Language in AI Models

  • Language models are often trained on vast datasets that include non-inclusive, biased language.
  • Impact: These models, widely used in content creation, reinforce and propagate biases across various platforms.
Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

center

center


Samples generated using Microsoft's Phi 3 Instruct model.

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

The Need for Inclusive Language Models

The use of AI models has relevant implications, making it essential to address these biases [2][3].

  • Retrain models with inclusive datasets covering all aspects of diversity (gender, race, disability, etc.).
  • Degign methods to detect and correct non-inclusive language.
  • Incorporate continuous feedbacks to enhance model inclusivity.
Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

In this talk, instead, we will focus on a different perspective.

center

Leverage linguistic knowledge and advanced writing abilities to detect and rewrite non-inclusive language in text.

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Training LMs to Detect and Correct Biases

The goal is to specialize language models to detect and correct non-inclusive language. This involves two main stages:

  1. Detection: Identifying biased or non-inclusive expressions.
  2. Rewriting: Suggesting more inclusive alternatives.

center

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

The E-MIMIC Project

Empower Multilingual Inclusive Communication (E-MIMIC) by developing AI tools that promote inclusivity across multiple languages [4].

center

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

E-MIMIC Overview

  • ๐ŸŒ Multilingual Challenge: Addressing inclusivity across languages with different grammatical structures.
  • ๐Ÿ’พ Data Collection: Annotating large corpora of text in multiple languages to identify non-inclusive language patterns.
  • ๐Ÿค– Model Development: Designing, training, and evaluating AI models.
Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Data Annotation in E-MIMIC

Expert-annotated data is crucial for training and evaluating models. The E-MIMIC project employs a multi-faceted annotation approach to capture the complexity of language use.

  • Human-in-the-Loop: Experts annotate texts to identify non-inclusive language.
  • Multi-Faceted Annotations: Beyond inclusiveness, annotations also cover legal constraints, communication type (web, legal, administrative, academic), and other relevant aspects.
Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Sentence: "The chairman must approve the budget"

  • Class of Inclusiveness:
    โ˜ Inclusive
    โ˜ Not-pertinent
    โ˜ Non-inclusive

  • Intended Context of Use:
    โ˜ Standard
    โ˜ Specialized
    โ˜ Informative/Educational

  • Discourse Type or Genre:
    โ˜ Legal
    โ˜ Administrative
    โ˜ Technical
    โ˜ Informative/Educational

  • Reformulation:
    [Insert reformulation here]


[4] La Quatra, Moreno, Salvatore Greco, Luca Cagliero, Michela Tonti, Francesca Dragotto, Rachele Raus, Stefania Cavagnoli, and Tania Cerquitelli. "Building Foundations for Inclusiveness through Expert-Annotated Data." In EDBT/ICDT Workshops, 2024. https://ceur-ws.org/Vol-3651/DARLI-AP-3.pdf

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

โŒ ๐Ÿ‡ช๐Ÿ‡ธ Los alumnos necesitan sus libros.
The male students need their books.

โœ… ๐Ÿ‡ช๐Ÿ‡ธ El alumnado necesita sus libros.
The students need their books.


โŒ ๐Ÿ‡ซ๐Ÿ‡ท Le directeur doit signer les documents.
The male director must sign the documents.

โœ… ๐Ÿ‡ซ๐Ÿ‡ท La direction doit signer les documents.
The management must sign the documents.


Using neutral terms like "management" instead of gender-specific titles.

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

The Inclusively Tool: Overview

A writing tool designed to detect and correct non-inclusive language.

  • Text analysis for inclusivity.
  • Possibility to provide feedbacks (human-in-the-loop).
  • Model analysis and explanations for data experts.

center

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Inclusively Tool: Writing Assistant

center

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Inclusively Tool: Feedbacks

center

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Inclusively Tool: Model Analysis

center

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Inclusively is a tool developed within the E-MIMIC project to promote inclusive writing practices [5].

  • Color-coded proposals for users.
  • Feedback Mechanism for expert who can provide suggestions.
  • Model Analysis to provides insights for data experts and researchers.

๐Ÿ‡ฎ๐Ÿ‡น Actively Working Demo
๐Ÿ‡ช๐Ÿ‡ธ ๐Ÿ‡ซ๐Ÿ‡ท Coming Soon


Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Future Directions & Call to Action

๐ŸŒ Expand Language Support
Support more languages, including less-studied ones.

๐Ÿค Global Collaboration
Work with experts to improve inclusivity standards.

๐Ÿš€ Innovate Inclusivity
Create new methods to easy the use of inclusive language.

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

Takeaways

๐Ÿณ๏ธโ€๐ŸŒˆ Inclusive Language
Promotes equality and prevents discrimination.

๐Ÿค– Bias in AI Models
AI models can spread biases; fixing this is crucial.

๐ŸŒ E-MIMIC & Inclusively
AI tools for inclusive, multilingual communication.

๐Ÿš€ Future Directions
Expand, collaborate, and innovate for inclusivity.

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion

References













[1] University of Edinburgh Inclusive Language Guide, https://blogs.ed.ac.uk/website-communications/inclusive-language-guide/

[2] Bartl, M., & Leavy, S. (2024). "From โ€˜Showgirlsโ€™ to โ€˜Performersโ€™: Fine-tuning with Gender-inclusive Language for Bias Reduction in LLMs." In Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP) (pp. 280โ€“294). Bangkok, Thailand: Association for Computational Linguistics. https://aclanthology.org/2024.gebnlp-1.18/

[3] Gupta, V., Venkit, P. N., Wilson, S., & Passonneau, R. (2024). "Sociodemographic Bias in Language Models: A Survey and Forward Path." In Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP) (pp. 295โ€“322). Bangkok, Thailand: Association for Computational Linguistics. https://aclanthology.org/2024.gebnlp-1.19/

[4] Attanasio, G., Greco, S., La Quatra, M., Cagliero, L., Tonti, M., Cerquitelli, T., & Raus, R. (2021). "E-mimic: Empowering multilingual inclusive communication." In 2021 IEEE International Conference on Big Data (Big Data) (pp. 4227-4234). IEEE. https://ieeexplore.ieee.org/document/9671868

[5] La Quatra, M., Greco, S., Cagliero, L., & Cerquitelli, T. (2023). "Inclusively: An AI-based Assistant for Inclusive Writing." In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 361-365). Cham: Springer Nature Switzerland. https://link.springer.com/chapter/10.1007/978-3-031-43430-3_31

[6] La Quatra, M., Greco, S., Cagliero, L., Tonti, M., Dragotto, F., Raus, R., Cavagnoli, S., & Cerquitelli, T. (2024). "Building Foundations for Inclusiveness through Expert-Annotated Data." In EDBT/ICDT Workshops. https://ceur-ws.org/Vol-3651/DARLI-AP-3.pdf

[7] Rรถttger, P., Vidgen, B., Hovy, D., & Pierrehumbert, J. (2022). "Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks." In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 175โ€“190). Seattle, United States: Association for Computational Linguistics. https://aclanthology.org/2022.naacl-main.13/

[8] Piergentili, A., Savoldi, B., Fucci, D., Negri, M., & Bentivogli, L. (2023). "Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus." In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 14124โ€“14140). Singapore: Association for Computational Linguistics. https://aclanthology.org/2023.emnlp-main.873/

[9] Zhu, S., Du, B., Zhao, J., Liu, Y., & Liu, P. (2024). "Do PLMs and Annotators Share the Same Gender Bias? Definition, Dataset, and Framework of Contextualized Gender Bias." In Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP) (pp. 20โ€“32). Bangkok, Thailand: Association for Computational Linguistics. https://aclanthology.org/2024.gebnlp-1.2/

Moreno La Quatra, Kore University of Enna @ KDD 2024
Special Day on Equity, Diversity and Inclusion
Thank you for your attention!

Speaker: Moreno La Quatra
Pronouns: He/Him/His
๐Ÿซ Institution: Kore University of Enna, Italy
โœ‰๏ธ Email: moreno.laquatra@unikore.it
๐ŸŒ Website: https://mlaquatra.me
๐Ÿฆ Twitter: @MorenoLaQuatra


That's all guys everyone! ๐Ÿ˜Š
Moreno La Quatra, Kore University of Enna @ KDD 2024

language in shaping perceptions and attitudes, and it can have a significant impact on how people are treated and perceived in society.

** Third World Countries -> Low-income countries: "Third World" is outdated and can carry negative connotations, implying inferiority. ** Homosexual -> LGBTQ+ individual: The term "homosexual" has clinical and outdated connotations that can be seen as offensive. "LGBTQ+ individual" is more inclusive, reflecting a broader spectrum of sexual orientations and identities. ** Mentally ill -> Person with a mental health condition: "Mentally ill" can be stigmatizing and dehumanizing. "Person with a mental health condition" is more respectful and person-centered. ** Chairman -> "Chairman" is gender-specific and excludes women and non-binary individuals.

<div class="references"> <font size="2"> [2] Bartl, M., & Leavy, S. (2024). "From โ€˜Showgirlsโ€™ to โ€˜Performersโ€™: Fine-tuning with Gender-inclusive Language for Bias Reduction in LLMs." In *Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)* (pp. 280โ€“294). Bangkok, Thailand: Association for Computational Linguistics. [https://aclanthology.org/2024.gebnlp-1.18/](https://aclanthology.org/2024.gebnlp-1.18/) [3] Gupta, V., Venkit, P. N., Wilson, S., & Passonneau, R. (2024). "Sociodemographic Bias in Language Models: A Survey and Forward Path." In *Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)* (pp. 295โ€“322). Bangkok, Thailand: Association for Computational Linguistics. [https://aclanthology.org/2024.gebnlp-1.19/](https://aclanthology.org/2024.gebnlp-1.19/) </font> </div>

## Beyond Bias: The Importance of Research in NLP

[3] La Quatra, M., Greco, S., Cagliero, L., & Cerquitelli, T. (2023, September). "Inclusively: An AI-based Assistant for Inclusive Writing." In *Joint European Conference on Machine Learning and Knowledge Discovery in Databases* (pp. 361-365). Cham: Springer Nature Switzerland. [https://link.springer.com/chapter/10.1007/978-3-031-43430-3_31](https://link.springer.com/chapter/10.1007/978-3-031-43430-3_31)