Master Thesis Proposals

Below are open thesis topics I am actively interested in supervising. Each comes with some background and a clear research direction. If something looks interesting, reach out and we can refine the scope together. You can also pitch a topic of your own at the bottom.

Speech · NLP · Industry

Emotional Text-to-Speech for Italian

Modern TTS systems can sound natural, but making them sound human is still hard. Emotional TTS synthesizes speech with the right tone and prosody for the context: a calm reading voice, an excited announcement, or an empathetic assistant. This matters a lot in 2026, as TTS is now used in audiobooks, voice assistants, accessibility tools, and customer service. Italian is still largely left out of this space.

The thesis would explore how to build or adapt an emotional TTS system for Italian, using fine-tuning or newer approaches like activation steering, which lets you control the emotional tone of speech at inference time without extra training. A core challenge is the lack of Italian emotional speech data, opening directions in data collection, augmentation, and cross-lingual transfer.

Text-to-Speech Emotion Control Prosody Modeling Italian NLP Cross-lingual Transfer Data Augmentation

Industry interest: This topic is connected to an ongoing collaboration with Audioboost, a company working on Italian audio production and voice technology. The thesis may have direct industrial relevance and the chance to work with real application requirements.

References: EmoVoice (ACM MM 2025) · EmoSteer-TTS (2025) · ECE-TTS (2025)

Contact me about this topic

Multimodal · Vision · NLP

Sign Language Recognition and Processing

Sign languages are full natural languages, but most automatic language systems ignore them. Sign language recognition is a multimodal problem: you need to understand hand gestures, facial expressions, and body posture from video, and map them to words or sentences.

What makes this exciting now is that the same techniques that changed NLP and speech (transformers, self-supervised learning, large pre-trained models) are being applied to sign language with good results. The thesis could go in several directions: recognizing continuous signing from video, translating signs to text, or adapting speech processing ideas to work with visual gestures instead of audio.

Video Understanding Transformers Sequence Modeling Multimodal Learning Sign Language Translation Accessibility

References: Awesome Sign Language (curated resource list) · Adaptive Transformer for SLR (2025) · Review of DL-based Sign Language Processing

Contact me about this topic

Advanced · LLMs · Reinforcement Learning

Post-training of Language and Speech Models

Pre-training gives a model broad knowledge of language. Post-training shapes its behavior: making it helpful, safe, or good at a specific task. This is one of the most active research areas right now, with methods like supervised fine-tuning (SFT), RLHF (Reinforcement Learning from Human Feedback), DPO (Direct Preference Optimization), and GRPO.

The thesis would apply post-training techniques to speech language models, which process audio and text together. This brings new open questions: how do you define a reward signal for spoken responses? How do you align a model that works with audio tokens instead of text? The focus could be on a specific application like speech summarization, spoken question answering, or voice assistants.

Large Language Models SpeechLLMs RLHF / DPO / GRPO Fine-tuning Reinforcement Learning Alignment

References: Recent Advances in SpeechLMs (ACL 2025) · RL Techniques for LLMs, Survey (2025) · Post-training methods overview

Contact me about this topic

Clinical · Speech · Healthcare

Automatic Analysis of Pathological Speech

Speech carries a lot of information about health. Changes in voice quality, rhythm, and articulation can signal neurological conditions like Parkinson's or ALS, mental health states like depression, or speech disorders like dysarthria. Systems that detect and track these changes can help with diagnosis and monitoring.

The thesis could focus on a specific condition or task, such as Parkinson's detection from voice recordings, dysarthria severity assessment, or general voice pathology screening. Key challenges include small clinical datasets, working across different recording settings, and explainability (clinicians need to understand why a model raises a flag).

Speech Processing Voice Pathology Parkinson's Detection Dysarthria Explainability Clinical NLP

References: BDHPD: Multilingual Parkinson's detection (ICASSP 2025) · voc2vec: Foundation model for non-verbal vocalizations (ICASSP 2025) · Generative Error Correction for Dysarthric ASR (INTERSPEECH 2025)

Contact me about this topic

Have your own idea?

If you have a research question in mind related to speech, language, audio, or multimodal learning, I am happy to hear it. A good thesis topic often starts from genuine curiosity. Write me a short description of what you have in mind and we can figure out if and how to make it work together.

Propose your own topic