Master Thesis Proposals
Below are open thesis topics I am actively interested in supervising. Each comes with some background and a clear research direction. If something looks interesting, reach out and we can refine the scope together. You can also pitch a topic of your own at the bottom.
Modern TTS systems can sound natural, but making them sound human is still hard. Emotional TTS synthesizes speech with the right tone and prosody for the context: a calm reading voice, an excited announcement, or an empathetic assistant. This matters a lot in 2026, as TTS is now used in audiobooks, voice assistants, accessibility tools, and customer service. Italian is still largely left out of this space.
The thesis would explore how to build or adapt an emotional TTS system for Italian, using fine-tuning or newer approaches like activation steering, which lets you control the emotional tone of speech at inference time without extra training. A core challenge is the lack of Italian emotional speech data, opening directions in data collection, augmentation, and cross-lingual transfer.
Sign languages are full natural languages, but most automatic language systems ignore them. Sign language recognition is a multimodal problem: you need to understand hand gestures, facial expressions, and body posture from video, and map them to words or sentences.
What makes this exciting now is that the same techniques that changed NLP and speech (transformers, self-supervised learning, large pre-trained models) are being applied to sign language with good results. The thesis could go in several directions: recognizing continuous signing from video, translating signs to text, or adapting speech processing ideas to work with visual gestures instead of audio.
Pre-training gives a model broad knowledge of language. Post-training shapes its behavior: making it helpful, safe, or good at a specific task. This is one of the most active research areas right now, with methods like supervised fine-tuning (SFT), RLHF (Reinforcement Learning from Human Feedback), DPO (Direct Preference Optimization), and GRPO.
The thesis would apply post-training techniques to speech language models, which process audio and text together. This brings new open questions: how do you define a reward signal for spoken responses? How do you align a model that works with audio tokens instead of text? The focus could be on a specific application like speech summarization, spoken question answering, or voice assistants.
Speech carries a lot of information about health. Changes in voice quality, rhythm, and articulation can signal neurological conditions like Parkinson's or ALS, mental health states like depression, or speech disorders like dysarthria. Systems that detect and track these changes can help with diagnosis and monitoring.
The thesis could focus on a specific condition or task, such as Parkinson's detection from voice recordings, dysarthria severity assessment, or general voice pathology screening. Key challenges include small clinical datasets, working across different recording settings, and explainability (clinicians need to understand why a model raises a flag).
Have your own idea?
If you have a research question in mind related to speech, language, audio, or multimodal learning, I am happy to hear it. A good thesis topic often starts from genuine curiosity. Write me a short description of what you have in mind and we can figure out if and how to make it work together.
Propose your own topic