Deep Learning for Speech and Vision

Deep Learning for Speech and Vision#

cover — Fig. 1 Image generated using OpenDALL-E#

Welcome to the jupyter book on Deep Learning for Speech and Vision. This book is a collection of lecture notes for the course “Deep Learning for Speech and Vision” at the Kore University of Enna. The course is part of the PhD Program in Computer Science and Engineering.

The book is written in Jupyter Book, a collection of pages written in Markdown and Jupyter Notebooks.


Instructor	Moreno La Quatra
Email	moreno.laquatra@unikore.it
Semester	Fall (Feb 2024)

Course Description#

The course aims at providing the students with the basic knowledge of deep learning for speech and vision. The course will cover the following topics:

Introduction to Deep Learning
Convolutional Neural Networks
Transformers and Attention Mechanism
Deep Learning libraries for complete project management
Applications and case studies in Speech and Vision (with practical examples in PyTorch)

The course provide introduces both the theoretical and practical aspects of deep learning. The students will be required to implement a deep learning model for a given task as final course project.

Prerequisites#

The course require preliminary knowledge of Machine Learning and Python programming language. The students are required to have a basic knowledge of:

If you are not familiar with these libraries, you can refer to the following resources:

Note: if you have any question about the course, please open an issue on the GitHub repository.

During the course we will use Google Colab, a free cloud service that provides a Jupyter notebook environment. When possible, the suggestion is to leverage the students’ GPU to familiarize with the training process and the model’s evaluation.

Course Material#

[Pri23]
[BB23]

Deep Learning for Speech and Vision

Contents

Deep Learning for Speech and Vision#

Course Description#

Prerequisites#

Course Material#