Mohamed Lazhar Bellagha

Mohamed Lazhar Bellagha

Doctorate in Computer Science

About Me

I am a Doctor of Computer Science, specialized in information retrieval in audio data, Natural Language Processing (NLP), and deep learning. My research focuses on speaker identification in Arabic television broadcasts, as well as Automatic Speech Recognition (ASR) for low-resource languages, such as the Tunisian dialect.

Bio

Born on

September 6, 1992

Email

bellaghamohamed@gmail.com

Phone

+216 26003324

Address

Rue Ibn El Jazzar Monastir -5000.

Research Focus Areas

Current research :

My current research focuses on developing Automatic Speech Recognition (ASR) models for the Tunisian dialect, addressing the challenges related to the scarcity of linguistic and annotated resources. This work explores the integration of large language models for speech recognition and leverages multimodal features to enhance performance in low-resource contexts.

In this framework, I initiated the creation of the TuniSpeech-AI organization on Hugging Face, an initiative dedicated to the open dissemination of corpora, models, and benchmarks for Tunisian Arabic and underrepresented Arabic dialectal varieties.

Thesis :

The research project carried out for the thesis focuses on named speaker identification in Arabic television news. This task involves associating speaker clusters with their real identities. It is an essential step in high-level video analysis, particularly for semantic indexing, content summarization, and other advanced applications.

Master 2 research internship:

2017

we conducted a study on the evolution of different approaches for speaker diarization. As well as a detailed discussion of these approaches and their contributions. The results of this works led to the creation of a speaker diarization system, which automatically identifies "who speak and when" in an audio document.

Project Graduation:

2014

Development of a modeling and diagnostic application with Bayesian networks: a graphical interface for the BNT toolbox (Bayes net Toolbox for Matlab) to facilitate learning and inference tasks.

Professional experiences

2025/2026

Contractual teacher at the Faculty of Sciences of Monastir.

2024/2025

Contractual teacher at the Faculty of Sciences of Monastir.

2023/2024

Contractual teacher at the Faculty of Sciences of Monastir.

2022/2023

Contractual teacher at the Faculty of Sciences of Monastir.

2021/2022

Contractual teacher at the Faculty of Sciences of Monastir.

2020/2021

Contractual teacher at the Faculty of Sciences of Monastir.

Education

Doctorate In Computer Science

2022

Earned a doctorate in computer science from the University of Sousse with a very honorable mention.

Masters in Automatic Reasoning Systems

2017

Research Master's Degree in Computer Science (Automatic Reasoning Systems), Faculty of Sciences of Monastir.

License

2014

License in Computer Science, Faculty of Sciences of Monastir.

Scientific publications

2022

Speaker Naming in Arabic TV programs. M.Bellagha, M.Zrigui; IAJIT 2022.

2021

Using the MGB-2 challenge data for creating a new multimodal Dataset for speaker role recognition in Arabic TV Broadcasts. M.Bellagha, M.Zrigui; kes 2021.

2020

Speaker Naming in TV programs Based on Speaker Role Recognition. M.Bellagha, M.Zrigui; Aiccsa 2020.

2017

Speaker Segmentation Using Adapted GMMs. M.Bellagha,M.Labidi, M.Maraoui. ICEMIS 2017.

Key Research Assets & Open Science Contributions

These initiatives showcase my commitment to developing foundational resources and state-of-the-art models for low-resource Arabic dialects.

Open Science Platforms

TuniSpeech-AI (Hugging Face Organization)

An open research platform dedicated to Tunisian speech and language. It centralizes corpora, recognition models, evaluation benchmarks, and linguistic processing scripts.

https://huggingface.co/TuniSpeech-AI

Data Resources & Corpora

TuniSpeech-21h

A multi-genre corpus of 21 hours of Tunisian speech, semi-automatically transcribed and aligned. This resource covers various domains (news, conversation, culture, music, etc.) and serves as a reference base for dialectal ASR research.

Arabic-Word-Embedding Data

A large-scale Arabic corpus compiled from various sources for learning continuous lexical representations. Word vectors (embeddings) were trained and published to support Natural Language Processing (NLP) applications.

https://github.com/MohamedBellagha/Arabic-Word-Embedding

Speaker-Role-Recognition

A public dataset and multimodal CNN-LSTM model designed for speaker role recognition in Arabic television news.

https://github.com/MohamedBellagha/Speaker-Role-Recognition

Models & Tools

TuniSpeech-models

A set of Whisper and Wav2Vec 2.0 models fine-tuned on the TuniSpeech-21h corpus, designed to improve Tunisian speech recognition in diverse acoustic contexts.

TRAINING CERTIFICATE

Scrum Master Certification: by Jim Sullivan

Scrum Methodologies

Learn the foundational knowledge to become proficient with Agile Scrum; Explore User Stories and how they are prioritized in Agile, Velocity, Backlog Refinement, and Market Actions.

Deep Learning Specialization by Younes Bensouda Mourri

Natural Language Processing with Attention Models

Design NLP applications that perform question-answering and sentiment analysis, created tools to translate languages and summarize text, and even built a chatbot

Deep Learning Specialization by Andrew Ng

Structuring Machine Learning Projects

build a successful machine learning project and get to practice decision-making as a machine learning project leader.

Deep Learning Specialization by Andrew Ng

Neural Networks and Deep Learning

build, train, and apply fully connected deep neural networks; implement efficient (vectorized) neural networks; identify key parameters in a neural network’s architecture; and apply deep learning to your own applications.

Deep Learning Specialization by Andrew Ng

Sequence Models

build and train Recurrent Neural Networks (RNNs) and commonly-used variants such as GRUs and LSTMs; apply RNNs to Character-level Language Modeling; gain experience with natural language processing and Word Embeddings; and use HuggingFace tokenizers and transformer models to solve different NLP tasks such as NER and Question Answering.

Deep Learning Specialization by Andrew Ng

Convolutional Neural Networks

build a convolutional neural network, including recent variations such as residual networks; apply convolutional networks to visual detection and recognition tasks; and use neural style transfer to generate art and apply these algorithms to a variety of image, video, and other 2D or 3D data.

Machine learning in python with scikit-learn by inria

Machine learning

Deep Learning Specialization by Andrew Ng

Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization

learn the best practices to train and develop test sets and analyze bias/variance for building deep learning applications; be able to use standard neural network techniques such as initialization, L2 and dropout regularization, hyperparameter tuning...

Languages

Arabic

English

French

Others projects

Web:

creating an educational Website For Kids (www.4kidss.com)

Environment: Django / MySQL

Skills

Interest

Artificial intelligence, Bayesian networks, signal processing, Natural language processing, speech processing and deep learning.

Programming languages

c/c++, python, bash.

library

Pytorch, Scikit-learn, TensorFlow, Keras.

Web development with python

Django/ MySQl/ HTML/ CSS / Javascript

Contact

bellaghamohamed@gmail.com