Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition
Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition
Blog Article
Owing to the linguistic richness of the Arabic language, which contains more than The Midi 6000 roots, building a reliable Arabic language model for Arabic speech recognition systems faces many challenges.This paper introduces a language model free Arabic automatic speech recognition system for Modern Standard Arabic based on an end-to-end-based Deep Speech architecture developed by Mozilla.The proposed model uses a character-level sequence-to-sequence model to map the character alignment produced by the recognizer model onto the corresponding words.The developed system outperformed recent studies on single-speaker and multi-speaker Arabic speech recognition using two different state-of-the-art datasets.The first was the Arabic Multi-Genre Broadcast (MGB2) corpus with 1200 h of audio data from multiple speakers.
The system achieved a new milestone in the MGB2 challenge with a word error rate (WER) of 3.2, outperforming related work using the same corpus with a word error reduction of 17%.An additional experiment with a dog food 7-hour Saudi Accent Single Speaker Corpus (SASSC) was used to build an additional model for single male speaker-based Arabic speech recognition using the same proposed network architecture.The single-speaker model outperformed related experiments with a WER of 4.25 with a relative improvement of 33.
8%.