Skip to main content
AkademIndex

Products

For developers

AkademBasesoonOpen API for the ecosystem
Latin
Article

Bridging Speech and Text using Multimodal Artificial Intelligence for Next-Gen Language Understanding

Abdurahim MannonovTashkent State University of Oriental Studies,UzbekistanLaith JasimThe Islamic University,College of Technical Engineering,Department of Computers Techniques Engineering,Najaf,IraqAbdullayeva Shakhnoza AnvarovnaTuran International University,Faculty of Humanities & Pedagogy,Namangan,UzbekistanI Wayan SuryasaITB STIKOM Bali,Denpasar,IndonesiaAshu NayakKalinga University,Department of CS & IT,Raipur,India
2025en
ABI

Abstract

Bridging speech and text through multimodal artificial intelligence (AI) is essential for advancing next-generation language understanding. Integrating voice and text modalities enhances comprehension, making AI-driven communication more seamless and effective. Existing voice-to-text transcription systems often struggle with accuracy, particularly in noisy environments or with diverse accents. Additionally, they lack sentiment analysis capabilities, which are crucial for understanding customer emotions in service calls. To address these issues, we propose a Multimodal AI-based Voice-to-Text Transcription System (MAI-VT-TS) with Sentiment Analysis. This system leverages deep learning models for speech recognition and natural language processing (NLP) to improve transcription accuracy. It integrates multimodal techniques, combining acoustic and textual cues for enhanced contextual understanding. Sentiment analysis is incorporated to assess customer emotions, enabling real-time insights into customer interactions. The proposed method is designed for customer service applications, helping businesses analyze calls effectively, improve response strategies, and enhance customer satisfaction. It provides a robust and efficient solution for handling large-scale call data. Experimental results demonstrate that MAI-VT-TS significantly improves transcription accuracy and sentiment detection compared to traditional methods. This advancement enables better customer engagement, data-driven decision-making, and a more intelligent AI-driven communication system.

Topics

Identifiers

Citations and references

Cited by 014 references
Metrics — AkademScholar · Coming soon