Maqola

Bridging Speech and Text using Multimodal Artificial Intelligence for Next-Gen Language Understanding

Abdurahim MannonovTashkent State University of Oriental Studies,UzbekistanLaith JasimThe Islamic University,College of Technical Engineering,Department of Computers Techniques Engineering,Najaf,IraqAbdullayeva Shakhnoza AnvarovnaTuran International University,Faculty of Humanities & Pedagogy,Namangan,UzbekistanI Wayan SuryasaITB STIKOM Bali,Denpasar,IndonesiaAshu NayakKalinga University,Department of CS & IT,Raipur,India

2025en

ABI

Annotatsiya

Bridging speech and text through multimodal artificial intelligence (AI) is essential for advancing next-generation language understanding. Integrating voice and text modalities enhances comprehension, making AI-driven communication more seamless and effective. Existing voice-to-text transcription systems often struggle with accuracy, particularly in noisy environments or with diverse accents. Additionally, they lack sentiment analysis capabilities, which are crucial for understanding customer emotions in service calls. To address these issues, we propose a Multimodal AI-based Voice-to-Text Transcription System (MAI-VT-TS) with Sentiment Analysis. This system leverages deep learning models for speech recognition and natural language processing (NLP) to improve transcription accuracy. It integrates multimodal techniques, combining acoustic and textual cues for enhanced contextual understanding. Sentiment analysis is incorporated to assess customer emotions, enabling real-time insights into customer interactions. The proposed method is designed for customer service applications, helping businesses analyze calls effectively, improve response strategies, and enhance customer satisfaction. It provides a robust and efficient solution for handling large-scale call data. Experimental results demonstrate that MAI-VT-TS significantly improves transcription accuracy and sentiment detection compared to traditional methods. This advancement enables better customer engagement, data-driven decision-making, and a more intelligent AI-driven communication system.

Mavzular

Natural Language Processing Techniques

Identifikatorlar

DOI: 10.1109/iccies63851.2025.11032768

Iqtiboslar va manbalar

0 ta iqtibos14 ta foydalanilgan manba

Koʻrsatkichlar — AkademScholar · Tez orada