Skip to main content
Article

Multimodal Movie Recommendation With Multitasking Architecture and Learning User–Movie Representation: An Empirical Study

Subham RajDepartment of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, IndiaSriparna SahaDepartment of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, IndiaBrijraj SinghSony Research India, Bangalore, IndiaNiranjan PedanekarSony Research India, Bangalore, India
2025en
ABI

Abstract

With the increasing availability of multimodal movie data, there is a growing interest in leveraging these data to improve movie recommendations. In the recent era, due to the increase in the number of users and movies on OTT platforms such as Amazon Prime, its services, including personalized movie recommendations, become challenging. This article proposes a novel approach <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$M^{2}RM^{2}UL$</tex-math></inline-formula>, which stands for multimodal movie recommendation with multitasking and movie–user learning. The initial phase involves preprocessing multimodal movie data to extract features encompassing visual and textual information. Subsequently, a multitasking architecture is employed, simultaneously undertaking classification and regression tasks to acquire user and movie representations. The learned representations are used to make personalized and accurate movie recommendations. Additionally, the Netflix Prize dataset has been augmented to include textual and visual features, rendering it multimodal. We conducted extensive experiments on three real-world multimodal movie datasets (Movielens-100K, MMTF-14K, and Netflix Prize) and compared our approach with several state-of-the-art movie recommendation algorithms. The experimental results illustrate that our approach outperforms the baseline methods in terms of recommendation accuracy and diversity. Furthermore, we demonstrate the effectiveness of our approach in different scenarios, such as cold-start and sparse data. Our empirical study provides strong evidence for the effectiveness of the proposed approach in multimodal movie recommendation.

Identifiers

Citations and references

Cited by 20 references