Multimodal Movie Recommendation With Multitasking Architecture and Learning User–Movie Representation: An Empirical Study
Abstract
With the increasing availability of multimodal movie data, there is a growing interest in leveraging these data to improve movie recommendations. In the recent era, due to the increase in the number of users and movies on OTT platforms such as Amazon Prime, its services, including personalized movie recommendations, become challenging. This article proposes a novel approach <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$M^{2}RM^{2}UL$</tex-math></inline-formula>, which stands for multimodal movie recommendation with multitasking and movie–user learning. The initial phase involves preprocessing multimodal movie data to extract features encompassing visual and textual information. Subsequently, a multitasking architecture is employed, simultaneously undertaking classification and regression tasks to acquire user and movie representations. The learned representations are used to make personalized and accurate movie recommendations. Additionally, the Netflix Prize dataset has been augmented to include textual and visual features, rendering it multimodal. We conducted extensive experiments on three real-world multimodal movie datasets (Movielens-100K, MMTF-14K, and Netflix Prize) and compared our approach with several state-of-the-art movie recommendation algorithms. The experimental results illustrate that our approach outperforms the baseline methods in terms of recommendation accuracy and diversity. Furthermore, we demonstrate the effectiveness of our approach in different scenarios, such as cold-start and sparse data. Our empirical study provides strong evidence for the effectiveness of the proposed approach in multimodal movie recommendation.