Preprint

Contrastive Language-Image Pre-training for the Italian Language

Federico BianchiBocconi UniversityGiuseppe AttanasioPolytechnic University of TurinRaphael PisoniSilvia TerragniUniversity of Milano-Bicocca,Gabriele SartiUniversity of GroningenSri Lakshmi

arXiv (Cornell University)repository2021en

ABI

Annotatsiya

CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs high-quality translations of the texts to guarantee a good performance. In this paper, we present the first CLIP model for the Italian Language (CLIP-Italian), trained on more than 1.4 million image-text pairs. Results show that CLIP-Italian outperforms the multilingual CLIP model on the tasks of image retrieval and zero-shot classification.

Hali tarjima qilinmagan

Mavzular

Linguistic Studies and Language Acquisition Natural Language Processing Techniques Second Language Learning and Teaching

Identifikatorlar

DOI: 10.48550/arxiv.2108.08688

Iqtiboslar va manbalar

0 ta iqtibos15 ta foydalanilgan manba

Koʻrsatkichlar — AkademScholar