Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseОткрытый API экосистемы
Другое

Uzbek 65 million web corpus lemmatized FastText model

Surayyo KhajibaevaUrgench State UniversityKhabibula MadatovUrgench State UniversityJernej VičičUniversity of Nova Gorica
ABI

Аннотация

FastText lemmatized model developed from 65 million Uzbek web corpus (10.5281/zenodo.19462612). The model can be used for efficient text representation via Word embeddings (vector representations). A simple Python script for loading and using the model. FastText is a lightweight, open-source library developed by Facebook AI Research for efficient text representation and classification. The FastText implementation used to produce the model, which can later be used for the model, was Gensim. The original corpus was lemmatized using UzbekLemma lemmatizer: https://pypi.org/project/UzbekLemma/.

Перевод пока недоступен

Идентификаторы

Цитирования и источники

Цитирований: 0Использованных источников: 0