Asosiy kontentga oʻtish
AkademIndex

Mahsulotlar

Ishlab chiquvchilar uchun

AkademBaseEkotizim uchun ochiq API
Maqola

Constructing Stoplists for Historical Languages

Patrick J. BurnsInstitute for the Study of the Ancient World (New York University)
2018en
ABI

Annotatsiya

Stoplists are lists of words that have been filtered from documents prior to text analysis tasks, usually words that are either high frequency or that have low semantic value. This paper describes the development of a generalizable method for building stoplists in the Classical Language Toolkit (CLTK), an open-source Python platform for natural language processing research on historical languages. Stoplists are not readily available for many historical languages, and those that are available often offer little documentation about their sources or method of construction. The development of a generalizable method for building historical-language stoplists offers the following benefits: 1. better support for well-documented, data-driven, and replicable results in the use of CLTK resources; 2. reduction of arbitrary decision-making in building stoplists; 3. increased consistency in how stopwords are extracted from documents across multiple languages; and 4. clearer guidelines and standards for CLTK developers and contributors, a helpful step forward in managing the complexity of a multi-language open-source project.

Hali tarjima qilinmagan

Identifikatorlar

Iqtiboslar va manbalar

2 ta iqtibos0 ta foydalanilgan manba