A Hybrid TF-IDF and RNN Model for Multi-label Classification of the Deep and Dark Web
Аннотация
The classification of content on the deep and dark web has been a topic of interest for researchers. Researchers focus on adopting more efficient and effective classification methods as the data available on deep and dark web platforms continues to grow. Multi-label classification is the approach for simultaneously categorizing content into multiple classes. To address this, a hybrid approach combining Term Frequency-Inverse Document Frequency (TF-IDF) and Recurrent Neural Network (RNN) has been proposed. The approach involves preprocessing a dataset of Hypertext Markup Language (HTML) documents, selecting specific HTML tags to generate embeddings using TF-IDF, and using an RNN model for multi-label classification. The proposed model was evaluated against commonly used methods (Binary Relevance, Classifier Chains, and Label Powerset) using precision, recall, and F1-score as evaluation metrics, demonstrating promising results in accurately classifying data from the deep and dark web. This contribution represents a noteworthy advancement for researchers and analysts working in this field.
Перевод пока недоступен