Multi-Class Detection of Humanized AI Text Using Machine Learning and Transformer Models
Аннотация
The rise of advanced large language models (LLMs) has enabled the generation of human-like text, challenging the detection of AI-generated and humanized AI content. This study evaluates Logistic Regression, Bidirectional LSTM, and DeBERTa for multi-class detection of human-written, AI-generated, and humanized AI text. We introduce a novel dataset of 30,000 texts, including 10,000 humanized samples created via a LangChain-based pipeline with GPT-4o, verified to reduce AI detectability using ZeroGPT. Experimental results show DeBERTa achieves 96.93% accuracy, outperforming Logistic Regression (93.43%) and LSTM (93.77%) in distinguishing text classes. Our approach leverages stylometric features and deep contextual embeddings to address real-world challenges like stylistic overlap and adversarial paraphrasing. Key contributions include the dataset, a comparative model evaluation, and insights into detecting humanized AI text, with implications for content moderation, academic integrity, and misinformation prevention.
Перевод пока недоступен