GitHub - ulyavedenina/AI-Generated-German-Hate-Speech-Detector: The repository contains a comprehensive collection of models and data designed for the text-based detection of German bot-generated hate speech. It also comprises a metadata-based language-agnostic dataset on bot detection on Reddit

This repository contains a comprehensive collection of models and data designed for the text-based detection of German bot-generated hate speech. It also includes a metadata-based, language-agnostic dataset for bot detection on Reddit and corresponding models for identifying bots. The datasets for this project were predominantly developed internally.

Repository Structure

dataset
- text-based
  - train.tsv -- bot- and human-generated German hate speech comments
  - test.tsv -- bot- and human-generated German hate speech comments
- metadata-based
  - hate_speech_train.tsv -- hate and non-hate German comments
  - hate_speech_test.tsv -- hate and non-hate German comments
  - bot_human_metadata_train_english.tsv -- English bot- and human-metadata dataset -- contains no usernames
  - bot_human_metadata_test_english.tsv -- English bot- and human-metadata dataset -- contains no usernames
  - bot_human_metadata_test_german.tsv -- German bot- and human- hate speech metadata dataset -- contains no usernames
- datasets_llama
  - bot_ds -- bot comments in the dataset format
  - no_bot_ds -- human comments in the dataset format
models
- helper -- preprocessing functions
  - extract_features.py -- cleaning for stylometric preprocessing
  - extract_metadata.py -- file to extract metadata features from Reddit accounts
  - fetch_user_metadata.py -- code to fetch a Reddit user metadata
  - helper.py -- file with text preprocessing functions
  - light_clean.py -- cleaning for Bert preprocessing
  - predict_comment.py -- code based on a BERT-Large model to predict an authorship of a hate comment
- text-based -- models designed for text-based hate speech bot detection
  - Bert
  - Bert-CNN
  - Bert-Style
  - Llama2
  - Style
  - requirements.txt
- metadata-based_pipeline -- models designed for metadata-based hate bot detection
  - bot_detector.py
  - hate_speech_detector.ipynb
data_collection
- bot_comment_generation -- generation of AI-generated comments (training data)
- bot_comment_generation_mixtral -- generation of AI-generated comments (test data)
- offensive_words.txt -- the list of offensive words
- get_subreddit_users.ipynb -- extraction of users from German subreddits

Text-Based Bot-Generared Hate Speech Detection (for German)

Below you find the sources for the data collection. Note that models were tested on the outputs of an unseen LLM to ensure robustness.

Part 1: Human-Written Comments

Sources	Total Comments
DeTox	4,504
RP-MOD	2,813
HASOC 2019	543
GermEval-2018 (test set)	1598

Part 2: Bot-Generated Comments

Sources	Total Comments
GPT 3.5	1600
GPT 4	1601
TheBloke/em_german_13b_v01-GPTQ	1600
TheBloke/em_german_leo_mistral-GPTQ	1600
TheBloke/leo-hessianai-13B-chat-GPTQ	1600
mistralai/Mixtral-8x7B-Instruct-v0.1 (test set)	1599

Models

We have implemented the following models for the text-based approach:

Model	F-Score
BERT Base	0.974
BERT Large	0.986
BERT Base-CNN	0.980
BERT Base+Stylometric	0.949
Stylometric	0.881
LLM (Llama2) 7B	0.943
LLM (Llama2) 13B	0.962

Metadata-Based Reddit Bot Detection Pipeline

Dataset

The hate speech detector was trained on the open-source data, see below:

Sources	Hate	Non-Hate
DeTox	4,504	7682
RP-MOD	2,813	3412
HASOC 2019	543	5789

The Reddit metadata dataset comprises 818 Human + 816 English Bot Accounts and 627 Human + 9 German Bot Accounts with corresponsing features.

Feature Set for Metadata Classification

Feature name	Description
comment_karma	Comment karma of a user
post_karma	Post karma of a user
comment_activity_day	Number of comments per day
posts_activity_day	Number of posts per day
avg_frequency_posts	Average time between the posts (in seconds)
avg_frequency_all	Average time between any activity (in seconds)
min_time_all	Minimal time between any activity (in seconds)
num_url	Proportion of links in posts and comments
num_repeated_post	Proportion of repeated posts
num_repeated_comment	Proportion of repeated comments

Models

For the hate-speech detector, Llama 3 was implemented:

Test Set	F-Score
Test set 1 (Detox, RP-Mod, HASOC)	0.92
Test set 2 (Reddit Hate Speech)	0.76

Metadata classification with Random Forest:

Model	F-Score (Performance on the English validation set)
Random Forest	0.934

Request the complete dataset

To request complete metadata datasets, contact

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data_collection		data_collection
dataset		dataset
models		models
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository Structure

Text-Based Bot-Generared Hate Speech Detection (for German)

Part 1: Human-Written Comments

Part 2: Bot-Generated Comments

Models

Metadata-Based Reddit Bot Detection Pipeline

Dataset

Feature Set for Metadata Classification

Models

Request the complete dataset

Citation

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Repository Structure

Text-Based Bot-Generared Hate Speech Detection (for German)

Part 1: Human-Written Comments

Part 2: Bot-Generated Comments

Models

Metadata-Based Reddit Bot Detection Pipeline

Dataset

Feature Set for Metadata Classification

Models

Request the complete dataset

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages