This repository contains a comprehensive collection of models and data designed for the text-based detection of German bot-generated hate speech. It also includes a metadata-based, language-agnostic dataset for bot detection on Reddit and corresponding models for identifying bots. The datasets for this project were predominantly developed internally.
- dataset
- text-based
- train.tsv -- bot- and human-generated German hate speech comments
- test.tsv -- bot- and human-generated German hate speech comments
- metadata-based
- hate_speech_train.tsv -- hate and non-hate German comments
- hate_speech_test.tsv -- hate and non-hate German comments
- bot_human_metadata_train_english.tsv -- English bot- and human-metadata dataset -- contains no usernames
- bot_human_metadata_test_english.tsv -- English bot- and human-metadata dataset -- contains no usernames
- bot_human_metadata_test_german.tsv -- German bot- and human- hate speech metadata dataset -- contains no usernames
- datasets_llama
- bot_ds -- bot comments in the dataset format
- no_bot_ds -- human comments in the dataset format
- text-based
- models
- helper -- preprocessing functions
- extract_features.py -- cleaning for stylometric preprocessing
- extract_metadata.py -- file to extract metadata features from Reddit accounts
- fetch_user_metadata.py -- code to fetch a Reddit user metadata
- helper.py -- file with text preprocessing functions
- light_clean.py -- cleaning for Bert preprocessing
- predict_comment.py -- code based on a BERT-Large model to predict an authorship of a hate comment
- text-based -- models designed for text-based hate speech bot detection
- Bert
- Bert-CNN
- Bert-Style
- Llama2
- Style
- requirements.txt
- metadata-based_pipeline -- models designed for metadata-based hate bot detection
- bot_detector.py
- hate_speech_detector.ipynb
- helper -- preprocessing functions
- data_collection
- bot_comment_generation -- generation of AI-generated comments (training data)
- bot_comment_generation_mixtral -- generation of AI-generated comments (test data)
- offensive_words.txt -- the list of offensive words
- get_subreddit_users.ipynb -- extraction of users from German subreddits
Below you find the sources for the data collection. Note that models were tested on the outputs of an unseen LLM to ensure robustness.
| Sources | Total Comments |
|---|---|
| DeTox | 4,504 |
| RP-MOD | 2,813 |
| HASOC 2019 | 543 |
| GermEval-2018 (test set) | 1598 |
| Sources | Total Comments |
|---|---|
| GPT 3.5 | 1600 |
| GPT 4 | 1601 |
| TheBloke/em_german_13b_v01-GPTQ | 1600 |
| TheBloke/em_german_leo_mistral-GPTQ | 1600 |
| TheBloke/leo-hessianai-13B-chat-GPTQ | 1600 |
| mistralai/Mixtral-8x7B-Instruct-v0.1 (test set) | 1599 |
We have implemented the following models for the text-based approach:
| Model | F-Score |
|---|---|
| BERT Base | 0.974 |
| BERT Large | 0.986 |
| BERT Base-CNN | 0.980 |
| BERT Base+Stylometric | 0.949 |
| Stylometric | 0.881 |
| LLM (Llama2) 7B | 0.943 |
| LLM (Llama2) 13B | 0.962 |
The hate speech detector was trained on the open-source data, see below:
| Sources | Hate | Non-Hate |
|---|---|---|
| DeTox | 4,504 | 7682 |
| RP-MOD | 2,813 | 3412 |
| HASOC 2019 | 543 | 5789 |
The Reddit metadata dataset comprises 818 Human + 816 English Bot Accounts and 627 Human + 9 German Bot Accounts with corresponsing features.
| Feature name | Description |
|---|---|
| comment_karma | Comment karma of a user |
| post_karma | Post karma of a user |
| comment_activity_day | Number of comments per day |
| posts_activity_day | Number of posts per day |
| avg_frequency_posts | Average time between the posts (in seconds) |
| avg_frequency_all | Average time between any activity (in seconds) |
| min_time_all | Minimal time between any activity (in seconds) |
| num_url | Proportion of links in posts and comments |
| num_repeated_post | Proportion of repeated posts |
| num_repeated_comment | Proportion of repeated comments |
For the hate-speech detector, Llama 3 was implemented:
| Test Set | F-Score |
|---|---|
| Test set 1 (Detox, RP-Mod, HASOC) | 0.92 |
| Test set 2 (Reddit Hate Speech) | 0.76 |
Metadata classification with Random Forest:
| Model | F-Score (Performance on the English validation set) |
|---|---|
| Random Forest | 0.934 |
To request complete metadata datasets, contact