Our dataset contains works of fanfiction, extracted from (AO3). Each work is between 50 and 6,000 words long and has between 1 and many trigger warnings assigned. The label set contains 32 different trigger warnings with a long-tailed frequency distribution, i.e. some labels are very common, most labels are increasingly rare. Our training dataset contains 307,102 examples, with 17,104 in validation and 17,040 in the test split.


If you want to link the dataset, please use the dataset permalink [doi].

  • Download the dataset from Zenodo.