An Archive of Pining Stats

Welcome! This is a short and hastily thrown together project to extract trend information about certain Pining-related tags from the Ao3 2021 Selective data dump for fan statisticians (opens in a new tab)

About the source data

🤓

This is an overview, more detailed information here

On March 3rd, 2021, the Archive of Our Own released a Selective data dump for fan statisticians (opens in a new tab). This data dump contains a subset of the data from the Archive, including all works, as all tags. The availbable data spans between 2008-09-13 and 2021-02-26.

The total sample size that this project covers is 7,269,693 works.

Tags

Ao3 uses a tagging system that allows users to tag their works with as many tags as they want. This means that a work can be tagged with all three of the tags above, or just one of them. This project is focused on the "yearning" tag and it's related tags. The following tags are included in this project:

Ao3 has a freeform tagging system, which means that users can tag their works with whatever tags they want. For example, the following tags are all variations of the "pining" tag:

  • "Unacknowledged pining"
  • "Unnecessary Pining"
  • "Vágyakozás"
  • "暗恋"
  • "Y E A R N I N G"
  • "Yearning"
  • "Yearning and Longing"
  • "Yearning to the MAX"
  • "frustrated pining"
  • "gay ass yearning"
  • "Pining!"
  • "PINING!!"
  • "PINING!!!"
  • "Pining???"
  • "pining??"
  • "pining?"

Thankfully, Ao3 uses Tag Wranglers (opens in a new tab) who work with categorizing and merging together related tags, making my work a lot easier. I have been trying to use Canonical Tags (opens in a new tab) as much as possible.

Please note that the Yearning (opens in a new tab) tag is folded into the "Pining" tag, and are by Ao3 considered Synonyms.

Presentation

Chart Controls

Group data by dates

This end date cause partial data, so graphs are truncated at end of Q1, 2021

A history of yearning

This is a chart representing how many number of works for each of the yearning related tags are published Quarterly.


Tag groupings

Queering yearning

🏳️‍🌈

What is queerness? Get a definitive answer here

This chart is about showing how works about yearning break down along the Straight/Gay/Polysexual/Aro lines. Please note that just like in real life, bi erasure is a thing, and in this case happens because works tend to center the pairing itself (M/M, F/F, M/F) rather than characters sexualities.

Yearning queerness in proportion

This chart shows the percentage of queer-tag related to the total number of works for that period. This will give a hint, but it is possible that a work is counted twice, if for example, a work is tagged with both "M/M" and "F/M".

The observant reader might note that the beginning of this graph is a LOT more unstable than it is later, this is due to the comparatively small number of data being published making just couple of works a day have a large impact on the graph.

Absolute yearning in context

The below chart is with the added line of EVERYTHING EVER SUBMITTED TO AO3 (of which yearning takes up a non-trivial percentage).


Tag groupings

Relative yearning

Since all of Ao3 has seen an increase in works over the years, it might be more interesting to see the relative increase in works for each of the tags. This chart shows the Quarterly percentage of works, compared to the total number of works for that period.


Tag groupings

Relative yearning in context

This chart compares the relative increase in works for each of the tags, compared to the total number of works for that period, with the relative increase in works for all of Ao3.

🤓

This chart compares two different types of data, and as such, the Y-axis is not directly comparable between the two lines. The gray line is how many works has been submitted to Ao3, and is meant to give an idea of the growth rate of the website, and the other lines show what percentage of those works are tagged with the selected tags.


Tag groupings

Relative yearning compared with popular cousins.

In this chart we compare the yearning related tags to some of the most popular tags on Ao3; Fluff, AU and Angst, in order to be able to compare the growth of yearning with what might be expected on the website.


Tag groupings

Method

Download and extract the data dump

The data was extracted from the Ao3 2021 Selective data dump for fan statisticians (opens in a new tab) using a custom script. In its initial form the data comes in two files, tags-20210226.csv and works-20210226.csv.

Processing these many records takes a long time, especially when parsing is involved, so the data has to be simplified.

tags-20210226.csv

14,467,138 records - 581,5MiB

idtypenamecanonicalcached_countmerger_id
1MediaTV Showstrue910
2MediaMoviestrue1164
3MediaBooks & Literaturetrue134
4MediaCartoons & Comics & Graphic Novelstrue166
...............
6716CharacterHillary Clintontrue659
6717RelationshipHillary Clinton/Barack Obamatrue3
6719FreeformPiningtrue122227
6720RelationshipClark Kent/OFCfalse811395045
6721RelationshipRedactedfalse212721

works-20210226.csv

7,269,693 records - 968,1MiB

creation datelanguagerestrictedcompleteword_counttags
2021-02-26enfalsetrue38810+414093+1001939+4577144+1499536+110+4682892+21+16
2021-02-26enfalsetrue163810+20350917+34816907+23666027+23269305+23269308+25382106+54629895+265399+105139+6207045+2509086+4483454+21741408+2791+21+16
2021-02-26enfalsetrue150210+10613413+9780526+3763877+3741104+7657229+30052928+54862740+54862743+3958232+3741113+13041709+8689774+39239518+21073668+36386338+54862746+54862749+54862752+24+14
2021-02-26enfalsetrue10010+15322+54862755+20595867+32994286+663+4717518+2096+54862758+54862761+54862764+54862767+21+16
...............

Filter out irrelevant data

In order to make this dataset more manageable, we filter out all tags that matches the following criteria:

Count each of the ~7.2 million works, and relevant tags

This step is just number crunching, counting the following statistics:

  • How many works are published per day per selected tag
  • How many of those works have an associated "queerness" tag.
  • How many works are published per day in total
  • How many works are published per day per selected tag, compared to the total number of works published that day (To determine relative growth rate)
  • How many works are published per day, compared to the total number of works ever (To determine growth rate)

Source data

On March 3rd, 2021, the Archive of Our Own released a Selective data dump for fan statisticians (opens in a new tab). This data dump contains a subset of the data from the Archive, including all works, as all tags. The availbable data spans between 2008-09-13 and 2021-02-26.

The total sample size that this project covers is 7,269,693 works.

tags-20210226.csv

14,467,138 records - 581,5MiB

ID

The ID of the tag

Type

  • Media - A tag that describes the media type of the work, such as "TV Shows" or "Movies"
  • Rating - A tag that describes the rating of the work, such as "Mature" or "Explicit"
  • ArchiveWarning - A tag that describes the archive warning of the work, such as "Creator Chose Not To Use Archive Warnings" or "Graphic Depictions Of Violence"
  • Category - A tag that describes the category of the work, such as "F/M" or "Multi"
  • Character - A tag that describes the character of the work, such as "Harry Potter" or "Hermione Granger"
  • Fandom - A tag that describes the fandom of the work, such as "Harry Potter - J. K. Rowling" or "Marvel Cinematic Universe"
  • Relationship - A tag that describes the relationship of the work, such as "Harry Potter/Draco Malfoy" or "James Potter/Lily Evans Potter"
  • Freeform - A user created tag, such as "Yearning" or "Pining"
  • UnsortedTag - I don't know what this is.

Name

Name of the tag

Canonical

Whether or not the tag is a canonical tag, e.g. the "main" tag for a group of tags, like "Yearning" is for "Pining": Selective data dump for fan statisticians All of the above are considered synonyms of "Pining", and are folded into the "Pining" tag.

Cached Count

The rough number of works that are tagged with this tag.

Merger ID

If the tag is not canonical, then this is the ID of the canonical tag that it is folded into.

works-20210226.csv

7,269,693 records - 968,1MiB

Creation date

Date of creation, in the format YYYY-MM-DD

Language

Language of the work, in the format en for English, fr for French, etc.

Restricted

Whether or not the work is restricted to logged in users only.

Complete

Whether or not the work is marked as complete.

Word Count

The word count of the work.

Tags

IDs of the tags that the work is tagged with, separated by +.

Defining Queer

This project extracts data whether a project has queer tags or not, but what does that mean? Sadly, it's difficult to define what is queer and what isn't, but I have tried to do so in the following way:

Queer tags

Using tags for defining how to "count" a work can be difficult, as tags are both contextual and subjective, and frankly, all over the place. I use a loose matching system where a tag only needs to partly match any of the search terms, and then it is considered a match.

For example: "Bisexual" matches "Bisexual mess". This is not ideal, but it is the best I can do with the time I have.

The following tags have been considered tags that define the "sexuality" of a work:

Gay

Straight

Straight

Aromantic / Asexual

Asexual (opens in a new tab) Aromantic (opens in a new tab)

The way that this filtering system is implemented, it will catch non-canonical tags like "ultra gay", but sadly also "not gay". This is therefore to be considered a rough estimate, and not a definitive source of data.

Downloads

CSV Files

Aggregate files relating to yearning and work statistics for Ao3.

Aggregate files relating to tags comparable to yearning

Aggregate files relating to queer tags.