An Archive of Pining Stats
Welcome! This is a short and hastily thrown together project to extract trend information about certain Pining-related tags from the Ao3 2021 Selective data dump for fan statisticians (opens in a new tab)
About the source data
This is an overview, more detailed information here
On March 3rd, 2021, the Archive of Our Own released a Selective data dump for fan statisticians (opens in a new tab). This data dump contains a subset of the data from the Archive, including all works, as all tags. The availbable data spans between 2008-09-13 and 2021-02-26.
The total sample size that this project covers is 7,269,693 works.
Tags
Ao3 uses a tagging system that allows users to tag their works with as many tags as they want. This means that a work can be tagged with all three of the tags above, or just one of them. This project is focused on the "yearning" tag and it's related tags. The following tags are included in this project:
Ao3 has a freeform tagging system, which means that users can tag their works with whatever tags they want. For example, the following tags are all variations of the "pining
" tag:
- "Unacknowledged pining"
- "Unnecessary Pining"
- "Vágyakozás"
- "暗恋"
- "Y E A R N I N G"
- "Yearning"
- "Yearning and Longing"
- "Yearning to the MAX"
- "frustrated pining"
- "gay ass yearning"
- "Pining!"
- "PINING!!"
- "PINING!!!"
- "Pining???"
- "pining??"
- "pining?"
Thankfully, Ao3 uses Tag Wranglers (opens in a new tab) who work with categorizing and merging together related tags, making my work a lot easier. I have been trying to use Canonical Tags (opens in a new tab) as much as possible.
Please note that the Yearning (opens in a new tab) tag is folded into the "Pining" tag, and are by Ao3 considered Synonyms.
Presentation
Chart Controls
A history of yearning
This is a chart representing how many number of works for each of the yearning related tags are published Quarterly.
Tag groupings
Queering yearning
What is queerness? Get a definitive answer here
This chart is about showing how works about yearning break down along the Straight/Gay/Polysexual/Aro lines. Please note that just like in real life, bi erasure is a thing, and in this case happens because works tend to center the pairing itself (M/M, F/F, M/F) rather than characters sexualities.
Yearning queerness in proportion
This chart shows the percentage of queer-tag related to the total number of works for that period. This will give a hint, but it is possible that a work is counted twice, if for example, a work is tagged with both "M/M" and "F/M".
The observant reader might note that the beginning of this graph is a LOT more unstable than it is later, this is due to the comparatively small number of data being published making just couple of works a day have a large impact on the graph.
Absolute yearning in context
The below chart is with the added line of EVERYTHING EVER SUBMITTED TO AO3 (of which yearning takes up a non-trivial percentage).
Tag groupings
Relative yearning
Since all of Ao3 has seen an increase in works over the years, it might be more interesting to see the relative increase in works for each of the tags. This chart shows the Quarterly percentage of works, compared to the total number of works for that period.
Tag groupings
Relative yearning in context
This chart compares the relative increase in works for each of the tags, compared to the total number of works for that period, with the relative increase in works for all of Ao3.
This chart compares two different types of data, and as such, the Y-axis is not directly comparable between the two lines. The gray line is how many works has been submitted to Ao3, and is meant to give an idea of the growth rate of the website, and the other lines show what percentage of those works are tagged with the selected tags.
Tag groupings
Relative yearning compared with popular cousins.
In this chart we compare the yearning
related tags to some of the most popular tags on Ao3; Fluff
, AU
and Angst
, in order to be able to compare the growth of yearning with what might be expected on the website.
Tag groupings
Method
Download and extract the data dump
The data was extracted from the Ao3 2021 Selective data dump for fan statisticians (opens in a new tab) using a custom script.
In its initial form the data comes in two files, tags-20210226.csv
and works-20210226.csv
.
Processing these many records takes a long time, especially when parsing is involved, so the data has to be simplified.
tags-20210226.csv
14,467,138 records - 581,5MiB
id | type | name | canonical | cached_count | merger_id |
---|---|---|---|---|---|
1 | Media | TV Shows | true | 910 | |
2 | Media | Movies | true | 1164 | |
3 | Media | Books & Literature | true | 134 | |
4 | Media | Cartoons & Comics & Graphic Novels | true | 166 | |
... | ... | ... | ... | ... | |
6716 | Character | Hillary Clinton | true | 659 | |
6717 | Relationship | Hillary Clinton/Barack Obama | true | 3 | |
6719 | Freeform | Pining | true | 122227 | |
6720 | Relationship | Clark Kent/OFC | false | 8 | 11395045 |
6721 | Relationship | Redacted | false | 2 | 12721 |
works-20210226.csv
7,269,693 records - 968,1MiB
creation date | language | restricted | complete | word_count | tags |
---|---|---|---|---|---|
2021-02-26 | en | false | true | 388 | 10+414093+1001939+4577144+1499536+110+4682892+21+16 |
2021-02-26 | en | false | true | 1638 | 10+20350917+34816907+23666027+23269305+23269308+25382106+54629895+265399+105139+6207045+2509086+4483454+21741408+2791+21+16 |
2021-02-26 | en | false | true | 1502 | 10+10613413+9780526+3763877+3741104+7657229+30052928+54862740+54862743+3958232+3741113+13041709+8689774+39239518+21073668+36386338+54862746+54862749+54862752+24+14 |
2021-02-26 | en | false | true | 100 | 10+15322+54862755+20595867+32994286+663+4717518+2096+54862758+54862761+54862764+54862767+21+16 |
... | ... | ... | ... | ... |
Filter out irrelevant data
In order to make this dataset more manageable, we filter out all tags that matches the following criteria:
- The tag is not of "Freeform" type, which means that it is not a user-created tag.
- The tag is not related one of the tags below, and doesn't have a canonical related tag:
- The tag wasn't selected as "Comparable" to yearning, in order to be able to compare the growth of yearning with what might be expected on the website.
Count each of the ~7.2 million works, and relevant tags
This step is just number crunching, counting the following statistics:
- How many works are published per day per selected tag
- How many of those works have an associated "queerness" tag.
- How many works are published per day in total
- How many works are published per day per selected tag, compared to the total number of works published that day (To determine relative growth rate)
- How many works are published per day, compared to the total number of works ever (To determine growth rate)
Source data
On March 3rd, 2021, the Archive of Our Own released a Selective data dump for fan statisticians (opens in a new tab). This data dump contains a subset of the data from the Archive, including all works, as all tags. The availbable data spans between 2008-09-13 and 2021-02-26.
The total sample size that this project covers is 7,269,693 works.
tags-20210226.csv
14,467,138 records - 581,5MiB
ID
The ID of the tag
Type
- Media - A tag that describes the media type of the work, such as "TV Shows" or "Movies"
- Rating - A tag that describes the rating of the work, such as "Mature" or "Explicit"
- ArchiveWarning - A tag that describes the archive warning of the work, such as "Creator Chose Not To Use Archive Warnings" or "Graphic Depictions Of Violence"
- Category - A tag that describes the category of the work, such as "F/M" or "Multi"
- Character - A tag that describes the character of the work, such as "Harry Potter" or "Hermione Granger"
- Fandom - A tag that describes the fandom of the work, such as "Harry Potter - J. K. Rowling" or "Marvel Cinematic Universe"
- Relationship - A tag that describes the relationship of the work, such as "Harry Potter/Draco Malfoy" or "James Potter/Lily Evans Potter"
- Freeform - A user created tag, such as "Yearning" or "Pining"
- UnsortedTag - I don't know what this is.
Name
Name of the tag
Canonical
Whether or not the tag is a canonical tag, e.g. the "main" tag for a group of tags, like "Yearning" is for "Pining":
All of the above are considered synonyms of "Pining", and are folded into the "Pining" tag.
Cached Count
The rough number of works that are tagged with this tag.
Merger ID
If the tag is not canonical, then this is the ID of the canonical tag that it is folded into.
works-20210226.csv
7,269,693 records - 968,1MiB
Creation date
Date of creation, in the format YYYY-MM-DD
Language
Language of the work, in the format en
for English, fr
for French, etc.
Restricted
Whether or not the work is restricted to logged in users only.
Complete
Whether or not the work is marked as complete.
Word Count
The word count of the work.
Tags
IDs of the tags that the work is tagged with, separated by +
.
Defining Queer
This project extracts data whether a project has queer tags or not, but what does that mean? Sadly, it's difficult to define what is queer and what isn't, but I have tried to do so in the following way:
Queer tags
Using tags for defining how to "count" a work can be difficult, as tags are both contextual and subjective, and frankly, all over the place. I use a loose matching system where a tag only needs to partly match any of the search terms, and then it is considered a match.
For example: "Bisexual" matches "Bisexual mess". This is not ideal, but it is the best I can do with the time I have.
The following tags have been considered tags that define the "sexuality" of a work:
Gay
- M/M (opens in a new tab)
- F/F (opens in a new tab)
- Gay (opens in a new tab)
- Lesbian (opens in a new tab)
Straight
Straight
- Bisexual (opens in a new tab)
- Pansexual character (opens in a new tab)
- Polysexual (opens in a new tab)
Aromantic / Asexual
Asexual (opens in a new tab) Aromantic (opens in a new tab)
The way that this filtering system is implemented, it will catch non-canonical tags like "ultra gay", but sadly also "not gay". This is therefore to be considered a rough estimate, and not a definitive source of data.
Downloads
CSV Files
Aggregate files relating to yearning and work statistics for Ao3.
Aggregate files relating to tags comparable to yearning
Aggregate files relating to queer tags.