But if you look at the pre-digital-text archive period (pre-1990?), you won't see "fuck" in the actual articles, such as the ones in the 1880s...I tried other cusswords and saw that they would appear in the very-old stories but no other time in NYT history...So I'm guessing that there is some fuzzy matching going on to compensate for the text in the scanned archives.
There are several hits for 'Internet' in 1853, and hundreds for 'email' in the 1890s. To avoid that, maybe they could weight the OCR recognition based on a word's first occurrence in dictionaries (backdated a decade or two).
I was about to ask why the word "equation" is trending upwards since 1960 even though the related words "equations" (plural) has not. It also doesn't exactly track the word "mathematics". But after I added the word "music", it reminded me that the Y axis shows "equation" at less than 1%. I suppose that's within the realm of random noise insofar as word count statistics. Maybe NYT had the odd mix of staff writers that happen to use "equation" more often in the journalism that's unrelated to any larger cultural context.
The Google Books Ngram Viewer doesn't have any hits for "enter the equation". However, the word "equation" and "equations" have the opposite trend from NYT.[1] But to emphasize again, the hits are less than .01% which is potentially abusing the word "trend".
I would expect that to be more constant when individuals are addressed. My interest was towards addressing men and women as a group, which - to some degree - would reflect the 'relevance' of the group for the given time in media.
It took me a long time to realize that articles related to a topic can be seen by clicking on the year. This instruction is hidden below the screen fold. It'd also be nice to see actual numbers (along with the %) for each year during hover.
It'd also be useful to see a curated list of topics to select from, instead of just randomly picking some related topics on page load. Hopefully they harvest some interesting suggestions they receive through @nyt. There are some fun topic-suggestions in this thread already.
Overall, a nifty tool that I wish existed for all news sources in the world and worked across all languages.
I love how "Netscape", "Loudcloud", and "Opsware" were mere flickering events for the NYT, but "Andreessen" got traction and has been on an upwards trend for the last ten years. Also - a unique spelling, while probably annoying when he was growing up, certainly makes it easier to track his presence in media.
It was probably a change in style or format, not a reduction in the number of stories about New York, because the percentages of stories about other cities drop in a similar way at the same time.
What if more articles were published in a particular year? I mean I assume the data would be skewed if more words existed in a particular year. I figured they'd measure how the language of say a random bunch of 5000 words as changed over the course of the life of NYT.
It's very likely to be related to the National Recovery Administration, which was formed in 1933, and set price codes and codes of fair practice. It must have generated a huge amount of public debate and lots of news stories.
You're comparing different things there: Republican is both the noun and adjective form, while Democrat is only the noun form. "Clinton is a Democrat" and "Bush is a Republican", but "Obama was the 2000 Democratic candidate" vs. "Dole was the 1996 Republican candidate".
You could try to adjust for that by comparing Republican vs. the sum of Democratic+Democrat, but that also pulls in unrelated uses of both terms: "democratic reforms in $countryname" and "Irish republicans", especially since it isn't case-sensitive. Which then probably overcounts "democratic", because it's used in that non-US-party sense more than "republican" is.
In general this kind of thing makes it very tricky to conclude things from pure word or n-gram frequency counts, since without more semantic annotation there are a ton of confounding issues.
"Theater" vs. "Theatre": http://chronicle.nytlabs.com/?keyword=theater.theatre
As well as how quickly culturally accepted terms are replaced:
"Mrs." vs. "Ms.": http://chronicle.nytlabs.com/?keyword=mrs..ms.