![]()
journalism | design | textured backgrounds
investigative reporter at @thenyworld | mhkeller.com
The other day I was thinking about hashtag line graphs — charts that show traffic on a particular topic over time — and how to make them more interesting. Visualizing traffic around a hashtag over time usually tells the story that everyone already knows i.e. some huge event happened and people started tweeting about it. Not terribly surprising.
Other experiments into semantic analysis of tweets have tried to characterize what this conversation is about. The Guardian’s riot rumor visualization is one great example but it has some high barriers to entry even if you have the crazy datavis chops. First, you need a huge sample of tweets to analyze and you’ll have trouble getting that unless you’re Twitter white-listed or want to pay a company like DataSift. Second, to ensure accuracy, you need to either build a semantic tagger more advanced than what’s currently out there, or get a bunch of people to make sure your semantic analysis coded each tweet correctly and correct mistakes. So you need manpower.
So what story could you tell if you’re not a huge paper?
Clearly identifying the meaning behind a sentence has some barriers but what about individual tags? What if you could chart how the audience around an event shifted by looking at the evolution of tags around a single topic? Surely, sheer tweet volume will tell you something about how popular an event is that but it could be confounded if a spike in traffic is from a small group tweeting a hundred times as fast as opposed to a hundred times as many people tweeting at the same rate. (Yes you could use network analysis to get picture of audience but again you need all that data.)
Like with most data visualizations, stories in data start to come out when you can mash together different datasets.
One experiment:
When Occupy Wall Street started, I remember the hashtag began as the cumbersome #occupywallstreet because know one knew about it. I briefly saw an #occupywallst and now #ows is the clear choice. The question is, when did Occupy Wall Street become commonplace enough where people were comfortable just referring to it by #ows?
In other words, by looking at hashtag evolution could you see the moment when an obscure march became part of the national discourse?
Let’s use Trendistic. Tumblr won’t let me embed the graph so click on the image to see the interactive version, or click here (Trendistic doesn’t display this data forever so depending on when you’re reading this, the data from fall 2011 may be gone. But the image below remains!)
Read more

The interactive story Emily Liedel, Jason Alcorn and I have been working on went up on the Washington Post this week. It’s a part of the Columbia J-School News21 Fellowship “Brave Old World: Our Future Selves” reporting on the aging population.
Click here for an interactive look at how population shifts, demographic health trends and retirement finances could affect people like you.
Click here to read the story in the Washington Post’s Health & Science section.
The goal of the project is to present the idea of the aging population in a way people besides the elderly will be interested. Unprecedented demographic changes will affect us all and this interactive will hopefully give people a way into seeing themselves forty years down the road.
News21 is open-source so I’ll be posting some of the code here in the next few weeks. Check out the “Methodology” section to see our data or contact me if you have any questions.
fusion tables are sick. you can take a data set, like which countries people in Carnegie Mellon’s various graduate programs come from and turn it into a map in, like, ten minutes. so heres some example of playing around with some data. the excel spreadsheet i got from a google search and was a random pick. download your own copy of the data set here: studentaffairs.cmu.edu/oie/admins/reports/admissions-stats-m09-f09.xls
It took about ten minutes to do but there are some errors to correct. For instance it labeled the united states as “georgia” since it thought i meant the state. theres an option to correct stuff like that though.
This maps the percent representation of each country among CMU’s international students in their masters programs.
In their doctoral program:
exchange program: look at switzerland and germany come right out
non-degree/summer: canada dominates, some other european countries ranking