After visualizing chat room dialogue earlier this week, I thought it might be neat to try and visualize some Twitter dialogue from the #LAK11 course. For some reason I thought it would be simple to extract tweets from Twitter for analysis. It wasn’t! So if you happen to know an easier way please leave a comment.
Getting the data out
I tried The Archivist but that only gives a summary of the tweets. Apparently they used to offer the ability to download the tweets in Excel, but that was discontinued as it violated Twitter’s terms of service. I also tried Yahoo Pipes but did not find a way to get lots of tweets in a decent record set.
I thought RSS would be a good option but the default feed was only returning 10 tweets. So I sent out a call for help on Vark and got an answer! Unfortunately the result was I could only pull a maximum of 100 tweets through RSS. So I thought I would go ahead anyhow. This analysis is on the 100 tweets posted between Twitter timestamp 2011-01-25T21:00:04Z and 2011-01-28T01:04:39Z which had the hashtag #LAK11.
Cleansing the data
After cleansing the data and getting it organized I structured it for feeding into NodeXL. I had quite a bit of help here from my colleague Andrew Deacon. We extracted the username being referenced if there was an @ in the tweet. So we did not account for tweets which had more then one @. This is a limitation of this analysis and will be kept in mind for the future. We also extracted the link in the tweet if there was one. Again we did not account for multiple links within tweets.
Creating the relationships
We imagined two relationships which could be created in this data. One based on users and the links they posted and one based on people conversing using the @. So while we only had 100 tweets, we had potentially two relationships which could occur within each tweet, if a user tweeted both a link and an @. We ended up with 140 (ironic) tweets which had a relationship mentioned.
We used nodeXL’s cluster finding tool to identify the clusters of interactions within the set of tweets. Each cluster was given a colour which you can see in the image. The smaller groups were minimized in this version below and we pulled out all of the smaller clusters for the image above. Any node that does not have a name attached to it is a web link (which I didn’t include because most are tiny urls anyhow). Connections between users are formed when someone mentions another (using @) or two people tweet the same url.
Also I have managed to graph the interactions with pictures. This image does not show the grouping represented by the colours in the above diagrams. The blue dots here are people who were referenced with @ but I did not have their Twitter image as they did not tweet within the record set.