Bytes of Summer by moppet65535, on Flickr
Creative Commons Attribution-Share Alike 2.0 Generic License by  moppet65535

How many floppy discs would you need to install:

Adobe Photoshop CS4 = 358 disks
The Sims 3 = 1760 disks
Firefox 3 = 12 disks

Figures Approximate – Source

It’s hard to imagine but only 10-15 years ago we carried around media which held 1.2 megabytes of data.  Remember the good old floppy disc?   Now I use one as a mug coaster on my desk, just to remind me of what was.  Today I carry around an 8 gigabyte usb stick, that’s a 682566.67% increase in capacity in just a few years.   Our ability to capture and store electronic data has increased immensely.

With this increase in capacity we have seen an increase in the amount of data which gets stored electronically.  Gradually our health records, shopping habits, grades,  library systems, among countless others became digitized.  Archaic systems like the card catalogue in the library became extinct.  The first challenges were in migrating all of the paper records to digital, then we grappled with scaling databases up, then we were challenged with getting databases to talk to one another, today we are exploring how to make sense of that data.

Now we live in a world where data is almost ubiquitous.  If you think about it as you go through your day most of the things we do generate data that will end up getting recorded somewhere.  Answer your cell phone, write an email, buy a coffee, print a document, take a test, order dinner, and watch your TV.  It’s mostly a meaningless record on its own, but when combined with the millions of similar records being recorded around the world, it may take on meaning to someone.  Companies have begun using the data they collect in conducting business to gain strategic advantage; enter the data scientist.

This week’s #LAK11 resources discussed the ways in which “Big Data” is changing our world.   From insurance firms calculating risk from masses of historical data, to stores predicting when people will be shopping and for what, matching singles flawlessly based on their characteristics (check this out for how they do this), targeted advertising with Facebook Beacon and Google Ads, and not to mention the way we examine and chart data is changing.

Now we our seeing a rise in public datasets.  Just in the last few months I have seen a number of websites popping up which offer datasets that anyone can access and use.  Even datasets which weren’t meant for public consumption are now being opened – Wikileaks.  Here is a few of the open data providers I have noted so far.  If you like playing with data but don’t have a dataset, visit one of these sites and you can start playing around in Excel or OpenOffice.

Freebase http://www.freebase.com/
Amazon Public Data Sets http://aws.amazon.com/publicdatasets/
Windows Azure Datamarket https://datamarket.azure.com/
Yahoo! Query Language http://developer.yahoo.com/yql/
Infochimps http://infochimps.com/
DBpedia http://dbpedia.org/
Guardian DataStore http://www.guardian.co.uk/data
Google Public Data http://www.google.com/publicdata/home
UNESCO Stats http://stats.uis.unesco.org/

Educational systems also record an incredible about of data and I think the theme of the LAK11 course it to discuss ways in which we might harness that data to improve learning.  My colleague and I have just conducted a review of some of the data that are able to extract from our learning management system.  We will be preparing a full report on it which I will share with the LAK11 community.

One of the challenges we have discovered so far in examining our LMS activity data is that it tells such a small part of the story.  We don’t know what people actually do with things and interpret activities they conduct on the site.  The activity data leaves us with a very weak sense of how the activity might have led to cognitive changes.  Maybe our data set is too small.  We will however keep exploring.

One of the nice things we have gained form this exercise is starting to think about where we can extract data from learning environments.  The data alone does not make a whole lot of sense in less you start combining it with other data.  This way you can form a richer picture of what might be going on.  More on this later.

Lastly, I just want to emphasize how important it is to take care of your data with this video.

LAK11 Week 2: Rise of “Big Data” and Data Scientists

CC BY 4.0 LAK11 Week 2: Rise of “Big Data” and Data Scientists by Michael Paskevicius is licensed under a Creative Commons Attribution 4.0 International License.

Leave a Reply

%d bloggers like this: