The project will be due on November 20. Between now and then, you can work in the NYU computer labs and email me for help. I’ll be available to meet somewhere in the city if you give me advance notice (most likely, after work).
The project will consist of 25 percent of your grade. However, if it is really stellar, I have no qualms about giving you extra credit for it. The most important thing you can take away from this class is to be able to independently dive into dense data and produce something from it.
This is what your project must consist of:
Short form essay
- An explanation of what the story is. Why did you choose this particular angle for your project?
- A list of the data sources you used, including the foreign keys used to join the different tables together.
- A step-by-step guide of how you cleaned, transformed, and or summarized the data, starting from the original data sets.
- Describe three to five shortcomings of the data in your project. This includes inherent limitations in the original data or just problems you had working with the data.
Basically, create something that tells a story that was not evident in the raw data alone.
Most of you will choose to do a visualization, but that is by no means necessary. Some of the best data stories involve finding needles in a haystack and describing them in words.
If you choose to make a visualization, whether it be an interactive map or simply a table, a la Edward Tufte’s small mutliples, remember the purpose of the visualization is to clarify the topic, not to turn words into (confusing) imagery.
Format of the story
There’s no required format for what you produce, except that it should be accessible online by me (you don’t need to make it publicly accessible if you don’t want to).
There’s many ways to do this. You can follow the Amazon S3 guide to build and store your own webpage and/or files from scratch. You can create a Google Document and send the link. You can post the story on a blog or other content management system. You can even save a screenshot of your file and post it on an image hosting service.
Just make sure it’s a link. Again, the projects won’t be shared by me to the public. It’s just important that you’re able to publish, even in a rudimentary fashion, in a way more portable than an email attachment.
Feel free to use the NYU computer labs to work on your projects. Here’s a short list of practical guides and references:
- Mashing and Mapping Data with Google Fusion Tables
- So you want to make a map…
- Cleaning Data with Open Refine
- Using Amazon S3 to host a static website
- Using Google Refine for Data Cleaning (note: this guide is for Open Refine back when it was called Google Refine)
- How to make a map with Fusion Tables
- Good Ol’ Excel Is The Ultimate Data Visualization Tool (In Most Cases)
And of course, check out the readings list for more insight and inspiration.