Tuesday, August 16, 2011

Useful data sets

If you are doing a performance test, it always a good thing to do that using a real dataset. Following are several useful datasets.

Often you can find data in the CSV format, and then parsing and using it is pretty easy.
  1. DLPB catalog - http://kdl.cs.umass.edu/data/dblp/dblp-info.html - this is data about publications. About 900MB raw size. 
  2. Google Fusion tables, http://www.google.com/fusiontables/Home
     - this has several useful datasets as CSV.
  3. Federal reserve economic data - http://research.stlouisfed.org/fred2/  
  4. Amazon public datasets - aws.amazon.com/publicdatasets
There are lot more. Following are some of them. If anyone knows list giving a sizes of datasets and nature of datasets, please let me know. 
  • http://dvn.iq.harvard.edu/dvn/dv/cid
  • http://www.thejanuarist.com/9-fascinating-datasets-available-online-for-free/
  • http://bios.dfg.ca.gov/dataset_index.asp
  • http://www.uic.edu/orgs/rin/dataset.html
  • http://www.nas.nasa.gov/Resources/datasets.html
  • http://www.datawrangling.com/some-datasets-available-on-the-web
  • http://news.ycombinator.com/item?id=2165497
  • https://bitly.com/bundles/hmason/1
  • http://www.gutenberg.org/wiki/Gutenberg:Feeds
  • http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public
  • http://getthedata.org/
  • http://news.ycombinator.com/item?id=2165497

No comments: