Tuesday, August 16, 2011

Useful data sets

If you are doing a performance test, it always a good thing to do that using a real dataset. Following are several useful datasets.

Often you can find data in the CSV format, and then parsing and using it is pretty easy.
  1. DLPB catalog - http://kdl.cs.umass.edu/data/dblp/dblp-info.html - this is data about publications. About 900MB raw size. 
  2. Google Fusion tables, http://www.google.com/fusiontables/Home
     - this has several useful datasets as CSV.
  3. Federal reserve economic data - http://research.stlouisfed.org/fred2/  
  4. Amazon public datasets - aws.amazon.com/publicdatasets
There are lot more. Following are some of them. If anyone knows list giving a sizes of datasets and nature of datasets, please let me know. 
