Sunday, November 27, 2011

Big data quantitative analysis toolkit

Just like data bytes may become much more richly modulated with attributes (an extension of data provenance; modulating data bytes with additional inspectable elements such as create/review/launch time stamps, and owner, quality, freshness, and controversy properties, etc.), so too may quantitative data sets.

There should be a ‘2.0 format’ standardized toolkit for quantitative data analysis that includes the top ten techniques often used to analyze data sets. These tools should be user-friendly, ideally as a widget overlay to websites, or otherwise easily accessible and usable by non-quant laypersons.

Suggested techniques for inclusion in the top ten most-useful data analysis tools:

  1. Fourier transforms
  2. Markov state models
  3. Entropy analysis
  4. Distribution analysis (e.g.; power law, Gaussian, etc.)
  5. Progression analysis (e.g.; linear, geometric, exponential, discontinuous)
  6. Qualitative math
  7. Network node/group theory/graphing theory analysis
  8. Complexity, chaos, turbulence, and perturbation modeling
It could become standard that these kinds of techniques are automatically run and displayed on large data sets.

blog comments powered by Disqus