Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Principles of good data analysis (gregreda.com)
108 points by gjreda on March 23, 2014 | hide | past | favorite | 6 comments


Helpful; thanks!

"Ten Simple Rules for Reproducible Computational Research" http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fj...

* Rule 1: For Every Result, Keep Track of How It Was Produced

* Rule 2: Avoid Manual Data Manipulation Steps

* Rule 3: Archive the Exact Versions of All External Programs Used

* Rule 4: Version Control All Custom Scripts

* Rule 5: Record All Intermediate Results, When Possible in Standardized Formats

* Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds

* Rule 7: Always Store Raw Data behind Plots

* Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected

* Rule 9: Connect Textual Statements to Underlying Results

* Rule 10: Provide Public Access to Scripts, Runs, and Results

Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285


This is way more useful than the original post, thanks.


This is great, thank you!


Great post! You mention that it is important to "be skeptical" - I concur and would add that it's helpful to approach the analysis from a non-biased standpoint. Even if you are going into your analysis with certain goals in mind, it is not only more ethical, but also more persuasive, to indicate any inconsistencies in your findings.


I think for "Profile your data", some tools like OpenRefine really help. http://openrefine.org


I forgot to mention reproducibility. Show your work (share the code).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: