Thursday, January 21, 2016

How to automate Data Analysis ? #part1

When I first started this project, I was wondering how to speed up the work done by analyst. I figure out there is a lot to do here :
  • What can be pre-processed (script to load a file, create the corresponding table, and so as soon as a new file / table is created, what is the statistics for every column, the regex, NULL values)
  • What can be automatically discovered (this column can be used to join with this table you have on element more though, user in the group [female] has an average .... compare to group [male]
  • Things that can be generated on the fly (mainly code : R code, SAS code, SQL code, mainly based on template)
  • Things that can be parameterized
    • How often I have heard, "I will try that later with this assumption" what if we can parameterised easily every step of the data workflow, useful as well for multi-dimension matrix based test case 
  • Things that can be automated / trigger
    • I just created a new variable what does that bring to my workflow
    • Variable reduction / transformation

No comments: