When I first started this project, I was wondering how to speed up the work done by analyst. I figure out there is a lot to do here :
- What can be pre-processed (script to load a file, create the corresponding table, and so as soon as a new file / table is created, what is the statistics for every column, the regex, NULL values)
- What can be automatically discovered (this column can be used to join with this table you have on element more though, user in the group [female] has an average .... compare to group [male]
- Things that can be generated on the fly (mainly code : R code, SAS code, SQL code, mainly based on template)
- Things that can be parameterized
- How often I have heard, "I will try that later with this assumption" what if we can parameterised easily every step of the data workflow, useful as well for multi-dimension matrix based test case
- Things that can be automated / trigger
- I just created a new variable what does that bring to my workflow
- Variable reduction / transformation
No comments:
Post a Comment