Lanciaux Maxime | BI | DWH | Hadoop | DevOps | Google Cloud | DataOps | PostgreSQL: How to automate Data Analysis ? #part2

Saturday, May 28, 2016

How to automate Data Analysis ? #part2

Here we go, so I code a prototype to

help parse CSV file (will add database, JSON supports later)
load the data into a Hadoop
create the corresponding Hive ORC table
run simple query to extract information

MIN, MAX, AVG
Top 10
COUNT(DISTINCT ), COUNT(*) (if timestamp by YEAR, YEAR / MONTH) and NULL value
Regex matching the record

You can find the code here !

Next step will probably to add Spark code generation.

No comments:

Subscribe to: Post Comments (Atom)