Here we go, so I code a prototype to
Next step will probably to add Spark code generation.
- help parse CSV file (will add database, JSON supports later)
- load the data into a Hadoop
- create the corresponding Hive ORC table
- run simple query to extract information
- MIN, MAX, AVG
- Top 10
- COUNT(DISTINCT ), COUNT(*) (if timestamp by YEAR, YEAR / MONTH) and NULL value
- Regex matching the record
Next step will probably to add Spark code generation.
No comments:
Post a Comment