Showing posts with label Automation. Show all posts
Showing posts with label Automation. Show all posts

Saturday, September 21, 2019

How to DataOps with Google Cloud Platform !

Hello from Budapest,

It's a long time since I didn't have the chance to look at my blog, so little news, I will now restart to share here and the first topic gonna be DataOps on Google Cloud Platform using BigQuery, DataStudio, Jenkins, StackDriver, Google Cloud Storage and more !



Stay Data tuned !

Monday, March 20, 2017

Chief DevOps Officer ! #automation

This is the new trendy job, this is what I think his/her mission should be :

  • Automation using DevOps
  • Improving metrics gathering & reporting
  • Quality improvement by Pareto

Wednesday, October 12, 2016

How to install Ansible on Ubuntu !

sudo apt-get install software-properties-common
sudo apt-add-repository ppa:ansible/ansible
sudo apt-get update
sudo apt-get install ansible

Tuesday, July 26, 2016

Minimum set of tools to do devOps with Hadoop !

DevOps is a way to do / frameworks to use to ease life of IT teams from developer to admin / prod in a complex, multi-parameter, collaborative environment.
  • Continuous integration tool : Jenkins
  • Build automation tool : Maven
  • Team code collaboration tool : Gerrit Code Review
  • Database : PostgreSQL
    • Admin and application monitoring
  • Visualisation tool : Zeppelin
    • CEO / admin / dev dashboards
  • A Versionning tool : Git
    • code
    • configuration
    • template

Wednesday, July 6, 2016

List of Jenkins plugins and configuration for Hadoop automatic deployment !

Configure :
  • JDK
  • Maven
  • Security
  • Share ssh public key from the jenkins hosts
Plugins : 
Currently in testing :

Saturday, June 18, 2016

Saturday, May 28, 2016

How to automate Data Analysis ? #part2

Here we go, so I code a prototype to

  • help parse CSV file (will add database, JSON supports later)
  • load the data into a Hadoop
  • create the corresponding Hive ORC table
  • run simple query to extract information 
    • MIN, MAX, AVG
    • Top 10
    • COUNT(DISTINCT ), COUNT(*) (if timestamp by YEAR, YEAR / MONTH) and NULL value
    • Regex matching the record
You can find the code here !

Next step will probably to add Spark code generation.

Thursday, January 21, 2016

How to automate Data Analysis ? #part1

When I first started this project, I was wondering how to speed up the work done by analyst. I figure out there is a lot to do here :
  • What can be pre-processed (script to load a file, create the corresponding table, and so as soon as a new file / table is created, what is the statistics for every column, the regex, NULL values)
  • What can be automatically discovered (this column can be used to join with this table you have on element more though, user in the group [female] has an average .... compare to group [male]
  • Things that can be generated on the fly (mainly code : R code, SAS code, SQL code, mainly based on template)
  • Things that can be parameterized
    • How often I have heard, "I will try that later with this assumption" what if we can parameterised easily every step of the data workflow, useful as well for multi-dimension matrix based test case 
  • Things that can be automated / trigger
    • I just created a new variable what does that bring to my workflow
    • Variable reduction / transformation

Wednesday, January 13, 2016

The HadoopAutomator !

2016 will be the year of automation, I am currently working on several projects to automate almost everything (from installation to automatic data analysis and reporting) mainly using :

Wednesday, November 11, 2015

Jenkins, Maven, SVN and Hortonworks HDP2.3 sandbox !

If you are also an automation and Open Source fan and you are (or not) in the process to build Hadoop application, I strongly suggest to use (minimum) :
  • Continuous integration tool (Jenkins, TeamCity, Travis CI)
  • Build tool (Maven, Ant, Gradle)
  • Provisionning tool (Chef, Ansible, shell script, Puppet)
  • Versionning system (Git, SVN, CVS)
In order to improve overall quality project / to stop loosing time / to ease Hadoop migration and testing / to be more efficient (yes a lot of good reasons).

I have the pleasure to use SVNJenkins + Maven + few Shell script + HDP sandbox on my laptop and this is really awesome.

Thanks ;-)