Lanciaux Maxime | BI | DWH | Hadoop | DevOps | Google Cloud | DataOps

Showing posts with label Automation. Show all posts

Saturday, September 21, 2019

How to DataOps with Google Cloud Platform !

Hello from Budapest,

It's a long time since I didn't have the chance to look at my blog, so little news, I will now restart to share here and the first topic gonna be DataOps on Google Cloud Platform using BigQuery, DataStudio, Jenkins, StackDriver, Google Cloud Storage and more !

Stay Data tuned !

Monday, March 20, 2017

Chief DevOps Officer ! #automation

This is the new trendy job, this is what I think his/her mission should be :

Automation using DevOps
Improving metrics gathering & reporting
Quality improvement by Pareto

Monday, February 27, 2017

Be ready for tomorrow prez #hadoop #devops !

Friday, February 10, 2017

Monitoring Hadoop application deployment per environment #DevOps !

Wednesday, October 12, 2016

How to install Ansible on Ubuntu !

sudo apt-get install software-properties-common
sudo apt-add-repository ppa:ansible/ansible
sudo apt-get update
sudo apt-get install ansible

Monday, October 3, 2016

What is DevOps ?

Monday, September 26, 2016

I find your lack of automation disturbing !

Tuesday, July 26, 2016

Minimum set of tools to do devOps with Hadoop !

DevOps is a way to do / frameworks to use to ease life of IT teams from developer to admin / prod in a complex, multi-parameter, collaborative environment.

Continuous integration tool : Jenkins
Build automation tool : Maven
Team code collaboration tool : Gerrit Code Review
Database : PostgreSQL

Admin and application monitoring

Visualisation tool : Zeppelin

CEO / admin / dev dashboards

A Versionning tool : Git

code
configuration
template

Wednesday, July 6, 2016

List of Jenkins plugins and configuration for Hadoop automatic deployment !

Configure :

JDK
Maven
Security
Share ssh public key from the jenkins hosts

Plugins :

Locale Plugin (en_GB and Ignore browser preference and force this language to all users)
GitHub Plugin (for Git interaction)
Nested View (to allow grouping job views into multiple levels)
SafeRestart (This plugin allows you to restart Jenkins safely)
Conditional BuildStep (It will allow you to define a condition controling the execution of the step(s))
Maven Integration plugin (Jenkins plugin for building Maven 2/3 jobs via a special project type)
JobConfigHistory (Saves copies of all job and system configurations)
Email-ext plugin (email notification functionality)
PostgreSQL+Database+Plugin (This is a driver plugin for Database Plugin)
thinBackup (simply backs up the global and job specific configurations)
Dynamic Parameter Plug-in (dynamic generation of default build parameter values)
Plot Plugin (This plugin provides generic plotting (or graphing) capabilities in Jenkins)
Build Pipeline (Build Pipeline View of upstream and downstream connected jobs)
View Job Filters (Create smart views with exactly the jobs you want)
Folder Plugin
xUnit Plugin
jUnit Plugin
R Plugin
Ansible plugin
Python Plugin
Vagrant Plugin

Currently in testing :

Saturday, June 18, 2016

Screen !

I use a lot nohup but now I enjoy screen ;-) Have a wonderful week-end !

Thursday, June 16, 2016

Simple example of Jenkins-HDP integration !

I just create a How-to about Jenkins-HDP on Hortonworks Community Connection, please vote for it ;-)

Cheers

Saturday, May 28, 2016

How to automate Data Analysis ? #part2

Here we go, so I code a prototype to

help parse CSV file (will add database, JSON supports later)
load the data into a Hadoop
create the corresponding Hive ORC table
run simple query to extract information

MIN, MAX, AVG
Top 10
COUNT(DISTINCT ), COUNT(*) (if timestamp by YEAR, YEAR / MONTH) and NULL value
Regex matching the record

You can find the code here !

Next step will probably to add Spark code generation.

Thursday, January 21, 2016

How to automate Data Analysis ? #part1

When I first started this project, I was wondering how to speed up the work done by analyst. I figure out there is a lot to do here :

What can be pre-processed (script to load a file, create the corresponding table, and so as soon as a new file / table is created, what is the statistics for every column, the regex, NULL values)
What can be automatically discovered (this column can be used to join with this table you have on element more though, user in the group [female] has an average .... compare to group [male]
Things that can be generated on the fly (mainly code : R code, SAS code, SQL code, mainly based on template)
Things that can be parameterized

How often I have heard, "I will try that later with this assumption" what if we can parameterised easily every step of the data workflow, useful as well for multi-dimension matrix based test case

Things that can be automated / trigger

I just created a new variable what does that bring to my workflow
Variable reduction / transformation

Wednesday, January 13, 2016

The HadoopAutomator !

2016 will be the year of automation, I am currently working on several projects to automate almost everything (from installation to automatic data analysis and reporting) mainly using :

An orchestration tool : Ansible
A powerful big data platform : Hortonworks Data Platform
A database, I like elephant so will probably go for PostgreSQL
A programming language : Python (some Java too because of Ambari view)
Several REST API like Ambari blueprints, WebHCat, Ambari metrics
And of course, the basic stack : Jenkins, SVN, Git, Maven, SSH, Shell scripts and some web technologies

Wednesday, November 11, 2015

Jenkins, Maven, SVN and Hortonworks HDP2.3 sandbox !

If you are also an automation and Open Source fan and you are (or not) in the process to build Hadoop application, I strongly suggest to use (minimum) :

Continuous integration tool (Jenkins, TeamCity, Travis CI)
Build tool (Maven, Ant, Gradle)
Provisionning tool (Chef, Ansible, shell script, Puppet)
Versionning system (Git, SVN, CVS)

In order to improve overall quality project / to stop loosing time / to ease Hadoop migration and testing / to be more efficient (yes a lot of good reasons).

I have the pleasure to use SVN + Jenkins + Maven + few Shell script + HDP sandbox on my laptop and this is really awesome.

Thanks ;-)

Lanciaux Maxime | BI | DWH | Hadoop | DevOps | Google Cloud | DataOps | PostgreSQL

Labels