Showing posts with label Quality. Show all posts
Showing posts with label Quality. Show all posts

Saturday, September 21, 2019

How to DataOps with Google Cloud Platform !

Hello from Budapest,

It's a long time since I didn't have the chance to look at my blog, so little news, I will now restart to share here and the first topic gonna be DataOps on Google Cloud Platform using BigQuery, DataStudio, Jenkins, StackDriver, Google Cloud Storage and more !



Stay Data tuned !

Wednesday, May 30, 2018

Getting Things Done

Hello from Langkawi,

A quick one before boarding, I had the chance to work at several customers using several technologies on several environments and several versions... Thing skeeps changing / evolving especially when customer and / or management is changing priorities or bugs occurs on Production (can happen with Hadoop).

How to keep track of your tasks (from customer questions, to admin / expense tasks, to private item to achieve) ? I tried different ways from emails, post-it, Wunderlist, Trello, Google keep and the only that worked for me is Todoist.

Why ?
  • Easy to setup recurring tasks
  • Karma / Graph view to see number of tasks achieved per day / per week
  • You can assign a color to each project and also create hierarchy of project
  • Possible to share with another user a project
Hope it will help ;-)
Cheers


Monday, March 20, 2017

Chief DevOps Officer ! #automation

This is the new trendy job, this is what I think his/her mission should be :

  • Automation using DevOps
  • Improving metrics gathering & reporting
  • Quality improvement by Pareto

Tuesday, July 26, 2016

Minimum set of tools to do devOps with Hadoop !

DevOps is a way to do / frameworks to use to ease life of IT teams from developer to admin / prod in a complex, multi-parameter, collaborative environment.
  • Continuous integration tool : Jenkins
  • Build automation tool : Maven
  • Team code collaboration tool : Gerrit Code Review
  • Database : PostgreSQL
    • Admin and application monitoring
  • Visualisation tool : Zeppelin
    • CEO / admin / dev dashboards
  • A Versionning tool : Git
    • code
    • configuration
    • template

Friday, June 10, 2016

My Hadoop is not efficient enough, what can I do ?

1. Review your memory configuration to maximize CPU utilisation
2. Review your YARN settings especially the Capacity Scheduler
3. Review your application design, parameter used, join strategy, file format

Of course with checking your ganglia / Ambari Metrics, voilà !

PS : For those who don't trust Multi-tenant Hadoop cluster, please call me ;-)

Wednesday, November 11, 2015

Jenkins, Maven, SVN and Hortonworks HDP2.3 sandbox !

If you are also an automation and Open Source fan and you are (or not) in the process to build Hadoop application, I strongly suggest to use (minimum) :
  • Continuous integration tool (Jenkins, TeamCity, Travis CI)
  • Build tool (Maven, Ant, Gradle)
  • Provisionning tool (Chef, Ansible, shell script, Puppet)
  • Versionning system (Git, SVN, CVS)
In order to improve overall quality project / to stop loosing time / to ease Hadoop migration and testing / to be more efficient (yes a lot of good reasons).

I have the pleasure to use SVNJenkins + Maven + few Shell script + HDP sandbox on my laptop and this is really awesome.

Thanks ;-)

Wednesday, October 21, 2015

Hadoop and version control software !

For those who don't use Ambari, or for those [edge] nodes which are not synced, please be sure to use a version control software so your team / admin will know which library / configuration file / links have been modified by who / when / why [and life will be easier]

Monday, December 29, 2014

Hadoop & Late binding !

Late binding is one of the key capability of Hadoop. It allows users to parse raw data (gzip, snappy, bzip2, csv, xml, json, pdf, jpeg, doc, others) stored in HDFS and to apply a structure on the fly.

Unfortunately, there may be some problems which link with data quality. Depending on the data and the SerDe used, you can loose some data, find it in the wrong column or the entire job can fail (which is annoying when working with 4TB of data ;-))

So my advice is to do a minimum checking [on the edge node] before data ingestion.

Tuesday, July 29, 2014

Dataiku !

Dataiku is a French startup which is providing a great web-based plateform to accelerate data-science projects and there is an open-source version !

Friday, October 18, 2013

Quality Function Development !

I like to use Japanese methods, QFD is one of my favourite for improving / solving complex IT issues !