Sunday, March 15, 2020

How to DataOps with Google Cloud Platform !

What do we want to achieve ?

Use DataOps to monitor information from twitter about Google.
  • Without doing IaaS (Infrastructure), so using Google Cloud managed service or Serverless Technologies
  • Making sure all asset are stored in a repository with dev and master branch
  • No manual step to test or push content to our Google Cloud Project
  • Ensure I can adapt to data structure change and so replay all data processing from scratch
  • Keep all data and compress them

What do we need :

Let's do it !

  1. Schedule a task every minute to gather tweets from twitter API then store information to GCS
  2. Schedule a task every day to compress all previous data in a tar.gz file 
  3. Read compress archive and load it to BigQuery with adaptive schema capabilities
  4. Build the according reporting
More information and code soon !

No comments: