Showing posts with label Aster. Show all posts
Showing posts with label Aster. Show all posts

Saturday, December 6, 2014

Using Open Data & Machine Learning !

Before I was not convinced that Open Data brings more value to my project. Lately, just using Open Data, I am able to build an efficient model to predict dengue rate in Brezil with Least Angle Regression algorithm. To do so, we used meteo (wind, temperature, precipitation, thunder / rain rates, ...), altitude, localisation, urbanization, twitter / wikipedia frequency and custom variables (mostly lag).

Tuesday, August 5, 2014

Scale Open Source R with AsterR or Teradata 15 !

I recently contribute to a great project which deals with using R in a distributed way within Aster and Teradata. I rediscover that R is really permissive, flexible, powerful.

Thursday, April 24, 2014

Python !

Python is already almost everywhere and used in production in Google. It is a very powerful programming langage to map your wish (from Web to GUI) in a script !

Friday, March 28, 2014

Machine Learning with Aster !

I am now working with Aster to do Machine Learning and statistics. Here are the functions you can use :
  • Approximate Distinct Count : to quickly estimates the number of distinct values
  • Approximate Percentile :  to computes approximate percentiles
  • Correlation : to determine if one variable is useful for predicting an other
  • Generalized Linear Regression & Prediction : to perform linear regression analysis
  • Principal Component Analysis : for dimensionality reduction 
  • Simple | Weighted | Exponential Moving Average : compute average with special algortihm
  • K-Nearest Neighbor : classification algorithm based on proximity
  • Support Vector Machines : build a SVM model and do prediction 
  • Confusion Matrix [Plot] : visualize ML algorithm performance
  • Kmeans : famous clustering algorithm
  • Minhash : Another clustering technic which depends on the set of products bought by users
  • Naïve Bayes : useful classification method especially for documents
  • Random Forest Functions : predictive modelling approaches broadly used for supervised classification learning

Tuesday, March 11, 2014

Teradata’s SNAP Framework !

Teradata’s Seamless Network Analytic Processing Framework is one of the great ideas inside Aster 6 database. It allows user to query different analytical engines and multiple type of storage using a SQL-like programming interface. It is composed by a query optimizer, a layer that integrates and manages resources, an execution engine and the unified SQL interface. These are the main components and their goals :
  • SQL-GR & Graph Engine : provide functions to work with edge, vertex, [un|bi|]directed or cyclic graph
  • SQL-MR : library (Machine Learning, Statistics, Search behaviour, Pattern matching, Time series, Text analysis, Geo-spatial, Parsing) to process data using MapReduce framework
  • SQL-H : easy to use connection to HDFS for loading data from Hadoop
  • SQL : join, filter, aggregation, OLAP, insert, update, delete, CASE WHEN, table
  • AFS connector : SQL-MR function to map AFS file to table
  • Teradata connector : SQL-MR function to load data from / to Teradata RDBMS
  • Stream API : plug your Python, Ruby, Perl, C[|++|#] scripts and use Aster CPU workers node to process it