Posts

Showing posts from May, 2020

Telco Churn Prediction in Oracle Analytics Revisited

Image
A couple of years ago I published Getting Started with Machine Learning in Oracle Data Visualization blog post. I have been using it for Qubix Academy and Oracle Analytics bootcamps which we deliver to Oracle partners in EMEA. With the development of Oracle Analytics, I thought it was the right time to revisit and refresh the content and pay a bit more attention to three areas I haven't focused in my initial blog: data analysis, data preparation and model training/comparison. My general impression is that if business users who want to include predictive models into their dashboards and analyses they don't need to be data scientists. They would need to know few basics, but otherwise very simple tools are available to analyse data, prepare data and create machine learning models. The price they will pay are most likely less accurate predictions (comparing to data scientists-built models), but I don't think this could be a show stopper. Therefore I would encourage users to us

Telco Churn Prediction in Oracle Analytics Revisited: Model Training and Evaluation

Image
This is the 3rd part of the blog post series on Telco Churn Prediction, which I'm revisiting after 2+ years from my original blog post  Getting Started with Machine Learning in Oracle Data Visualization .  In my two previous posts in this series, I am talking about Data Analysis and Data Preparation . Both steps are mandatory steps before any machine learning is applied. My plan in this blog is to demonstrate how to create a machine learning model, how to improve it by setting parameters or simply by replacing an algorithm, and finally, how to create a project in Oracle Analytics in order to compare all created machine learning models among each other in order to decide which is the best model for my prediction. Creating a new Machine Learning model A new machine learning model can be created by using (again) Data Flows functionality in Oracle Analytics. This hasn't change much since my initial blog from two years ago. Basically, this is a 3-step process in which you need to:

Telco Churn Prediction in Oracle Analytics Revisited: Data Preparation

Image
Data preparation in our example basically means bringing all four training datasets together (and new data datasets too).  Final result of this exercise should be a table with 5298 rows, one row per customer. The data should be ready for training the model, which includes performing a series of data transformations before we start eventually building the model. In my original post from 2+ years ago , I simply brought the four files together, did some minor transformations (for example Onehot transformation for Services file), but other transformations were not done. In this respect, I am revisiting this process with goal to automate the whole preparation cycle including model training and deployment. As title of this post suggest, we will look into data preparation  first. And we try to use the most of the Data Flow functionality. We have analysed data source files in the previous post  Telecom Churn Prediction Case Revisited: Data Analysis .  The starting point for data preparation w

Telco Churn Prediction in Oracle Analytics Revisited: Data Analysis

Image
It's been more than two years now from my post Getting started with Machine Learning in Oracle Data Visualization ( https://zigavaupot.blogspot.com/2018/03/getting-started-with-machine-learning.html ). In that post I tried to explain how Oracle Data Visualization (Oracle Analytics) can be used for predicting churn in a telecom company . For that exercise I used data files which I found on Kaggle. Interesting enough, this data set is no longer available, but still I keep original files.  The idea of the example below is to revisit and try to improve if possible. To be honest, I wasn't paying too much of attention in data analysis and preparation then. As we all know, data preparation step is basically the most important in machine learning process. Therefore I will try to follow some of the guidelines and best practices in data preparation for machine learning and will try to use Data Flows functionality in Oracle Analytics.  Data Analysis Let's start with the source data a