Using Oracle ADW machine learning models in Oracle Analytics (part 1)

A few weeks ago I was exploring how can Oracle Analytics can be used for housing prices prediction. In my post Housing Price Prediction in Oracle Data Visualization I am discussing the tools available in Oracle Analytics to perform the task. In this post (actually there ware 2 parts) I am looking at options that we have in Oracle Analytics if we wanted to use machine learning models that reside in Oracle Autonomous Data Warehouse.

Oracle Data Mining was an Oracle Database option for more than 2 decades now. With advancement of machine learning support in database, it became part of Oracle Advanced Analytics database option. Developers were able to use Oracle Data Mining by using Oracle Data Miner which was part of the Oracle SQL Developer.

Oracle Data Mining is PL/SQL based set of libraries which supported several supervised and unsupervised data mining functions such as classification, regression, clustering, time series analysis and other functions. These functions were support by a number of data mining algorithms such as decision trees, k-means, random forest, xboost, neural networks, ... just to name few.

With the latest releases of Oracle Database, Advanced Analytics option has been renamed into Machine Learning and it is now free to all Oracle Database users.

In Oracle Autonomous Data Warehouse, Zeppelin notebooks have been added for the interactive development of ML models.

Oracle Analytics can now connect to Oracle ADW and run ML model within the database, which is very interesting option, because no data is required to move from and to database. All operations like predictions happen within the database.

In our short exercise, we will take a look at Oracle ADW Machine Learning and we will create a simple prediction model. We will implement regression in order to predict housing prices in Boston. I have been using this exact same example with Oracle Analytics some time ago. We will then register ML model from ADW with Oracle Analytics and run the prediction from Oracle Analytics.

Machine Learning in Oracle ADW

We will use Zeppelin notebooks to explore data, prepare data and create a machine learning model. If you haven’t use Zeppelin notebooks, you will find it easy to use even without any previous experience. However good knowledge of SQL is desired. SQL is the language of choice here, so Python or R won’t help here. Maybe R in some case. This is obviously good news for Oracle database developers who are skilled in SQL and/or PL/SQL. You might also find out that using Machine Learning in Oracle database is pretty straight forward, but general understanding of ML functions and algorithms is definitely needed.

So, let’s start from the beginning.

To run this “tutorial”, Autonomous database is required. Just as a note, “on-premises” database, for example Oracle Database 19c, supports Oracle Data Miner and all its functions, which cannot be used with Autonomous.

Setting Machine Learning users

To begin, you must navigate to Oracle Cloud console and open Service Console.

Navigate to Administration and click on Manage Oracle ML Users.

Click on Create User and create one.

When user is created, you can then login into ML notebooks.

Creating a new notebook

In Service Console navigate to Development and select Oracle Machine Learning Notebooks.

Use just created user’s credentials to sign in.

Machine Learning Workspace opens:

From here you can navigate to Notebooks and open your projects, you can explore Examples (there is one example for every function that you can implement) and some other options which you are more than welcome to explore in detail.

We will create a new notebook. So, click on Notebooks. The list of all notebooks is presented and of course we need to create a new one. Click Create.

Enter a new notebook’s name, add comments, leave Connection set to Global and click OK.

An empty notebook is now created:

Let’s check connections first. You can do it simply by clicking the(wheel) icon in top right corner.

If you don’t set these connections, then you will not able to connect to database and you will wonder why.

Exploratory Data Analysis and Data Preparation

Let’s check if our database table exist in my database schema. If you want to run a SQL statement in a notebook paragraph, type %sql before SQL statement you want to execute. If you are running a script, then type %script.

Run the following SQL statement: SELECT * FROM HOUSING; and observe the results.

We can run similar exploratory analysis as we did in my Housing Prediction exercise, which I prepared for Oracle Analytics and explore relationships between predictors and target attribute (column MV).

Then we can take a look at distribution for target attribute and explore correlations in dataset.

Mostly, finding are the same, as we don’t need to perform and addition data preparation. For example, there are no strong correlations which would be reasons for attribute removals.

Creating a machine learning model

In Oracle Machine Learning in a database everything is happening inside the database. This also means preparing and storing the models.

But before we do that, let’s just make a copy of our original dataset. The first script we will run is as follows:

%script

--- Drop HOUSING_ALL table if exists.

BEGIN

EXECUTE IMMEDIATE 'DROP TABLE HOUSING_ALL';

EXCEPTION

WHEN OTHERS THEN NULL;

END;

--- Create a new table HOUSING_ALL which is a copy of original HOUSING table and is used in this exercise.

CREATE TABLE HOUSING_ALL AS SELECT * FROM HOUSING;

In the next step we need to split the original data set into TRAIN and TEST dataset. Actually, the later will act as a new dataset on which we will apply the model for prediction and we will compare predicted with actuals target values.

%script
-- Split the data into HOUSING_TRAIN and HOUSING_TEST. Split train vs. test ratio = 60%.

BEGIN

EXECUTE IMMEDIATE 'DROP TABLE HOUSING_TRAIN';
EXCEPTION
WHEN OTHERS THEN NULL;
END;
/
BEGIN
EXECUTE IMMEDIATE 'DROP TABLE HOUSING_TEST';
EXCEPTION
WHEN OTHERS THEN NULL;
END;
/
BEGIN
EXECUTE IMMEDIATE 'CREATE TABLE HOUSING_TRAIN AS SELECT * FROM HOUSING_ALL SAMPLE (60) SEED (1)';
DBMS_OUTPUT.PUT_LINE ('Created HOUSING_TRAIN');
EXECUTE IMMEDIATE 'CREATE TABLE HOUSING_TEST AS SELECT * FROM HOUSING_ALL MINUS SELECT * FROM HOUSING_TRAIN';
DBMS_OUTPUT.PUT_LINE ('Created HOUSING_TEST');
END;
/

For the prediction we are after “the big guns”, we will use Neural Networks in our prediction model.

First, model parameters table has to be created. These parameters will be later used for model creation.

As the first step a new settings table needs to be created and populated with parameters – check for more details in (1).

We will set parameters for:

algorithm name used,
auto data preparation,
hidden layers and number of nodes in each of layer.

%script

--- Drop NN_Model_settings table if exists.

BEGIN

EXECUTE IMMEDIATE 'DROP TABLE NN_Model_settings';

EXCEPTION

WHEN OTHERS THEN NULL;

END;

-- create settings table for Neural network model

CREATE TABLE NN_Model_settings

( setting_name VARCHAR2(30),

setting_value VARCHAR2(4000));

BEGIN

INSERT INTO NN_Model_settings (setting_name, setting_value)

values (dbms_data_mining.algo_name,

dbms_data_mining.algo_neural_network);

-- Auto data preparation parameter is by default set to OFF

INSERT INTO NN_Model_settings (setting_name, setting_value)

VALUES (dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);

-- Let's define three hidden layes with respective number of nodes. Default activation function is "NNET_ACTIVATIONS_LOG_SIG".

INSERT INTO NN_Model_settings (setting_name, setting_value)

VALUES (dbms_data_mining.nnet_nodes_per_layer, '20, 20, 10');

END;

Let’s verify the settings above:

We are now ready to create a new ML model which will use parameters from the table above.

%script
-- Create a neural network ML model
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'NN_MODEL_HOUSING',
mining_function => dbms_data_mining.REGRESSION,
data_table_name => 'HOUSING_TRAIN',
case_id_column_name => 'IDX',
target_column_name => 'MV',
settings_table_name => 'NN_Model_settings');
END;

Once model is created, you can check which settings have been use. Observe that beside three settings we defined, there are also other default settings which can be amended and used in model creation.

%sql

select * from all_mining_model_settings where model_name='NN_MODEL_HOUSING';

We can now also explore what else has been generated, statistics for example. These are contained in DM$VGNN_MODEL_HOUSING view.

%sql
SELECT * FROM DM$VGNN_MODEL_HOUSING;

Now it is time to test the prediction. Let’s run this SQL statement:

%sql
SELECT IDX, MV, prediction(NN_MODEL_HOUSING USING *) PredictedValue,
prediction_details(NN_MODEL_HOUSING USING *) PredictionDetails
FROM HOUSING;

For each instance in TEST dataset, we can check the predicted value and prediction details, explaining the prediction made. MV attribute is also presented for easier comparison.

We have now a new Neural Network machine learning model created in Oracle ADW. This model can now we registered with Oracle Analytics and used there.

Additional Information

(1) Brandan Tierney’s blog Understanding, Building and Using Neural Network Machine Leaning Models using Oracle 18c (https://developer.oracle.com/databases/neural-network-machine-learning.html),

(2) Oracle Machine Learning for SQL User’s Guide (https://docs.oracle.com/en/database/oracle/oracle-database/19/dmprg/index.html)

(3) Oracle Machine Learning for SQL Concepts (https://docs.oracle.com/en/database/oracle/oracle-database/19/dmcon/index.html).

Žiga Vaupot's Blog

Search This Blog