Oracle Data Mining is PL/SQL based set of libraries which supported several supervised and unsupervised data mining functions such as classification, regression, clustering, time series analysis and other functions. These functions were support by a number of data mining algorithms such as decision trees, k-means, random forest, xboost, neural networks, ... just to name few.
With the latest releases of Oracle Database, Advanced Analytics option has been renamed into Machine Learning and it is now free to all Oracle Database users.
In Oracle Autonomous Data Warehouse, Zeppelin notebooks have been added for the interactive development of ML models.
Oracle Analytics can now connect to Oracle ADW and run ML model within the database, which is very interesting option, because no data is required to move from and to database. All operations like predictions happen within the database.
In our short exercise, we will take a look at Oracle ADW Machine Learning and we will create a simple prediction model. We will implement regression in order to predict housing prices in Boston. I have been using this exact same example with Oracle Analytics some time ago. We will then register ML model from ADW with Oracle Analytics and run the prediction from Oracle Analytics.
Machine Learning in Oracle ADW
We will use Zeppelin notebooks to explore data, prepare data and create a machine learning model. If you haven’t use Zeppelin notebooks, you will find it easy to use even without any previous experience. However good knowledge of SQL is desired. SQL is the language of choice here, so Python or R won’t help here. Maybe R in some case. This is obviously good news for Oracle database developers who are skilled in SQL and/or PL/SQL. You might also find out that using Machine Learning in Oracle database is pretty straight forward, but general understanding of ML functions and algorithms is definitely needed.
So, let’s start from the beginning.
To run this “tutorial”, Autonomous database is required. Just as a note, “on-premises” database, for example Oracle Database 19c, supports Oracle Data Miner and all its functions, which cannot be used with Autonomous.
Setting Machine Learning users
To begin, you must navigate to Oracle Cloud console and open Service Console.
Navigate to Administration and click on Manage Oracle ML Users.
Click on Create User and create one.
When user is created, you can then login into ML notebooks.
Creating a new notebook
In Service Console navigate to Development and select Oracle Machine Learning Notebooks.
Use just created user’s credentials to sign in.
Machine Learning Workspace opens:
From here you can navigate to Notebooks and open your projects, you can explore Examples (there is one example for every function that you can implement) and some other options which you are more than welcome to explore in detail.
We will create a new notebook. So, click on Notebooks. The list of all notebooks is presented and of course we need to create a new one. Click Create.
Enter a new notebook’s name, add comments, leave Connection set to Global and click OK.
An empty notebook is now created:
Let’s check connections first. You can do it simply by clicking the(wheel) icon in top right corner.
If you don’t set these connections, then you will not able to connect to database and you will wonder why.
Exploratory Data Analysis and Data Preparation
Let’s check if our database table exist in my database schema. If you want to run a SQL statement in a notebook paragraph, type %sql before SQL statement you want to execute. If you are running a script, then type %script.
Run the following SQL statement: SELECT * FROM HOUSING; and observe the results.
We can run similar exploratory analysis as we did in my Housing Prediction exercise, which I prepared for Oracle Analytics and explore relationships between predictors and target attribute (column MV).
Then we can take a look at distribution for target attribute and explore correlations in dataset.
Mostly, finding are the same, as we don’t need to perform and addition data preparation. For example, there are no strong correlations which would be reasons for attribute removals.
Creating a machine learning model
In Oracle Machine Learning in a database everything is happening inside the database. This also means preparing and storing the models.
But before we do that, let’s just make a copy of our original dataset. The first script we will run is as follows:
In the next step we need to split the original data set into TRAIN and TEST dataset. Actually, the later will act as a new dataset on which we will apply the model for prediction and we will compare predicted with actuals target values.
-- Split the data into HOUSING_TRAIN and HOUSING_TEST. Split train vs. test ratio = 60%.
BEGIN
EXCEPTION
WHEN OTHERS THEN NULL;
END;
/
BEGIN
EXECUTE IMMEDIATE 'DROP TABLE HOUSING_TEST';
EXCEPTION
WHEN OTHERS THEN NULL;
END;
/
BEGIN
EXECUTE IMMEDIATE 'CREATE TABLE HOUSING_TRAIN AS SELECT * FROM HOUSING_ALL SAMPLE (60) SEED (1)';
DBMS_OUTPUT.PUT_LINE ('Created HOUSING_TRAIN');
EXECUTE IMMEDIATE 'CREATE TABLE HOUSING_TEST AS SELECT * FROM HOUSING_ALL MINUS SELECT * FROM HOUSING_TRAIN';
DBMS_OUTPUT.PUT_LINE ('Created HOUSING_TEST');
END;
/
For the prediction we are after “the big guns”, we will use Neural Networks in our prediction model.
First, model parameters table has to be created. These parameters will be later used for model creation.
As the first step a new settings table needs to be created and populated with parameters – check for more details in (1).
We will set parameters for:
- algorithm name used,
- auto data preparation,
- hidden layers and number of nodes in each of layer.
Let’s verify the settings above:
We are now ready to create a new ML model which will use parameters from the table above.
-- Create a neural network ML model
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'NN_MODEL_HOUSING',
mining_function => dbms_data_mining.REGRESSION,
data_table_name => 'HOUSING_TRAIN',
case_id_column_name => 'IDX',
target_column_name => 'MV',
settings_table_name => 'NN_Model_settings');
END;
Once model is created, you can check which settings have been use. Observe that beside three settings we defined, there are also other default settings which can be amended and used in model creation.
%sql
We can now also explore what else has been generated, statistics for example. These are contained in DM$VGNN_MODEL_HOUSING view.
SELECT * FROM DM$VGNN_MODEL_HOUSING;
Now it is time to test the prediction. Let’s run this SQL statement:
SELECT IDX, MV, prediction(NN_MODEL_HOUSING USING *) PredictedValue,
prediction_details(NN_MODEL_HOUSING USING *) PredictionDetails
FROM HOUSING;
For each instance in TEST dataset, we can check the predicted value and prediction details, explaining the prediction made. MV attribute is also presented for easier comparison.
We have now a new Neural Network machine learning model created in Oracle ADW. This model can now we registered with Oracle Analytics and used there.
Additional Information
(1) Brandan Tierney’s blog Understanding, Building and Using Neural Network Machine Leaning Models using Oracle 18c (https://developer.oracle.com/databases/neural-network-machine-learning.html),
(2) Oracle Machine Learning for SQL User’s Guide (https://docs.oracle.com/en/database/oracle/oracle-database/19/dmprg/index.html)
(3) Oracle Machine Learning for SQL Concepts (https://docs.oracle.com/en/database/oracle/oracle-database/19/dmcon/index.html).