Skip to main content

Oracle Data Science: Connecting to Oracle Autonomous Data Warehouse from a Notebook Session

The Oracle Accelerated Data Science (ADS) SDK is a Python library that is included as part of the Oracle Cloud Infrastructure Data Science service. ADS offers a friendly user interface, with objects and methods that cover all the steps involved in the lifecycle of machine learning models, from data acquisition to model evaluation and interpretation.

We will use ADS to create a connection to Oracle Autonomous Data Warehouse (ADW). Detailed explanation on how setup connection to ADW is also available in ADS documentation (click here for more: https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html.

If you haven't done so, you should look into getting-started.ipynb before you continue with this script.

Wallet: location & content

Wallet file which contains the credentials and other neccessary information can be downloaded from ADW instance. Wallet files are then uploaded to Data Science instance folder.
In [1]:
import os
import warnings
warnings.filterwarnings('ignore')
In our example, wallet files is located in "wallet" folder:
In [3]:
ls -l /home/datascience/block_storage/wallet
total 72
-rw-r--r--. 1 datascience users  6661 Mar 20 17:46 cwallet.sso
-rw-r--r--. 1 datascience users  6616 Mar 20 17:46 ewallet.p12
-rw-r--r--. 1 datascience users  3241 Mar 20 17:46 keystore.jks
-rw-r--r--. 1 datascience users   691 Mar 20 17:46 ojdbc.properties
-rw-r--r--. 1 datascience users   183 Mar 20 17:46 readme.md
-rw-r--r--. 1 datascience users   138 Mar 20 18:31 sqlnet.ora
-rw-r--r--. 1 datascience users  1160 Mar 20 17:46 tnsnames.ora
-rw-r--r--. 1 datascience users  3335 Mar 20 17:46 truststore.jks
drwxr-xr-x. 3 datascience users  4096 Mar 20 18:32 Wallet_OABOOTCAMPADW/
-rw-r--r--. 1 datascience users 25048 Mar 20 18:33 Wallet_OABOOTCAMPADW.zip
From all the files TNSNAMES.ORA contains the connection information which we will use:
In [4]:
cat /home/datascience/block_storage/wallet/tnsnames.ora
********_high = (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.eu-frankfurt-1.oraclecloud.com))(connect_data=(service_name=zepp9pq65uhav7e_********_high.adwc.oraclecloud.com))(security=(ssl_server_cert_dn="CN=adwc.eucom-central-1.oraclecloud.com,OU=Oracle BMCS FRANKFURT,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))

********_low = (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.eu-frankfurt-1.oraclecloud.com))(connect_data=(service_name=zepp9pq65uhav7e_********_low.adwc.oraclecloud.com))(security=(ssl_server_cert_dn="CN=adwc.eucom-central-1.oraclecloud.com,OU=Oracle BMCS FRANKFURT,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))

********_medium = (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.eu-frankfurt-1.oraclecloud.com))(connect_data=(service_name=zepp9pq65uhav7e_********_medium.adwc.oraclecloud.com))(security=(ssl_server_cert_dn="CN=adwc.eucom-central-1.oraclecloud.com,OU=Oracle BMCS FRANKFURT,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))

We need to set 4 environment variables: TNS_ADMIN, ADW_SID, ADW_USER and ADW_PASSWORD.
In [11]:
%env TNS_ADMIN=/home/datascience/block_storage/wallet
%env ADW_SID=********_high
%env ADW_USER=********
%env ADW_PASSWORD="********"
env: TNS_ADMIN=/home/datascience/block_storage/wallet
env: ADW_SID=********_high
env: ADW_USER=********
env: ADW_PASSWORD="********"
We are now good to connect.
Let's import DatasetFactory library first. DatasetFactory allows datasets to be loaded into ADS.
In [12]:
from ads.dataset.factory import DatasetFactory
Construct uri as the connection source.
In [13]:
uri=f'oracle+cx_oracle://{os.environ["ADW_USER"]}:{os.environ["ADW_PASSWORD"]}@{os.environ["ADW_SID"]}'
Then specify the table and the target feature (let's assume that this is a classification example). Table can be specified as a SQL statement, but there are some other options too. Assumption here is that data resides in a single database table.
In [14]:
table = "BANK_DATA"
target = "y"
After you specify your query (in our case simply specify a table), then use ADS to query a table from ADW and load data to ADSDataset object using DatasetFactory.
In [7]:
if target != "BANK_DATA":
    ds = DatasetFactory.open(uri, format="sql", table=table, target=target).set_positive_class('yes')
else: 
    ds = DatasetFactory.open(uri, format="sql", table=table)

TIP

ADS now supports the ability to query based on a SQL expression.

For example,

ds = DatasetFactory.open(uri, format="sql", table="SELECT prod_id, cust_id, time_id FROM SH.sales")

Using entire dataset for graphing (4521 rows)
TIP:
  + Use show_in_notebook() to visualize the dataset.
  + Use get_recommendations() to view and apply recommendations for dataset optimization.

You can run the two functions: show_in_notebook() and get_recommendations() in order to review and analyze the data just read and perform some dataset transformations. Due to technical limitations (I can't export results of "show_in_notebook"), I am simply presenting the content of ADSDataset object called ds.
In [9]:
ds.head()
Out[9]:

age job marital education defaults balance housing loan contact day month duration campaign pdays previous poutcome y
0 45 management divorced tertiary no 0 no no unknown 1 jun 5 1 -1 0 unknown False
1 40 services married secondary no 0 yes no cellular 2 jun 297 1 28 3 failure False
2 28 blue-collar single primary no 0 yes yes unknown 3 jun 380 4 -1 0 unknown False
3 59 housemaid married secondary no 0 no no telephone 3 feb 722 1 99 4 failure False
4 28 services married secondary no 0 yes no cellular 5 may 335 2 281 8 failure False
This concludes this simple exercise. ds is a Dataset object which will be used later in forthcoming exercises/blog posts.
In [ ]: