Oracle Data Science: Connecting to Oracle Autonomous Data Warehouse from a Notebook Session

The Oracle Accelerated Data Science (ADS) SDK is a Python library that is included as part of the Oracle Cloud Infrastructure Data Science service. ADS offers a friendly user interface, with objects and methods that cover all the steps involved in the lifecycle of machine learning models, from data acquisition to model evaluation and interpretation.

We will use ADS to create a connection to Oracle Autonomous Data Warehouse (ADW). Detailed explanation on how setup connection to ADW is also available in ADS documentation (click here for more: https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html.

If you haven't done so, you should look into getting-started.ipynb before you continue with this script.

Wallet: location & content¶

Wallet file which contains the credentials and other neccessary information can be downloaded from ADW instance. Wallet files are then uploaded to Data Science instance folder.

In [1]:

import os
import warnings
warnings.filterwarnings('ignore')

In our example, wallet files is located in "wallet" folder:

In [3]:

ls -l /home/datascience/block_storage/wallet

total 72
-rw-r--r--. 1 datascience users  6661 Mar 20 17:46 cwallet.sso
-rw-r--r--. 1 datascience users  6616 Mar 20 17:46 ewallet.p12
-rw-r--r--. 1 datascience users  3241 Mar 20 17:46 keystore.jks
-rw-r--r--. 1 datascience users   691 Mar 20 17:46 ojdbc.properties
-rw-r--r--. 1 datascience users   183 Mar 20 17:46 readme.md
-rw-r--r--. 1 datascience users   138 Mar 20 18:31 sqlnet.ora
-rw-r--r--. 1 datascience users  1160 Mar 20 17:46 tnsnames.ora
-rw-r--r--. 1 datascience users  3335 Mar 20 17:46 truststore.jks
drwxr-xr-x. 3 datascience users  4096 Mar 20 18:32 Wallet_OABOOTCAMPADW/
-rw-r--r--. 1 datascience users 25048 Mar 20 18:33 Wallet_OABOOTCAMPADW.zip

From all the files TNSNAMES.ORA contains the connection information which we will use:

In [4]:

cat /home/datascience/block_storage/wallet/tnsnames.ora

********_high = (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.eu-frankfurt-1.oraclecloud.com))(connect_data=(service_name=zepp9pq65uhav7e_********_high.adwc.oraclecloud.com))(security=(ssl_server_cert_dn="CN=adwc.eucom-central-1.oraclecloud.com,OU=Oracle BMCS FRANKFURT,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))

********_low = (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.eu-frankfurt-1.oraclecloud.com))(connect_data=(service_name=zepp9pq65uhav7e_********_low.adwc.oraclecloud.com))(security=(ssl_server_cert_dn="CN=adwc.eucom-central-1.oraclecloud.com,OU=Oracle BMCS FRANKFURT,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))

********_medium = (description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.eu-frankfurt-1.oraclecloud.com))(connect_data=(service_name=zepp9pq65uhav7e_********_medium.adwc.oraclecloud.com))(security=(ssl_server_cert_dn="CN=adwc.eucom-central-1.oraclecloud.com,OU=Oracle BMCS FRANKFURT,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))

We need to set 4 environment variables: TNS_ADMIN, ADW_SID, ADW_USER and ADW_PASSWORD.

In [11]:

%env TNS_ADMIN=/home/datascience/block_storage/wallet
%env ADW_SID=********_high
%env ADW_USER=********
%env ADW_PASSWORD="********"

env: TNS_ADMIN=/home/datascience/block_storage/wallet
env: ADW_SID=********_high

env: ADW_USER=********
env: ADW_PASSWORD="********"

We are now good to connect.
Let's import DatasetFactory library first. DatasetFactory allows datasets to be loaded into ADS.

In [12]:

from ads.dataset.factory import DatasetFactory

Construct uri as the connection source.

In [13]:

uri=f'oracle+cx_oracle://{os.environ["ADW_USER"]}:{os.environ["ADW_PASSWORD"]}@{os.environ["ADW_SID"]}'

Then specify the table and the target feature (let's assume that this is a classification example). Table can be specified as a SQL statement, but there are some other options too. Assumption here is that data resides in a single database table.

In [14]:

table = "BANK_DATA"
target = "y"

After you specify your query (in our case simply specify a table), then use ADS to query a table from ADW and load data to ADSDataset object using DatasetFactory.

In [7]:

if target != "BANK_DATA":
    ds = DatasetFactory.open(uri, format="sql", table=table, target=target).set_positive_class('yes')
else: 
    ds = DatasetFactory.open(uri, format="sql", table=table)

TIP

ADS now supports the ability to query based on a SQL expression.

For example,

ds = DatasetFactory.open(uri, format="sql", table="SELECT prod_id, cust_id, time_id FROM SH.sales")

Using entire dataset for graphing (4521 rows)

TIP:
+ Use show_in_notebook() to visualize the dataset.
+ Use get_recommendations() to view and apply recommendations for dataset optimization.

You can run the two functions: show_in_notebook() and get_recommendations() in order to review and analyze the data just read and perform some dataset transformations. Due to technical limitations (I can't export results of "show_in_notebook"), I am simply presenting the content of ADSDataset object called ds.

In [9]:

ds.head()

Out[9]:

	age	job	marital	education	defaults	housing	loan	contact	day	month	duration	campaign	pdays	previous	poutcome	y
0	45	management	divorced	tertiary	no	no	no	unknown	1	jun	5	1	-1	0	unknown	False
1	40	services	married	secondary	no	yes	no	cellular	2	jun	297	1	28	3	failure	False
2	28	blue-collar	single	primary	no	yes	yes	unknown	3	jun	380	4	-1	0	unknown	False
3	59	housemaid	married	secondary	no	no	no	telephone	3	feb	722	1	99	4	failure	False
4	28	services	married	secondary	no	yes	no	cellular	5	may	335	2	281	8	failure	False

This concludes this simple exercise. ds is a Dataset object which will be used later in forthcoming exercises/blog posts.

In [ ]:

Žiga Vaupot's Blog

Search This Blog

Oracle Data Science: Connecting to Oracle Autonomous Data Warehouse from a Notebook Session

Wallet: location & content¶

TIP

For example,