The Oracle Accelerated Data Science (ADS) SDK is a Python library that is included as part of the Oracle Cloud Infrastructure Data Science service. ADS offers a friendly user interface, with objects and methods that cover all the steps involved in the lifecycle of machine learning models, from data acquisition to model evaluation and interpretation.
We will use ADS to create a connection to Oracle Autonomous Data Warehouse (ADW). Detailed explanation on how setup connection to ADW is also available in ADS documentation (click here for more: https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html.
If you haven't done so, you should look into getting-started.ipynb before you continue with this script.
We will use ADS to create a connection to Oracle Autonomous Data Warehouse (ADW). Detailed explanation on how setup connection to ADW is also available in ADS documentation (click here for more: https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html.
If you haven't done so, you should look into getting-started.ipynb before you continue with this script.
Wallet: location & content¶
Wallet file which contains the credentials and other neccessary information can be downloaded from ADW instance. Wallet files are then uploaded to Data Science instance folder.
In [1]:
import os
import warnings
warnings.filterwarnings('ignore')
In our example, wallet files is located in "wallet" folder:
In [3]:
ls -l /home/datascience/block_storage/wallet
From all the files TNSNAMES.ORA contains the connection information which we will use:
In [4]:
cat /home/datascience/block_storage/wallet/tnsnames.ora
We need to set 4 environment variables: TNS_ADMIN, ADW_SID, ADW_USER and ADW_PASSWORD.
In [11]:
%env TNS_ADMIN=/home/datascience/block_storage/wallet
%env ADW_SID=********_high
%env ADW_USER=********
%env ADW_PASSWORD="********"
We are now good to connect.
Let's import DatasetFactory library first. DatasetFactory allows datasets to be loaded into ADS.
Let's import DatasetFactory library first. DatasetFactory allows datasets to be loaded into ADS.
In [12]:
from ads.dataset.factory import DatasetFactory
Construct uri as the connection source.
In [13]:
uri=f'oracle+cx_oracle://{os.environ["ADW_USER"]}:{os.environ["ADW_PASSWORD"]}@{os.environ["ADW_SID"]}'
Then specify the table and the target feature (let's assume that this is a classification example). Table can be specified as a SQL statement, but there are some other options too. Assumption here is that data resides in a single database table.
In [14]:
table = "BANK_DATA"
target = "y"
After you specify your query (in our case simply specify a table), then use ADS to query a table from ADW and load data to ADSDataset object using DatasetFactory.
In [7]:
if target != "BANK_DATA":
ds = DatasetFactory.open(uri, format="sql", table=table, target=target).set_positive_class('yes')
else:
ds = DatasetFactory.open(uri, format="sql", table=table)
You can run the two functions: show_in_notebook() and get_recommendations() in order to review and analyze the data just read and perform some dataset transformations. Due to technical limitations (I can't export results of "show_in_notebook"), I am simply presenting the content of ADSDataset object called ds.
In [9]:
ds.head()
Out[9]:
This concludes this simple exercise. ds is a Dataset object which will be used later in forthcoming exercises/blog posts.
In [ ]: