Skip to main content

Talking to Oracle Database in plain English


My SQL beginnings

It was approx. 35 years ago (gosh, that long!?) when I first met SQL. Structured Query Language to those who might not be  from that era or are not 100% sure of what I am talking about. It was during the labs at faculty when we went to IBM's training center (we didn't have the capacity at faculty at the time) in Radovljica, Slovenia, to learn and play with this (at least for me) new programming language. The language of the 4th generation, so they said. With its roots in boolean algebra and simple structure, we were told that this was almost the same as natural language (english of course. Who would thought one day you would use Slovenian language to query databases too! With all those SELECTs, FROMs, WHEREs, GROUP BYs, HAVINGs, ORDER BYs, etc. it actually sounded really cool and easy.

My Select AI beginnings

The other day I finally found some time to invest into something else then just daily work, which is obviously something different than the rest of this blog post.

The one that caught my attention was one of the latest Oracle's blogs **Introducing Select AI - Natural Language to SQL Generation on Autonomous Database**, written by Mark Hornick of Oracle, which is discussing the use of generative AI with Large Language Models (LLMs), such as ChatCPT, Cohere or Azure OpenAI Service with Oracle Autonomous Database.

More details can also be found in Oracle Autonomous Database documentation: Use Select AI to Generate SQL from Natural Language Prompts.

Let's talk now

Enough said. Let's try the simplest query of all and let's ask my database what is the total revenue recorded:

select ai what is total revenue

And the response is?

Simple SELECT Ai Query

For experienced ChatGPT users this is probably nothing really spectacular. For me, it is. The response is obviously coming from underlying database table and is correct. Encouraging.

Let's dig a bit further and try to get back a bit more complex result.

In the next query I'm asking a question that will force several tables using a joins. For example this:

select ai what is the revenue by customer segments in the last year

A bit more complex SELECT AI query

In this case, the result is a bit more complex and it can be displayed in various formats, for example as a graph.

I believe that by now everybody had already figured it out, that the magic word is SELECT AI which is a secret ingredient that enables us to query data using natural language.

So what is happening. I'll borrow the following image from Mark's blog mentioned above:

SELECT AI - what's happening

SELECT AI natural language query is using metadata from the schema that is identified in the user's profile. This information is provided to the LLM (for example ChatGPT), to create a proper SQL query. The query is then executed within the database and results are fed back, if requested (ie. using option narrate or chat), through the LLM to generate the answer.

As already indicated, the same natural language query can return different types of results. This is dependent on the keyword that can be added to the SELECT AI syntax.

For example, if narrate keyword is used in the query above:

select ai narrate what is the revenue by customer segments in the last year

The result is formulated as the following narrative:

A bit more complex SELECT AI query using narrate keyword

If we want to show SQL that is generated, then showsql keyword can be used:

select ai showsql what is the revenue by customer segments in the last year

Query above results in the following SQL query:

A bit more complex SELECT AI query using showsql keyword

If you want simply to execute the query, then keyword is runsql:

select ai runsql what is the revenue by customer segments in the last year

The results returns the following response - it is the same as when on keyword was used, just click on different visualisation types:

A bit more complex SELECT AI query using runsql keyword

The last keyword option, chat is probably the most interesting as conversational AI response is generated:

select ai chat what is the revenue by customer segments in the last year

The results formulates the following - it is the same as when on keyword was used:

A bit more complex SELECT AI query using chat keyword

We can see that in this case the answer is a bit more explanatory. It seems that query went to LLM and returned back with prescription what needs to be done, not the result itself.

Quick idea: we can ask what SELECT statement actually does, remember the one generated just earlier using showsql keyword. I guess we are now more in ChatGPT area now.

select ai chat explain the following SQL
    SELECT
        C.CUST_SEGMENT,
        SUM(B.REVENUE) AS TOTAL_REVENUE
    FROM OABOOTCAMP.D_CUSTOMERS C INNER JOIN OABOOTCAMP.F_BILL_REV B ON C.CUST_NUMBER = B.CUST_NUMBER
    WHERE
        B.TIME_BILL_DT >= ADD_MONTHS(TRUNC(SYSDATE, 'YEAR'), -12)
    GROUP BY
        C.CUST_SEGMENT

The results formulates the following - it is the same as when on keyword was used:

A bit more complex SELECT AI query using chat keyword

We can do this slightly reversed. For example, what if we used ChatGPT to create a query instead of explaining it?

For example let's create a query for the exact same situation as above:

select ai chat 
create SQL query using tables F_BILL_REV, D_TIME and D_CUSTOMERS in OABOOTCAMP database 
schema to retrieve the total revenue for each customer segment in the past 12 months

The result is created SQL query:

Using SELECT AI to create SQL query using chat keyword

If this query was run in my database, I would get an ORA-01747: invalid user.table.column, table.column, or column specification error. Issue is with date attributes that couldn't have been recognised within F_BILL_REV table - there are two dates: TIME_BILL_DT and TIME_PAID_DT which model couldn't resolve. Most likely foreign keys would also help here - I don't have any.

This confirms my suspicion that we should use more user friendly table and column names that would not cause any ambiguity.

Let's have some more ...

... and check some more questions:

What about top 5 products by sales revenue by units sold and revenue?

Top 5 product by revenue

Great! What about this same question, jut compare two years in a graph. I can't use current year as there is no data for 2023 in database table. 😃. Hence I am comparing previous year with year before that. 

Top 5 product by revenue comparing previous years

And now, let' s take this to another level and add some calculations. Calculate growth year-over-year in absolute values and %s.

Compare years and show growth - relative

Or let's explore sales by customer segments down to customer level.

Customer distribution by segments

Well? Do you see where I'm getting to? Let's imagine a Google-like or a chat application used by CEO, asking business questions from his mobile phone? Or even better, just talking to it and getting business information from his or her information system or business analytics? Actually, it looks like that it can be done. We've got all components. For example, in Oracle Cloud Infrastructure.

To wrap it up

My first encounter with LLM in a database is quite positive. I've read that we should be careful for hallucinations and one should always verify generated results - see the results for query generation.

Anyway, Rome was not built in a day, right?

After all, I remember that we managed to successfully crash that poor thing of that IBM mainframe within the first 5 minutes and labs were finished for good. And again, we have proven once more that the 3rd generation programming languages still ruled the world.

And still, look at us know!