In my previous blog post, Talking to Oracle Database in plain English I am discussing how to query Oracle Database using natural language queries to retrieve data from an Oracle database.
From my quick exploration, I can conclude that AI has definitely entered databases and can be applied in very different scenarios. For example, using SELECT AI in chat application where data can be retrieved and presented from Oracle Database by applying natural language query. But that was in plain English.
But what about if my primary natural language isn't English. What if your (and, for this matter also mine) isn't English. What if it is Slovenian? Would SELECT AI still work?
Let's examine the very same examples that I used in my previous blog post.
Let's talk now ... "po slovensko" (Slovenian)
Enough said. Let's try the simplest query of all and let's ask my database what is the total revenue recorded. In Slovenian, the question what is total revenue translates into:
kakšni so skupni prihodki
And the response is?
The response we've got is correct. Interesting, so far so good!
Just out of curiosity, what would be the response if I used narrate option in select ai?
select ai narrate kakšni so skupni pridodki
Well, database replies in Slovenian. Of course, what else ...
In my next query I have asked database what is the revenue by customer segments in the last year. In Slovenian, the question is as follows:
select ai kakšen je prihodek po segmentih kupcev v zadnjem letu
And, I am getting used to it by now already, the question was understood and response given was the same as if I asked in plain English.
With showsql option, we can examine SQL that was generated:
SQL is correct:
Or for potentially documenting purposes, we could ask database to explain that same SQL:
This response as such is obviously generated by ChatGPT and has nothing to do with underlying database. Nevertheless, could be very useful in some cases.
Going beyond simple
In the next couple of examples I tried to challenge Oracle Database in Slovenian a bit further asking questions that would include more data, more tables.
The first query is adding additional measures, and the question in english is what are total revenues and units sold in the last year. In Slovenian, this is as follows:
select ai prikaži letne prihodke in prodane količine po segmentih kupcev v zadnjem letu
And the response is again correct:
Ok, let's drill down previous query by months of the last year (note: last year in my dataset is 2022):
select ai prikaži skupne prihodke in prodane količine po letih, mesecih in segmentih kupcev v zadnjem letu
Again, database has nailed it.
I was playing with questions and responses for some more time and here are some additional examples how to run efficient queries against Oracle database in Slovenian language. For comparison, the same questions in English, side by side:
The first example is to obtain top 5 products by revenue, for the current and previous year:
The 2nd example extend previous example plus calculates absolute and relative difference between the two years:
The example below explores customer's revenue distribution by customer segments:
And this last example demonstrates that also calculated measures can be created automatically. Database tables contain data about fixed and variable costs and ChatGPT assumed that both these costs add up into total costs. The query responded also correctly on the term blagovna znamka, which can be translated as a brand which is available in Products dimension table. So it is not just compiling terminology but it is also making sense of data provided.
It's all Greek to me
I saved this one for the end. Quite a bit complex question in English display revenue, total costs, profit, profit margin, fixed cost share in total costs, variable costs share in total costs by product brand, customer segments and regions for this year would sound in Slovenian something like this:
prikaži prihodke, skupne stroške,profit in profitne stopnje,
delež fiksnih stroškov v skupnih stroških, delež variabilnih stroškov v skupnih stroških,
po blagovni znamki, segmentih kupcev in regijah za letos
This request is really a complex one. It includes terminology that is not directly stored in database tables. For example, besides total costs it also knows how to calculate profit margin (there is no instruction provided anywhere). The query also returns information about %s of cost by cost types in total costs - all calculated measures.
The result of the query is absolutely correct.
The table from above can be simply converted into something more visual, scatter graph for example:
Let me get back to the basic question: Can Oracle database talk to its users in Slovenian?
The answer is yes. Absolutely. However, users have to be careful. Responses that I am providing above were not always generated instantly and as expected. It required a bit of question-fine-tuning to get what I wanted. Sometimes response in English is immediately correct, but in Slovenian not always. I must confess there are still some examples that I was not able to persuade database to get back with the correct answers. Maybe something got lost in translation. I don't know to be honest.
The next question that pops up to my mind is what about other languages? Well situation is pretty similar as with Slovenian.
For example, Serbia is not that far from Slovenia. In Serbia, they use Cyrillic letters, therefore it is quite interesting to see how Oracle database would react to that?
Here is the question in Serbian language:
Well it works! What if I move a few hundred miles south to Greece? Translation looks very Greek to me - not having the slightest idea what the question is about.
Question in Greek is:
Ok. In this case the answer is not 100% same as the ones above. But I assume in this case, something was indeed lost in translation. But well, it's returning valid values, so probably native speaking Greek could do some fine tuning on the question and response would be correct as expected.
We could go on, asking question in various language and writings, but most of them would return back correct response. Just like the question in traditional Chinese:
按產品品牌、客戶群和地區顯示今年的收入、總成本、利潤、利潤率、固定成本佔總成本的份額、可變成本佔總成本的份額
To wrap it up
This was my second encounter with LLM and their integration with Oracle database. And my impression is even more positive than a week ago. I found database that can actually speak Slovenian.
My feeling is that developers should pay attention to underlying data structures in order to make the most out of generative LLMs. This means providing as much as possible metadata to LLMs to support query generation. This is what I see quite interesting, because analytics tools such as Oracle Analytics base everything on such semantic model. Data catalogs might also play very important role here as well.
Too make this closing short. I began my previous blog post with my beginnings with SQL. Now, database can speak my own natural language. So, how cool is that?