Skip to main content

Talking to Oracle Database, this time in plain Slovenian

In my previous blog post, Talking to Oracle Database in plain English I am discussing how to query Oracle Database using natural language queries to retrieve data from an Oracle database.

Select AI

From my quick exploration, I can conclude that AI has definitely entered databases and can be applied in very different scenarios. For example, using SELECT AI in chat application where data can be retrieved and presented from Oracle Database by applying natural language query. But that was in plain English.

But what about if my primary natural language isn't English. What if your (and, for this matter also mine) isn't English. What if it is Slovenian? Would SELECT AI still work?

Let's examine the very same examples that I used in my previous blog post.

Let's talk now ... "po slovensko" (Slovenian)

Enough said. Let's try the simplest query of all and let's ask my database what is the total revenue recorded. In Slovenian, the question what is total revenue translates into:

kakšni so skupni prihodki

And the response is?

Simple SELECT Ai Query

The response we've got is correct. Interesting, so far so good!

Just out of curiosity, what would be the response if I used narrate option in select ai?

select ai narrate kakšni so skupni pridodki

Well, database replies in Slovenian. Of course, what else ...

Simple SELECT Ai Query with NARRATE

In my next query I have asked database what is the revenue by customer segments in the last year. In Slovenian, the question is as follows:

select ai kakšen je prihodek po segmentih kupcev v zadnjem letu

And, I am getting used to it by now already, the question was understood and response given was the same as if I asked in plain English.

A bit more complex SELECT AI query

With showsql option, we can examine SQL that was generated:

A bit more complex SELECT AI query using showsql

SQL is correct:

Running SQL generated by SELECT AI

Or for potentially documenting purposes, we could ask database to explain that same SQL:

Explain SQL generated by SELECT AI

This response as such is obviously generated by ChatGPT and has nothing to do with underlying database. Nevertheless, could be very useful in some cases.

Going beyond simple

In the next couple of examples I tried to challenge Oracle Database in Slovenian a bit further asking questions that would include more data, more tables.

The first query is adding additional measures, and the question in english is what are total revenues and units sold in the last year. In Slovenian, this is as follows:

select ai prikaži letne prihodke in prodane količine po segmentih kupcev v zadnjem letu

And the response is again correct:

Adding measures in the query

Ok, let's drill down previous query by months of the last year (note: last year in my dataset is 2022):

select ai prikaži skupne prihodke in prodane količine po letih, mesecih in segmentih kupcev v zadnjem letu

Drill down to get more details by months

Again, database has nailed it. 

I was playing with questions and responses for some more time and here are some additional examples how to run efficient queries against Oracle database in Slovenian language. For comparison, the same questions in English, side by side:

The first example is to obtain top 5 products by revenue, for the current and previous year:

Top 5 products by revenue

The 2nd example extend previous example plus calculates absolute and relative difference between the two years:

Comparing and calculating

The example below explores customer's revenue distribution by customer segments:

Customers by segments

And this last example demonstrates that also calculated measures can be created automatically. Database tables contain data about fixed and variable costs and ChatGPT assumed that both these costs add up into total costs. The query responded also correctly on the term blagovna znamka, which can be translated as a brand which is available in Products dimension table. So it is not just compiling terminology but it is also making sense of data provided.

Total costs by brand

It's all Greek to me

I saved this one for the end. Quite a bit complex question in English display revenue, total costs, profit, profit margin, fixed cost share in total costs, variable costs share in total costs by product brand, customer segments and regions for this year would sound in Slovenian something like this:

prikaži prihodke, skupne stroške,profit in profitne stopnje,
delež fiksnih stroškov v skupnih stroških, delež variabilnih stroškov v skupnih stroških,
po blagovni znamki, segmentih kupcev in regijah za letos

This request is really a complex one. It includes terminology that is not directly stored in database tables. For example, besides total costs it also knows how to calculate profit margin (there is no instruction provided anywhere). The query also returns information about %s of cost by cost types in total costs - all calculated measures. 

The result of the query is absolutely correct.

Really complex one

The table from above can be simply converted into something more visual, scatter graph for example:

Really complex one as scatter graph

Let me get back to the basic question: Can Oracle database talk to its users in Slovenian? 

The answer is yes. Absolutely. However, users have to be careful. Responses that I am providing above were not always generated instantly and as expected. It required a bit of question-fine-tuning to get what I wanted. Sometimes response in English is immediately correct, but in Slovenian  not always.  I must confess there are still some examples that I was not able to persuade database to get back with the correct answers. Maybe something got lost in translation. I don't know to be honest.

The next question that pops up to my mind is what about other languages? Well situation is pretty similar as with Slovenian.

For example, Serbia is not that far from Slovenia. In Serbia, they use Cyrillic letters, therefore it is quite interesting to see how Oracle database would react to that?

Here is the question in Serbian language:

приказати приход, укупне трошкове, профит, профитну маржу,
удео фиксних трошкова у укупним трошковима, удео варијабилних трошкова у укупним трошковима
према бренду производа, сегментима купаца и регионима за ову годину

Really complex one in Serbian Cyrillic

Well it works! What if I move a few hundred miles south to Greece? Translation looks very Greek to me - not having the slightest idea what the question is about.

Question in Greek is:

έσοδα προβολής, συνολικό κόστος, κέρδος, περιθώριο κέρδους,
μερίδιο σταθερού κόστους στο συνολικό κόστος, μεταβλητό μερίδιο κόστους στο
συνολικό κόστος ανά επωνυμία προϊόντος, τμήματα πελατών και περιοχές για φέτος

Really complex one in Greek

Ok. In this case the answer is not 100% same as the ones above. But I assume in this case, something was indeed lost in translation. But well, it's returning valid values, so probably native speaking Greek could do some fine tuning on the question and response would be correct as expected.

We could go on, asking question in various language and writings, but most of them would return back correct response. Just like the question in traditional Chinese:

按產品品牌、客戶群和地區顯示今年的收入、總成本、利潤、利潤率、固定成本佔總成本的份額、可變成本佔總成本的份額

Really complex one in Chinese

To wrap it up

This was my second encounter with LLM and their integration with Oracle database. And my impression is even more positive than a week ago. I found database that can actually speak Slovenian.

My feeling is that developers should pay attention to underlying data structures in order to make the most out of generative LLMs. This means providing as much as possible metadata to LLMs to support query generation. This is what I see quite interesting, because analytics tools such as Oracle Analytics base everything on such semantic model. Data catalogs might also play very important role here as well.

Too make this closing short. I began my previous blog post with my beginnings with SQL. Now, database can speak my own natural language. So, how cool is that?