Skip to main content

Another box of chocolates: Language AI and Oracle Analytics

OCI Language AI and Oracle Analytics

I have been writing about OCI Vision services and integration between OCI Vision to Oracle Analyticsin previous blog posts and how these services can integrate and be used with Oracle Analytics Cloud (OAC)

In the meantime Oracle has released the Oracle Analytics January 2023 release, which comes with an option to integrate OAC with another OCI AI service, OCI language service (Language).

In this blog post I am focusing on the integration between OAC and Language. This means, I am not focusing on the training of a custom model (possible topic for one of the future blog posts), but I am using pre-trained model available within Language. Assuming that model already exists, connecting OAC instance to Language has to follow the following steps:

  • Language requires storage to store temporary results. This is why, a staging bucket is created in Object Storage.
  • In order to access Language and to access staging bucket OAC users require few policies that needs to be set.
  • When all policies are set, connection between OAC and OCI Resource, in this case Language, can be established.
  • If connection is valid, register Language model (in this case the generic one) with OAC.
  • Data Flow is then used to apply Language model to perform, for example sentiment analysis, on the textual dataset. A new dataset with prediction data is created as a result of running data flow.
  • Finally, a new dataset can be used in visualizations.

Before you begin using Language with OAC, you need to find proper dataset with textual data. I am using a subset of 15k rows from Madrid_reviews.csv dataset which is part of Six TripAdvisor Datasets for NLP Tasks on Kaggle.

But let's talk about Language first.

OCI Language

Language is one of OCI AI services accessible using Console, REST API, SDK and CLI. Users have an opportunity to use pre-trained model as well as their own, custom model to process unstructured textual data and extract insights. All that the same way as they are able to do using OCI Vision service (Vision), without any machine learning experience and knowledge. Practically self-service.

In this blog, I am focusing on pre-trained models, which are frequently retrained by Oracle. Using pre-trained model, users can access the following capabilities:

  • language detection,
  • text classification,
  • named entity recognition,
  • key phrase extraction,
  • sentiment analysis,
  • text translation.

OCI Language AI services

When working with all models (not just pre-trained, but custom also) you have to understand that the best results are achieved when using English. There is no auto-correction for misspelling.You might also find some other rules and constraints, which are described in more details in Language documentation.

Let’s take a look at an example of a hotel review. For example, we would like to analyze the following restaurant review:

The menu of Yakuza is a bit of a lottery, some plates are really good (like most of the sushi rolls) and instead some others are terrible ( the pizza sushi and most of the fried starters). Taking this in consideration, it´s a great option if you feel like sushi and can avoid ordering from the rest of the menu. We even ordered for delivery more than once and the packaging they use is great., Dec 10, 2019.

If you navigate to text analytics in Language console, simply copy the text from above in text box field.

Analyze text

On Analyze, text analysis in a generic language model is performed automatically and results are displayed.

Results of text analysis

Sentiment analysis is performed on document, on aspect (word) and on sentence level. Sentiment (positive, neutral, mixed, negative) is easily detectable using color coding.

Results of text analysis

In the last results section, Personal identifiable information (PII) is displayed. For any PII, users can configure masking. In text example above, there aren't any PII, hence text has not been affected.

Results of text analysis

In our example little further on this blog post, generic language model is used to integrate it with OAC where bulk sentiment analysis is performed.

Bringing Language and Analytics together

Setting some prerequisites

Before you can use Language, users require some policies to be set for them. For example, the following policy needs to be set:

allow group <group-name> to use ai-service-language-family in tenancy

Policy to allow user group to use AI Language Service

And there is another prerequisite: Language requires a staging bucket for storing temporary results. This bucket needs to be created and policies for managing it have to granted to the user group which is using Language.

Staging bucket for AI Language Service

allow group <group_in_tenancy> to read objectstorage-namespaces in compartment <compartment>
allow group <group_in_tenancy> to read buckets in compartment <compartment>
allow group <group_in_tenancy> to manage objects in compartment <compartment> where target.bucket.name='<staging_bucket_name>'

Policy to allow user group to use and manage staging bucket

Establishing a connection to Language from Analytics

Connecting from OAC to Language is same as for Vision. When defining a new connection, OCI Resource connection type is available in OAC. What is needed is some parameters from OCI environment and information for the user connecting to OCI resource.

Creating connection to OCI Resource

Based on valid connection generic language model can be registered with OAC.

Register Language Model

And then, users can choose between several pre-trained models. For example, sentiment analysis.

Select pre-trained sentiment analysis model

Deploying model for Sentiment Analysis

When the model is registered with OAC, pre-trained sentiment analysis (in the case described) language model can be used for sentiment analysis in Data Flows.

Data flow itself is pretty straight forward. As any other activity within OAC, it is designed to be used by any slightly advanced user, as it is easy to use, quite intuitive and consists of only 3 steps:

  • Read restaurant review dataset.
  • Apply language model on provided dataset.
  • Store results into a new dataset.

Apply Language model for Sentiment Analysis

Based on the selection in the 2nd, Apply AI Model, sentiment analysis is performed on different sentiment levels:

  • Aspect (individual word),
  • Sentence,
  • Both, Aspect and Sentence

Sentiment Analysis Levels.

Based on this selection, result of different granularity are generated. In any case, sentiment analysis for the document level (in our case, restaurant review) is performed.

Results of the sentiment analysis are already visible, like a preview, in a data flow, where Sentiment is calculated, based on Positive Score, Negative Score, Neutral Score and Mixed Score metrics. Depending on which of the score metrics is the highest, sentiment is determined.

Visualizing Sentiment Analysis

Once a new sentiment analysis dataset is created, it can be instantly used in a new OAC workbook.

For example, number of reviews by sentiment by restaurant can lead to detailed analysis of each given review.

Sentiment Analysis Workbook.

Conclusion

In this short example, we can see (again) how easy is to use machine learning for analytics within Oracle Cloud Infrastructure. Just like VisionLanguage is really easy to use as a stand-alone service. To analyze text, users, like business analysts, don't have to be data scientists. What they need is just a couple of clicks and of course basic understanding of AI service they are using.

And on top of it, this short example also demonstrates how easy is to connect OAC to Language service. The only technical stuff is to set policies and create connection between the two OCI services. But once these are done (usually by OCI administrator), then users can take it from there and deploy their data and visualize it in OAC. Just like any other dataset they might use in their analyses. Or as the title suggest: Just another box of chocolates.