Skip to main content

Market Basket Analysis with Oracle Machine Learning and Oracle Analytics

Introduction

Recommendations are usually always there to stimulate you to buy additional products or services regardless if you are shopping online or in a store. Sometimes these recommendations are just intuitions of sales manager, but more often these are prepared using some machine learning algorithm, for example Association Rules.

When Association Rules modeling is applied to transaction sales data, the model is called Market-Basket Analysis. Market-Basket Analysis can be used in various situations, from direct marketing, sales promotions, discovering business trends, but also for effectively managing store layouts, catalog design or for exposing cross-sell opportunities in a web store.

Since we I am not going to discuss theory about Market Basket Analysis and Association Rules, please find more information here: 

  • https://docs.oracle.com/en/database/oracle/oracle-database/19/dmcon/data-mining-basics.html,
  • https://docs.oracle.com/en/database/oracle/oracle-database/19/dmcon/association.html.

Key Rule Metrics: Support, Confidence and Lift

Support is a measure that tells us how many times an itemset appears in a dataset.

Confidence of a rule indicates the probability of both the antecedent and the consequent appearing in the same transaction.

Confidence is the ratio of the rule support to the number of transactions that include the antecedent.
Both support and confidence must be used to determine if a rule is valid. However, there are times when both of these measures may be high, and yet still produce a rule that is not useful. 

Lift indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent, given their individual support. It provides information about the improvement, the increase in probability of the consequent given the antecedent.

Any rule with an improvement of less than 1 does not indicate a real cross-selling opportunity.

The data source: Transactions

The basis for Market-Basket Analysis are transactions. The goal of Market-Basket Analysis is to identify the products which are usually sold together. 

We can use the results of Market-Basket Analysis to make recommendations. For example, when shopping online, once a customer adds a product to her basket, then recommendation engine would suggest buying a product that is most often sold together with the product that has already been added in the basket.

Data that is required is expected at the lowest possible data level, for example POS Sales transaction. 

Data that I am using in this example can be downloaded from Dunnhumby source files for Carbo-Loading (https://www.dunnhumby.com/source-files/). Data model that is imported into Oracle Autonomous Data Warehouse is as follows:

For our Market-Basket Analysis exercise, CARBO_TRANSACTIONS database table is relevant, with the following two columns:

  • BASKET is simply basket ID and
  • UPC which contains product codes.

Each basket can have one or more rows in this table, for example:


Building Market Basket Analysis Model

We will use Oracle Autonomous Data Warehouse and Oracle Machine Learning for PL/SQL to create a model.

To get to this, just navigate to Service Console / Development in OCI console for your ADW. Once you select Oracle Machine Learning Notebooks and provide your credentials, Zeppelin notebook will open.


Start by creating a new Notebook. 

Provide your new notebook a name:


Once your new notebook is created, you can begin. First, let’s a view that we will use as a basis for our model:

We can create a model now.


First, we specify some parameters, that will be used for the model creation:

  • ALGO_NAME is algorithm name to be used in model creation. In our case, this is Apriori.
  • PREP_AUTO is a parameter which defines if there is additional automatic data preparation required. In most cases this should be set to ON. Algorithm will then perform necessary transformation before actual data model is created.
  • ASSO_MIN_SUPPORT defines minimum “cut-off” support. We can define this value to avoid running into too many samples with very rare occurance.
  • ASSO_MIN_CONFIDENCE defines minimum confidence in itemsets. It doesn’t make sense to set this value too high or too low as there could be only a couple of cases or too many. 
  • ASSO_MAX_RULE_LENGTH simply defines the size of the itemset. In our case it is set to 3, which means we will look into all itemsets with size 2 or 3.
  • ODMS_ITEM_ID_COLUMN_NAME is the column that contains information about (in our case) products.

When parameters are prepared, model can be built using DBMS_DATA_MINING.CREATE_MODEL2 function call. Parameters that are required to create association rules model are the following:

  • MODEL_NAME,
  • MINING_FUNCTION is ASSOCIATION
  • DATA QUERY which retrieves data, in our case this is a view we created in the beginning,
  • SET_LIST is a list of parameters defined above,
  • CASE_ID_COLUMN_NAME is basically basket ID,  in our case BASKET.

The Model

When creation of the model is completed, then we can start reviewing it and we can start our analysis:

When the model is created, there are several database tables and views created as well. We need to be familiar with these tables and as they will be the basis for our further analysis.

You can find more detailed explanation on Model Detail Views in Oracle Data Mining User’s Guide (https://docs.oracle.com/en/database/oracle/oracle-database/19/dmprg/model-detail-views.html), however I would like to emphasise the two which I find the most important:

Model detail view for Transactional Rule describes the transactional rule view and transactional itemsets view:

  • Itemsets: DM$VTmodel_name view provides the itemsets information in transactional format:

  • Transaction Rules: DM$VAmodel_name view provides information about Transactional Rules:

In our example, these two views are:

  • DM$VTAR_CARBO_TRANS and
  • DM$VAAR_CARBO_TRANS.

Let’s check which are the most Frequent Itemsets.

We can see that ITEMSET_ID, which consists of two ITEMS (product UPC codes), 360001369 and 9999985029, has a support 0.00169, meaning that this combination is represent 0,169% of all itemsets in our transaction data table. 

Let’s examine now what is Association Rule behind this Frequent Itemset:


We can see that RULE_CONFIDENCE is pretty high, and RULE_LIFT is high as well. So, we have clear candidate for cross-sale.

We can also take ANTECEDENT_PREDICATE (for example: product with UPC 3620001369) and find out which products are also sold very often with that product, which can be treated as clear recommendations for cross-sale.

Market-Basket Analysis visualization and analysis

At the end it would be great if we could bring the Market-Basket Analysis results into Oracle Analytics and visualize the whole thing. It is not to expect that regular “category managers” would be skilled enough to write SQL code to retrieve recommendations from model views.

In dashboard below, we can see Top 25 Itemsets by Itemset Support. By clicking on Itemset ID in the top left bar chart, detailed information can be displayed. 

We can see Itemset ID 8561 with two known products, PRIVATE LABEL FETTUCCINI (UPC: 9999985029) and RAGU\CHZ CREATION ALFRDO (UPC: 3620001369). 

At the bottom of the screen, we can examine correlation between the two products in terms of sales value, units sold and # of households buying these two products.

In the middle section, we can see Association Rules for the selected Itemset. You can observe that information provided is very similar to the one we saw in the model view, DM$VAAR_CARBO_TRANS,  we used in the previous step. 


To further analyze association rules in our model, we have two action links defined to navigate to Recommendations for Antecedent and Recommendations for Consequent canvases.

By activating action menu that takes us to Recommendations for Antecedent, we can analyze which products can be recommended to a buyer when he already picked RAGU\CHZ CREATION ALFRDO (UPC: 3620001369):


Sankey chart presents these recommendations really well. You can now see by darkness and width of the lines, which other products can be recommended for initially selected product. We can see that there is another product which has higher RULE_LIFT than the one we observed initially. This can happen as we initiated our analysis by looking at ITEMSET_SUPPORT. 

We can now continue with our analysis to explore which products are sold well together with that other product, RAGU CHS CREATN PARM MOZZ (product UPC: 2620000217). 

By activating another Data Action, Recommendations for Consequent, this time in the first line of the table on the right, we can get recommendations for RAGU CHS CREATN PARM MOZZ (product UPC: 2620000217):


We could go on now analysing all these relationships. But basically, the question is: how is this done in backend?

I created a data model using Data Modeler tool, which is part of Oracle Analytics and I came up with the following model:


In the model above, I have imported Frequent Itemset model view and Associations Rules model view as Fact tables. 

Association Rules is linked, on logical level, to Antecedent and Consequent Product dimension tables, which enables us navigation in both directions. 

For Frequent Itemsets fact table, additional Itemsets dimension table is created which is also dimension in Association Rules submodel. Function of this dimension is also linkage between Freqent Itemsets and Business Rules. 

Model has been created using Data Modeler tool, Model Administration Tool is used only for data model visualization.