Association Rules and Recommendations with Oracle Analytics
A few months ago, I was writing about Market Basket Analysis that you can perform in Oracle Autonomous Database and Frequent Itemsets that can also be created with Oracle Analytics.
When working with Oracle Analytics, one important part of Market Basket Analysis was missing - Association Rules and Recommendations. Oracle Analytics from version 6.0 supports this feature, therefore complete Market Basket Analysis can be done now.
Let's take a look at the data flow we created for creation of Frequent Itemset. It consists of three steps:
As usual, we begin a data flow by reading data. In our case we read database data table CARBO_TRANSACTIONS. This table has relatively simple structure:
For our analysis only BASKET and UPC columns are important.
In the second step, Database Analytics step with analytics operation Frequent Itemsets we define parameters for the algorithm.
In the Outputs, we define output columns. In our case we are looking only for combinations of the two product combination in a basket, which are specified as Item1 and Item2. Item3 and Item4 are not used.
In this same step we also define Parameters. And here is the difference from version 5.9.
- Transaction Id Column is basically transaction identifier, in our case this is BASKET number.
- Item Value Column is product identifier, in our case UPC.
- Maximum and Minimum Itemset Size define product combination sizes we are interested in. We are going to analyse only combinations of two items.
- Minimum Support Percent filters out all item combinations with SupportPercentage less than the value set by this parameter. We had to go quite low do to low SupportPercentage for our frequent itemsets.
- Top N Frequent Itemset specifies a number of top N itemsets ordered by SupportPercentage. In our case we left the default value of 50.
- Generate Association Rules parameter defines if association rules should be generated or no.
- Top N Association Rules parameter defines how many association rules are generated and stored. Please be aware that "top" is defined by Confidence and then Support.
- Association Confidence Percent defines the cut off limit for minimal confidence of the rule generated.
And here I hit on a problem. In case of large datasets with a lot of products and transactions, Support and Confidence Percents tend to be lower that 0.25 and 0.1 respectively. So in my case I didn't get any frequent itemsets and association rules generated at all. Which is quite annoying and tells me that I should go back to Autonomous Database and to my analysis there.
In order to reduce the number of rows I changed the data flow a bit and add a filter step to filter data based on a store. If we are looking at the best sales combinations and recommendation of two products, this has some results.
So here is data flow:
Original dataset is now filter using STORE value:
Data Analytics step stays unchanged and in the last step I am storing datasets into database:
There will be two tables created:
- CARBO_BASKET_ANALYSIS and
The first table, CARBO_BASKET_ANALYSIS, contains data about Frequent Itemsets
and the second table, CARBO_BASKET_ANALYSIS_AR, is holding information about association rules and therefore product recommendations:
Yo can see the difference between the two. In the Frequent Itemset table, we are looking at particular combination of two items (it can go up to combinations of 4). Metrics that are presented are number of items in a combination, number of occasions of the combination and total number of frequent itemsets. This information gives us Support for the particular combination of items.
In the Association Rules table we can see Antecedent Item(s), maximum 3, which are combined in an association rule with the Consequent Item. And there are also corresponding Support (for each of the items in a combination and for all items in a combination together), Confidence and Lift metrics.
An at the end, we can visualise the model:
This concludes my writing about Market Basket Analysis, Frequent Itemsets and Association Rules. You can access previous blogposts using these links: