Working with Graphs in Oracle Analytics

Working with Graphs in Oracle Analytics - Node Ranking

Node Rank

Graphs describe relationships between various entities in a dataset. When defining graphs, then nodes correspond to entities and edges represent relationships between them. Node Ranking measures the importance of nodes in a graph.

For example, in social networks we have people. We represent people, members of a social network, as nodes. The relationships, e.g. friendship, are represented as links between two people. People with a lot of friends are often called influencers, as their decisions, opinions, etc. are exposed to more people than other members of the network. Their rank is higher. Marketing departments tend to focus on and work with these influencers in order to get the maximum reach within their networks. Measure that measures this phenomena is called Node Rank.

We can find similar use cases practically everywhere. In transportation, for instance, the network of airports has more and less important airports. Airport hubs are those airports which have the most connection flights with other airports, hence they have higher Node Rank than others.

Data

For this little demonstration, I have used the Dolphins dataset, which contains data about a social network of bottlenose dolphins. The dataset contains a list of all pairwise links, where a link represents frequent associations between two dolphins.

Node Ranking in Oracle Analytics

The Data Flow

Oracle Analytics allows users to perform Graph Analytics by using Data Flows. Data Flows is a “mini-ETL” tool inside Oracle Analytics, which is used to perform various data preparation tasks.

Each data flow consists of several sequences or parallel tasks, called Steps. In the case of Graph Analytics and Graph Operations, the Data Flow has three steps:

The first step is about reading the data. Oracle Analytics data flow steps for Graph Analytics are all executed in an Oracle database. This is why the initial dataset has to reside also in the same Oracle database.

We simply read Dolphins dataset and choose two attributes, the source and the destination node of a relationship between two dolphins.

In the next step, Graph Operation is added to the data flow. There are four Graph Operations. Besides Node Ranking, there are the following operations available: Clustering, Shortest Path and Sub Graph.

In order to add a new Graph Operation step, click on Add Step and choose Graph Analytics.

Definition of the Node Ranking is pretty straightforward. In the Parameters section, simply define Source Column and Destination Column.

In our case, these are DOLPHIN1 and DOLPHIN2. In the Outputs section we need to define the Node_Vertex attribute which is called Dolphin and maps to DOLPHIN1 (we will use this definition later in data visualization), and Rank.

In the last step, a new dataset (database table) with the two attributes is created. For each node in the graph, its rank is calculated.

Visualization

We can begin the visualisation exercise by visualizing the newly generated Dolphins Node Rank dataset.

We can immediately see, which nodes are more important, ie. which dolphins are influencers. However, we are talking about graphs, so graph visualization is expected. Isn’t it?

In order to present dolphin influencers on a graph, we need to create a join with original dolphins social network. First, we need to add additional dataset to this project.

Join between two datasets is defined on the Source Column, which is DOLPHIN1 from Dolphins dataset and Dolphin attribute from Dolphins Node Rank dataset.

The next step is to return back to Visualize tab and add a new visualization object, Network 10K Plugin.

In the graph we can clearly see which nodes have higher rank, hence they are more important.

Conclusion

Graph Analytics functions in Oracle Analytics are really easy to use. Basically there is no coding required, and also data flows are executed very efficiently. I have experienced a little bit of performance issues when I visualized the network using Network 10K Plugin. With larger graphs (a couple of thousand nodes and vertices), things get a bit slower and graph “loading” might take a bit more time. But after all, these few initial out-of-the-box features, like Page Ranking, are really nice and easy to use and I am convinced there are use cases which can be deployed in business analyses performed by business users and analysts.

Žiga Vaupot's Blog

Search This Blog