I had an opportunity to present our big data proof of concept project that we've done couple of months ago for the Ministry of Interior of one of countries in Central and East Europe region.
The project was about demonstrating some potential use cases that could be used beside regular business intelligence and analytics solutions. Today I have spoken about two use cases that addressed business problems that customer would like to resolve when faced with large (in most cases unstructured) data sets that contain valuable information. Of course doing this, there are certain challenges when facing these type of business problems, like data availability. As, we were engaged as an Oracle partner, obviously the proof-of-concept project we have developed was based on Oracle Big Data Management and Analytics platform.
There were two use cases we addressed:
- loading and processing ustructured data and
- predicting traffic accidents.
Looking back, I find this proof-of-concept as really interesting as it addressed both aspects of big data solutions, the Hadoop infrastructure and predictive analytics. Results of both cases were later brought into Oracle Business Intelligence where they were connected with the regular relational data and presented in dashboards, using geo-spatial views.
Loading and processing of unstructured data was focused on data retrieved from text files containing valuable information about traffic patrols. We are talking thousands of documents that simply sat in some file systems, not being touched even once since they were stored there.
The solution provided is build around Hadoop, into which all files are imported. Using map-reduce data is extracted and stored as Hive tables. There are two options to go from there. One can connect Oracle BI server directly to Hive or using Big Data SQL, data can be accessed via Oracle Database. In any case, a new data sources could be easily then modelled in BI Administration tool and brought to Oracle BI solution.
Predicting traffic accidents is another use case which we prepared. For the predictions we used data from existing road accidents database, which we combined with weather data we found on public weather portals. We were able to produce a model that actually performed quite well, but for better prediction we should have had some more data to build a model that could be used in practice. But I guess for the proof-of-concept, that would do.
Anyway, we have used Oracle R Enterprise, which is part of Oracle's database option Oracle Advanced Analytics. Oracle R Enterprise is Oracle's implementation of powerful statistical language R, which is capable to use all resources available to Oracle database, including database security.