What is Big Data? Discussion of this topic, ever more prominent on the international stage, is inclined to stumble over the definition.
Critics’ implication is often that, given the relatively small size of most New Zealand businesses, we don’t encounter “big data” in its true sense.
Jim Davis, chief marketing officer at business analytics leader SAS, defines big data as “data that exceeds the processing capability of conventional database management systems”.
Data of this volume and complexity can stretch processing capacity even in a small company, he says, particularly when unstructured and semi-structured data such as text documents, emails and worldwide social networking input is thrown into the mix.
Even in New Zealand, the accepted way of providing “business intelligence” is to extract a sample of data and reformat it into a structure suitable for online analytical processing – the so-called OLAP cube. Creating such a cube is often an overnight process and this limits the number of times BI routines can be run on fresh data.
With its “high-performance analytics” platform, says Davis, SAS can work through “billions of [database table] rows in a few seconds”, working with the full raw data, not an extract. Practical demonstrations by SAS technical expert Justin Choy at a recent Wellington presentation appeared to bear this out.
Davis distinguishes between conventional business intelligence, which concentrates on specific standard reports and occasionally an ad hoc report – which takes time to produce.
It’s reactive, he says, whereas true business analytics can be used for proactive moves; a business can do detailed modelling and forecasting; carefully planned analytics should be able to identify, for example, the customers who are thinking of changing suppliers, so they can be engaged with offers and discussions that will improve the chances of keeping them on-board.
“With business analytics we construct virtual cubes, defining our dimensions on-the-fly.”
SAS introduced “in-database processing” to avoid the time-lag of extracting data from the database for analytical routines to work on. Companies such as Teradata and Greenplum employ this to produce valuable business intelligence but in the face of steadily increasing data volume, even this will not be enough, says Davis. “We are seeing limitations in relational database management systems. We can no longer expect to run these models and do additional analytic processing in a relational database environment; it just won’t work.” Processes based on RDBMSs such as SQL will deliver business intelligence but not full business analytics.
You cannot do predictive modelling, optimisation, forecasting by statistical regression analysis and other business analytics tasks properly using SQL, he maintains: “It’s just impossible.”
SAS is now practising high-performance “in-memory” analytics, based on a server built of commodity hardware blades, each containing a powerful Intel processor and working in parallel. Data sufficient to build a BA model can be moved from storage to such a blade server in five to six seconds, he says. Data streaming can be set up, to provide continuously current data.
“We’re committed to this as a platform” Davis says; “this is not just a new product; this is something very big for SAS and for our partners.”
All the analytics routines SAS has built up over decades will be implemented on the new high-performance platform.
SAS is not only using the power of its generalised analytics routines on the high-performance platform, it has also produced vertical solutions.
“Over the last 10 years, we’ve realised it’s very important to focus in on problems that are specific to particular industries. So we’ve taken our [vertical] solutions and pushed those into this platform – we can do retail planning, revenue optimisation, marketing optimisation, stress testing, liquidity risk management, a very long list.
“We were looking at risk calculations for a very large financial organisation in Asia recently – it took 18 to 20 hours [using conventional database techniques]. With in-memory analytics you see the results in 10 to 15 minutes,” Davis claims.
A risk calculation does not necessarily involve “big data”, he says, “but it does take time”.
However, another argument against spending the money on high-performance analytics is “I don’t need results that quickly”. The question to ask, Davis says, is “what would you do with the hours saved?” One obvious answer is to run the routines again with adjusted parameters to do deeper analysis and come up with better, more fully considered answers.
One of the logical counterpoints to efficient processing of large volumes of data or complex data is visualisation, to render a mass of data easily comprehensible. SAS has devised a new set of visualisation routines that allow analysis based on large amounts of data to be presented in ways that can be readily appreciated and – equally important – to be easily distributed to staff with smartphones and tablets who can suggest adjustments to parameters and new ways of looking at the results.
Many minds can then quickly be brought to bear on “exploring the data, turning it upside down” and coming up with even more potentially useful results, Choy says.