CIO Upfront: 'Big data is too important to leave to data scientists'
- 03 June, 2015 06:00
I'm just waiting for a probabilistic demonstration of the long-rumoured (but never before established!) correlation between sales of beer and nappies.
I talk at a lot of these conferences and I see a lot of this stuff. And while it’s true that most of the presenters have vastly more compelling graphics than me (not to mention hipper glasses and cooler t-shirts), I for one don’t care. And neither do my customers.
This isn't because the customers we work with don't have big data. We're a data company: all of our customers are managing data; many are grappling with big data issues, yet few of the people we sell to are card-carrying data scientists.
Put simply, our customers – and tens of thousands like them – are still dealing with the mundane problems that someone has to worry about. They're the people trying to put together consistent reports on sales – and this is before they start to tackle the problem of preparing their data for analysis: i.e., of identifying the right data, putting it into the right format, and bringing it together – in the right place, at the right time. These are the Hard Problems of data engineering that are a prelude to any analytical wizardry.
The big data initiatives I talk to customers about tend to be on the edge. They are experiments and projects and, with the exception of some startup online businesses, not the critical data of the organisation. If a company really needs to rely on the results of a query or analysis – for example, when corporate executives could end up imprisoned if the information they've provided is incorrect or incomplete – they're more concerned with issues of data accuracy and provenance; they're concerned with cleansed, vetted, standardised data that provides a baseline – an unshakeable ground – for day-to-day business decision-making. This isn't big data.
All of our customers are managing data; many are grappling with big data issues, yet few of the people we sell to are card-carrying data scientists.
For this reason, the data scientists are not on the critical path.
And because they are not on the critical path, they are given a pass of sorts. Not a free ride, per se; just an easier ride: they are allowed to take shortcuts.
For example, instead of systematically cleansing source data – i.e., matching, de-duping, and standardising it so that it's reliably consistent the data scientist can tailor a data set to suit his or her own (highly idiosyncratic) hypotheses.
The data scientists also enjoy that most exquisite of perquisites: their employers don't necessarily expect results. Unlike their counterparts in marketing or other operational contexts, the budget they are allocated isn't contingent on the “success” of their project. If the results of their analyses come up empty – e.g., if a correlation can't be established, if a critical insight can't be identified and operationalised – no biggie: it's only a science experiment, right?
Which brings me to my Big Reveal: big data is too important – too potentially transformative – to leave to the data scientists and their science “experiments”. Data science isn't in any sense synonymous with big data; data science is concerned with “big” and “small” data – and the data scientists ignoresmall data at their peril. The category of big data should neither be rarefied as a highly specialised domain nor ghettoised as a Babel of inconsistent or impossible-to-standardise data types.
That's where we as data people come in. All of us – data warehouse developers, business analysts, and marketers – need to embrace big data. Yes, it can be put to highly specialised uses, but it can also be used to augment or enrich day-to-day decision making. And to the extent it's used in conjunction with day-to-day decision making, it must be governed: standardised, cleansed, and controlled.
You know, the boring stuff they don't talk about at big data conferences.
We need to embrace big data as inevitable and valuable. First, because it's cheap: big data technologies are priced to move – like all gateway drugs! – with absurdly low initial costs of entry. (And as we all know, cheap is a great place for any disruptive technology to be.) It doesn't matter that the big data adoption model transfers costs from external vendors (that collectively deliver software stacks that work) to internal resources that are trying to reduplicate these existing solutions – in most cases, by building their big-data equivalents from scratch.
This is early adopter behaviour that will not last for ever. With our help, we can demonstrate that big data technologies deliver difference-making value and ROI when they're used to complement or extend the data warehouse and traditional databases – or (and this is where data scientists comes in) to address a range of entirely new data management and analytic use cases for which these technologies simply were not designed.
One aspect of this is a new ability to get value from data the industry has traditionally struggled with. Most data warehouses are optimised to deal with static, structured data. Big data technologies are ideal for semi-structured and streaming data. For some industries, the ability to store, manage, and analyse this data will result in valuable new insights. At this point, it doesn't matter that the big data emperor doesn't have much in the way of clothing – i.e., that lots of big data promises are hollow; this, too, is an aspect of early adoption. It's the way that markets work. It's how many of us in data management first wedged our own feet in the door 20 years ago.
Speaking of which, the data industry needs to embrace big data as an opportunity to deliver on its own forgotten promises, from “better access to data” to “better decision making” to “centralised data management” and the tantalising promise of ubiquitous access to data. Big data enables us to reset expectations; with the right mindset, the right approach, and the right technologies, it gives us a totally new context in which to deliver value from data.
If we don’t take the big data bull by the horns, we're all too familiar with the likely endgame. We know the plot to this story, because we've been through it before.
There was a time when the data warehouse promised to change the world. The industry mucked it up, but we're wiser for the experience. We have much to teach the big data parvenu – starting with the fact that big data is too damn important to be left to the data scientists.
— Compassites (@compassites) June 3, 2015
— Michael Hiskey (@mphnyc) June 3, 2015
Submit contributions to CIO Upfront to email@example.com
Follow Divina Paredes on Twitter: @divinap
Follow CIO New Zealand on Twitter:@cio_nz