Crystal clear

Crystal clear

Structured data is potential profit: You can act on it because its meaning is clear. And while there is an ocean of unstructured data, how do you get at it to unleash its potential?

Today's business is awash with raw, unstructured data. Every day a tidal wave that could have been turned into actionable and profitable information crashes onto the business' shoreline - email servers - and goes to waste. Unstructured data - typically anything that is not information relating to a specific business transaction - is often treated as the lowest level of data, unappreciated as the business enabler it really is.

Yet the ocean of unstructured data held in email files and, hopefully, enterprise storage systems could be as strategically valuable as anything retained on a customer relationship management (CRM) system - but ultimately only as long as it is properly managed.

Unfortunately, typical situations see crucial business data stored in email and routinely deleted - information that could contribute to future product sales, greater customer retention or which might ward against various compliance risks.

Worse, it can simply be dumped onto a storage network and left unsifted, unsorted and disabled.

A research note released by Kevin McIsaac of analysts Intelligent Business Research Services (IBRS ) late last year, based on a survey of 81 Australian and New Zealand organisations, found 60 per cent of respondents believed email represented one of their top-data management problems.

McIsaac says that increasingly, the email server is being treated as an enormous database and IT is left to deal with the problem.

The most common practice for curbing email-storage growth was enforcing mailbox quotas; 45 per cent of respondents did so. With users then complaining, McIsaac says this has a worsening flow-on effect.

"What users then typically do is take their emails and archive them either on their local computer, networked or shared drive," McIsaac says.

"This is then completely unmanaged, the emails cannot be searched and the worst thing that happens is the message, which was originally sent to 10 people, is then stored 10 different times in 10 different locations all on the one shared resource."

The most common means of managing unstructured data in the form of email, according to McIsaac, is implementing an email-archival tool that a central administrator can use to find and archive every email ever received and stored by employees, and which in turn creates an email inbox of unlimited size.

But savvy chief information officers are already asking questions about the use of the data; what about the idea of a CRM tool embedded in an email client?

If the hassle in losing, or making better use of, critical data from an email system is stalled at the user level, perhaps the email server could pull its own weight and ease the load.

McIsaac says the flipside of that scenario is archiving email and then performing a name query on the email archive. But that is getting well ahead of today's problem.

There is a lot of talk about how to capture and manage unstructured data, but very little effective discussion on how to track such unstructured data.

Research today shows that IT has simply not done a good job of capturing transactional data.

"Imagine if a supplier and purchaser discussed a deal via email but signed the contract in person? Where is the initial pre-discussion and where is the system of record held? What happens when someone says they had a disagreement? All you have to back up the product purchase of intent is the invoice," McIsaac says.

McIsaac says individual business units should own the data, and charge for over-storage by the megabyte. For example, if email-storage costs reach over $500,000 annually for 30 sales staff, the CIO could talk about ways of reducing costs - thereby shifting the business towards the archiving approach.

Australian superannuation administrator Superpartners sees effective information capture and business growth as symbiotic.

Superpartners has taken a process-based approach to managing its incoming data, following a three-stage information lifecycle management (ILM) project starting in late May 2007. The company first designed its internal data structures and then went to tender for more storage equipment to aid its data structures. The tender is still out.

Superpartners' infrastructure manager, Alex Siggins, says the project aimed to build a structured process for retaining information stored in different channels such as static documents and spreadsheets.

Superannuation organisations need to recall old documents, which could be anything from letters to clients, or called up through basic image retrieval. Call centres also produce mountains of data that need to be captured and organised. Siggins says his main concern is knowing exactly where the data is.

Superpartners' executive manager, technology and innovation, Gary Evans, describes the entire project as being geared to changing the way data is used by the organisation.

"It is just opportune we are coming to the point to refresh our storage anyway. Our business needs have changed and we need to put in business intelligence software over our data, and cater for other data retrieval needs," Evans says.

"We implemented an information lifecycle management process in trials all over our systems. We looked at our data, where our snapshots were stored and how it was retrieved, how we use it as a business function, and then the integrator (HP) came back with a report on what changes need to happen.

"The whole project is around changing data and how we use it - one factor is data retrieval and how it is structured and the other factor is how it is stored to enable this."

Some in the consultancy game believe regulatory regimes such as Sarbanes-Oxley (a United States federal law enacted in 2002 as a response to a number of major corporate and accounting scandals) are focusing attention on the need to manage unstructured data. Some believe the rise in desktop categorisation, document and core network-search tools have all helped in classifying data, whether it is lying dormant on a local storage drive as a Word document, a PDF or spreadsheet.

Linking it all together through relevance is the most important part to get right, according to director of Altis Consulting, Gavin Cooke.

Altis Consulting was lead consultant in a recently completed agency-wide data-structure "re-jig" project for an Australian state police force. It implemented text-mining software and resulting processes as well as "groovy smarts" for contextual analysis of a wide range of data points. All the data smarts developed by Altis for the agency are oriented towards linking seemingly unrelated pieces of data to the "bigger picture", such as witness statements.

Cooke says the biggest users of unstructured data are security-oriented organisations such as the police - those who can benefit the most from extracting context and meaning behind an event, and establishing the identities which link event information.

A better internal data structure, in this instance, allows the police to pull out meanings of statements a witness gave and instantly build a scenario of context and meaning beyond the actual incident.

"Most law enforcement agencies have an incident-reporting system where someone reports their car stolen and the police officer on the other end of the phone keys in the individual fields on the system, but on the bottom of that electronic document is a free-format text-entry system," Cooke says.

"Once upon a time the poor clerk, or the poor constable, had to key in all this information. Now they just type all information into the free-format field to get an event number, then off they go."

Altis principal consultant Peter Hopwood was involved in similar introductions of data mining and data-process "re-engineering" for law enforcement in Britain. He says such considerations when managing unstructured data are directly transferable to enterprise firms the world over.

Hopwood believes the key challenge for law enforcement is dealing with slang and colloquialisms in data and that this is directly comparable to the challenges faced by corporates when dealing with customer names and languages in many dialects. "In the UK, I worked on developing and implementing a colloquialism translator that matches names with regions, regional dialects and spelling, which had a very similar project plan in ensuring finding relevant facts in embedded documents," he says.

Hopwood says to match their actual deeds to strategy, organisations are looking at structuring data from a variety of sources.

A lot of the unstructured data in this case comes from feedback from customer call centres and information buried in financial results, which are not easy to tie into progress and link back to original strategy without having a strict data structure.

Hopwood says organisations have typically applied some structure to internal data processes by purchasing KPI-styled benchmarking software that tags user performance to specific activities - such as keying in data correctly.

Event numbers and linking crime-incident databases is a world apart from the issues suffered by Superpartners, which had to develop and work from a standard in order to get on top of incoming data and relevant structure - hence the introduction of ILM processes.

Superpartners' Siggins says documented processes are a must, and an important part of the recent project was nutting out these processes before tendering for any future storage needs.

"As the information grows and business grows, data becomes an area that is very difficult to track ... The ILM strategy was developed because we have had phenomenal growth over the past three years as legislation changes in the industry - specifically around the introduction of choice of super funds - has seen substantial increases in the volume of transactions and activity," Siggins says.

Evans says an important part of the project is to replace storage systems with cheaper archiving alternatives and implement data-integrity checks to ensure all data is retrievable in a short time. (It uses an NAS and SAN array provided by EMC.)

"The project crosses over into looking at storage in a better way ... putting data that does not need quick recall onto cheaper storage mechanisms. We are looking at appropriate spend for appropriate use.

"The ILM project began May 2007 and is being done in three stages - first we designed the entire strategy, then designed the data set-ups then recently went to tender for the storage set-ups - next is the storage implementation."

Control information overload

1. Reconsider established processes from the ground up.

2. Educate employees on the value of information in email.

3. Consider long-term storage costs, and archive to appropriate media.

4. Implement process-based procedures to aid data transfer.

5. Consider network load when storing on a shared drive.

Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags unstructured datastorage. email

Show Comments