Where are all these files coming from?

Where are all these files coming from?

You might be surprised to discover just how much of your business data is stored in widely accessible files on shared network storage. At a time when companies are focused on building data warehouses and squeezing as much business intelligence out of their databases as possible, files actually make up as much as 80% of business data, according to market analyst firm IDC. That is a staggering number, and enough to make you wonder: Where is all this unstructured data coming from, and is it relevant business data?

The answer is that file data is unquestionably relevant, as well as valuable. Not only is it valued by businesses and regulators, but it is prized by malicious insiders. For example, last July, a former Goldman Sachs worker was arrested for downloading software source code with the intention of taking it with him to a new employer. And, earlier in 2009, a Microsoft employee was accused of stealing documents he planned to use in a lawsuit against the company. More recently, in March 2010, the Canada Revenue Agency, Canada's equivalent to the US Internal Revenue Service, disclosed that CRA employees had been accessing hundreds of files and using the information for everything from financial gain to providing preferential treatment for friends and relatives.

And, as noted, regulators are also concerned with file-based data. Regulations that address data security--such as Sarbanes Oxley (SOX), Payment Card Industry Data Security Standard (PCI DSS), Health Insurance Portability and Accountability Act (HIPAA) and others--do not limit their scope based on data format. They apply equally to files, databases, applications, etc. So, even if an organization uses financial applications and databases to manage their finances, when financial data that is governed by SOX is exported to a spreadsheet for manipulation, the handling of the spreadsheet must also comply.

To answer the question of where all this valuable file data coming from, here's a quick checklist of sources to consider as you survey your own file data landscape, as well as thoughts on protecting these files.

Business applications and databases

Whether your business applications and databases are running in-house or in the cloud, mid-level managers are probably using them to export interesting data for analysis, reporting, presentations and other legitimate business activities. For example, six months of sales data from your account can be invaluable in assessing sales trends and diagnosing operational issues. But, when the resulting spreadsheets, documents and presentations containing exported information are stored on shared file systems for enhanced communications and collaboration, you have a data security risk that needs to be mitigated. And, if that data is financial in nature, includes credit card information or has customer details, you may also have a SOX, PCI or personally identifiable information (PII)-related compliance issue to address.

Intellectual works

Plenty of file data is never stored in a database or application, but goes straight from the mind of knowledge workers into a file. Software source code is an obvious example, as are legal documents, product roadmaps and strategic planning documents. These files often contain intellectual property and a wealth of information and rich detail about market opportunities, partnerships, business operations, future plans and strategic advantage. Sharing this information on file servers and network attached storage devices can be critical for mobilizing your company and uniting distributed project teams, but it's just as critical to ensure that the data is protected from intentional or even inadvertent harm.

Also see 'Offsite meeting security: test your convergence IQ'

Application communication and storage

When applications need to communicate with each other, but don't speak a common language, using intermediate files on a shared file system can serve as a form of enterprise application integration. For example, a bank with a legacy application running on a mainframe, and another banking application running on Microsoft servers, can use files on a shared file server or NAS device to exchange information between the disparate systems. While only the applications should have access those shared files, it's highly likely that the file servers or NAS devices where the files are stored are accessible by many users. So, care has to be taken to safeguard access and prevent sensitive data from being compromised.

An even more basic, and more common, use of shared file systems by applications is when applications simply store their output or intermediate results in files. Business applications can generate a lot of file data, and once this application-generated file data exists on shared storage, it needs to be protected against excessive access.

Digital media

No, we're not talking about employees who store their movies and music on your enterprise file servers. Instead, think: digital recordings of calls to your customer service representatives and telesales team, video from security cameras, and even training and education materials such as podcasts and videos. Media files can be large, and when they are generated through ongoing business operations--like contact center recordings and surveillance videos--there can be a lot of them. If, for example, your business is processing pharmacy refills or purchases made with credit cards, your media files are governed by regulations such as HIPAA and PCI, and need to be protected. Similarly, you will want to make sure only those with a business need-to-know can access your surveillance video.

Informal business processes

Files are sometimes just more practical, functional or convenient that formal systems. For example, despite the widespread deployment of customer contact center software, your customer representatives may keep documents or spreadsheets to track "hot" cases, details that don't fit in standard forms, or other information they want to have readily at-hand. These types of informal business process files are often stored on shared file systems to so that teams can communicate across work shifts and geographies. While these files facilitate more efficient business, they can expose sensitive or regulated data to too many users, depending on the nature of your business.


Valuable file data on shared file systems is plentiful in most organizations and comes from a number of sources, including applications, databases, knowledge workers, digital media and ad-hoc business processes. It's got obvious value to your business, and regulators and auditors recognize its value too. Unfortunately, if you're harboring any malicious insiders, they're also coveting this data. That's reason enough for you to spend some time getting a handle on who has access to your file data, which users are actually using it, who owns it, and how to ensure that access is based on a business need-to-know.

Raphael Reich is Director of Product Marketing at Imperva. Prior to joining Imperva, he held senior positions in product management and product marketing at Varonis, Cisco, Check Point, Echelon and Network General. Additionally, Reich was a software engineer at Digital Equipment Corporation. He has over twenty years of business experience and holds a bachelors degree in computer science from UC Santa Cruz and an MBA from UCLA.

Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags softwaredata protectionapplicationsfilesImpervaunstructuredraphael reich

More about Check Point Software TechnologiesCiscoCRADigital Equipment CorporationEchelonetworkGoldmanIDC AustraliaImpervaInternal Revenue ServiceMicrosoftNASNetwork GeneralSalesforce.comUCLAVaronis

Show Comments