Orson Welles’ 1941 classic Citizen Kane, regarded by many to be the best film of all time, features a party scene in which the hat worn by an extra disappears and reappears in subsequent shots. Films generally have storyboards, scripts and continuity managers. Businesses often aren’t so fortunate and a lack of business continuity planning (BCP) can be severe, even catastrophic. Perhaps the starkest example of this is Swedish mobile phone company Ericsson. It appeared to be a relatively minor 10-minute fire caused by lightning that in March 2000 struck a New Mexico electronic chip-making plant belonging to Dutch firm Philips. But in a chip-making environment required to be clinically clean and dust free, that fire was catastrophic. After failing to find an alternative supplier, Ericsson reported a loss of over $2 billion to its mobile phone division for the year and the company that had once been a market-leader was dismissed by the Economist as an “also-ran”.
Business continuity plans are also designed to help organisations protect themselves from the losses to infrastructure and resources caused by earthquakes, extreme weather, other natural disasters, pandemics and terrorism. It is important to discriminate between BCP and disaster recovery (DR). BCP is a plan that takes into account your resources, processes and technology in the event of downtime, a disaster or emergency; whereas DR is the underlying technology component determining how the technology falls-over.
For this special report, CIO sought BCP advice from the following CIOs, suppliers and IT experts:
• Ron Murray, business manager, ICT outsourcing, Gen-i
• David Reason, senior consultant, internal business continuity manager and business continuity management practice leader, Gen-i
• Ian Scott, IT Manager, NZ Fire Service
• Nats Subramanian, group IT manager, Click Clack
• Russell Turner, chief information officer, NZ MetService
1. Investigate the risks you can insure against
Risk mitigation should be part of your BCP plan; sometimes, insurance really is the best policy.
Following the 11 September 2001 attacks US companies were awarded $9.5 billion in insurance for interruptions to business, the Economist reported in 2005. Bruce Schneier, chief technology officer of BT Counterpane, predicts the insurance industry will eventually subsume the computer security industry. “Not that insurance companies will start marketing security products, but rather that the kind of firewall you use — along with the kind of authentication scheme, the kind of operating system and the kind of network monitoring scheme you use — will be strongly influenced by the constraints of insurance.”
The MetService’s forecasting capabilities are most in demand when the business is most vulnerable. CIO Russell Turner agrees it makes business sense to separate insurable from non-insurable risks and to focus continuity on those that are not insurable. “There’s no point in us insuring against the risk of a potential loss of life because of a storm grounding a ship,” says Turner. “But for a purely financially driven business, one not driven by the protection of life and property, it might be reasonable to affray complicated BCP setups and insure for the loss of business.”
Click Clack is among New Zealand’s top 50 manufacturing exporters. The company has established itself in the design and manufacture of airtight storage and beverage-ware. Group IT manager Nats Subramanian works on the principle that BCP itself is an insurance policy. “It may seem to attract unwarranted cost and effort, in addition to the day-to-day challenges and priorities,” he says. “However, an organisation must stand back and ask ‘what if’? The next key question to ask is, ‘How much risk we are willing to accept’?”
Ian Scott, New Zealand Fire Service’s IT manager, agrees that having a continuity plan with a DR site is an insurance policy; particularly in the event of a catastrophic failure of its national headquarters’ production environment.
David Reason is a senior consultant and Gen-i’s internal business continuity manager. He says organisations often fail to treat continuity management as an overarching business safeguard. “BCM has so many components to it: risk; security; business continuity planning; IT disaster recovery (ITDR) planning; business interruption; an insurance component; as well as crisis management planning. When I read reports, BCP and ITDR are often interchangeable, which is a confusion of terminology.”
2. Make BCP central to the organisational strategic plan
Don’t let business continuity languish in the pages of a plan the rest of the business will never read.
The New Zealand Fire Service Commission sets the direction of the Service’s IT and this is reflected in its strategic plan. The Fire Service information systems strategic plan (ISSP) has an initiative relating specifically to BCP: “Develop and maintain disaster recovery and business continuity processes for critical Fire Service systems and data.” Scott says one of the challenges for organisations is sustaining a commitment to the continuity plan to ensure it’s kept operational through regular testing. “Once the euphoria of getting the DR system up and running is over, complacency can set in,” he warns.
Ron Murray is business manager of ICT outsourcing at Gen-i. His focus is the company’s largest customers, organisations such as ANZ, Fonterra, Westpac, as well as a number of government departments. A BCP has to sit within the overall business strategic plan, and not be buried in the information systems strategic plan, says Murray. “We would recommend it sits at a very high level in the business plan, where it has the visibility of all the business owners and that they go through a robust business continuity management process that takes into account the four key areas: asset protection; business compliance and regulation; legal protection; and protecting yourself against financial loss.”
3. Make sure your CEO and executive committee grasp the significance of BCP
Capture the attention of the board with numbers that illustrate the effect of downtime, but keep it real.
As Reason of Gen-i says, “There’s nothing like disasters to get the attention of decision-makers.” Of course, it’s better to avoid the actual thing and grab their attention via a realistic test scenario.
Turner of the MetService says the best way to focus the executive on BCP is through risk analysis. “We went through that exercise of looking at it from a numerical point of view: what’s the probability of this disaster occurring and what would the financial losses be?” But don’t overestimate the dollar impact, he cautions. “Show what the probability of certain events are and focus on the most likely things, such as a chemical spill in the building that means you can’t work in it for 12 hours.”
Subramanian of Click Clack agrees the way to win executive buy-in is to approach them with figures showing downtime and how much money the organisation would lose. “Having presented the problem on a higher level, propose a couple of good options with a cost estimate, without flooding them with too many solutions,” he says.
Scott of the Fire Service points out that emotional blackmail also can be effective. “Paint the scenario of the Exchange server being down for an indefinite period. That will quickly convince everyone, particularly the executives, that they need a BCP.”
4. Make sure you have manual process contingencies
A lot of the information organisations need in a disaster are in electronic form and may not be accessible.
Nowadays, even internal telephone directories may be on PCs with no paper back-up, says Turner of the MetService, and the fact that such things are not often thought about makes many organisations all the more vulnerable in an emergency. “It’s a stark reminder that there are few things that operate in most offices and businesses, that don’t have some electronic infrastructure behind them.”
Murray of Gen-i says it’s worth weighing the cost of a fully redundant DR environment against reverting to manual processes where that would be relatively seamless. But in some cases, he acknowledges, organisations have no paper processes to fall back on. “I know of a business that was almost brought to its knees because its despatch depot had just one printer that broke down, and there were 20 or 30 trucks lined up. We very quickly implemented a manual workaround for that customer that at least keeps the trucks rolling.”
5. Assess risks to your assets and infrastructure
A disaster would affect not only your information systems; other assets might also be affected.
Turner says the MetService has consulted the New Zealand Fire Service to determine the probability of a fire in its headquarters, and discussed with seismic engineers the effect of an earthquake on the building. “We determined there was about a one-in-50-year chance of a fire of significant enough size that it would require evacuation of the building for more than 24 hours.” This would require MetService BCP processes to be invoked, because it cannot risk a 24-hour interruption to its forecasting operations.
Subramanian says over the past few years BCP has been moving to the top of the list of Click Clack’s priorities. The company’s main business system is SAP and it has integrated EDI customers from the USA through SAP Exchange Infrastructure and Informatica’s Complex Data Exchange. “If systems become unavailable to users, every minute costs us dollars,” says Subramanian.
Scott says a key pressure for the Fire Service is the centralised nature of its IT infrastructure. A server farm running Citrix with 440 network nodes and significant growth in applications used by fire-fighters — rosters, fire incidence reports and email — meant systems had become mission critical. “The impact on the business if significant outages occurred would compromise our ability to support the fire-fighters’ key operational role of ‘reducing the incidence and consequence of fire and to provide a professional response to other emergencies’.” After deciding a continuity plan was required, phase one of BCP ensured Exchange and Citrix would be available in the event of failure at Wellington national headquarters. Phase one was completed successfully in September 2004. Phase two was a fully operational DR site. This was completed in December 2005. “The system is simple, relatively easy to maintain and allows the DR site to be worked on and tested without impacting on the production site,” says Scott.
6. Address your legacy challenges
Virtualisation of IT resources, through a single physical device that appears to function as multiple logical devices, can also play a part in BCP.
Waikato District Health Board is one of New Zealand’s largest, with six sites across the country. When it centralised its patient information system, iSoft, the DHB was aiming to get away from each of its facilities having its own systems. This caused increased IT operating costs; difficulty in maintaining a DR strategy; as well as complexity in compliance with health sector data security regulations. WDHB virtualised applications across its facilities using Citrix’s Presentation Server. The virtualised system’s business continuity features mean WDHB’s employees can use any computing device over any connection to access their mission-critical applications and information in the case of unplanned interruption or disaster. If WDHB’s data centre is affected, the system allows it to quickly deliver all critical applications from a backup site.
Click Clack had been relying on tape backups as the only means of disaster recovery and continuity, while the company was undertaking major IT infrastructure upgrades company-wide. “Three years ago, almost all our systems had reached the end of their lifecycle,” says Subramanian. “We’ve not only upgraded them, but also future-proofed them for the next few years.”
The company is considering a third-party data centre and has begun identifying various potential disaster scenarios and consequences. “We expect to have a fully-fledged business continuity system in place before the end of 2008. BCP will ensure that at least a minimally functional system is available to key users in times of a crisis.”
Three years ago, backup tape stored offsite was also the only continuity plan the Fire Service had, says Scott. Today, its fully operational DR site supports 6500 users, including volunteer stations and 440 sites.
7. Measure the success of BCP through testing
Make your BCP exercises real, not “thought experiments”.
Willingness to simulate disasters requires confidence in your infrastructure and your capabilities. Your first metric should be whether your plan works, says Reason of Gen-i. “If the plan is well-written you’ll know, because plans have to be tested to see that the business can recover its critical business processes or IT services within the recovery time required by the business leaders.”
Turner of the MetService warns against plans based on less-likely occurrences. “People get carried away with regional disaster scenarios. You’re vastly more likely to face a security issue than you are any regional disaster. My advice would be do real testing, but choose scenarios that are likely to happen. You’ll be surprised how often they come up showing the dollars you need to spend to mitigate that risk.”
Subramanian is happy with the way Click Clack’s BCP journey is progressing so far, even though it is too early to quantify its success. “Once implemented, the metrics I would be considering are system availability, response time in times of crisis and the savings to the company.”
At the Fire Service, to ensure all DR systems and data are up to date and operational, daily checks are carried out. “Six-monthly live tests are run to ensure the whole country can connect and operate out of the DR site,” says Scott. “The live tests are carried out by a number of predefined and randomly selected users from around the country.”
Murray of Gen-i cautions it is vital to test, rather than only model, a scenario — untested plans are useless. “I’ve been in organisations where they’ve had a plan and documented their manual processes, but no one’s been trained on them, no one has ever tested them and therefore it’s just ‘shelfware’. They’re fundamentally flawed.”
Orson Welles, the director of Citizen Kane followed a shooting script to minimise continuity errors in his films. Mostly, he succeeded. When you write your continuity plan, not only must you ensure it isn’t lost like the film extra’s hat among 1000 other details, but also that you test it in a realistic environment. If you take your systems offline for 48 hours, you should consider the impact of publicising the outage’s duration among your staff. After all, if everyone knows essential and non-essential systems will be back online in two days, they won’t behave as they would in a real disaster.
Top 10 BCP challenges
Our interviewees say these business continuity planning challenges affect New Zealand organisations across all industries.
1. Ongoing commitment to ensure BCP is up to date, operational and regularly tested.
2. Winning the buy-in of the executive committee or board.
3. Lack of manual process documentation to support electronic processes.
4. Difficulty of realistically simulating disaster and downtime scenarios.
5. High cost of business continuity management offerings.
6. Legacy systems permeating the infrastructure and retarding disaster recovery plans.
7. Committing the money and resources to setting up new processes and systems.
8. Executing the business continuity plan that has been written and ensuring it is fit for purpose.
9. Dependence on key staff with skills to run systems and provide critical business functions.
10. Reluctance to maintain internal ownership of disaster recovery infrastructure.
© Fairfax Business Media
Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.