It’s a much better system than relying on tape back-ups, says chief information officer Chris Brunton, since even off-site storage doesn’t get the data into the safe zone immediately. “If your data centre burnt to the ground at 7am [after the nightly back-up], your tapes will still be in the drives and you’ll actually lose up to 36 hours of data since your previous back-up,” he says. “A fibre-based SAN provides the means to replicate in real-time, so what’s in the data centre is backed up to the millisecond if things go up in smoke. It makes a huge difference.”
Given the importance of continuous availability of data in the financial services industry, technological changes in both back-up and communications technology have opened an important new chapter in the story of business continuity planning (BCP) and disaster recovery (DR) strategies.
Whereas it was once enough to have any sort of formal BCP strategy in place, auditors, business executives, shareholders and customers are now increasing scrutiny of those plans. Y2K was the first major driver for BCP revisitation, September 11 was another, and the increasing focus on corporate governance requirements is yet another.
In a recent analysis of the state of BCP, research house Meta Group predicted 80 per cent of IT groups would this year re-evaluate BCP plans to ensure they reflect business priorities and recovery windows.
“It’s becoming paramount,” says Brunton. “Even small businesses are questioning what sort of protection we have in place. The criticality around these sorts of things has increased exponentially.”
In many cases, the business continuity reality is being found severely wanting compared to what is expected – or believed – to be in place. One EMC survey of 274 American executives found only 14 per cent of business executives believed their business data was very vulnerable to loss in the event of a disaster; in comparison, 52 per cent of IT executives felt that to be the case.
Clearly, despite years of rhetoric about business continuity, something is still being lost in the communication between business and IT management. Whether business leaders are unduly confident in their IT managers’ abilities – or IT managers are simply more paranoid and planning for the worst – this gap underscores the ongoing need for greater alignment between reality and expectations.
Increasing external pressure on the business to provide comprehensive data back-up will drive an increasing number of companies to invest heavily in real-time data duplication strategies.
Dual data centres – once a luxury restricted to cashed-up, large enterprises – will become standard issue for BCP and DR strategies, with Meta Group expecting 40 per cent of Global 2000 companies will be running dual data centres by 2007.
This is distinct from conventional back-up strategies, which have generally been built around the off-site storage of back-up tapes. In many cases, changes to this strategy are being driven by allowable recovery time objectives (RTOs) – which have generally been set at 24 to 48 hours in the past – that are shrinking dramatically as businesses recognise even a half-hour of system downtime can be irretrievably damaging for a business.
The challenge: Shortening RTOs requires a significant investment and, potentially, a complete rethink of existing IT infrastructure. Nonetheless, overall increases in IT budgets mean now may well be the best time to begin the push towards real-time replication.
Increasing executive recognition of BCP’s importance has given many IT organisations the backing they need to implement solid DR capabilities with far more flexible budgets than was ever possible.
At Australia’s James Cook University (JCU) in Townsville, for example, a major server consolidation project has finally kicked off after several years in which tight budgets failed to provide enough funding for the complete overhaul required.
In JCU’s case, the current project came about after a Queensland Audit Office report questioned the robustness of its BCP processes, which included a tape-based back-up regime built around a number of distributed servers. This report provided a shot in the arm for IT budgets, with the technical group now implementing a 2Gbps fibre optic link between the JCU campus and a back-up data centre one kilometre away at Townsville Hospital.
In the new environment, data has been consolidated within a StorageTek BladeStore storage array that combines 4.2 terabytes of expensive Fibre Channel disk with 7.5TB of slower but much cheaper ATA-based disk. The Fibre Channel storage is used for production data, while the ATA disk provides a cost-effective back-up that provides for far faster recovery than the previous tape environment. Data is mirrored in real-time between the two sites.
Hosted data is currently focused on content such as student storage space and learning web pages, and high-performance computing requirements; corporate data is expected to move online later this year. But by consolidating its storage and providing built-in redundancy, the IT organisation is now in a position to persuade various university departments – which each manage their own storage through a variety of methods and technologies – to shift their storage on to the central array. This, in turn, would lower overall IT support costs and increase the cost-effectiveness of this latest BCP exercise.
And why would departments do this when they may already be happy with the current method? Lower cost is one factor, according to the university’s computer infrastructure manager, Blake Carney. Reliability is another.
BCP “was never funded to the extent that it is now, but management sees it as a priority now”, Carney explains. “It’s given us the ability to offer relatively cheap storage, based on Samba servers and LDAP groups, to schools and divisions within the university.”
The upgrade has also given the IT organisation new challenges, he adds: “Beyond the actual technology, it’s opened up a whole challenge organisationally. Complexity has really taken a quantum jump: Phasing, project planning, risk analysis and costings are quite complex. The scale of the environment now seems to infer a requirement to have much more rigorous processes to deal with it. You can’t do this in an ad hoc manner.”
The need for formalised review of BCP and DR processes has never been greater, and strong response rates suggest executives have come to accept the importance of effective continuity planning in meeting governance requirements.
Over time, pursuit of business continuity best practice will lead to the clear assignment of responsibilities to ensure changing requirements are identified and addressed.
Given the need for more rigorously defined and defended continuity processes, it’s an excellent time for a fundamental rethink of business continuity processes. This should be driven by the results of a complete business impact assessment (BIA), which provides an invaluable inventory of current business processes, allows the identification of vital functions, identifies dependencies that could compromise business continuity, and lends weight to BCP efforts by highlighting the potential impact from any disruption.
With the results of a BIA in hand, it should be easier than ever to get executives to entertain expensive continuity project proposals that would have previously been dismissed as overkill. Nonetheless, there is still a need to ensure the efficient usage of resources: Simply buying two of everything gets very expensive, very quickly – and that particularly includes data centres.
For this reason, sharing the cost of data centre upkeep is an attractive proposition, as compared to owning two purpose-built facilities. ADP solved this issue by outsourcing its data centre management to a third-party for whom such things are a major part of its business.
Improved availability of fibre optic cabling has become a significant facilitator for improved business continuity, since real-time data replication minimises recovery times in line with BCP objectives. Yet if that link is left dead until a disaster occurs, its cost is being wasted. At ADP, this issue has been addressed by using the back-up site as a development environment while the primary site handles everyday processing.
“I don’t like spending millions of dollars on equipment and having it sit there idle,” says ADP’s Brunton. “September 11 created the need to go to the next level around processing functionality, continuity and management of external elements. We’ve entered the next phase of business continuity, and telecommunications is the thing that has allowed these changes. It all comes back to the fact that, irrespective of the business operation, the criticality [of business continuity] has increased exponentially.”
For TNT Express in New Zealand, business continuity planning is an ongoing task, and a challenge. Chris Arthur, IT and strategic projects manager of the global delivery services provider, dares not quantify financial losses in case of systems failure.
“The target is no down time,” he tells MIS from his office near Auckland International Airport. “So much of our business is based on the customers’ perception of our reliability. What we actually miss by not operating half a day would be a tip of the iceberg compared to the business we would lose in the long term because we had a failure that we weren’t able to cope with.”
He points out TNT Worldwide has a strong commitment to business continuity programs all the way to the board. “We have a global steering committee that looks at our main computing centres and key hub sites around the planet.”
While the current system is already meeting the needs of the enterprise in terms of supportability for the users and disaster recovery, TNT is revisiting its business continuity strategy.
TNT is in the “early stages” of exploring the possibility of outsourcing the server and business recovery facilities to a third party. Arthur says there is a “fairly significant cost” in maintaining standby hardware sites, and the idea was to have a third party provider for that service. “We have the most resilient infrastructure internally, but we have to balance the cost of having standby facilities against the option to outsource the technology and share the cost with other companies with similar requirements.”
This move is part of the company’s constant review of its needs for BCP. “A lot of what we have, we built ourselves. That’s the Kiwi way,” says Arthur. Through the years, he says, the IT side of the business has grown from just doing
what the global computing centre prescribes, to running its own systems and doing what is right for TNT NZ in the context of TNT global. As they started adding elements to their infrastructure locally, they had to plan locally, make these reliable and ensure their continuity.
“It really comes down to having to understand our own infrastructure, plan accordingly to make sure it is there after whatever may come.”
TNT’s recent move to thin client technology came after such a review. “We have been in growth mode, technology wise, over the past five years,” says Arthur. When he started with the company in 1998, there were only two PCs with no network in between. Disks were carried across the office.
Through the years, additional capital was spent on desktops and servers and networking equipment. “And so, five years was really pushing the end of the life of the PCs. It was time to look at how we were going to replace those and what the platform we wanted going forward was.”
At the same time, the staff was using a number of Windows-based applications. “They ran extremely well when you were sitting in the same office as the server that they reside on and not so well when you were in the slower end of the data link in Wellington or Dunedin.”
When a computer went down in Wellington, the staff would courier the unit to the Auckland office for repair, and have it back after a day. “While all of our sites were able to run with one PC down, it’s not ideal. We were spending too much time rebuilding PCs when they fell over and while we’re doing that, someone is sitting without a PC.”
Going thin client
An evaluation of options prompted TNT to migrate to Wyse Winterm thin client technology. “It means all of your users who are connected to the terminal server have the same software, the same configuration, the same version.” This minimal support requirement was vital because, “While TNT has an awful lot of technology we don’t have technology experts throughout the business.”
“Simply having the different configuration options built into the thin client desktop device made it so much easier and made sure the disaster recovery is something that can happen without the technology team having to make it happen.”
Prior to the upgrade, the IT team had a “more limited ability” and needed more manual intervention during a system failure.
“Whereas now, because the configuration is in the terminals that are in all of the sites, if they are not able to connect to the terminal server here in Auckland then it goes back to the menu to choose another mainframe configuration to be connected to the UK. There is absolutely no reconfiguration required.”
Arthur explains most of TNT’s core systems are based in its computing centre in the UK with a full live secondary site 300 miles away. In New Zealand, the business critical applications are in the mainframe. “As long as we can provide the basic infrastructure to access that mainframe we are in pretty good shape. As long as we’ve got a site with terminal emulation and network connectivity we can continue business.”
Arthur reckons there will always be the aspect of ‘fire-fighting’ in IT management. But, he says, the terminal services thin client infrastructure has enabled him and his team to “get away from fixing PCs” and focus on other business continuity concerns.
His perspective of BCP, on the other hand, extends beyond the enterprise – to the customers themselves. Starting at TNT as a customer service person who was the “primary interface” with corporate clients, Arthur has first-hand experience of the potential impact of any breakdown in the core systems or delays due to non-compliance with international regulations.
The myriad cases he has handled included ensuring vouchers worth half a million dollars were delivered safely in another part of the world and an aircraft part was delivered on time to keep an international commercial flight in the air. A more recent incident was an urgent request to ship an incubator to a hospital in Japan. The hospital only had two incubators, and triplets were scheduled to be delivered the following day.
“Those things do happen,” he says. “You’ve got to remember we are not moving empty brown cardboard boxes. The lifeblood of the very company we deal with is in those boxes. They could be documents in a tender that would make the difference between a company continuing to exist or not; parts of a machine or a prototype that’s going to change the world we live in.”
Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.