“Three months before the quake, our disaster recovery facility was in bad shape. We had allocated the funds to upgrade it, but our decision to outsource our datacentre had put this on hold. We felt that if a disaster were to happen, such as a loss of a big server, then we would be able get critical systems back up and running within a month — it was an acceptable risk. We didn’t anticipate the disaster would be an earthquake,” says Till.
In hindsight, he acknowledges some of the arguments made to accept that risk would be different today — not so much about whether they would be prepared to take the risk again, but more about what they would consider to be a ‘critical system’.
He says, “We didn’t consider that the building consent systems would be critical, as we were thinking more along the lines of developers building subdivisions, or people wanting to build a home extension. But after the quake, building consent officers were out there 24x7 inspecting buildings for damage, and that was a factor that no-one considered. It is one of the huge lessons we have learned.”
“Our biggest challenge was the provisioning of laptops and PCs. We didn’t have enough equipment for that scale of emergency, so we had to steal equipment from desks to provide essential IT services for civil defence and council staff. Re-associating the equipment afterwards was difficult. We plan to run a higher level of revolving stock of laptops to be better prepared next time.”
Another lesson learned was the importance of having a good IT and business continuity plan, with more structure around what and how they would provision IT in a disaster to avoid having to make decisions on the fly.
“We coped very well, and we got a lot of things right, but it could have been much easier. We’ve learned that our business continuity plan should be appropriate to the level of services that is at risk. City critical functions such as water and waste are vital to the health and well being of a city, as is street lighting.
Back office functions are also important, such as the Facility team that needed to identify the council buildings that were operational and to what level, and what other council and non-council facilities could be utilised.”
As a consequence of this disaster the council has a better understanding of what they need as an organisation and an IT department, along with the level of redundancy and resilience required.
“We are going to build a new civil defence emergency management centre, and a lot of things that were considered ‘gold plating’ are now deemed essential, such as the quality of cooling in the server room versus the redundant power supplies,” says Till.
“Deciding how much to invest in IT and data protection comes down to risk. The impact of an earthquake may be high, but what is the probability? You’ve got to be prepared to take a balanced view of what you are protecting yourself against.” Shelley Grell
Staying on track
Access Homehealth, a New Zealand-owned company that cares for elderly and disabled people in their homes, came through the Christchurch earthquake in September comparatively smoothly from the IT and telephony point of view, thanks to effective workload management and data replication across a WAN.
The company’s building in Christchurch was partly demolished in the earthquake, but disaster recovery procedures for both data and voice networks allowed the company to continue providing services to its clientele.
“Basically, one side of the building fell off,” says CIO Philip Hendry, on the Christchurch office based in a converted Victorian-era warehouse in Sydenham, not far outside the central city.
“However, Christchurch, like most of our offices, has quite minimal gear,” he says. “We run a private WAN and most of what those offices do is delivered via Terminal Services over the WAN.
“We have a datacentre in Wellington. In Tauranga there is a processing centre that has a datacentre function, and in Ashburton the same thing.” Those three sites and the links between them are the main backbone of the network. “At any one time one of those sites has all the data,” Hendry says. “It makes the system very resilient.
“What happened in the Christchurch quake was that the satellite offices just picked up the load automatically.”
The telephone network — the chief way the company’s caregivers and clients keep in touch with one another and organise schedules – was upgraded and integrated three years ago to a Shoretel unified communications system, which also funnels its traffic through the WAN.
“We have two primary call centres, Palmerston North for the North Island and Christchurch for the South. Hanging off those are smaller offices in, for example, Nelson, Timaru and Invercargill. During the weekend only the two main centres are running.”
When the quake hit, the system detected that calls weren’t being answered in Christchurch and they were automatically diverted to Palmerston North.
“Obviously we had to do something about staffing, but the staff there held the fort till we could get a few more bums on seats,” says Hendry. By about 7:00, two-and-a-half hours after the quake hit, “every client and staff member in the Christchurch area was being called out of Palmerston North.”
Before the UC system was introduced, each branch used to operate independently and was staffed appropriately. Outside peak hours in particular, small centres were overstaffed for the workload.
“A small site that gets 60 to 70 calls a day doesn’t necessarily need two full-time staff; but you can’t hire three-quarters of a person,” says Hendry. “With the phone system all people who provide call centre services work essentially on the same team. So the first thing we’ve done is to centralise all the after-hours work back to Christchurch and Palmerston North. Over a year, that’s saved hundreds of thousands of dollars.”
The replication philosophy extends to laptops in the field with the nurses. “Our visiting coordinators [nurses] will go out to assess a client’s needs in the field with a laptop that has a subset of the database,” Hendry says. A coordinator can electronically do the assessment very efficiently, even where not able to connect, in areas with bad mobile phone reception. The company has been looking at tablet computers to replace laptops. A device based on the open Android operating system is more likely to be chosen than the proprietary iPad, Hendry says.
One significant lesson from the disaster was to improve paper instructions on business continuity procedures, he says: “The people on the ground are under a lot of stress and sometimes they stop thinking straight. It is really important to have clear contingency plans written in a book.” Stephen Bell
To comment on this article, please email the editor.
Follow CIO on
Sign up to receive CIO newsletters.
Click here to subscribe to CIO.
Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.