Code red

In a CIO roundtable sponsored by Gen-i, ICT leaders share their insights on ensuring business as usual for a range of scenarios and reveal how they coped in situations that tested their business continuity plans.

In the round:

  • Nigel Bailey, group operations Manager, Fairfax Media
  • Nicola Clements, manager, risk management office, Westpac
  • Fiona Colgan, senior manager, technology operations, ASB Bank -
  • Greg Nunes, sales manager, Gen-i
  • Chris Quin, CEO, Gen-i
  • Ofer Reshef, security and risk manager, Fonterra
  • Andy Shields, group IT nanager, Beca Group
  • Darren Wiseman, IT operations manager, Ports of Auckland
  • Karl Wright, chief information officer, Yellow Pages
  • Divina Paredes, editor, CIO New Zealand
  • Rob O'Neill, editor, Computerworld New Zealand

The CIO roundtable on business conti-nuity could not have occurred at a more apt time — the day the World Health Organisation raised the H1N1 alert to the highest level. Here is an edited transcript of the discussion:

The key challenges

Andy Shields: The key issue for us is having the ability and the time available for testing the plan and making sure it actually works … Other obvious challenges are how do you get the business to buy into it, how do you get the executive support, where do you find the budget for something which is essentially an insurance policy?

Ofer Reshef: The key challenges are in two areas. One is the linking of the BCP, the business side of planning, how to respond to all kind of events, to what the technology side is doing. Because many times people have loss and recovery plans, so they know how to get systems back up, but the business have their own plan on what to do – those plans would not be in sync. The other challenge is how to make sure that if there is an event that happens, it could take out the business operation at the same time as IT. Because if IT is in the same building as the business and the building is not available … then who would the business call?

Fiona Colgan: The expectation is we’re there [24x7]. We’re there online, we’re there at an ATM, we’re there in a branch … Because the expectation is so high, the plan needs to be so tight and testing needs to happen in real time, perpetually. We don’t have a week that goes by that we don’t have systems tested and multiple systems tested for their DR. It’s top of mind all the time.

Nicola Clements: You can have all of the scientific evidence to suggest that at some point in time Rangitoto will blow [up], and obviously you want to have contingency plans in place to make sure that you can deal with that. But we really should be focusing on the more plausible things like a power failure or a snow storm in Christchurch or flooding or burst water pipes.

Who is responsible?

Nigel Bailey: IT must not own BCP. We’ve heard the distinction between BCP and DR and by definition it’s business level, so IT can’t get itself in the situation where it’s responsible for BCP … IT generally understands probably more about the business as a whole, than any other department. But IT just has to be very, very strong and push it back to the business unit, so at a business level they have to be owners of their own BCP.

Andy Shields: We push it up to the CEO level. It has got to be driven from the top.

Ofer Reshef: We have a group that deals with enterprises … One of the things they did [was] they provided a template for a business continuity plan. They go around the different business units and help them to develop their business continuity plan. The business units are responsible for their own business continuity planning. But you can’t expect every business unit to invent the wheel from scratch, so they actually provided a template which had quite a lot of different checklists for all the different things that you need to consider; logistics, moving people, reporting, all the stuff that you might need. But you need to know who the people are that need to be involved in managing, and who have the authority to do stuff.

A pragmatic approach

Andy Shields: A plan can be big or it can be small. People tend to focus on plans sometimes as world changing events, where you’ve got to have a plan that copes with the obliteration of Auckland or the obliteration of a city or something. It’s more likely that you’re going to find things like pandemic events or the power outages and various other infrastructure issues.

Karl Wright: You’ve got to be really, really clear about what level of risk you are insuring against, and what is the minimal service you will provide in response to that risk. Because, the reality is, I don’t think we can actually afford to do business as usual in a BCP situation.

Nicola Clements: Our focus is to just make business continuity part of our business as usual process. So rather than saying we need to have a plan for this and we need to have a plan for that, you build in your operations such that you have that resiliency and that redundancy across different sites. We have multiple locations for call centres for example, so that if you have a regional-wide event, you’ve got an alternate site that you can work from.

Chris Quin: You can end up finding 20 or 30 events that can cause you to have a comparable risk. But they come down to a couple of basic things. Either your systems are available or your people are available to work where the systems are available too … Your plan really needs to address how would I get my systems available, or how would I get my systems to my people should they not be able to work where they’re currently available?

Work with other business units

Nigel Bailey: Different business units will have different drivers, so it’s very important at the executive level that we understand what those drivers are and have a really mature conversation around that cost versus reality discussion.

Fiona Colgan: If the business units don’t know what they need, we can’t deliver it. So our business units have done a hell of a lot of good [work] working out what they do. How many people do you need and what are they doing? Will they be accessing cash systems from home? No they won’t, so actually the business problem is not, can we provide remote access to their home, because we can. It is, ‘do you want us to’? Often it’s not about what can technology deliver, it’s ‘what do you really want us to deliver’?

Nicola Clements: It’s the engagement with everyone across the business to clearly define or determine what the business impact assessment is. It’s understanding every single arm of the business. There’s no point having one person going, ‘Well I’m going to write up a plan to do X, Y and Z’ if they haven’t gone out there and talked to all of those different business parties … And then it’s the end to end view of your organisation, so you really understand your business requirements, and then go from a time perspective. What are we capable of recovering within a certain period of time and can we meet the business’s expectations? So it’s that alignment between BCP and DR.

Andy Shields: When running a global operation, and whether it’s in New Zealand or wherever it is, try and educate all your business units to understand the pain. Understand that when this happens, when something happens, here’s how it’s going to affect you. Here’s what it’s going to mean to you. It’s not someone else’s problem. You have to share the problem as a company. It’s a global operation and you can’t just say, ‘that’s a New Zealand problem if the New Zealand network goes down, we’ll be okay’, because you won’t. You have to participate in the business continuity plan.

Involve your external partners

Ofer Reshef: How do you know that the outsourcer actually has a real business continuity plan? True, they all tell you that they have it, signed in blood and all that sort of stuff … Part of our role for internal IS, is to make sure the outsourcers are actually aligned with each other … A word of advice: You have to trust them but you need to verify, even if it is in the way of just getting to test if it works.

Karl Wright: [Testing the plan] keeps your suppliers on their toes, because they then have to go and [say] ‘what was the plan, get it out, whip it out, what are we going to do’? And that I think really does help the business case and support the continuity of the business case. And that’s particularly relevant now when people have got their scissors out and they’re chopping costs everywhere.

Andy Shields: It’s interesting, when you talk about being a supplier to a business continuity plan. We feel the pain a little bit as well … We do a lot of the infrastructure of cities around New Zealand. And so, obviously, when the balloon goes up and the plan needs to come into play, a lot of [the] business continuity plans that are out there say, ‘call Beca’.

A strong case for BCP

Darren Wiseman: We had a non-critical system go down for an hour. And at the end of that hour there were stevedores standing in the room cross-armed, giving me the glare. And five minutes later they were going ballistic, and a stevedore going ballistic is not a pretty sight … That non-critical system was email, and it was at that point I could say, this non-critical system is now absolutely crucial to the business.

What would happen is the instruction would come from the shipping companies [via email]. And they would generally come in an hour or so before the ship arrived and they had to be analysed and so on. And so if they didn’t get those, a ship would turn up and we didn’t know what to do with it, and ergo the business stops. Our virtualisation environment and our DR environment happened, because I had that story to tell in a language that the executive understood.

Chris Quin: We look across all of the clients we’re talking to about BCP at the moment and there are really things that are showing up. The first one is cost. It’s the insurance end. In times that are challenging for any business in terms of cost, then you’re going to make a trade-off all the time around cost.

And then secondly, [have] a relevant business case … The building up of benefits for the business case for BCP has actually got to go deeper than just ‘it’s the cheapest way of doing BCP’.

So if we weren’t able to publish a newspaper, if we weren’t able to publish a phone book, what would that be worth and how much of that are we prepared to bank against the risk?

Theory vis-à-vis practice

Karl Wright: The best thing you can do around a business continuity plan is to actually enact it for real. In a couple of instances we have had online go down for two or three days at a time, and it does do things. It sharpens your plan every time. Your plan gets better every time you execute it. It also keeps it top of mind of executives as being a real issue that you are insuring against, and in a perverse way it becomes a lot easier to demonstrate the value of having such a plan. No one ever wants to have to use it, but if you don’t use it, then no one sees the value. It becomes a red book on a shelf somewhere and you don’t do it.

Darren Wiseman: Every two years we do a simulation of an event and the last one, which was about 18 months ago, was that the main admin building had burnt down. There was potential loss of life and there were people onsite who were missing, possibly burnt in the fire, we didn’t know. And we simulated the whole thing amongst all the line managers and executives. It was a [one] day simulation and it was great. IT ended up being maybe 10 percent of the effort. We were talking to politicians, local government, staff, keeping them informed and the exercise was fantastic, in that it showed us how we can support the organisation in this sort of event.

Nicola Clements: Don’t make it a once a year event; and if people are kept aware of the policies and the procedures and the tools and everything, and it’s built into your every day business, [that] makes it part of the culture. And the way that you design your operations, not to have single points of failure, and to encourage cross-skilling and that resilience with your team, so that you take out the risk and integrate it as part of your operational risk management.

Practise, practise, practise. Use scenarios, tell the stories and practise it. Once you’ve practised it, then all those lessons that you’ve learnt from the practise, you need to feed that back in.

Chris Quin: It will be the smallest things that break down. Those things will pop up when you practise a plan, because generally when you plan BCP, you’ll get good advice. You’ll have good people who will deal with the big things, and then you practise them in operation. The last one we did in our building in Wyndham Street, we practised the whole, so you had to empty the floors and move people to another site, where we can run our service desk from. We discovered little issues like the stairwells had no lighting once the building power goes out, so suddenly you’re physically finding it difficult to move people down the building.

One of the classics we discovered, with power outages, [is that] most people use mobile devices and after four or five hours of talk time, how are you going to charge that device? Unless you run through it end to end, you’re not going to find those things. They are the best way of engaging managers, because you force them to be part of it rather than read the book.

Smart moves during tough times

Karl Wright: One of the big disconnects around BCP to me is that the business unit owners have an expectation that we can afford to continue business as usual in an emergency situation. That may not be possible and/or desirable. So, we adopt a policy, almost like sort of the accident and emergency triage policy, where you start to treat the most important things first quickly and make sure that the blood stays in the core of the body. And if you have to cut off arms and legs, well we’ll do it. And if that means we have to shut down an office or whatever it is, then we’ll just shut down an office. But we will start with the things that are most visible and hurt us the most first, and keep them up and running.

Depending on how long the problem is that you’ve got, you need to have a staged response, it has to be scalable. It can’t be a scenario, based one. It has to be based on a set of principles that you’ve all agreed, because when it does happen it’s going to be moving so fast. I don’t think as an executive team, you’re going to have a lot of time to be able to sit there and consult about whether you were doing version plan four or plan version eight. You’re going to have to make some instant decisions.

Chris Quin: The most important thing to do is to hang on to the customers you’ve got. So if you start that in your benefit case for any BCP planning and say, ‘what is it worth to us to invest in terms of time, money or resource, into ensuring we deliver service to the clients we have, which are the cash flow of this business’? We’ve seen a couple of organisations focus on that issue with good pragmatic success around doing the right things and making that happen. So that’s reminding people that retention of clients through just continuity of service and trust is more important than ever.

Nigel Bailey: As things change you need to adapt your BCPs accordingly. That’s definitely a challenge because there needs to be some effort around that and it’s not always front of mind. So it has to be built into the process. It has to be part of the process, not an end achievement.

Tap the technology tools

Andy Shields: I think everyone probably agrees, virtualisation has become a big part of business continuity and it certainly is for us. We’re also looking a bit further forward and saying how does cloud computing and these sorts of virtual technologies out there, how can they help us from a business fundamental standpoint?

The other aspects we’re also looking at is how we can enhance our networks globally, to have that you’re ‘always on’ perspective. You’re always on the network, you’re always on your corporate NT no matter where you are.

So I just want to know that if I’m on the internet I can get access to my applications, my financial applications, my timesheet systems, my customer sales channels and all those sort of things.

We’re doing a lot of work in this area to try and streamline how that might work, so that business continuity becomes again, what you said, business as usual. It just becomes part of your overall environment, as opposed to being something special that happens when a disaster happens. It’s just always there.

Ofer Reshef: The flip side is that the easier you make for the good guys to get in and do their stuff, you also could make it easier for the bad guys to get in.

Nicola Clements: Another thing that’s really critical from a wider technology perspective is communications. In case of an event or if you’re doing a practice [of the BCP plan], the ability to communicate with each other, all the different business areas, communicating with IT, communicating with each other is really important. And you have that assumption it’ll be okay, the cellphones will work, or that conference call number will work or whatever it might be.

So you kind of always have to have that in the back of your mind, that what if this conference call doesn’t work, what are the other options that we’ve got? Making sure you’ve got basic things like cellphone chargers [being] available, because if you’re running around on your Blackberry and you’re talking on the phone for six hours a day, well, that’s going to come unstuck.

Designing out the panic

Darren Wiseman: Throw away the manuals. They won’t get looked at. Design out the panic. And so everything that comes across the project steering committee [and] it’s not just IT projects. [We] design out the panic … Identify what the panic is going to be. That’s our mantra.

Nicola Clements: We identify the people that do the job now, but who else in your organisation can also do that job? Who might have done it two years ago but has now moved on and done something else, but their knowledge is still there? Who else have we got that’s maybe not in this team, but could be pulled in from somewhere else that also has that knowledge? And having that information available, you don’t have to think on the day, ‘Oh my goodness, who knows about Java or whatever it is’? You’ve already got that pre-determined, so it’s designing out the panic.

Fiona Colgan: Every member of the management team has to carry a white card, which has all the contacts and first five actions [to take]. It literally folds out to a sheet and it’s all the first numbers, all the critical processes, who’s responsible for what, who the backup is and contact numbers. And every quarter they get issued again and you fold them up again into your wallet card.

Andy Shields: A piece of advice when running a global operation, and whether it’s in New Zealand or wherever it is, try and educate all your business units to understand the pain. Understand that when this happens, when something happens, here’s how it’s going to affect you. Here’s what it’s going to mean to you. It’s not someone else’s problem. You have to share the problem as a company. It’s a global operation and you can’t just say that’s a New Zealand problem if the New Zealand network goes down, we’ll be okay, because you won’t. You have to participate in the business continuity plan, you have to participate from an investment perspective. Whether it’s investment of time, capital, whatever.

Greg Nunes: If you look at it holistically, ultimately the IT function is providing those IT functions to the business, [whether] it is business as usual or in a crisis situation.

Karl Wright: I think for me the key learning is at the end of the day, it [BCP] is insurance and you really do have to be really clear about how much are you going to buy, and where are you going to deploy it. The second thing is that in a world where it’s unlikely you can afford the full service, you really need to understand how you’re going to triage what money you’ve got and what plans you’ve got to areas as they happen dynamically.

Things can sneak up quite quickly. Just a pure change in the economy can make the focus change to something else. Which is why I think it’s probably more advantageous to have a good set of well understood principles about how you would react and which bits you would protect, as opposed to a prescriptive plan on ‘this is what I’m going to do in this scenario’. Because that scenario is unlikely to ever eventuate.

Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

