CIOs and the challenge of data intensive science
- 15 January, 2014 07:00
NZGL is a government funded initiative to provide genomics research services for the NZ research community, as well as the clinical and medical fields. It is based at the University of Otago, one of three collaborating universities within NZGL, together with Auckland and Massey universities.
One of its roles is to provide bioinformatics support, which is delivered within a private cloud and executed on a high performance computing cluster to genomic researchers based in the three universities as well as commercial and non-commercial clients across New Zealand.
“In a nutshell our challenge was to be able to transfer large files around the country and in a predictable time frame, and make use of the bandwidth that was there,” says Lindsay. “We are doing new stuff now, where these large files need much cleaner network pathways to be efficient.”
“Technology is getting faster, bigger, better and producing more data,” he notes, and the sequencing machines they support produce high volumes of data, up to four terabytes.
Lindsay says for a variety of reasons, there has been underinvestment in campus WAN gateways so trying to move massive sequencing data files through the enterprise-grade firewalls in place was not going to work.
All organisations have built their network where the lowest common denominator for security is the corporate environment, says Lindsay. “That is all fine in a corporate environment. Even if you are moving payroll file from A to B, there will be a few kilobytes, not a terabyte.”
CIOs can do a lot to assist researchers and make it far easier for them to move their data around and be able to do their science
Our challenge was we are not going to change the security policies of organisations and it would not be appropriate to do that because the arrangement they have are appropriate for their environments, he says.
NZGL asked Research and Education Advanced Network New Zealand (REANNZ), of which it is a member, to come up with a system that would create a high-capacity firewall bypass connecting NZGL to client sites while maintaining security and removing the impact of dropped packets in throttling back transmission rates.
“Traditional best-effort IP networks work well for most applications but the trend in science is to become ever more data-intensive,” says REANNZ CEO Steve Cotter.
“What we proposed to them was a concept that my team had developed when I was with the Lawrence Berkeley National Laboratory prior to coming to New Zealand.”
“We came up with a concept of the science DMZ which was using the same concept for a security DMZ where you move part of the network outside the firewall,” explains Cotter. “We were doing the same thing with the high performance infrastructure that they needed. So we created a data enclave outside the firewall that was still trusted and secure but provided a high-speed onramp to the backbone.
“Once it was on the backbone, it could move unfettered across the entire infrastructure to all their collaborators.”
“The hardest part was convincing the IT teams this is a secure way,” says Cotter. “CIOs are not going to get fired with a researcher saying ‘my data is moving slowly’. But they will get fired if they have a major compromise, a security breach.”
He says the ScienceDMZ was a “pragmatic approach” as it gave the researchers the performance they need without having to upgrade the entire campus infrastructure.
“It allows an institution to support high performance scientific applications by applying security policies appropriate to research data, not the complicated, performance sapping security measures required to protect business servers and desktop applications.”
REANNZ selected Juniper Networks EX3300 Ethernet Switch as the endpoint for NZGL’s zero loss firewall link. Cotter explains Juniper’s EX3300 switch is built to data centre operational standards, which are a more demanding specification than needed for access switches put in wiring closets. “Essentially, it is a cost-effective device that that can handle 10-Gigabit Ethernet connections at wire speed—without dropping packets.”
“One of the things I have found, a lot of organisations, parts of their networks are not constructed out of high quality equipment,” says Lindsay. “We were very adamant with REANNZ that we want Tier 1 provider equipment.”
“We did not state we required Juniper but they came up with a Juniper solution and we were really happy with that,” says Lindsay. “Part of the attraction of the science DMZ is being able to do things around how network traffic is directed and so we’ve also embraced the concept of software-defined networking (SDN) as something we will leverage in the future.”
Having a vendor with a strong SDN roadmap was important, says Lindsay.
CIOs are not going to get fired with a researcher saying ‘my data is moving slowly’. But they will get fired if they have a major compromise, a security breach
Lindsay says a partial science DMZ has been implemented, with genetic researchers at the University of Otago as its first end users, and is working with other research groups to get connected to the system in upcoming months. “We want to have these arrangements in place with all of the major customers of NZGL,” says Lindsay.
Cotter says NZ Genomics was the first organisation in New Zealand to give the Science DMZ concept much thought and apply it.
He says other organisations have asked REANNZ to look at their architecture to see if a similar system can be deployed. “It is definitely generating a lot of interest.
“We have to find ways to remove roadblocks to collaboration with our international partners,” he explains. Embracing data intensive science “is critical to our ability to be competitive as a nation”.
Phillip Lindsay reflects on how CIOs can help people working in big data science. “They can do a lot to assist researchers and make it far easier for them to move their data around and be able to do their science.
“The focus tends to be on the corporate environment and I think CIOs need to understand what the challenges are for researchers.
“Researchers are often left on their own trying to solve the technical issues. Many researchers are really happy to do that," he states, "but from an organisational point of view it is not very efficient if every researcher is solving the same problem.”
The science DMZ, for instance, was not just about loading sequencing data sets into NZGL’s infrastructure as quickly as possible. Although this was important, he says, the project also provides flexibility to the somewhat different things biological researchers work on.
“They don’t really want to spend hours configuring arcane bits of software, talking with their IT people to get exceptions to firewall policies, and all the other issues, which is what they’d otherwise have to do. It doesn’t engender a particularly good customer experience.”
Related:The scientist as CIO
Phillip Lindsay is one of a handful of New Zealand CIOs with a PhD. His doctorate, gained in 1983, unusually combined chemistry and computing, though his early computer work was in electronics development.
Follow CIO New Zealand on Twitter:@cio_nz