Why machine learning could be the next frontier for data center operations

Artificial intelligence is expected to transform a wide range of industries, as simple tasks are automated and carried out by machines. The IT sector is no different, with machine learning algorithms increasingly being targeted at automating and improving data centre operations.

A notable example has been Google, which recently revealed that it is using its own DeepMind technology to manage power consumption at its huge server farms, reducing the amount of electricity needed by 40 percent.

There is also potential for AI technology to automate functions carried out by IT operations teams. Machine learning offers a way to manage infrastructure and react quickly to faults without human intervention.

Speaking at VMworld Europe in Barcelona, Dr Wolfgang Krips, executive VP, Global Operations and GM, Amadeus IT Group – a technology vendor which supplies services to the airline industry – said the company is currently trialling the use of IBM’s Watson artificial intelligence platform to monitor its data centre infrastructure.

“We are doing work together with the Watson guys at IBM to try to see whether we can use their technology in working on operations parameters, previewing incidents, overseeing coming to root causes faster, because operating at scale requires a totally different arsenal of firepower.”

Amadeus runs a data centre with around 12,000 servers to support its business operations. Its infrastructure environment is smaller in comparison to the likes of Google, but like any organisation running a data centre, it is facing ever greater demands from customers for new services, while ensuring little or no down time.

“The problem space you have to deal with is becoming so complex and you are getting millions and millions of events per day. There is no human anymore that can look at it,” Krips said.

“You need to deal with those things in a fully automated fashion, which means you need machines that can make decisions. That can make decisions on whether that computer now has to be shut down or whether to do something different.

“What we are currently working on is trying to figure out that at least the first remedial actions are automatically initiated and only if the machine doesn’t go any further it is calling out for help.

“That is where the trend is heading and that is why we are looking into all of these areas.”

VMware’s chief technology officer of the EMEA region, Joe Baguley, said that the vendor – which provides software to manage data centre infrastructure – is also investigating how machine learning can benefit its customers.

“Our CTO of cloud management Mike Wookey is doing a lot of research on machine learning and AI, specifically for automation and management of platforms,” he explained.

“As [data centres] reach scale and speeds we have not reached before, we get to the point where graphs of alerts aren’t actually useful for humans to interpret in a fast enough time to maintain 100 percent availability.

“We need to surface-up AI-based responses to the infrastructure and applications to be able to take action on them. Not ‘so-and-so subsystem x’s disk seven is failing’, it should be ‘this is the impact on your system now’.”

He added: “You will probably see the management toolsets having much more of a machine learning, AI focus over the next four of five years.”

Baguley said that the aim is to be able to predict when problems in infrastructure may arise. This bears resemblance to the demand-forecasting that online retailer Amazon applies.

“It is the kind of thing Amazon are doing with their supply chain. Sometimes you order things that are delivered the same day instead of the next day is that they know that in that area people tend to order that kind of thing around that time, so they ship stuff out. Why are we not doing that in data centres? Why are we not pre-shipping stuff out and spotting trends?

Read next: Cloud and data centre predictions 2016: Containers, devops and IoT

The interest in machine learning is the next step to automate technology infrastructure.

Amadeus’ Krips added that a reduction in the need for manual data centre operations will mean a shift in the role of IT operations staff.

“In the future my department or business unit which is actively involved in day to day transactions, service configuring, will not do so in future,” he said.

“These guys will become automation engineers. Like in a automotive plant, the workers go away from the conveyor belt and they start programming the robots.

“If you want to go to these levels of stability and agility you have to change the whole way how you deliver the services. That is the big transformation that is happening.”