The financial sector may be one of the more cautious industries when it comes to adopting the cloud. But for HSBC the ability to analyse large volumes of information and access machine learning tools via APIs has served as a catalyst for its own cloud ambitions.
Banks are, by their nature, data-intensive organisations, and HSBC -- one of the world's largest banks, with 37 million customers and billions of dollars in assets -- is certainly no exception.
The 150-year old lender has around 100 petabytes of information across its organisation, and that figure is growing fast, too, as customers change the way they bank, favouring digital interactions over traditional methods.
"We have more and more demands to do more with data, every day," says HSBC chief architect, David Knott. "And we also have more coming our way as well. Banks are data organisations at heart and we have a lot of data to manage, and increasing demand for deriving insight and value from that data."
While the growth of data has its benefits, allowing the bank to understand its business and customers better than ever, there are also challenges in managing and gaining usable insight from the vast amount of information. Machine learning tools will play a key role in achieving this for HSBC, says Knott, and involves adopting a "cloud-first" approach for its analytics requirements.
"As we have seen, the power of machine learning and the ease of consumption through machine learning APIs, we realise that is going to be a huge part of our future," says Knott.
"We also realise that is something we would really struggle to service by trying to do it all on-premise," he continues. "We didn't have the native machine learning capability, we are not going to build a ground-up machine learning engineering capability - there are only so many people in the world that can do that. But people like Google have made it easy to consume from the cloud, so we can go there and that is going to make a huge difference to us."
Cloud proof of concepts
HSBC is now on its way to running data analytics and machine learning in the cloud, having completed a set of five proof of concept (PoC) projects in partnership with Google. CIO Darryl West revealed the pilot projects earlier this year at Google Cloud Next, with work centring on areas including anti-money laundering and risk simulations. West said that it will help the bank become a "simpler, better and faster organisation" and respond more quickly to customer demands.
Reporting in to West, David Knott is responsible for technology design choices across the organisation, leading a team of chief architects across different lines of business. He says the bank is now in the process of moving terabytes of analytics data into Google Cloud.
"We did these PoCs very quickly, they were successful," Knott says, adding that the bank has a number of other projects lined up once the initial projects are fully up and running. "We are literally a few weeks away from going live with the five first set of use cases and then hot on the back of that we will say to all the other hundred people or so that have been waiting: 'you can now start deploying'."
From Hadoop to BigQuery and CloudML
The bank had previously run all of its analytics on-premise over the years, progressing from SQL to traditional data warehouses, before investing in Hadoop around 2011. "We had built what most people had built; a set of big data and analytics capabilities using various parts of the Hadoop ecosystem." This involved a mixture of open source and commercial technologies "which we had selected and then integrated together to basically build ourselves data lakes and analytics clusters and all that kind of stuff".
However, the Hadoop systems had limitations, such as scalability and flexibility.
"We got some value out of that but to be honest we found it hard to keep on top of, just hard to build skills at the pace required to integrate new technologies," Knott says.
"No matter how hard we ran there is always something new coming in that we wanted to get access to, but we couldn't get there quite fast enough to have really finished deploying what we were deploying previously.
"So it was hard to manage, hard to keep on top of, and also hard to scale. We had reasonable success but we were having these challenges."
The aim for the company was to access machine learning capabilities, but without the need to run the systems on-premise.
With regards to Google Cloud, HSBC is using a variety of tools. This includes BigTable and BigQuery for data analytics, Dataflow, PubSub for event handling, as well as a range of Google's machine learning APIs, including one for Data Loss Prevention.
"Around last year we started a conversation with all of the cloud providers to say 'show us what you have got'," Knott says, "and after some conversations we decided to work with Google on a series of PoCs, to answer three questions which were: if we bring some big data use cases to you, will they work, can we do the things we are trying to do? [Secondly] are they economic - can we do them at least the same price but hopefully a cheaper price than beforehand. Thirdly, is it easier, basically, which was really the big one."