Asia's Source for Enterprise Network Knowledge

Sunday, June 25th, 2017


Beyond HTTPS and Blacklisting: Malicious URL Detection With Behavioural Clustering

The comforting green padlock we see beside the URL in every modern browser gives us the assurance that the sites we visit are secure. Over the years, as we hand over sensitive personal information to the web, we have been conditioned to feel more secure if the websites we visit have Secure Sockets Layer, or SSL, certification.

Today, many sites use SSL certification for various encryption protocols, increasing the roadblocks for cybercriminal activity, whether in performing man-in-the-middle attacks, or planting spyware when users are logging into internet banking or emails accounts. HTTP-based communications have become widely used by both consumers and companies today. As at June 2016, 45 percent of page loads on the Web used HTTPS. Yet, only 55.4 percent of Alexa’s list of top websites were found to be implementing SSL securely.

However, modern malwares have found ways to bypass HTTPS-based connections, and are utilizing it to receive command to start attack, communicate face-to-face and basically, receive updates from the rest of their team. This happens once a computer host has been compromised by sophisticated malware. The malware makes network connections to execute malicious activities such as communication with C&C servers, data exfiltration, and ads clicking for monetization.

Surge in malware, new websites and digitalisation

The number of malware types and its variants are steadily rising, with cybercriminals becoming more sophisticated in their methods. For instance, there was a 752 percent spike in the number of new ransomware families, fuelled by the availability of open source ransomware and ransomware as a service (RaaS).

This trend does not look likely to buck, with both emerging and developed economies in Asia-Pacific embracing digitalisation, creating an environment attractive to cyber attackers. The thriving business climate, support from governments to go digital and increasingly tech-savvy users make for a large group of potential victims in the eyes of cybercriminals.

As enterprises move towards a digital economy, they open themselves up to an unrelenting host of vulnerabilities. While technologies such as cloud and IoT devices have afforded more streamlined processes, they have also provided more avenues for attackers to exploit in more sophisticated ways.

In addition, the ease with which new sites can achieve SSL certification is increasing the possibility that harmful sites can slip through your browser’s defences before the URL is flagged and still appear secured with that comforting padlock.

Over the years, several solutions have surfaced to detect malicious HTTPS connections. Many are based on signature matching, requiring maintenance of a large URL reputation database, on top of frequent updates. Some are machine learning based systems that have been trained with known malware connection data.

However, the exponential rise of malware and their variants make detection a challenge, and simply using a blacklist will not be sufficient to keep up with, detect, and defend against quickly proliferating and evolving malware. This points to a need for an efficient modelling method that can detect both existing malware as well as its unknown descendants to plug the gap between signature updates.

Boosting Malicious URL detection with Clustering

With the constant development of cyber security threats and challenges that enterprises face, relying on SSL security alone or simply waiting for signature database update is no longer effective.

Malicious URL attacks within the same family often occur together and are distinct in some way from other combinations. Clustering is a mathematical way of categorising recurring combinations of given parameters (i.e. combinations of URL parameters that are strongly related in malicious behaviour type) into discrete groups. In this case, clustering can aggregate incidents based on known malware attacks and compute if an unknown attack is similar and strongly related to known ones.

While Portable Executable (PE) packing and binary obfuscation techniques are being adopted to evade old-school securities, they do not hide some common network traits that clustering identifies. Wholesale reuse of malware codes obtained in underground markets also make for behavioural clustering to possibly be an efficient malware detection tool. One such example is between Locky and Zepto, two ransomware types of the same strain, consequence, and payment page. Clustering helped to unveil similarities among malware samples, and to discover network-level relationships between the two malware families.

Clustering brings a dynamic edge to the scanning process as it can adapt independently, learning from past computations, to trawl for and identify hidden similarities between unknown malware variants and known malware samples. This may be especially effective considering the trend of code recycling, where common network traits in malware variants are not always hidden.


Francis Teo, South East Asia Regional Director at Hillstone Networks