IoT malware analysis at VARIOT
This blog entry explains the system used to study a large-scale IoT malware ecosystem that integrates attacker-related features with binary analysis information aiming to fill the gap in IoT ecosystem understanding. Malware binaries and scripts have been monitored in a controlled environment to gather execution behavior while tracking every third-party connection in a transparent manner. Thereby, we will be able to find which type of external agents (third-party connection) are involved, which is their functionality, how are they connected with each other and with malware families and samples, and what is the behavior of these external agents.
General overview of the System
The developed system is a framework that aims to provide a global large-scale longitudinal IoT malware landscape study that takes into account the attacker-related information by incorporating all the involved agents into the analysis and, therefore, build a more comprehensive dynamics-aware framework. This framework will collect all the information about the external agents involved in the malware samples.
The design and implementation of the analysis pipeline were built by a trial-and-error incremental approach. Each analysis task was implemented as an independent module, which was then integrated into the overall framework responsible for the distribution of the job execution among multiple parallel workers.
All the tools and techniques used to implement the framework will be explained briefly.
Malware dataset
To start up the whole system, a fresh malware dataset is necessary. For this purpose, the Shadowserver honeypot has been used. With this hook, all kinds of IoT malware will be collected every day and these will be the first input of the analysis system.
File & Metadata analysis
The first step in the analysis pipeline is to retrieve metadata from the files. Indeed, certain fields contained in the ELF files are required for the execution environment, such as the architecture for which the file is compiled. We also retrieve additional information about the files, such as the file size, the file name, the file type etc.
As the final part of this module, we also extract the reports of the antivirus (AV) labels of each malware sample from VirusTotal and VirusShare. If the malware is labeled from any antivirus in one of these services, we will feed them to the AVClass2 tool to obtain a normalized name for the malware family. AVClass2 is a state-of-the-art tool that can be used to normalize, remove generic tokens, and detect aliases among a set of AV labels assigned to a malware sample. Therefore, whenever it is a list of AV labels, it means that there was a general consensus among different antivirus on the class (family) the malware belongs to.
Execution environment (LiSa++)
All executable files collected in the ShadowServer honeypot will be analyzed statically and dynamically in a secure execution environment. The development of the executable runtime environment has been based on LiSa, which is an open-source project providing automated Linux malware analysis on various CPU architectures (x86_64, i386, ARM, MIPS and AARCH64). New features have been added such as new execution architectures (M68K, SPARC, SH4 and PowerPC), the ability to run bash and perl scripts, variable execution times depending on the behavior of the malware and a transparent proxy to filter attacks to third parties and prevent the spread of malware. The system emulation with QEMU, the static analysis with radare2 and the dynamic analysis with SystemTap has not been modified from LiSa. In addition, the network traffic generated by the malware will be extracted in a PCAP file in order to analyze and extract all connections with third parties.
Network Analysis Environment
Using the PCAP file generated in the runtime environment will extract all the connections with third parties, the messages sent between them, the ports and protocols used, etc. The agents extracted from the file will be analyzed geographically, categorically, topologically and whether they are on a blacklist or not. And if an agent is a web server, it will be analyzed more deeply using a web crawler.
Monitoring system
Consequently, a monitoring module has been developed to monitor the agents extracted in the network analysis. Every 3 hours, the agents will be monitored, checking if they are still working, if they have changed geographically or if they have changed the domain name or not.
Results & Findings
During two months (2022-03-28 / 2022-05-30) 2,925 executable malware files (binaries and scripts) and 453 non-executable malware files were collected. This table shows the distribution of the architectures found during these two months. Almost half of the examples found are MIPS and ARM, and the other half are PowerPC, M68K, x86, SH, SPARC and different scripts.
Regarding the families encountered, 3 large families stand out: Mirai, Gafgyt and Tsunami. Their behavior is similar and they are divided into two different modules.
- Propagation module. The malware tries to create a botnet as large as possible and therefore uses different scanning and exploitation techniques in order to infect as many devices as possible.
- Attack module. The botnet performs a Distributed Denial-of-Service (DDoS) attack using different techniques such as SYN, UDP, HTTP floods.
Five different types of agents related to the malware have been found: malware distributors, command and controls (C&C), DNS agents, certificate agents and simple communication agents. Of these five, those with malicious behavior have been analyzed in more depth: malware distributor and command and control.
Malware distributor
Malware campaigns usually have servers that distribute the malware payload. These servers have been called malware distributors. Using a communication service (94% of them use HTTP) they distribute the same malware that is compiled for various IoT architectures.
Here the distribution of the agents and binaries, being red the binaries and black the agents.
221 HTTP malware distributors and 7 using the HTTPS protocol have been found. Each agent of this type is related on average to 12.5 binaries. This is because the malware distributor delivers the same malware for different architectures (it can be seen in this image that it distributes the same malware for 15 different architectures).
Command and Control
These agents are a server that sends commands to infected devices and can perform different malicious behaviors. The protocols that have been found to send commands are BitTorrent DHT which is a P2P protocol, IRC and HTTP. In total 678 such agents have been found and each C&C controls 3.33 binary malware on average.
Here is a list of commands from a C&C of a Tsunami binary sample.
It is also interesting to note that 30,3% of C&C agents behave as malware distributors at the same time. This means that most of the agents have one functionality but there is a smaller percentage that have several functionalities.
Geographical Analysis
In the case of C&Cs and malware distributors, the country with the largest presence is the United States. In fact, this country accounts for one-third of all C&C agents and malware distributors. The countries that appear most frequently are the same in both cases: United States, Netherlands and Germany. The percentage of the three countries being more than 60%. With this data we can assume that the different malware campaigns distribute their agents in the same countries.