Last post, we looked at Locky for the first time and attempted to unpack the main payload for analysis. This time, we will go in to some details on Locky behavior. I will not reverse all code path, but rather only give a summary of the main functionality of the routine. If you have time, please dive in, as I think the experience is very rewarding.
3. Config stuffs
Even after
unpacking, the executable in memory is still not as easy to read as
we would like. First, let’s open the dumped PE file in IDA. IDA
probably will complains about not being able to resolves some
addresses. That is fine. We may not be able to extract an executable
that can run properly, but we can surely analyze Locky behavior with
it.
Press Ctrl-E and
go to “start” as the main entry. This should be the same function
used for CreateThread earlier. If your IDA shows EBP-XXX instead of
EBP+var_2C, for example, that means IDA does not recognize this
function is using EBP based frame. Click on the name of “start”,
and then press Alt-P, and check BP-based frame on the right panel.
IDA will start to analyze and create all kinds of local variables and
arguments.
At the “start”
function, you should notice GetModuleHandle, and a few VirtualAlloc that
looks suspicious. This is where the main configuration get decoded from
inside the PE file. First, at 0x405174, eax is assigned edi, which is
assigned the output of GetModuleHandleA from the call at 0x40515D.
This is the very beginning of the image. Then, the malware starts to
search for memory regions with the following property. If [eax] XOR
0x88BBDD8D == [eax + 4] AND [eax] XOR [0xDDBCA2B2] == [eax + 8] then
EAX points at the beginning of the configuration block. We can derive that the
configuration block is defined as:
typedef struct _configuration_block {
DWORD dwMarker0;
DWORD dwMarker1; // dwMarker1= dwMarker0 XOR 0x88BBDD8D
DWORD dwMarker2; // dwMarker2 = dwMarker0 XOR 0xDDBCA2B2
DWORD dwEncodedKey; //dwEncodedKey = dwKey XOR dwMarker0
DWORD dwEncodedSize; //dwEncodedSize^dwMarker0= size
VOID* pEncodedConfig
} ConfigBlock;
The encoded
configuration block will be copied to a new memory region allocated
at 0x4051E3. Then, at 0x405231, the same configuration block will be
decoded into a new memory region allocated at 0x40520E. We can easily
walk through this block of code to re-implement the search and decode
part to grab the config out of any Locky sample. This part of code
seems pretty consistent against a few samples of Locky that I have
locked into.
Search and decode configuration |
4. Fancy stuffs
The next
interesting thing to look at is the function call at 0x40525F, which
calls 0x406634. 0x406634 starts with calling 0x4064F3. This one look
for the address of NtQueryVirtualMemory in ntdll.dll. Then, it
compare the first byte with 0xB8, and compare the next few bytes with
0x00. This piece of code is verifying that the beginning of
NtQueryVirtualMemory starts with a “mov eax” which, naturally,
it does. NtQueryVirtualMemory sets EAX up with the right function
number, before calling the dispatch SystemCall function to serve the
request. In this case, if you disassemble the NtQueryVirtualMemory,
the system call is 0x10B.
Going back to the
malware, we see it checks NtQueryVirtualMemory to make sure it has
not been changed. Then, we see a call to VirtualAlloc, a memcpy with
rep movsb, two VirtualProtect, and ultimately the magic 0xE990 being
stored at the beginning of NtQueryVirtualMemory at 0x40658F. 0xE990
is a relative jmp instruction. But, where does it jump to? It jumps
to whatever is stored at location 0x406588, which, if you trace all
the way back, is the memory region allocated at 0x40653C. This memory
is copied over with the data from 0x4147B0, which we now call the
Patch function.
In summary,
subroutine 0x4064F3 checks NtQueryVirtualMemory to make sure no one
plays with it. If no one patched the function, it will allocate a
memory region, copy over the code from subroutine 0x4147B0, then
patch NtQueryVirtualMemory to jmp to 0x4147B0. This function simply
change the memory type of the returned data. We will see the
significance of it soon.
5. Fancy stuffs 2
After the malware
patches NtQueryVirtualMemory, it again allocate another 0x3000 bytes.
This time, it copies over the entire image in memory to the new
location at 0x406676 using rep movsb. Then it calls 0x4065A2, which
looks like a lot of relocations is being fixed. Then Locky calls
0x406627 at 0x40668C. If you have been running things in a debugger
(and hopefully in a VM), and try to step over this function, you will
notice some strange behavior from your debugger. This function does
not seem to return. Jumping in, at 0x40662D, locky stores the address
of the return address into ECX. Then, it is overwritten with the
value in EAX, which is the difference between EAX and the value of
arg0. Taking a step back, arg0 is EBX, which is the address of this
image in memory, and EAX is the newly allocated region which Locky
allocated at 0x406664. Therefore, if the ret instruction executes, we
will inevitably jump to the same offset, in the new memory region,
and start to execute there. Locky also wipes the current image in
memory with 0 at 0x4066BA.
VirtualAlloc 0x3000 bytes, and copy the image over to the new memory region at 0x406676 |
This function returns to another memory region. Return address is patched at 0x40662A |
Now, if our OpSec
engineer is checking this machine image, they will start noticing
strange thing. The executable is running at a PRIVATE memory region
(since we allocated and return to it). That is definitely not a
normal behavior and would raise red flags. You know what would be
normal? If the new memory region is of type IMAGE, which indicates
the OS load the image at that region instead. That’s why Locky
patch NtQueryVirtualMemory earlier to make sure its behavior does not
stand out, too much.
6. Main stuffs
Locky moves on to
call 0x40B4E1. If you have done a lot of reversing, you will notice
that this function look somewhat familiar. There is __SEH_prolog, and
GetStartupInfoW. It calls HeapSetInformation, and validate MZ and PE
magic signature. It also call GetCommandLineA, and parse the command
line into argc, argv. It is, indeed, CRTStartMain which eventually
calls the main() function at 0x4B489.
A quick note on Locky and its string behavior. I believe Locky is using Visual Studio XString
class to handle all their strings. Once feature that you need to be
aware is that the XString has this prototype:
typedef struct __x_string { union { TCHAR* pszStr; TCHAR szStr[0x10]; }; DWORD len; } XString;
If len is greater
or equals to 0x10, the first four byte at offset 0x0 is a string
pointer. If the length is less than 0x10, the string content is embedded
inside the structure itself. The same thing applies to wchar strings,
where the length will be checked against 0x08 instead of 0x10.
Following locky
logic, at this point, looks pretty straight forward. At 0x44BC9, Locky
calculates a unique ID using the volume name of the infected system.
The ID is calculated at 0x46BD9. First, Locky gets the volume name.
It then looks for the values between ‘{‘ and ‘}’, inclusive.
Locky then calculate the MD5 of that string, and get the first 16 bytes as the ID for the infected system. This ID is used in various places
throughout, including communicating with C2 server and calculating
various random strings to store in the registry.
Locky then
calculates a random string to store under HKCU\Software\ .
This string is derived from the system ID calculated earlier. All
configurations will be stored under this key, including the public
key received from C2 server, the ransom text, as well as the main
flags indicating that the entire system has been encrypted. To
protect yourself against this version of Locky, simply setting this
flag to YES will cause locky to stop executing.
Locky then starts
to gather system information at 0x44D35 and request a public key from
C2 server. If you have PCAP of the communication to and from C2, you
can used the scripts provided to decode all traffic.
Now, you can look
at the details of Locky operation. The main encryption code for C2
communication is between 0x47D5D and 0x47DB5. Using XOR property, we
can derive the decryption code for it in the attached script in
client_encrypt and client_decrypt functions. The main decryption part
for data received from C2 is between 0x47FDB and 0x4802C. We can also
derive the encryption logic in the attached script under
server_encryption and server_decrypt.
All messages
between the sample and C2 use the following format:
[0x10 bytes of MD5 of plaintext][plain-text, variable length]
The entire
message is then encrypted using client_encrypt if it is from the
infected system, or server_encrypt if it comes from C2 server.
Knowing that, you can mock your own C2 server to play with the sample
as you see fit.
Locky then
enumerates all logical volumes, and create a worker thread for each
volume. The thread starts out searching for all interesting files,
with all the extensions included in the sample. The list of include
extensions is at 0x54224. This sample also skips windows specific
directories and files, which are listed at 0x5CD8. The thread then
add all the interesting files into a list, then go on to encrypt each
file in the list.
For each file, it
generates a random 0x10 bytes at 0x4256F, and encrypts the 0x10 bytes
using the C2 public key. The 0x10 bytes randomly generated is used as
a session key to encrypt each file using AES 128 or AES 192
algorithm. The thread also appends the following FileInfo structure
at the end of each file:
typedef _File_Info { DWORD magic0; // 0x8956FE93 BYTE SystemID[0x10]; BYTE SessionKey[0x100]; DWORD magic1; //0xD41BA12A CHAR szOriginalFileName[MAX_PATH]; _WIN32_FILE_ATTRIBUTE_DATA FileAttribute; } FileInfo;
For each of the
file, the malware also generates a new random name using the
following format:
[0x10 bytes SystemID][0x10 bytes random hex string].locky
The filename
generation happens between 0x422CD and 0x4240C. You can follow along
using the debugger to see how the names are generated and used. After
all the files are encrypted, the thread for each volume will send the
statistic back to C2 server.
The main thread
will wait for all worker threads to finish, before setting the
Desktop Wallpaper to the instruction text received from the C2
server. The text is customized based on the infected system default
language.
#!/bin/env python # htnhan aka khoai huynh[.]t[.]nhan[@]gmail_dot_com # implements most of locky crypto stuffs: # client_encrypt: Encrypts data coming from malware to C2 # calculate and prepend MD5 yourself please. # client_decrypt: Decrypts stuffs encrypted with client_encrypt # server_encrypt: Encrypts data coming from C2 to malware. # alculate and prepend MD5 yourself please. # server_decrypt: Decrypts stuffs encrypted with server_encrypt # gensystemid : Get volume name and generate SystemID from it # genregkey : Generate registry keys using SystemID to store # - Main config stuffs at HKCU\Software\<string0> # - C2 PUBLICKEYBLOB at HKCU\Software\<string0>\<string2> # - instructions text at HKCU\Software\<string0>\<string3> # - YES flag at HKCU\Software\<string0>\<string4> import sys import ctypes import hashlib K0 = 0xCD43EF19 K1 = 0xAFF49754 # rol, ror are stolen from somewhere on the internet.... # with some modification. # maybe https://gist.github.com/c633/a7a5cde5ce1b679d3c0a rol = lambda val, r_bits: \ (val << r_bits%32) & (2**32-1) | \ ((val & (2**32-1)) >> (32-(r_bits%32))) ror = lambda val, r_bits: \ ((val & (2**32-1)) >> r_bits%32) | \ (val << (32-(r_bits%32)) & (2**32-1)) def client_encrypt(idata): '''encryption part for client''' key = K0 plain = bytearray(idata) ctext = bytearray() for i, v in enumerate(plain): ctext.append(((ror(key,0x05) - rol(i 0x0D) & 0xFF) ^ v) & 0xFF) tmp = rol(v, (i & 0xFF) & 0x1F) + ror(key, 0x1) key = tmp ^ (ror(i, 0x17) + 0x53702f68) & 0xFFFFFFFF return ctext def client_decrypt(idata): '''This one decrypts things encrypted by the infected system''' key = K0 plain = bytearray(idata) ctext = bytearray() for i, v in enumerate(plain): n = ((ror(key, 0x05) - rol(i, 0x0D) & 0xFF) ^ v) & 0xFF ctext.append(n) tmp = rol(n, (i & 0xFF) & 0x1F) + ror(key, 0x1) key = tmp ^ (ror(i, 0x17) + 0x53702f68) & 0xFFFFFFFF return ctext def server_encrypt(idata): '''This one encrypt data on C2 before sending to Locky''' key = K1 ctext = bytearray(idata) ptext = bytearray() for i, v in enumerate(ctext): num = (v - i - rol(key, 0x03)) & 0xFF ptext.append(num) key = (key+ror(num,0x0B)^rol(key,0x05)^i-0x47CB0D2F)&0xFFFFFFFF return ptext def server_decrypt(idata): '''This one decrypts data received from C2.''' key = K1 ctext = bytearray(idata) ptext = bytearray() for i, v in enumerate(ctext): num = (v - i - rol(key, 0x03)) & 0xFF ptext.append(num) key = (key+ror(num,0x0B)^rol(key,0x05)^i-0x47CB0D2F)&0xFFFFFFFF return ptext def pprint(buf): for i, v in enumerate(buf): if i % 0x10 == 0: print '' print "%02X" % (v,), def shrd(dst, src, cnt): return (((src << 32) + dst) >> cnt) & 0xFFFFFFFF def shld(dst, src, cnt): out = ((src << 32) + dst) << cnt out |= (src >> 32-cnt) return out & 0xFFFFFFFF def myadd(a, b): out = a + b c = out > (2**32-1) return 0xFFFFFFFF & (out), c ROUND = 7 def mycrypt(h, l, idx): for i in xrange(ROUND): eax = shrd(l, h, 0x19) ^ (0xFFFFFFFF & (l << 7)) ecx = shld(h, l, 0x07) ^ (h >> 0x19) esi, c = myadd(rol(i, 7), eax) edi = (ecx+c) & 0xFFFFFFFF esi, c = myadd(esi, 0xFFFFFFFF & (idx< string 0x02: value name to store C2 publickeyblob 0x03: value name to store instructions text 0x04: value name to mark encryption finished System ID can be generated with gensystemid. ''' h, l = int(idstr[:8], 16), int(idstr[8:], 16) out = str() h, l = mycrypt(h, l, idx) size = 0x8 + (shrd(l, h, 0x5) & 0x7) for i in range(size): h, l = mycrypt(h, l, i) tmp = (l & 0xff) - 1 h, l = mycrypt(h, l, i) value = l & 0xff if tmp % 3 == 0: ascii_code = (value % 26) + ord('A') elif tmp % 3 == 1: ascii_code = (value % 26) + ord('a') else: ascii_code = (value % 10) + ord('0') out += chr(ascii_code) return out def getvolname(): kernel32 = ctypes.windll.kernel32 buf = ctypes.create_unicode_buffer(1024) kernel32.GetVolumeNameForVolumeMountPointW( ctypes.c_wchar_p("C:\\"), buf, ctypes.sizeof(buf) ) return buf.value def gensystemid(): vname=getvolname() print vname n1, n2 = vname.index('{'), vname.index('}') vname = vname[n1:n2+1] print vname m = hashlib.md5() m.update(vname) sid = m.hexdigest()[:0x10].upper() return sid if __name__ == '__main__': print 'Generating registry keys....' SID = gensystemid() for idx in [0, 2, 3, 4, 0xFFFFFFFB]: print '0x%08x - %s' % (idx, genregkey(SID, idx))