Sunday, April 24, 2016

Reversing locky -- The juicy parts

Last post, we looked at Locky for the first time and attempted to unpack the main payload for analysis. This time, we will go in to some details on Locky behavior. I will not reverse all code path, but rather only give a summary of the main functionality of the routine. If you have time, please dive in, as I think the experience is very rewarding.

3. Config stuffs

Even after unpacking, the executable in memory is still not as easy to read as we would like. First, let’s open the dumped PE file in IDA. IDA probably will complains about not being able to resolves some addresses. That is fine. We may not be able to extract an executable that can run properly, but we can surely analyze Locky behavior with it.
Press Ctrl-E and go to “start” as the main entry. This should be the same function used for CreateThread earlier. If your IDA shows EBP-XXX instead of EBP+var_2C, for example, that means IDA does not recognize this function is using EBP based frame. Click on the name of “start”, and then press Alt-P, and check BP-based frame on the right panel. IDA will start to analyze and create all kinds of local variables and arguments.
At the “start” function, you should notice GetModuleHandle, and a few VirtualAlloc that looks suspicious. This is where the main configuration get decoded from inside the PE file. First, at 0x405174, eax is assigned edi, which is assigned the output of GetModuleHandleA from the call at 0x40515D. This is the very beginning of the image. Then, the malware starts to search for memory regions with the following property. If [eax] XOR 0x88BBDD8D == [eax + 4] AND [eax] XOR [0xDDBCA2B2] == [eax + 8] then EAX points at the beginning of the configuration block. We can derive that the configuration block is defined as:
typedef struct _configuration_block {
    DWORD dwMarker0;
    DWORD dwMarker1; // dwMarker1= dwMarker0 XOR 0x88BBDD8D
    DWORD dwMarker2; // dwMarker2 = dwMarker0 XOR 0xDDBCA2B2
    DWORD dwEncodedKey; //dwEncodedKey = dwKey XOR dwMarker0
    DWORD dwEncodedSize; //dwEncodedSize^dwMarker0= size
    VOID* pEncodedConfig
} ConfigBlock;
The encoded configuration block will be copied to a new memory region allocated at 0x4051E3. Then, at 0x405231, the same configuration block will be decoded into a new memory region allocated at 0x40520E. We can easily walk through this block of code to re-implement the search and decode part to grab the config out of any Locky sample. This part of code seems pretty consistent against a few samples of Locky that I have locked into.
Search and decode configuration


4. Fancy stuffs

The next interesting thing to look at is the function call at 0x40525F, which calls 0x406634. 0x406634 starts with calling 0x4064F3. This one look for the address of NtQueryVirtualMemory in ntdll.dll. Then, it compare the first byte with 0xB8, and compare the next few bytes with 0x00. This piece of code is verifying that the beginning of NtQueryVirtualMemory starts with a “mov eax” which, naturally, it does. NtQueryVirtualMemory sets EAX up with the right function number, before calling the dispatch SystemCall function to serve the request. In this case, if you disassemble the NtQueryVirtualMemory, the system call is 0x10B.
Going back to the malware, we see it checks NtQueryVirtualMemory to make sure it has not been changed. Then, we see a call to VirtualAlloc, a memcpy with rep movsb, two VirtualProtect, and ultimately the magic 0xE990 being stored at the beginning of NtQueryVirtualMemory at 0x40658F. 0xE990 is a relative jmp instruction. But, where does it jump to? It jumps to whatever is stored at location 0x406588, which, if you trace all the way back, is the memory region allocated at 0x40653C. This memory is copied over with the data from 0x4147B0, which we now call the Patch function.
In summary, subroutine 0x4064F3 checks NtQueryVirtualMemory to make sure no one plays with it. If no one patched the function, it will allocate a memory region, copy over the code from subroutine 0x4147B0, then patch NtQueryVirtualMemory to jmp to 0x4147B0. This function simply change the memory type of the returned data. We will see the significance of it soon.





5. Fancy stuffs 2

After the malware patches NtQueryVirtualMemory, it again allocate another 0x3000 bytes. This time, it copies over the entire image in memory to the new location at 0x406676 using rep movsb. Then it calls 0x4065A2, which looks like a lot of relocations is being fixed. Then Locky calls 0x406627 at 0x40668C. If you have been running things in a debugger (and hopefully in a VM), and try to step over this function, you will notice some strange behavior from your debugger. This function does not seem to return. Jumping in, at 0x40662D, locky stores the address of the return address into ECX. Then, it is overwritten with the value in EAX, which is the difference between EAX and the value of arg0. Taking a step back, arg0 is EBX, which is the address of this image in memory, and EAX is the newly allocated region which Locky allocated at 0x406664. Therefore, if the ret instruction executes, we will inevitably jump to the same offset, in the new memory region, and start to execute there. Locky also wipes the current image in memory with 0 at 0x4066BA.
VirtualAlloc 0x3000 bytes, and copy the image over to the new memory region at 0x406676


This function returns to another memory region. Return address is patched at 0x40662A

Now, if our OpSec engineer is checking this machine image, they will start noticing strange thing. The executable is running at a PRIVATE memory region (since we allocated and return to it). That is definitely not a normal behavior and would raise red flags. You know what would be normal? If the new memory region is of type IMAGE, which indicates the OS load the image at that region instead. That’s why Locky patch NtQueryVirtualMemory earlier to make sure its behavior does not stand out, too much.


6. Main stuffs

Locky moves on to call 0x40B4E1. If you have done a lot of reversing, you will notice that this function look somewhat familiar. There is __SEH_prolog, and GetStartupInfoW. It calls HeapSetInformation, and validate MZ and PE magic signature. It also call GetCommandLineA, and parse the command line into argc, argv. It is, indeed, CRTStartMain which eventually calls the main() function at 0x4B489.
A quick note on Locky and its string behavior. I believe Locky is using Visual Studio XString class to handle all their strings. Once feature that you need to be aware is that the XString has this prototype:
typedef struct __x_string {
    union {
        TCHAR* pszStr;
        TCHAR szStr[0x10];
    };
    DWORD len;
} XString;


If len is greater or equals to 0x10, the first four byte at offset 0x0 is a string pointer. If the length is less than 0x10, the string content is embedded inside the structure itself. The same thing applies to wchar strings, where the length will be checked against 0x08 instead of 0x10.
Following locky logic, at this point, looks pretty straight forward. At 0x44BC9, Locky calculates a unique ID using the volume name of the infected system. The ID is calculated at 0x46BD9. First, Locky gets the volume name. It then looks for the values between ‘{‘ and ‘}’, inclusive. Locky then calculate the MD5 of that string, and get the first 16 bytes as the ID for the infected system. This ID is used in various places throughout, including communicating with C2 server and calculating various random strings to store in the registry.
Locky then calculates a random string to store under HKCU\Software\. This string is derived from the system ID calculated earlier. All configurations will be stored under this key, including the public key received from C2 server, the ransom text, as well as the main flags indicating that the entire system has been encrypted. To protect yourself against this version of Locky, simply setting this flag to YES will cause locky to stop executing.
Locky then starts to gather system information at 0x44D35 and request a public key from C2 server. If you have PCAP of the communication to and from C2, you can used the scripts provided to decode all traffic.
Now, you can look at the details of Locky operation. The main encryption code for C2 communication is between 0x47D5D and 0x47DB5. Using XOR property, we can derive the decryption code for it in the attached script in client_encrypt and client_decrypt functions. The main decryption part for data received from C2 is between 0x47FDB and 0x4802C. We can also derive the encryption logic in the attached script under server_encryption and server_decrypt.
All messages between the sample and C2 use the following format:
[0x10 bytes of MD5 of plaintext][plain-text, variable length]
The entire message is then encrypted using client_encrypt if it is from the infected system, or server_encrypt if it comes from C2 server. Knowing that, you can mock your own C2 server to play with the sample as you see fit.
Locky then enumerates all logical volumes, and create a worker thread for each volume. The thread starts out searching for all interesting files, with all the extensions included in the sample. The list of include extensions is at 0x54224. This sample also skips windows specific directories and files, which are listed at 0x5CD8. The thread then add all the interesting files into a list, then go on to encrypt each file in the list.
For each file, it generates a random 0x10 bytes at 0x4256F, and encrypts the 0x10 bytes using the C2 public key. The 0x10 bytes randomly generated is used as a session key to encrypt each file using AES 128 or AES 192 algorithm. The thread also appends the following FileInfo structure at the end of each file:

typedef _File_Info {
    DWORD magic0; // 0x8956FE93
    BYTE SystemID[0x10];
    BYTE SessionKey[0x100];
    DWORD magic1; //0xD41BA12A
    CHAR szOriginalFileName[MAX_PATH];
    _WIN32_FILE_ATTRIBUTE_DATA FileAttribute;
} FileInfo;
For each of the file, the malware also generates a new random name using the following format:
[0x10 bytes SystemID][0x10 bytes random hex string].locky
The filename generation happens between 0x422CD and 0x4240C. You can follow along using the debugger to see how the names are generated and used. After all the files are encrypted, the thread for each volume will send the statistic back to C2 server.
The main thread will wait for all worker threads to finish, before setting the Desktop Wallpaper to the instruction text received from the C2 server. The text is customized based on the infected system default language.

At this point, I believe we have a fairly good understanding of Locky. There are lots of code to cover, and lots of optimization and in-line code that make analysis a pain in the butt. But, which a debugger attached as we walk through the code, it helps identify the function’s behavior without fully going into the details of STLs. Walk along with the code and annotate IDA as you go, it will greatly help clear things up.


#!/bin/env python

# htnhan aka khoai huynh[.]t[.]nhan[@]gmail_dot_com
# implements most of locky crypto stuffs:
# client_encrypt: Encrypts data coming from malware to C2
#                 calculate and prepend MD5 yourself please.
# client_decrypt: Decrypts stuffs encrypted with client_encrypt
# server_encrypt: Encrypts data coming from C2 to malware.
#                 alculate and prepend MD5 yourself please.
# server_decrypt: Decrypts stuffs encrypted with server_encrypt
# gensystemid   : Get volume name and generate SystemID from it
# genregkey     : Generate registry keys using SystemID to store
#        - Main config stuffs at HKCU\Software\<string0>
#        - C2 PUBLICKEYBLOB   at HKCU\Software\<string0>\<string2>
#        - instructions text  at HKCU\Software\<string0>\<string3>
#        - YES flag           at HKCU\Software\<string0>\<string4>

import sys
import ctypes
import hashlib


K0 = 0xCD43EF19
K1 = 0xAFF49754


# rol, ror are stolen from somewhere on the internet....
# with some modification.
# maybe https://gist.github.com/c633/a7a5cde5ce1b679d3c0a
rol = lambda val, r_bits:  \
    (val << r_bits%32) & (2**32-1) | \
    ((val & (2**32-1)) >> (32-(r_bits%32)))


ror = lambda val, r_bits:  \
    ((val & (2**32-1)) >> r_bits%32) | \
    (val << (32-(r_bits%32)) & (2**32-1))


def client_encrypt(idata):
    '''encryption part for client'''
    key = K0
    plain = bytearray(idata)
    ctext = bytearray()
    for i, v in enumerate(plain):
        ctext.append(((ror(key,0x05) - rol(i 0x0D) & 0xFF) ^ v) & 0xFF)
        tmp = rol(v, (i & 0xFF) & 0x1F) + ror(key, 0x1)
        key = tmp ^ (ror(i, 0x17) + 0x53702f68) & 0xFFFFFFFF
    return ctext


def client_decrypt(idata):
    '''This one decrypts things encrypted by the infected system'''
    key = K0
    plain = bytearray(idata)
    ctext = bytearray()
    for i, v in enumerate(plain):
        n = ((ror(key, 0x05) - rol(i, 0x0D) & 0xFF) ^ v) & 0xFF
        ctext.append(n)
        tmp = rol(n, (i & 0xFF) & 0x1F) + ror(key, 0x1)
        key = tmp ^ (ror(i, 0x17) + 0x53702f68) & 0xFFFFFFFF
    return ctext



def server_encrypt(idata):
    '''This one encrypt data on C2 before sending to Locky'''
    key = K1
    ctext = bytearray(idata)
    ptext = bytearray()
    for i, v in enumerate(ctext):
        num = (v - i - rol(key, 0x03)) & 0xFF
        ptext.append(num)
        key = (key+ror(num,0x0B)^rol(key,0x05)^i-0x47CB0D2F)&0xFFFFFFFF
    return ptext


def server_decrypt(idata):
    '''This one decrypts data received from C2.'''
    key = K1
    ctext = bytearray(idata)
    ptext = bytearray()
    for i, v in enumerate(ctext):
        num = (v - i - rol(key, 0x03)) & 0xFF
        ptext.append(num)
        key = (key+ror(num,0x0B)^rol(key,0x05)^i-0x47CB0D2F)&0xFFFFFFFF
    return ptext


def pprint(buf):
    for i, v in enumerate(buf):
        if i % 0x10 == 0: print ''
        print "%02X" % (v,),


def shrd(dst, src, cnt):
    return (((src << 32) + dst) >> cnt) & 0xFFFFFFFF


def shld(dst, src, cnt):
    out = ((src << 32) + dst) << cnt
    out |= (src >> 32-cnt)
    return out & 0xFFFFFFFF


def myadd(a, b):
    out = a + b
    c = out > (2**32-1)
    return 0xFFFFFFFF & (out), c


ROUND = 7
def mycrypt(h, l, idx):
    for i in xrange(ROUND):
        eax = shrd(l, h, 0x19) ^ (0xFFFFFFFF & (l << 7))
        ecx = shld(h, l, 0x07) ^ (h >> 0x19)

        esi, c = myadd(rol(i, 7), eax)
        edi = (ecx+c) & 0xFFFFFFFF

        esi, c = myadd(esi, 0xFFFFFFFF & (idx< string
        0x02:   value name to store C2 publickeyblob
        0x03:   value name to store instructions text
        0x04:   value name to mark encryption finished
      System ID can be generated with gensystemid.
    '''
    h, l = int(idstr[:8], 16), int(idstr[8:], 16)
    out = str()
    h, l = mycrypt(h, l, idx)
    size = 0x8 + (shrd(l, h, 0x5) & 0x7)


    for i in range(size):
        h, l = mycrypt(h, l, i)
        tmp = (l & 0xff) - 1

        h, l = mycrypt(h, l, i)
        value = l & 0xff

        if tmp % 3 == 0:
            ascii_code = (value % 26) + ord('A')
        elif tmp % 3 == 1:
            ascii_code = (value % 26) + ord('a')
        else:
            ascii_code = (value % 10) + ord('0')
        out += chr(ascii_code)
    return out


def getvolname():
    kernel32 = ctypes.windll.kernel32
    buf = ctypes.create_unicode_buffer(1024)

    kernel32.GetVolumeNameForVolumeMountPointW(
        ctypes.c_wchar_p("C:\\"),
        buf,
        ctypes.sizeof(buf)
    )
    return buf.value


def gensystemid():
    vname=getvolname()
    print vname
    n1, n2 = vname.index('{'), vname.index('}')
    vname = vname[n1:n2+1]
    print vname
    m = hashlib.md5()
    m.update(vname)
    sid = m.hexdigest()[:0x10].upper()
    return sid



if __name__ == '__main__':
    print 'Generating registry keys....'
    SID = gensystemid()
    for idx in [0, 2, 3, 4, 0xFFFFFFFB]:
        print '0x%08x - %s' % (idx, genregkey(SID, idx))

No comments:

Post a Comment