Showing posts with label locky. Show all posts
Showing posts with label locky. Show all posts

Sunday, April 24, 2016

Reversing locky -- The juicy parts

Last post, we looked at Locky for the first time and attempted to unpack the main payload for analysis. This time, we will go in to some details on Locky behavior. I will not reverse all code path, but rather only give a summary of the main functionality of the routine. If you have time, please dive in, as I think the experience is very rewarding.

3. Config stuffs

Even after unpacking, the executable in memory is still not as easy to read as we would like. First, let’s open the dumped PE file in IDA. IDA probably will complains about not being able to resolves some addresses. That is fine. We may not be able to extract an executable that can run properly, but we can surely analyze Locky behavior with it.
Press Ctrl-E and go to “start” as the main entry. This should be the same function used for CreateThread earlier. If your IDA shows EBP-XXX instead of EBP+var_2C, for example, that means IDA does not recognize this function is using EBP based frame. Click on the name of “start”, and then press Alt-P, and check BP-based frame on the right panel. IDA will start to analyze and create all kinds of local variables and arguments.
At the “start” function, you should notice GetModuleHandle, and a few VirtualAlloc that looks suspicious. This is where the main configuration get decoded from inside the PE file. First, at 0x405174, eax is assigned edi, which is assigned the output of GetModuleHandleA from the call at 0x40515D. This is the very beginning of the image. Then, the malware starts to search for memory regions with the following property. If [eax] XOR 0x88BBDD8D == [eax + 4] AND [eax] XOR [0xDDBCA2B2] == [eax + 8] then EAX points at the beginning of the configuration block. We can derive that the configuration block is defined as:
typedef struct _configuration_block {
    DWORD dwMarker0;
    DWORD dwMarker1; // dwMarker1= dwMarker0 XOR 0x88BBDD8D
    DWORD dwMarker2; // dwMarker2 = dwMarker0 XOR 0xDDBCA2B2
    DWORD dwEncodedKey; //dwEncodedKey = dwKey XOR dwMarker0
    DWORD dwEncodedSize; //dwEncodedSize^dwMarker0= size
    VOID* pEncodedConfig
} ConfigBlock;
The encoded configuration block will be copied to a new memory region allocated at 0x4051E3. Then, at 0x405231, the same configuration block will be decoded into a new memory region allocated at 0x40520E. We can easily walk through this block of code to re-implement the search and decode part to grab the config out of any Locky sample. This part of code seems pretty consistent against a few samples of Locky that I have locked into.
Search and decode configuration


4. Fancy stuffs

The next interesting thing to look at is the function call at 0x40525F, which calls 0x406634. 0x406634 starts with calling 0x4064F3. This one look for the address of NtQueryVirtualMemory in ntdll.dll. Then, it compare the first byte with 0xB8, and compare the next few bytes with 0x00. This piece of code is verifying that the beginning of NtQueryVirtualMemory starts with a “mov eax” which, naturally, it does. NtQueryVirtualMemory sets EAX up with the right function number, before calling the dispatch SystemCall function to serve the request. In this case, if you disassemble the NtQueryVirtualMemory, the system call is 0x10B.
Going back to the malware, we see it checks NtQueryVirtualMemory to make sure it has not been changed. Then, we see a call to VirtualAlloc, a memcpy with rep movsb, two VirtualProtect, and ultimately the magic 0xE990 being stored at the beginning of NtQueryVirtualMemory at 0x40658F. 0xE990 is a relative jmp instruction. But, where does it jump to? It jumps to whatever is stored at location 0x406588, which, if you trace all the way back, is the memory region allocated at 0x40653C. This memory is copied over with the data from 0x4147B0, which we now call the Patch function.
In summary, subroutine 0x4064F3 checks NtQueryVirtualMemory to make sure no one plays with it. If no one patched the function, it will allocate a memory region, copy over the code from subroutine 0x4147B0, then patch NtQueryVirtualMemory to jmp to 0x4147B0. This function simply change the memory type of the returned data. We will see the significance of it soon.





5. Fancy stuffs 2

After the malware patches NtQueryVirtualMemory, it again allocate another 0x3000 bytes. This time, it copies over the entire image in memory to the new location at 0x406676 using rep movsb. Then it calls 0x4065A2, which looks like a lot of relocations is being fixed. Then Locky calls 0x406627 at 0x40668C. If you have been running things in a debugger (and hopefully in a VM), and try to step over this function, you will notice some strange behavior from your debugger. This function does not seem to return. Jumping in, at 0x40662D, locky stores the address of the return address into ECX. Then, it is overwritten with the value in EAX, which is the difference between EAX and the value of arg0. Taking a step back, arg0 is EBX, which is the address of this image in memory, and EAX is the newly allocated region which Locky allocated at 0x406664. Therefore, if the ret instruction executes, we will inevitably jump to the same offset, in the new memory region, and start to execute there. Locky also wipes the current image in memory with 0 at 0x4066BA.
VirtualAlloc 0x3000 bytes, and copy the image over to the new memory region at 0x406676


This function returns to another memory region. Return address is patched at 0x40662A

Now, if our OpSec engineer is checking this machine image, they will start noticing strange thing. The executable is running at a PRIVATE memory region (since we allocated and return to it). That is definitely not a normal behavior and would raise red flags. You know what would be normal? If the new memory region is of type IMAGE, which indicates the OS load the image at that region instead. That’s why Locky patch NtQueryVirtualMemory earlier to make sure its behavior does not stand out, too much.


6. Main stuffs

Locky moves on to call 0x40B4E1. If you have done a lot of reversing, you will notice that this function look somewhat familiar. There is __SEH_prolog, and GetStartupInfoW. It calls HeapSetInformation, and validate MZ and PE magic signature. It also call GetCommandLineA, and parse the command line into argc, argv. It is, indeed, CRTStartMain which eventually calls the main() function at 0x4B489.
A quick note on Locky and its string behavior. I believe Locky is using Visual Studio XString class to handle all their strings. Once feature that you need to be aware is that the XString has this prototype:
typedef struct __x_string {
    union {
        TCHAR* pszStr;
        TCHAR szStr[0x10];
    };
    DWORD len;
} XString;


If len is greater or equals to 0x10, the first four byte at offset 0x0 is a string pointer. If the length is less than 0x10, the string content is embedded inside the structure itself. The same thing applies to wchar strings, where the length will be checked against 0x08 instead of 0x10.
Following locky logic, at this point, looks pretty straight forward. At 0x44BC9, Locky calculates a unique ID using the volume name of the infected system. The ID is calculated at 0x46BD9. First, Locky gets the volume name. It then looks for the values between ‘{‘ and ‘}’, inclusive. Locky then calculate the MD5 of that string, and get the first 16 bytes as the ID for the infected system. This ID is used in various places throughout, including communicating with C2 server and calculating various random strings to store in the registry.
Locky then calculates a random string to store under HKCU\Software\. This string is derived from the system ID calculated earlier. All configurations will be stored under this key, including the public key received from C2 server, the ransom text, as well as the main flags indicating that the entire system has been encrypted. To protect yourself against this version of Locky, simply setting this flag to YES will cause locky to stop executing.
Locky then starts to gather system information at 0x44D35 and request a public key from C2 server. If you have PCAP of the communication to and from C2, you can used the scripts provided to decode all traffic.
Now, you can look at the details of Locky operation. The main encryption code for C2 communication is between 0x47D5D and 0x47DB5. Using XOR property, we can derive the decryption code for it in the attached script in client_encrypt and client_decrypt functions. The main decryption part for data received from C2 is between 0x47FDB and 0x4802C. We can also derive the encryption logic in the attached script under server_encryption and server_decrypt.
All messages between the sample and C2 use the following format:
[0x10 bytes of MD5 of plaintext][plain-text, variable length]
The entire message is then encrypted using client_encrypt if it is from the infected system, or server_encrypt if it comes from C2 server. Knowing that, you can mock your own C2 server to play with the sample as you see fit.
Locky then enumerates all logical volumes, and create a worker thread for each volume. The thread starts out searching for all interesting files, with all the extensions included in the sample. The list of include extensions is at 0x54224. This sample also skips windows specific directories and files, which are listed at 0x5CD8. The thread then add all the interesting files into a list, then go on to encrypt each file in the list.
For each file, it generates a random 0x10 bytes at 0x4256F, and encrypts the 0x10 bytes using the C2 public key. The 0x10 bytes randomly generated is used as a session key to encrypt each file using AES 128 or AES 192 algorithm. The thread also appends the following FileInfo structure at the end of each file:

typedef _File_Info {
    DWORD magic0; // 0x8956FE93
    BYTE SystemID[0x10];
    BYTE SessionKey[0x100];
    DWORD magic1; //0xD41BA12A
    CHAR szOriginalFileName[MAX_PATH];
    _WIN32_FILE_ATTRIBUTE_DATA FileAttribute;
} FileInfo;
For each of the file, the malware also generates a new random name using the following format:
[0x10 bytes SystemID][0x10 bytes random hex string].locky
The filename generation happens between 0x422CD and 0x4240C. You can follow along using the debugger to see how the names are generated and used. After all the files are encrypted, the thread for each volume will send the statistic back to C2 server.
The main thread will wait for all worker threads to finish, before setting the Desktop Wallpaper to the instruction text received from the C2 server. The text is customized based on the infected system default language.

At this point, I believe we have a fairly good understanding of Locky. There are lots of code to cover, and lots of optimization and in-line code that make analysis a pain in the butt. But, which a debugger attached as we walk through the code, it helps identify the function’s behavior without fully going into the details of STLs. Walk along with the code and annotate IDA as you go, it will greatly help clear things up.


#!/bin/env python

# htnhan aka khoai huynh[.]t[.]nhan[@]gmail_dot_com
# implements most of locky crypto stuffs:
# client_encrypt: Encrypts data coming from malware to C2
#                 calculate and prepend MD5 yourself please.
# client_decrypt: Decrypts stuffs encrypted with client_encrypt
# server_encrypt: Encrypts data coming from C2 to malware.
#                 alculate and prepend MD5 yourself please.
# server_decrypt: Decrypts stuffs encrypted with server_encrypt
# gensystemid   : Get volume name and generate SystemID from it
# genregkey     : Generate registry keys using SystemID to store
#        - Main config stuffs at HKCU\Software\<string0>
#        - C2 PUBLICKEYBLOB   at HKCU\Software\<string0>\<string2>
#        - instructions text  at HKCU\Software\<string0>\<string3>
#        - YES flag           at HKCU\Software\<string0>\<string4>

import sys
import ctypes
import hashlib


K0 = 0xCD43EF19
K1 = 0xAFF49754


# rol, ror are stolen from somewhere on the internet....
# with some modification.
# maybe https://gist.github.com/c633/a7a5cde5ce1b679d3c0a
rol = lambda val, r_bits:  \
    (val << r_bits%32) & (2**32-1) | \
    ((val & (2**32-1)) >> (32-(r_bits%32)))


ror = lambda val, r_bits:  \
    ((val & (2**32-1)) >> r_bits%32) | \
    (val << (32-(r_bits%32)) & (2**32-1))


def client_encrypt(idata):
    '''encryption part for client'''
    key = K0
    plain = bytearray(idata)
    ctext = bytearray()
    for i, v in enumerate(plain):
        ctext.append(((ror(key,0x05) - rol(i 0x0D) & 0xFF) ^ v) & 0xFF)
        tmp = rol(v, (i & 0xFF) & 0x1F) + ror(key, 0x1)
        key = tmp ^ (ror(i, 0x17) + 0x53702f68) & 0xFFFFFFFF
    return ctext


def client_decrypt(idata):
    '''This one decrypts things encrypted by the infected system'''
    key = K0
    plain = bytearray(idata)
    ctext = bytearray()
    for i, v in enumerate(plain):
        n = ((ror(key, 0x05) - rol(i, 0x0D) & 0xFF) ^ v) & 0xFF
        ctext.append(n)
        tmp = rol(n, (i & 0xFF) & 0x1F) + ror(key, 0x1)
        key = tmp ^ (ror(i, 0x17) + 0x53702f68) & 0xFFFFFFFF
    return ctext



def server_encrypt(idata):
    '''This one encrypt data on C2 before sending to Locky'''
    key = K1
    ctext = bytearray(idata)
    ptext = bytearray()
    for i, v in enumerate(ctext):
        num = (v - i - rol(key, 0x03)) & 0xFF
        ptext.append(num)
        key = (key+ror(num,0x0B)^rol(key,0x05)^i-0x47CB0D2F)&0xFFFFFFFF
    return ptext


def server_decrypt(idata):
    '''This one decrypts data received from C2.'''
    key = K1
    ctext = bytearray(idata)
    ptext = bytearray()
    for i, v in enumerate(ctext):
        num = (v - i - rol(key, 0x03)) & 0xFF
        ptext.append(num)
        key = (key+ror(num,0x0B)^rol(key,0x05)^i-0x47CB0D2F)&0xFFFFFFFF
    return ptext


def pprint(buf):
    for i, v in enumerate(buf):
        if i % 0x10 == 0: print ''
        print "%02X" % (v,),


def shrd(dst, src, cnt):
    return (((src << 32) + dst) >> cnt) & 0xFFFFFFFF


def shld(dst, src, cnt):
    out = ((src << 32) + dst) << cnt
    out |= (src >> 32-cnt)
    return out & 0xFFFFFFFF


def myadd(a, b):
    out = a + b
    c = out > (2**32-1)
    return 0xFFFFFFFF & (out), c


ROUND = 7
def mycrypt(h, l, idx):
    for i in xrange(ROUND):
        eax = shrd(l, h, 0x19) ^ (0xFFFFFFFF & (l << 7))
        ecx = shld(h, l, 0x07) ^ (h >> 0x19)

        esi, c = myadd(rol(i, 7), eax)
        edi = (ecx+c) & 0xFFFFFFFF

        esi, c = myadd(esi, 0xFFFFFFFF & (idx< string
        0x02:   value name to store C2 publickeyblob
        0x03:   value name to store instructions text
        0x04:   value name to mark encryption finished
      System ID can be generated with gensystemid.
    '''
    h, l = int(idstr[:8], 16), int(idstr[8:], 16)
    out = str()
    h, l = mycrypt(h, l, idx)
    size = 0x8 + (shrd(l, h, 0x5) & 0x7)


    for i in range(size):
        h, l = mycrypt(h, l, i)
        tmp = (l & 0xff) - 1

        h, l = mycrypt(h, l, i)
        value = l & 0xff

        if tmp % 3 == 0:
            ascii_code = (value % 26) + ord('A')
        elif tmp % 3 == 1:
            ascii_code = (value % 26) + ord('a')
        else:
            ascii_code = (value % 10) + ord('0')
        out += chr(ascii_code)
    return out


def getvolname():
    kernel32 = ctypes.windll.kernel32
    buf = ctypes.create_unicode_buffer(1024)

    kernel32.GetVolumeNameForVolumeMountPointW(
        ctypes.c_wchar_p("C:\\"),
        buf,
        ctypes.sizeof(buf)
    )
    return buf.value


def gensystemid():
    vname=getvolname()
    print vname
    n1, n2 = vname.index('{'), vname.index('}')
    vname = vname[n1:n2+1]
    print vname
    m = hashlib.md5()
    m.update(vname)
    sid = m.hexdigest()[:0x10].upper()
    return sid



if __name__ == '__main__':
    print 'Generating registry keys....'
    SID = gensystemid()
    for idx in [0, 2, 3, 4, 0xFFFFFFFB]:
        print '0x%08x - %s' % (idx, genregkey(SID, idx))

Reversing locky -- First look

Ransomware has been the main focus of many blogs, talks and complaints recently. They come in all shapes and sizes. One day last week, I had a chance to look at a variant of Locky. You can find many articles on Locky ransomware. However, the majority of the articles focus on behavior, delivery methods and features. Instead, I am interested in the bits and bytes of the main pay load. I want to know how Locky is designed, how it works, and how it may fail. Hence, this post!
I would like to provide a walkthrough for main reversing process. I try to not have any "magic" decision like "set a break point at 0x12345 and see that it is unpacked". But, I rather you travel with me through the decision making process of why we look at function XYZ and why we break at 0x12345. Feel free to play a long, or, just read for your own entertainment.

0. Pre-Req:

I assume you are comfortable with reversing and debugging malware. I assume you use a virtual machine technology of some kind. Please do not infect yourself. If you are looking for a commodity sample, malwr[11] and virustotal[12] are both great sources. You also need the following toys to play along:
  • A virtual machine software. There are many choices: VMware [1] and VirtualBox [2] are probably the most common. Pick one that suits your preference.
  • A Microsoft Windows virtual machine. I use Windows 7 x86. XP works as well. I have not tested on Windows 8, 8.1 or 10, but I assume they work just fine.
  • A debugger. I use Microsoft Windbg[3]. OllyDb[4], ImmunityDebugger[5] are two other common options.
  • A disassembler. I use IDA[6]. The freeware version works just fine. If you have another alternative to IDA, please, please, please let me know.
  • PEiD[7] is still a very nice tool for generic PE information. The KANAL plugin is very useful in many cases.
  • CFF explorer[8] is a nice free tool for viewing the PE file info as well.
  • Tools for searching for strings. I use Linux strings command. Make sure you also check strings -el for windows Unicode strings in the sample.
  • Process Hacker[9] is a great tool to monitor a process behavior.
  • Process Monitor[10 from SysInternal suite is also great to record many events happen over time. It does, however, generates a lot of noise, so it’s best to filter out unnecessary operations or processes before running your sample.

1. First look:

The first look turns out to be great. PEiD reports that the file is not packed. CFF explorer shows complete import table, as well as plenty of proper sections. Running strings on the sample reveals many, many strings, which is always a good sign that the sample is not pack. All these are great signs, as packed malware is a lot tougher to handle. Let’s open up in IDA and see what it does.

PEiD shows that file is not packed.

CFF explorer shows full import table

And "proper" sections


IDA shows WinMain to be pretty simple. However, poking around a little shows that the sample is either encrypted or packed with an unknown packer. We see there is no real structure to the code. There are many global variables and constants being used. All of this make analysis impossible.
This is one prime example why we should not trust our tools completely. It is always best to verify with a disassembler and check the code execution.

2. Unpack

Since this sample is not packed by a standard packer, we have to unpack it manually. The most common approach is looking for the end of decryption/unpacking stub and set a break point there. However, after many attempts, I have not found a good location for that just by looking at the assembly code.
Process Hacker shows a RWX memory region committed.

Instead, I turn to the dynamic analysis tools. I let the malware run, and capture its events. As you can see Process Hacker shows a few memory region with RWX permission. It is a good indication that such region is allocated and filled out by the unpacking stub before resuming its execution there. We also see a new thread created and executing in a different memory region. Therefore, I try to set a breakpoint at CreateThread API, hoping to catch its execution. Make sure you have debugging symbols loaded for windows dll. Using windbg, set your break point using
bp kernel32!CreateThreadStub
If you use OllyDebugger or ImmunityDebugger, within the assembly panel, type Ctrl-G, and type in “CreateThread”. Olly will takes you to CreateThread entry point. There, you can set a break point using F2.

A new thread is created at 0x5152

A CreateThread even in Process Monitor.


Now continue execution until CreateThread is called. You will be able to see the address of the new thread.

WinDbg breakpoint at kernel32!CreateThreadStub
And the breakpoint at the ThreadFunction at 0x405152 hits

Next, we need to dump the image in memory, and fix up the entry point for it. If you are using Olly, there is a plugin call OllyDump that can achieve the task easily. For Windbg, I use Scylla to archive such task. Select the process, and click “File”->”Dump”. Then, you can fix the original entry piont OEP and use IAT auto search to search for the import address table (IAT). Click GetImports to verify all the imports are fixed properly. You can also use ImportREC, which is shown in the screenshot below.

ImportREC shows all import resolves properly.

Now that we have successfully unpack Locky, we can look into its real behavior in the next post.

Links:
[1] - https://my.vmware.com/
[2] - https://www.virtualbox.org/wiki/Downloads
[3] - https://msdn.microsoft.com/en-us/windows/hardware/hh852365.aspx
[4] - http://www.ollydbg.de/
[5] - https://www.immunityinc.com/products/debugger/
[6] - https://www.hex-rays.com/products/ida/support/download.shtml
[7] - https://www.aldeid.com/wiki/PEiD
[8] - http://www.ntcore.com/exsuite.php
[9] - http://processhacker.sourceforge.net/
[10] - https://technet.microsoft.com/en-us/sysinternals/processmonitor.aspx
[11] - https://malwr.com/
[12] - https://www.virustotal.com/