“You can download it here”
Before diving deep into reversing any executable, it’s always a better idea to take a static look at the binary and find information on Compilers, Packers, Strings and any other information that can come handy.
In this case the statically gathered information was a kind of lottery and saved me from entering debugging mode with the executable. The string analysis quickly revealed the executable is somehow related to python.
Figure 1: Strings related to python
Even with a very little experience in this field one can deduce that the python scripts here are converted to exes, same as it is done for other scripting languages like AU3 or VBS etc. PyInstaller is one such utility to achieve this. Now with a quick google search for unpacking PyInstaller exes I came across this simple but effective unpacker on GitHub to extract the python modules from the executable.
After extraction there are around 440 files, with no any hint about the main file (entrypoint) to start looking at. Here string search on directory using command like FIND or GREP (I used FAR Manager) for the message displayed in the CrackMe console was again very useful and it revealed the file with name “another” which has no extension to supposedly contain all of the required logic.
Figure 2: Main python module
After some more static analysis (strings analysis) on this file it became certain that this is a python module but a compiled one (*.PYC file) as I saw many python function names and strings mingled with binaries codes in it. But there is still something missing – The binary header is not proper, so when using a decompiler to retrieve the actual .PY file from .PYC it throws an error.
It wasn’t too complicated to fix it, just inserted the missing header and again passed to the decompiler and got the complete python plain text source code.
Figure 3: Header comparison
Once we get hold of the main python file the steps to clear stage 1 was quite straightforward.
Figure 4: Stage 1 code
Stage 1 takes three inputs from the user
Login string is compared to a plain text string “hackerman” which gives us our login name to use. Password is converted to md5 hex string and compared with ’42f749ade7f9e195bf475f37a44cafcb’ which is md5 hash of “Password123“. This can be quickly figured out using online services that contains the hash and related plain text of popular strings and passwords. One such online service is https://www.md5online.org/
Now to validate the PIN, a key is derived using random number generator and user inputted PIN is passed as the seed to the RNG. The derived key is converted to md5 hash hex string and compared against the string ‘fb4b322c518e9f6a52af906e32aee955’. This is also well known and its plain text is ‘95104475352405197696005814181948’. Now to get the PIN I added a small brute force loop in the code to get it.
Figure 5: brute forcing the PIN
The pin got revealed and it is “9667“. Now we have all the information to clear stage 1:
- Login: hackerman
- Password: Password123
- PIN: 9667
To begin, stage-2 requires an internet connection and downloads some content from the web. The URL of this content is in the script but in AES encrypted form. The key generated using PIN (95104475352405197696005814181948) from the stage-1 is used to decrypt this URL.
Figure 6: Encrypted URL
After decryption the URL uncovers to https://i.imgur.com/dTHXed7.png
Figure 7: Image containing the stage 2 payload
The image file is then decode using following routine to the next stage payload which is a DLL file.
Figure 8: Decoding image payload to actual DLL file
After decoding the image content, we get a DLL which has following PE-header description.
The interesting technique to note here is the way DLL gets loaded into the memory. Basically, it uses reflective DLL loading technique to load this DLL.
Below function from the main python module is responsible to initiate the reflective loading of this DLL.
Figure 9: Initiate the DLL reflective loading
The above function just makes a Win function call to base-memory + 2. The base-memory contains the decode DLL from the image.
Interestingly at base-memory + 2 we have a routine with starting bytes (E8 00 00 00 00) which looks very similar to a start of shellcode and then it is followed by a PUSH-RET as shown in below disassembly.
This routine will take us to (0x10000007 + 0x6D9) which is 0x100006E0. At this memory we have a big routine which has code similarity to https://github.com/stephenfewer/ReflectiveDLLInjection and hence we are now completely sure that this DLL is loaded in the process’s memory and to clear the stage 2 we will have to get into the code detail of the DLL.
Figure 10: Reflective loader start at 0x6E0
The DLL has no exports so all the code originates form DllEntryPoint. So, we begin our analysis directly from here.
Figure 11: Important functions and path to reach them
In the above diagraph I have highlighted main routines to discuss.
The AddExceptionHandler has following code of interest:
Figure 12: code snippet from AddExceptionHandler
In above code a handler function “VerifyEnvVariable” is added to vectored exception handler table and then there is an instruction to generate software breakpoint interrupt (INT 0x3) which can be handled by a debugger if present or else an exception will be generated and the exception handler routine will be called.
In this case the exception handler routine “VerifyEnvVariable” is called.
Now there is an interesting code in this routine which get the EIP at which the exception occurred and based on whether the environment variable “mb_chall“ is present or not (if DLL is loaded separately i.e. not from the actual executable this variable won’t be set) it sets the EIP (instruction pointer) to the location where exception occurred Exception-EIP + 0x1 or Exception-EIP + 0x6 as shown in below snippet.
Figure 13: Exception handler routine redirecting to FailNode or MainNode based on the Env variable “mb_chall”
Exception-EIP + 0x1 = FailNode (Will display a message box saying “Sorry you, failed!”)
Figure 14: Failure message
Till now from this DLL we have only analyzed all unnecessary code that will not help to clear this stage. But it was necessary to understand the flow. Now we will look at the code in MainNode which is at Exception-EIP + 0x6 and it is the code which is actually running in stage 2.
This code creates a new thread which enumerates through all available windows on the system in a continuous loop with a pause of 1000 milliseconds in between re-enumeration.
For each available window it finds on the system a registered WNDENUMPROC function is called.
Figure 15: Enumerating windows
Here the WNDENUMPROC function is sub_10005750, This is the sub routine that verifies the presence of required secret console which can start accepting commands to ultimately clear this stage. Let’s look at the function sub_10005750 to know the further steps required.
Figure 16: Important code snippet from WNDENUMPROC
In the above code snippet, I have marked 4 important lines that we need to look at.
#1: Get the title of the window and store it in lParam (it’s then copied to variable v10 before comparing).
#2: Checks if the window title contains the string “Notepad” and “secret_console”.
#3: Change the title of the windows to “Secret Console is waiting for the commands…”.
#4: Gets the child window of this (editor) and executes EnumFunc call.
Now we know that it is looking for the windows where title should contain Notepad and secret_console so we can go ahead a create an empty text file with filename secret_console.txt and open it.
Figure 17: our secret console
You will immediately see messages appearing in the main executable which says “waiting for the command” and the title of our opened file in Notepad will also get changed as what I mentioned in #3 previously.
Figure 18: secret console ready to accept commands
We now know it’s working so the last thing we want is the command to enter in this secret console. To know this, we have to see the point #4 where it enumerates the child windows and calls the function EnumFunc on it.
Below we can see the code snippet of EnumFunc module –
Figure 19: Code from EnumFunc
As we see the only string compare in this code is the string “Dump_the_key” after which it’s doing some substantially new stuffs with library and memory (**remember the DLL name actxprxy.dll) which might look useless to do if the required command is not this instead of just exiting and saying no. So, just try with the command Dump_the_key.
And there we go, we get a congrats message saying the stage is clear. Cool
Now to solve for stage 3. I head back to the same old python script and start analyzing. The following is the code snippet in main(), responsible for verifying level 3.
Figure 20: Level 3 code in python script main()
We see it calling decode_pasted() and comparing the result with Boolean True. Let’s have a look at decode_pasted()
Figure 21: code from function decode_pasted()
In the above code our main focus should be on the highlighted part as these are the statement that is finally going to reveal our flag. The remaining part in the middle is basically verification of correctness of the required content.
As you may remember from stage-2 figure 19 about LoadLibraryA to load this same DLL and now here it’s getting the base address of this DLL as the start of the array to construct a string.
This string is then decompressed using Zlib base64 decode and run a XOR decryption using a user inputted 3 bytes key (R, G and B codes) and then execute it as a python expression. The original content before decompression and decryption looks like this.
If you want to know about the content how it came here it would be good to see the same figure 19 and look at the function call after VirtualProtect() (VirtualProtect is used to change the page permission to make it writable) the function sub_10009420 put it there.
It is indeed a base64 encoded string. So, once it is decode and decompressed we find the following content.
Figure 22: decompressed content
There is an obvious pattern that we can see which is repeated. The pattern is as follows.
<non_readable string_character> <string_character> <non_readable string_character>
This pattern of non-string character, string and then non-string character is repeated almost to the end of the file.
Also note that final input that we need to guess is 3 integers, 1 integer value for each R, G and B. Now it’s becoming clear, this three bytes XOR key will decrypt the current content so that it becomes executable python expression.
Next thing to note is exec statement of python takes string expressions or a code object. So, after decryption this complete content should become a readable string.
To achieve this readability, we only need to change or decrypt the non-string binary character so that they fall under readable range and the middle string character can be retained as it is.
This gives us one byte (middle one) of the three-byte XOR key and hence it should be 0x00 since no modification of middle character is required. Key will be something like this [\0x__ \0x00 \0x__]
It is also OK to guess that the first word of this content could be def (a start of function definition in python) from the given hint ‘e’ at the second location, but discard guessing the word and I will just use brute force again as I have significantly reduced the number of iteration required to get the correct value of R, G and B by guessing the value of G to constant 0x00 in advance. Let’s see a small calculation below:
- Total number of iteration to brute force without knowing any of the three values = 255 * 255 * 255 = 16,581,375 iterations.
- Total number of iteration to brute force with knowledge of one byte out of three = 255 * 255 = 65025.
There is significant reduction in the iteration from 16,581,375 to just 65,025 so, I decided to go for it.
Figure 23: Brute forcing final key
And the final flag was revealed at just 16,897th iteration. The value are as follows:
Figure 24: Successful retrieval of flag using brute force
Figure 25: The flag