Lab 10 - Document Malware
In this lab, you will explore malware that is distributed using PDF and Microsoft Office document files.
Deliverables
Upload the following items to the Lab 10 Assignment on Canvas when finished.
- The PDF file containing the required documentation
Part 1 - PDF Malware #1
Launch your REMnux virtual machine and copy over the 1-xxxx.zip
malware. Unzip the malware, and open it using a text editor. (SCiTE works).
Question | Answer |
---|---|
What is the object number and version number of the first object to be processed after opening the file? Tip: Look at the /Root line in the trailer) |
|
What is the complete contents of that object? (from obj to endobj) |
Even knowing nothing about PDFs, that object looks suspicious. What is a PDF file doing invoking PowerShell? Dig into the specifics of the command using the documentation on the PDF standard and answer the following questions. In particular, refer to Section 8.5.3 of the specification (“Action Types”).
Question | Answer |
---|---|
What does /S /Launch/Win mean? |
|
What does /F (….) mean? |
|
What does /P (….) mean? |
Let's explore the details about how PowerShell is being invoked.
Question | Answer |
---|---|
What does the -EncodedCommand argument do to PowerShell? |
|
What is the decoded command? Tip: Use https://www.base64decode.org/ in "Auto Detect" character set mode to decode |
The base64dump.py utility can be a useful way to extract encoded strings from files, rather than decoding them manually.
Bugfix Spring 2022: The base64dump.py
program must be run with Python2, and earlier Remnux releases invoked Python3 by mistake. If the program crashes, explicitly run it under Python2 this way: :
$ python2 /usr/local/bin/base64dump.py .......
Question | Answer |
---|---|
Run base64dump.py <file.pdf> to see a summary of all potential encoded strings. What is the output? |
|
Run base64dump.py <file.pdf> -s <stream ID> -S to print the decoded text for a particular stream ID. What is the decoded text? |
Part 2 - PDF Malware #2
Launch your REMnux virtual machine and copy over the 2-xxxx.zip
malware. Unzip the malware, and use pdfid.py to do a quick “triage” of the PDF file. Usage: pdfid.py <file.pdf>
Question | Answer |
---|---|
What object types might be suspicious to a malware analyst, why are they interesting, and how many of each objects were found? |
Although opening up a large PDF file in a text editor and scrolling up and down can be fun, there are other tools that will allow you to jump straight to portions of interest in the file. Use pdf-parser.py to search for the objects you previously identified with pdfid.py
Usage: pdf-parser.py <file.pdf> --search <ObjectType>
Question | Answer |
---|---|
What is the name of the JavaScript function that the PDF invokes upon opening? Tip: The name is this.XXXXXX |
|
In what object is this JavaScript function defined? |
You can use pdf-parser.py to view particular sections.
Usage: pdf-parser.py <file.pdf> --object <objectNum>
Question | Answer |
---|---|
What section does the previous object (that defined the JavaScript function ) reference? | |
Is there any code in that referenced section, or does it reference yet another section? | |
What object in the PDF file actually contains JavaScript code? |
You can use pdf-parser.py to decode and dump the JavaScript code from a stream.
Usage: pdf-parser.py <file.pdf> --object <objectNum> --filter --raw -d output.txt
- --raw : Output without escaping special characters
- -d : Save the output to disk instead of standard output
Question | Answer |
---|---|
Provide a piece of the resulting JavaScript code. It’s obfuscated, so I don’t need it all. |
We could continue here, but JavaScript deobfuscation is a topic for another time...
Use peepdf.py to parse the file for an alternate view of potentially suspicious sections.
Usage: peepdf <options> <file.pdf>
Common options:
- -f : Force parsing mode (continue even if errors are encountered)
- -l : Loose parsing mode (report any malformed objects - malware could be abusing the PDF structure)
- -i : Interactive mode
In interactive mode, you can jump between sections. Try “object 13” or “object 10”. Type “?” for help to see other commands, or “quit” to exit.
Question | Answer |
---|---|
What is the output of peepdf.py, under “Objects with JS Code” and “Suspicious Elements”? |
Part 3 - MS Office Malware #1
Launch your REMnux virtual machine and copy over the 3-xxxx.zip
malware. Unzip the malware, and use olevba.py
to examine the file.
Usage: olevba <file>
Question | Answer |
---|---|
Examine the summary table produced. What IOCs are identified from the VBA code? | |
Examine the summary table produced. What suspicious keywords are identified from the code? | |
Locate the AutoOpen() function in the VBA code, which was identified by olevba.py as potentially suspicious. What function does AutoOpen() call? | |
... And what function does that function call? | |
.,.. Inspect the code for that function. In a few sentences, what does it accomplish? (Your answer should explain whether this is a dropper or downloader and support that with evidence from the code). |
Unzip the Office XML file to view its inner structure and components.
Usage: unzip <file> -d <directory>
where file is the Office document and directory is the name of a new directory where you want all of the components to be placed.
Question | Answer |
---|---|
Two image files were found in the office document. Copy and past them into your report. |
Use oledump.py
to obtain a view of the overall structure of the file.
Question | Answer |
---|---|
What stream(s) contain Macro code? Tip: A column with an M label means "Macro attributes AND code", which is what you are looking for. A column with an m label is macro attributes only and without code. |
|
Use oledump.py to obtain the contents of that stream and view the code. What is the full command to do so? Tip: Use the -h argument to access the documentation. -s will specify a stream, and -v will decompress the macro. |
|
There’s another HTTP-related string in the macro file. What is it? Tip: Use the strings program with the --encoding=l (lowercase-L) argument to search the macro file for unicode-encoded strings. Note that here you need to specify the VBA .bin file inside the .zip archive, not the original .docm container file, because the strings program does not understand the zip compression. |
|
What might be the significance of this HTTP-related string? (Speculate, there's no particular right answer here) |
Part 4 - MS Office Malware #2
Launch your REMnux virtual machine and copy over the 4-xxxx.zip
malware. Unzip the malware, and use oledump.py
to examine the file.
Question | Answer |
---|---|
Locate a section of the file with VBA macros. What is the stream (section) number, size in bytes, and name? | |
Obtain the contents of that stream. What is the name of the first function that is declared in the VBA macro? Tip: Look for the Public Function keyword |
|
The function is obfuscated, with names that are random gibberish, unnecessary loops, and extra variables. However, based partially on the name of the function, but more heavily based on the way the function is used, its purpose will become clear. Search the macro (via grep, text editor search, etc…) for invocations of the function. How many times is it used in the macro? |
Create a table documenting each invokation of this function. Show the arguments passed to the function, the un-obfuscated value, and the human-readable results that are potential indicators of compromise. Hint: The following two websites may be useful:
- http://tomeko.net/online_tools/xor.php?lang=en
(Tip: unchecking the "output: use 0x and comma as separator (C-like)" option will produce results that can be more directly copy-and-pasted) - https://www.rapidtables.com/convert/number/hex-to-ascii.html
Arg1 | Arg2 | Un-Obfuscated Result | Human-Readable Result | |
---|---|---|---|---|
Call 1 | ||||
Call 2 | ||||
... | ||||
... | ||||
Call n |
Aside from un-obfuscating the contents manually, you could also write a Python program to accomplish it, or use the VBA debugger provided with Microsoft Office to attempt to capture the un-obfuscated data after the macro processes it.