Reverse engineering of firmware devices on the example of a flashing "rhino". Part 1
April 2? 2018 the company INFORION held a conference for students of the MSTU. Bauman SMARTRHINO-2018 . Specifically for the conference was prepared a small device based on the microcontroller STM32F042.
This rhinoceros became
The first part of the article is compiled based on the conducted master class and is intended for beginners - attention is paid to the basic approaches to firmware reversal and features of working with the IDA disassembler. The second part is a little more complicated, it pays attention to the features of the operation of devices based on real-time operating systems.
Carefully, under the cutter a blinking rhinoceros and its firmware!
download firmware for self-study.
The master class was held in an interactive mode - with the ability to ask, to offer their own solutions. For the participants, 4 "Rhinoceros" workers were available.
Briefly: at this stage, an external inspection of the device is performed in order to find the markings, connectors available.
At the beginning of the seminar, emphasis was placed on first looking at the device externally, then proceeding to reverse the firmware.
First of all, the microcontroller is interested, then the peripheral devices and connectors.
An external inspection of the device made it possible to establish the following:
The microcontroller STM32F042 - here immediately it is necessary to refer to the documentation for the microcontroller (if there is one), where you can find out the architecture, the microcontroller bit depth and a lot of useful (for our case - a 32-bit microcontroller on the ARM architecture);
On the back side there is a connector without designations - those who worked with microcontrollers can make the right assumption that this is the connector for flashing the device (first, it is not labeled, and secondly, it has 5 contacts, which corresponds to the necessary number of contacts for microcontroller interfacing);
Contacts GND, TX;
USB-connector for powering the device (this is stated in the "Instruction");
Unknown XP2 connector on the front of the device;
An incomprehensible yellow blimp on the leg of a rhinoceros is probably a touch button.
The smartest participants immediately connected the power to the devices and saw the following:
It was also discovered that there were available Bluetooth-devices with the names RHINOCEROS-220x , when connected to which a virtual COM port is created in the system. It turned out to be convenient to connect to the device via Bluetooth from the smartphone and interact via the mobile application "Serial Bluetooth Terminal" or similar.
It was found that when sending arbitrary text to the COM port, the device returns the answer
Initial investigation of firmware
Briefly: at this stage, a preliminary analysis of the firmware is performed. View rows. Download firmware in IDA Pro.
Before parsing the firmware code, it makes sense to check if the code is packed. There can be different approaches, in a simple case it is enough to use the utility strings , to get the lines of the binary file (are given in abbreviation):
Hardware init done Starting FreeRTOS
sendMsg error% s
SET AUTH% d
AT + AB ShowConnection
ERROR: Wrong header length
led idx% d hue% d sat% d val% d
addr =% x, size =% x
User auth pass% s
Wrong will not give up!
ERROR: Unk cmd
I've got a super power and now I'm seeing invisible tactical combatant nano-ants everywhere
Lines were found a lot - you can make the assumption that the firmware is not compressed and not encrypted. Already at this stage, you can pay attention to some noteworthy strings, for example, format strings, lines describing errors and specifying the operating system ( .and you saw them? ). The presence of meaningful lines, by and large, can be considered a half of a successful reverse.
Well, let's try to download the firmware to the most popular disassembler. We will use IDA version 6.9 for 32-bit code (since the microcontroller is 32-bit).
When you open the firmware file, IDA can not automatically determine the architecture and the entry point - you need to help it.
At this point, you need to consult the documentation for the microcontroller again. STM32F042x4 STM32F042x6 and see section 5 "Memory mapping":
As the Processor Type choose ARM Little endian , put the daw Manual load , click OK:
In the window " Do you want to change the processor type "Click Yes, after that IDA invites us to create segments of RAM and ROM, put the ROM dummy.
Now we need to specify the start address of the ROM. On the diagram it is necessary to look section Flash - these are the addresses 0x08000000 - 0x08008000 . Also we will point out that we want to download the firmware file to the same address: Loading address = 0x08000000 .
Click "OK" in the " .Arm and Thumb mode switching instructions " Window.
Next, the IDA says that it knows nothing about arbitrary binary files and the entry point is the main function - you must define it yourself. Click OK.
The download has been completed. You can learn the firmware.
Let's open the row of (Shift + F12). You can pay attention to the fact that not all lines coincide with the results from the utility strings - IDA did not recognize everything, unfortunately. A little later we will help her
Note for beginners
- Any program /firmware is a set of binary data. IDA Pro can be differently interpret these data of the source file (to represent data in the form of commands or data in a particular format). At the same time here there is no "Back" button (Ctrl + Z) to cancel the selected display - you need to know how to switch between different display modes. Crib on hot keys IDA Pro )
- Reverse engineer from the apparent chaos of binary data restores logic, structure and readability.
- Lines - important information when reversing! Since, in fact, among the entire set of binary data are most easily and quickly perceived by a person. The lines allow you to draw conclusions about the assignment of functions, variables and blocks of code.
- Name the scanned functions !! By default, the IDA gives the functions names based on their start addresses. At the analysis to keep in mind these addresses is very difficult, it is much easier to use meaningful names. In order to name the function sufficiently at least for its cursory analysis - this will already be an important help for further analysis.
- Name the recognized variables! In order to more effectively analyze blocks of code and functions, it makes sense to name the variables that the IDA recognized, according to their purpose (everything, as in the best programming practices).
- Leave the comments , so as not to forget the important. By analogy with programming, comments on the reverse enable you to further explain the logic of the program or its parts.
- If possible, create the structures !! IDA in its arsenal has a means of working with structures, it makes sense to master this tool and apply it if necessary. In the presence of structures, the code under investigation will become even easier to perceive.
Analysis of the lines
Briefly: Strings analysis can help to make an approximate plan for exploring a binary file.
Hardware init done Starting FreeRTOS
sendMsg error% s
AT + AB SPPDisconnect
AT + AB DefaultLocalName RHINOCEROS-2205
Only on the basis of the lines can you get a lot of information:
- Operating system - FreeRTOS;
- The presence of formatted strings - most likely printf-like functions are used, it will be possible to set the assignment of registers /variables;
- Names of tasks (tasks) - one can assume the appointment of these same tasks and related functions;
- Using AT commands - presumably this is how the interaction between the microcontroller and the Bluetooth module is built.
Far not always everything is so rosy in the analysis of firmware - lines and debug-information may not be at all or they are not very informative, but when creating the firmware we deliberately did not complicate the process of reverse engineering.
Identification of standard functions
Briefly: at this stage, you need to make sure that the lines are actually recognized, after which you will identify some of the standard functions of C.
After downloading the firmware and automatic analysis, the IDA recognized the bodies of functions (not all of them, by the way), but among the function names there are not any "normal" (only automatic names from the IDA), which can be a small complication compared to the reverse of the ELF or PE file .
Thus, during the research it is necessary to determine the purpose of not only specific functions of a specific firmware, but also to identify standard C functions. A reasonable question may arise - where is the guarantee that such functions are in the firmware and that they are standard? Here it is worth saying that usually when creating software (including firmware), in 9 out of 10 cases they do not bother creating their own unique libc-library, but use what has already been written and verified by time. That is why in 90% of cases it is possible to propose the existence of standard C-functions.
Since Hex-Rays Decompiler can convert ARM-assembler into C-code, we will take advantage of this pleasant opportunity. It is worth noting that The presence of a decompiled listing does not negate the need to understand the assembler , moreover, the decompile does not exist for all platforms.
Open the window of lines in IDA (Shift + F12).
Choose line sendMsg error% s , open the links to this line (X key - Xrefs - Cross References) - IDA recognized the references to the string, this is good:
However, among the lines allocated by the green in the disassembler, there are simply bytes marked in red. In this case, some lines are not explicitly recognized . So, for example, if you place the cursor on the address
0x080074E6and press the A key (then agree with the "Directly convert to string?" clause), then the "No device connected" line will be displayed. In the same way, you can go through all the string-like data and turn them into strings (or, for example, write a Python script that will scroll through the specified range of addresses and create strings).
The next obstacle that can arise is unrecognized references on lines (even if the string was recognized). Try to go through the rows by pressing the X key. For example, in my case, there is no reference to the string "recvMsg error". A reference to an object may not be found for two obvious reasons:
- there is no code that references the current object;
- IDA did not recognize the link.
We will try to exclude the first of them by performing a binary search for the firmware. Open the binary search window (Alt + B), enter the address of the line, do not forget to check "Find all occurrences":
Received one occurrence:
Let's pass to it (the address
We turn the DWORD-number into offset by pressing the O key. A link to the line appears:
Why are double references to strings created? [/b]
This is due to the peculiarity of the ARM architecture - the command length is fixed and is 32 bits, therefore, there is no possibility in the command to transfer the full address of the object (also 32-bit). Therefore, the code uses a short offset to the address next to the function, where the full 32-bit address of the object is already stored.
Set the cursor a little higher - inside the function sub_8005070 (range
0x08005070-0x08005092). Switch to the decompiled listing of by pressing Tab:
Let's pay attention to the function sub_8006690 . If you go back to the "sendMsg error% s" line, you can see that it is also passed to the sub_8006690 function. It is the lines with the formatting characters that can lead to the assumption that the function sub_8006690 is the standard printf . Suppose now that at the level of assumption this will be printf (even if our assumption is wrong, it will still allow us to advance in the study).
Put the cursor on the name sub_800669? press the N key, enter the new name x_printf . The prefix "x_" is added for convenience (from the word "eXecutable") - so it will be possible to distinguish the functions renamed by us from functions, names that the IDA gave automatically.
We can consider the preparatory part to be completed, now we turn to the analysis of the task responsible for handling the Bluetooth connection. Again, you can go to it again through the lines. In many IDA windows, you can search by Ctrl + F. So, you can immediately select lines with the word "bluetooth":
What is the task? [/b]
Task is a concept from the world of real-time operating systems (RTOS). If it is simple, then the task can be represented as a separate process. More can be read in a series of articles on FreeRTOS
Briefly: identify and analyze the function of processing commands transmitted via Bluetooth. It will be necessary to create an additional memory segment in the IDA.
The string "Bluetooth taskrn" does not have cross-references - we will use binary search again, we will get the address where it is used -
0x080058A0, go there and see the list of partially recognized links:
Create full links from them (by typing O, or writing a Python script for IDA).
Perhaps, not all links will be created (addresses allocated in green):
Going through the links highlighted in green, we see that there are no rows. We correct - we help Ide.
Let's return to the line "Bluetooth taskrn". Now in the code at
0x08005556there was a link to this line:
Here we see that this line is passed by an argument to the function x_printf already seen by us. Do not forget to also give the speaking name of the current function "sub_8005554", for example, "x_bluetooth_task".
Switch to the decompiler and view the function completely. Let's pay attention to line 13? where a certain number is transferred to the function x_printf. If you change the display of a number from decimal to hexadecimal (H key), you will see the number
0x8007651, which is very similar to the address.
Already familiar situation - IDA did not recognize the link. We help it, however, for this we need to switch from the decompiler to the disassembler (Tab key): do the offset, go over it, create the line. We go back to the decompiler, press F5 (update).
We are glad to improve the code:
Again, note line 132. Obviously, in addition to the format string in x_printf, another list of variable length arguments (va_list) should be passed, IDA did not recognize this Well, you understand, yes? We will help her.
Set the cursor to the name of the function x_printf, press Y - the window for changing the prototype object will open. We will write the correct prototype function printf:
int x_printf (const char * format, )
Um, sorry, you have a bug in the prototype printf
I agree, it will be correct
. And a little later we will fix it.
void x_printf (const char * format, )
IDA will display arguments for the format string:
It's time to establish the purpose (names) of the variables (again, the lines help us):
x_printf ("recv% s state% drn", v? v25);-
x_printf ("recv% s state% drn", , recv_data , state );
x_printf ("cmd:% srn", v24);-
x_printf ("cmd:% srn", .cmd );
x_printf ("addr =% x, size =% xrn", v1? v15);-
x_printf ("addr =% x, size =% xrn", addr , size );
Other names are not so obvious, but they are not super-complicated to understand.
For example, let's pay attention to the code section:
Variable v3 is compared with the number ? then the message about the wrong length of the header appears. It is logical to rename:
- variable v3 in header_len ;
- function sub_80006C8 in x_strlen (you can go into this function and check our assumption).
Next, pay attention to the following block of code:
Function sub_80006B4 it is used several times. Inside it looks like this:
Did you recognize her? [/b]
strcmp . Rename it. We create from the chaos and disunity a harmonious readable code.
Now let's pay attention to the variables
v2000062? v2000034? v20000348. IDA highlighted them in red. All because they refer to addresses that are not in the current disassembler database. If you go back to the microcontroller documentation, you can see that the address range is
0x20000000-0x20001800refers to RAM.
Why 0x20001800? [/b]
0x1800 is 6Kb RAM, which is indicated in the documentation
If the variable refers to a non-existent memory area, xrefs will not be available for it - the study will cause discomfort For convenience and performance, it makes sense create an additional memory segment . Open the segment window (Shift + F7), add a RAM-segment:
We are updating the decompile. We draw attention to the variable unk_20000344:
It looks like it's some kind of auth_flag (authorization flag). So we will write down, that is, we will name this variable. In my case, no cross-references were found - we use binary search and create links.
Check on the device
Briefly: let us check some assumptions on the operating device
Static analysis - cool thing, but even better, if you can explore the code in the dynamics. There is also scope for creativity, but if not to complicate, then the simplest thing is to connect to the device via Bluetooth, send some command and look at the result.
So, for example, when sending the string "ZZZ" the device will respond with the line
ERROR: Wrong header lengthrn, when sending "MEOW" (this line is in the code being analyzed, it is passed to strcmp function) we will see
mur-mur (> ._. <)rn, and when sending "ZZZZ" -
ERROR: Unk cmd
Thus, the function sub_8005234 can be renamed to x_bluetooth_send
I will compile a list of commands that the device can support, and I will immediately check them. Here's what happened:
"ECH1"- returns "OK", turns on the echo mode - the command is duplicated to the sender;
"ECH0"- turns off the echo mode;
"MEOW"- returns "mur-mur (> ._. <)rn » — то ли пасхалка, то ли отладочная команда;
"LED"- turns off one of the bright LEDs;
"UART"- returns "OK";
"BLE"- the red LED flashes once;
"READ"- returns "ERROR: Not auth!"
"WRIT"- returns "ERROR: Not auth!"
"AUTP"- returns "ERROR: auth error!"
"SETP"- returns "ERROR: Not auth!"
"VIP"- returns "Wrong will not give up!"
Interim conclusions on the protocol:
- the command consists of at least 4 characters;
- there are quite strange commands, somehow related to authorization (why authorization on a lighting device?).
Improve the code. Creating the structure
Briefly: if possible, it makes sense to create data structures - a great help for analysis.
Go ahead. The minimum task for us is to learn how to control LEDs.
The experiment showed that with the large LEDs the "LED" command is connected - at least, it allowed to switch off one of the four large LEDs. Let's see what is in this thread:
Here it would be possible to rename the variables, confusing only constructs like
* (_ WORD *) (v6 + 4) = sub_8005338 (v4);
In most cases, the variable v6 is a pointer to the structure. For convenience, also create this structure . Context menu for variable v6 - select "Create new struct type".
IDA proposes the following definition for the structure:
Here we confide in automation about the types of fields in the structure, but we establish readable names based on the data from the format string:
After creating the structure, the code became even more pleasant:
Variable v6 in the course of the case was renamed into led . The additional variables v7 and v8 were also renamed for convenience. Let you not be confused by the appearance of additional variables - the compiler is more visible.
According to the information from the format string, we can conclude that the color is set in the format HSV (Hue, Saturation, Value). To transfer the color from RGB, you can use table .
About the variable v4 is still difficult to say for certain, except that it is a structure and is created in the function sub_8005298:
We can assume that the variable v4 is the arguments of the command, which came via Bluetooth. Let's just call it:
- v4- bt_args
- sub_8005298- x_get_bt_args
In the decompiler, the previously recognized information [/b] can be lost.
When manipulating the names and types of data in the decompilation, the function arguments may disappear or appear. In this case, you need to explicitly specify their prototype for such functions (the Y key on the function header). Due to the fact that in ARM'e the first 4 arguments are passed through the registers, IDA with the decompile can "lose" these arguments, in this case we hurry to the aid of IDE. If the decompiler does not understand what arguments are passed to the function, we go to the disassembler listing and look at the registers R0-R3 - do not they put any values in them before accessing the function of interest. If entered, then in 90% of cases - this is the arguments of the function, and you need to write these arguments in the prototype.
The "LED" command
Briefly: research the LED-command, continue to rename the functions and variables.
Let's make some more renames for convenience of perception:
- sub_8003B6E - x_create_struct
- sub_800532C - x_get_value_1
- sub_8005338- x_get_value_2
Let's go into the function x_get_value_1:
Let's rename sub_800530C to x_get_value_3 . Now compare the functions x_get_value_1 and x_get_value_2:
They use the same function x_get_value_? but with a different second argument (2 and 4). In this case, x_get_value_1 returns a 1-byte number, and x_get_value_2 - a 2-byte number.
We analyze the work with x_get_value_3:
- The work is performed with the string bt_args (or the structure containing the string);
- when the input is a number ? the output - a number of 1 byte;
- when the input is a number ? the output - a number of 2 bytes.
Comparing these facts, we can put forward the assumption that the function x_get_value_3 forms a number from the hex-string of the specified size.
Perform the renaming:
- x_get_value_1- x_get_byte ;
- x_get_value_2 - x_get_word ;
- x_get_value_3 - x_unhexlify .
Let's see if the x_unhexlify function is used somewhere else.
Used. Function sub_8005344 looks like this:
You can rename it to x_get_dword .
The interested reader can immerse himself in the static analysis of the x_unhexlify function and the bt_args structure - for sure it will be fascinating.
At the moment we can form a team for controlling LEDs:
The question remains - do you need delimiters between individual fields?
Taking advantage of the availability of the device, I will check 2 options:
- spaces as delimiters;
- without separators.
To turn the zero LED into red (following the conversion table), the following values must be set:
- LED index (idx) = 0x00;
- Tint (hue) = 0x00;
- Saturation = 0xFF;
- The value is 0xFF.
Command with spaces:
"LED ??? FF FF"- the LED lights up in a bright light blue color.
Command without spaces:
"LED 000000FFFF"(space after the symbols "LED" is needed according to the command format) - the LED lights up in red.
Thus, we can conclude that the command parameters must be passed without any spaces. And here you can build an assumption (which can be confirmed by those comrades who performed a full static analysis of the function x_unhexlify) that the function x_unhexlify serves for streaming the information with the size from some basic buffer.
- Enable the first LED in green:
- Turn on the second LED in blue:
- Turn on the third LED purple:
In the LED-branch, the function remained unexplored. sub_8003B7C . It takes on the input a variable dword_20000624 . Let's see where this variable is used - just in case, we immediately use binary search (Alt + B):
Pay attention to the addresses
0x08004FF? 0x08005D40. I wanted to hide! We help Ide - we create links.
Let's see now, where the links are
function sub_8004D84 - obviously the initial function of the firmware, because the string "rnHardware init done Starting FreeRTOSrn" is used inside - we will rename this function to x_main ;
function sub_8005A08 - at the very beginning uses the string "LED taskrn" - rename this function to x_leds_task .
Thus, the variable dword_20000624 is used:
closer to the end of the main function;
after receiving data via Bluetooth in x_bluetooth_task;
at the beginning of the x_leds_task function loop.
Those who have programmed threads in a regular OS or who have worked with RTOS Tasks will see a pointer to the queue for exchanging data between the tokens in the variable - and they will do it right. Let's perform some more renaming:
dword_20000624 - leds_queue ;
sub_8003BD0- x_queue_recv ;
sub_8003B7C - x_queue_send .
In addition, you can verify the correctness of the names if you look at the places where these functions are used:
sub_800501C - x_sendMsg ;
sub_8005044- x_recvMsg .
Now, to make sure that we are fully able to control the LEDs, we investigate the function x_leds_task .
On this we will break off a little, drink tea with chocolate and continue in the second part of the article.
The results of the first stage are
An external inspection of the device was carried out.
The firmware is loaded into the disassembler.
We found lines useful for research.
It was found that the "rhino" is controlled via Bluetooth using a simple text protocol.
Partially investigated the task of processing commands of the protocol of exchange via Bluetooth.
In the second part you will be faced with a complete analysis of all the flashing rhinoceros. Search for non-obvious functionality and a small homework.