We are developing the NIOS II processor module for IDA Pro
Screenshot of the interface of the disassembler IDA Pro
IDA Pro is a famous disassembler, which has been used for many years by information security researchers around the world. We at Positive Technologies also apply this tool. Moreover, we managed to develop our own processor module of the disassembler for the microprocessor architecture NIOS II , which increases the speed and convenience of code analysis.
Today I will tell you about the history of this project and will show what happened in the end. Nios II Classic Processor Reference Guide , which was then the most relevant. In total, this work took about two weeks.
The processor module was developed for IDA version 6.9. For speed, IDA Python was chosen. In the place where the processor modules reside, - the procs subdirectory inside the IDA Pro installation directory - there are three modules in Python: msp43? ebc, spu. In them, you can see how the module works and how the basic functionality of disassembling can be implemented:
parsing instructions and operands,
their simplification and output to the screen,
creating offsets, cross references, and the code and data they refer to,
processing of switch constructions,
manipulation of stack manipulations and stack variables.
Approximately this functionality was implemented at that time. Fortunately, the tool was also useful in the process of working on another task, during which, a year later, it was actively used and modified.
I decided to share the experience of creating a processor module at the PHDays 8 conference.The presentation aroused interest (video of the report
on the PHDays website), even IDA Pro creator Ilfak Gilfanov was present. One of his questions was whether IDA Pro version 7 was supported. At that time it was not there, but after the performance I promised to make the corresponding release of the module. That's where the most interesting thing began.
Now the most recent was manual from Intel , which was used to verify and check for errors. I significantly redesigned the module, added a number of new features, including solving those problems that previously could not be won. And, of course, added support for the 7th version of IDA Pro. That's what happened.
The software model NIOS II
NIOS II is a software processor developed for FPGA by Altera (now part of Intel). From the program point of view, it has the following features: little endian byte order, 32-bit address space, 32-bit instruction set, that is, 4 bytes, 32 common and 32 special-purpose registers are used for encoding each command.
Disassembling and code references
So, we opened a new file in IDA Pro, with the firmware for the NIOS II processor. After installing the module, we will see it in the list of IDA Pro processors. The choice of the processor is shown in the figure.
Suppose that the module has not yet implemented even a basic analysis of commands. Given that each command takes 4 bytes, group bytes of four, then everything will look something like this.
After implementing the basic functionality of decoding instructions and operands, displaying them on the screen, and analyzing control transfer instructions, the set of bytes from the example above is converted to the following code.
As you can see from the example, cross-references from the control transfer commands are also generated (in this case you can see the conditional transfer and the procedure call).
One of the useful properties that can be implemented in processor modules is the comments to the commands. If you disable the output of byte values and enable comment output, the same section of code will already look like this.
Here, if you first encounter the assembler code of a new architecture for you, you can use the comments to understand what is happening. Next, the code samples will be in the same form - with comments, so as not to look in the manual on NIOS II, but immediately understand what is happening in the code section, which is given as an example.
Pseudo-instructions and simplification of commands
Some NIOS II commands are pseudo-instructions. For such commands there are no separate opcodes, and they themselves are modeled as particular cases of other commands. In the process of disassembling, the instructions are simplified - replacing certain combinations with pseudo-instructions. Pseudo-instructions in NIOS II can be divided into four types:
when one of the sources is zero (r0) and can be removed from consideration,
when the command has a negative value and the command is replaced by the opposite one,
when the condition is replaced by the opposite one,
when a 32-bit offset is entered in two teams in parts (the lowest and highest) and this is replaced by one command.
The first two types were realized, since the replacement of the condition does not give much, and 32-bit offsets have more options than presented in the manual.
For example, for the first type, consider the code.
It can be seen that the use of the zero register in computations is often encountered here. If you look closely at this example, you can notice that all commands except for transfer control are options for simply entering values into certain registers.
After implementing the processing of pseudo-instructions, we get the same piece of code, but now it looks already more readable, and instead of variations of the or and add commands we get the variations of the mov command.
The architecture of NIOS II supports the stack, besides the pointer to the sp stack, there is also a pointer to the stack frame fp. Consider an example of a small procedure in which a stack is used.
Obviously, the place is reserved for local variables on the stack. We can assume that the register ra is stored in a stack variable, and then restored from it.
After adding functionality to the module that tracks changes to the stack pointer and creates stack variables, the same example will look like this.
Now the code looks a bit clearer, and you can already name the stack variables and disassemble their purpose, cross-referencing. The function in the example is of type __fastcall and its arguments in registers r4 and r5 are pushed onto the stack to call a subprocedure that is of type _stdcall.
32-bit numbers and offsets
The peculiarity of NIOS II is that in one operation, that is, when executing one command, it is possible, as a maximum, to register an immediate value of 2 bytes (16 bits) in the register. On the other hand, the processor registers and address space are 32-bit, that is, 4 bytes are needed to address the register.
To solve this problem, biases are used, consisting of two parts. A similar mechanism is used in processors in PowerPC: the offset consists of two parts, the older and the younger, and is written into the register by two commands. In PowerPC, it looks like this.
In this approach, cross-references are formed from both commands, although in fact the setting for the address occurs in the second command. This can sometimes cause inconvenience in counting the number of cross-references.
In the displacement properties for the older part, the non-standard type HIGHA16 is used, sometimes the HIGH16 type is used, for the younger part - LOW16.
In the very calculation of 32-bit numbers from two parts, nothing is complicated. Complexities arise when operands are formed as offsets for two separate commands. All this processing falls on the processor module. Examples, how to implement this (especially in Python), the IDA SDK does not.
In the report on PHDays, the bias was an unresolved problem. To solve the problem, we cheated: 32-bit offset only from the younger part - on the base. The base is calculated as the highest part shifted to the left by 16 bits.
With this approach, a cross reference is formed only from the instruction of entering the low-order part of the 32-bit offset.
In the displacement properties, the base is visible and a property is noted to treat it as a number, so that a large number of cross-references to the address itself, which we accept as a base, are not formed.
In the code under NIOS II, the following mechanism for entering 32-bit numbers in the register occurs. First, the upper part of the offset is stored in the register using the movhi command. Then the younger part joins it. It can be done in three ways (commands): adding addi, subtracting subi, logical OR ori.
For example, in the next section of the code, registers are set to 32-bit numbers, which are then written into registers - arguments before calling the function.
After adding the displacement calculation, we get the following representation of this block of code.
The resulting 32-bit offset is displayed next to the command for entering its lowest part. This example is quite visual, and we could even easily count all 32-bit numbers in the mind by simply adding the younger and the older parts. Judging by the values, most likely, they are not displacements.
Consider the case when subtraction is used when entering the lower part. In this example, you will not be able to determine the final 32-bit numbers (offsets) on the move.
After applying the calculation of 32-bit numbers, we get the following form.
Here we see that now, if the address is in the address space, an offset is formed on it, and the value that was formed as a result of the connection of the younger and older parts is not displayed next. Here we have a bias on the line "10/22/08". In order for the remaining offsets to point to valid addresses, we will increase the segment a bit.
After increasing the segment, we get that now all the 32-bit numbers computed are displacements and point to valid addresses.
It was mentioned above that there is another option for computing offsets when a logical OR command is used. Here is an example of a code where two offsets are calculated in this way.
The one that is computed in register r8 is then put on the stack.
After the conversion, you can see that in this case the registers are set to the start addresses of the procedures, that is, the procedure address is written to the stack.
Reading and writing about the base
Prior to this, we considered cases where a 32-bit number entered by two commands could simply be a number and also an offset. In the following example, the base is entered in the upper part of the register, then read or write occurs with respect to it.
After processing such situations, we get offsets to variables from the read and write commands themselves. In this case, depending on the size of the operation, the size of the variable is also set.
Enclosed in binary files, switch constructs can facilitate analysis. For example, in the number of cases of selection within a switch construct, you can localize the switch, responsible for processing some protocol or command system. Therefore, the task is to recognize the switch itself and its parameters. Consider the next section of the code.
The execution stream stops at the register jmp r2 transition. Next, there are blocks of code that are referenced from the data, and at the end of each block, a jump is made to the same label. Obviously, this is a switch construct and these individual blocks handle specific cases from it. Above you can also see the verification of the number of cases and the default jump.
After adding the switch processing, this code will look like this.
Now the jump itself is indicated, the address of the table with offsets, the number of casesev, as well as each case with the corresponding number.
The table itself with the offsets to the options looks like this. To save space, the first five elements are listed.
In fact, the processing of the switch is to go through the code back and search for all its components. That is, it describes some scheme of the organization of the switch. Sometimes schemes can have exceptions. This can be the reason for the cases when the seemingly intuitive switches are not recognized in existing processor modules. It turns out that the real switch simply does not fall under the scheme, which is defined inside the processor module. There are still possible options when the scheme seems to be there, but inside it there are still other commands not participating in the scheme, or the basic commands are rearranged, or it is broken by transitions.
The NIOS II processor module recognizes a switch with such "extraneous" instructions between the main commands, as well as the main commands that have been repositioned and the transitions that break the scheme. A reverse pass is used along the execution path, taking into account possible transitions that break the circuit, with the setting of internal variables that signal different states of the resolver. As a result, about 10 different variants of the switch organization, found in the firmware, are recognized.
In the architecture of NIOS II there is an interesting feature - the custom instruction. It gives access to 256 user-defined instructions that are possible in the NIOS II architecture. In its work, in addition to general-purpose registers, the custom instruction can access a special set of 32 custom-registers. After implementing the logic for parsing the custom command, we get the following form.
You can see that the last two instructions have the same instruction number and seem to perform the same actions.
According to the custom instruction, there is separate manual . According to him, one of the most complete and up-to-date versions of the custom instruction set is a set of instructions for working with floating point - NIOS II Floating Point Hardware 2 Component (FPH2). After implementing the parsing of the FPH2 commands, the example will look like this.
According to the mnemonics of the last two teams, we are convinced that they actually perform the same action - the fadds command.
Transitions by the value of the register
In the researched firmware, a situation occurs frequently when a jump is made by the value of the register, in which a 32-bit offset, which determines the place of the jump, is written before.
Consider the code section.
In the last line there is a jump in the value of the register, while it is clear that before the register is entered the address of the procedure, which begins in the first line of the example. In this case it is obvious that the jump is made at its beginning.
After adding the jump recognition functional, the following form is obtained.
Next to the jmp r8 command, an address is displayed where the jump occurs, if it was computed. Also, a cross-link is created between the command and the address where the jump occurs. In this case, the link is visible in the first line, the jump itself is performed from the last line.
The value of the register gp (global pointer), saving and loading
It is common to use a global pointer that is configured for an address, and variables are addressed relative to it. In NIOS II, the global pointer register is used to store the global pointer. At some point, as a rule, in the procedures for initializing the firmware, the value of the address is written to the gp register. The processor module handles this situation; To illustrate this, the following code examples and the IDA Pro output window are shown with the included debugging messages in the processor module.
In this example, the processor module finds and calculates the value of the gp register in the new database. When the idb database is closed, the value of gp is stored in the database.
If you load an already existing idb database and if the value of gp has already been found, it is downloaded from the database, as shown in the debug message in the following example.
Reading and writing about gp
Common operations are read and write with offset relative to the register gp. For example, in the following example, there are three reads and one entry of this type.
Since the value of the address, which is stored in the register gp, we have already received, we can address this kind of reading and writing.
After adding the handling of reading and writing situations relative to the gp register, we get a more convenient picture.
Here you can see which variables are being accessed, track their usage and identify their purpose.
Addressing with respect to gp
There is another use of the register gp for addressing variables.
For example, here we see that the registers are adjusted relative to the gp register to some variables or data areas.
After adding the functionality that recognizes such situations, transforms into offsets and adds cross-references, we get the following form.
Here you can already see which registers are adjusted for gp, and it becomes more clear what is happening.
Addressing with respect to sp
Similarly, in the following example, registers are configured for some memory areas, this time relative to the sp register of the stack pointer.
Obviously, registers are configured for some local variables. Such situations - setting arguments to local buffers before procedure calls - are quite common.
After adding the processing (converting the immediate values into offsets), we get the following form.
Now it becomes clear that after the procedure is called, the values are loaded from those variables whose addresses were passed as parameters before calling the function.
Cross references from the code to the fields of the structures
Setting up structures and using them in IDA Pro can facilitate code analysis.
Looking at this piece of code, you can understand that the field_field_8 is incremented and, possibly, is the counter of the occurrence of an event. If the read and write fields are spaced in the code at a great distance, cross-references can help in the analysis.
Let us consider the structure itself.
Although there are references to the fields of structures, as you can see, there was no cross-referencing from the code to the elements of the structures.
After such situations are processed, for our case everything will look like this.
Now there are cross-references to the fields of structures from specific commands that work with these fields. Direct and reverse cross-references are created, and you can track them by different procedures, where the values of the structure fields are read and where they are entered.
The inconsistencies between the manual and reality
In the manual, when decoding some commands, certain bits must take strictly defined values. For example, for a return command from the eret exception, bits 22-26 must be 0x1E.
Here is an example of this command from one firmware.
Opening another firmware in a place with a similar context, we meet a different situation.
These bytes are not automatically converted to a command, although all commands are processed. Judging by the environment, and even a similar address, it should be the same team. Let's look closely at the bytes. This is the same eret command, except that bits 22-26 are not 0x1E, but zero.
We have to correct this team a little. Now it does not quite match the manual, but it's true.
Support IDA 7
Since version IDA 7.? the API provided by IDA Python for regular scripts has changed quite a lot. As for the processor modules, the changes are enormous. Despite this, the processor module NIOS II was able to be remade for the 7th version, and it successfully earned it.
The only unclear point: when you load a new binary file under NIOS II in IDA ? there is no initial automatic analysis, which is present in IDA 6.9.
In addition to the basic disassembly functionality, examples of which are in the SDK, the processor module has many different features that make it easier for the code explorer to work. It is clear that all this can be done manually, but, for example, when thousands of tens of thousands of offsets of different kinds are found on a binary file with an insertion size of a couple of megabytes, why waste time? Let this be done for us by the processor module. After all, how to help the pleasant possibilities of fast navigation on the code under investigation with the help of cross references! This makes IDA such a convenient and enjoyable tool as we know it.
Author : Anton Dorfman, Positive Technologies
It may be interesting
German Rottweiler Puppy for sale