Disassembling with IDA Pro
Posted: 13 Dec 2019, 07:21
				
				I will start this topic to provide and collect information about how to use IDA Pro for reverse engineering binary images of foreign systems.
First of all, IDA Pro is namely the best tool you can get for this job, but it can only aid you in the process. Overall it's a veeery time consuming task to get into another ones code. Also keep in mind that in general IDA Pro translates the binary data into the machines (CPU) meta language called "mnemonics", which is only a readable representation of the commands directly executeable by the CPU of the target system.
What you need to begin is, of course, the binary image of whatever system you're investigating on. I don't handle this task here.
Next you need to gather some informations about the target. Most of all you need to know the type of CPU executing the image. This can mostly be done by search the web for the part marking you find on the hardware. In car modules there are often Microcontrollers used, which is nothing else than an set components in one case. A so called MCU/MPU would always contain some type of CPU, some periphals for making IO, a small base RAM for CPU execution tasks (stack and so on), some have additional Flash directly included, some have it externally, a Co-Processor, Watchdogs, and so on. All these informations could be found in the datasheet of the part, which is the first thing of interest to get.
If you got the type of CPU (e.g. "ARM9") and dialect (e.g. "ARMv5") you should also look for the Endianes (Big oder Little) and if it is an 8, 16, 32 oder 64 Bit architecture. Then you can try to load you image. IDA Pro can try to determine those information itself, but it is much more promissing to set the right ones from the start.
The first challenge of disassembling an binary image is to distinguish data from code. It is in the nature of an image that those things are mixed all over. The compiler/linker which produced the image decides on where to put what and which memory location. Also, binary images, espacially on car modules which have limited memory capacity, are most often "stripped", means that all kind of "symbols" are removed from the image. So you will not have any self explaining locations, variables, routine names, nothing. It's just a big mess of executeable code and you need to understand and name the parts you decode one after the other. The compiler usually puts those symbols into the image to aid debugging.
Now after loading, IDA starts to find code in the image itself. It starts from the beginning and has some techniques to know which parts could be data and which could be code. This process is very good but not errorless. So you may end up in an image only shown as data.
So the second challenge is to find the startpoint of execution ("entry point"). Here the datasheet can be a great help! It will describe how the CPU expects the image to be aligned. Most CPUs have an reset vector table in the first few bytes. Vector 0 is the main reset and should lead to the entrypoint of the code. A vector is nothing else than and executeable code to a memory location. Interrupts causes the CPU to set the program counter (PC) to the memory location of a vector and executes from there.
If you don't got any information on how the CPU is doing this, assume that it will start at 0x0000, which is always the best bet.
Now, the code should be in front of and you see bunches of machine commands. Like every program, the first part will be some base initialization of the whole system, e.g. preset the periphal IO ports, init memory, whatever. In order to access other components of and MCU, it is often done by "memory-mapped-IO". Each periphal has it's registers mapped into a unique memory location of the addressable space of the CPU, normally at the upper end of it. Here the datasheet is a great help also. You should look for commands loading those addresses and rename the memory locations to the function-labels from the datasheet and the code will make much more sense. You are than also capable to imagine the side effects of commands on periphals or IO ports and may be able to find routines handling specific IO-pins of the MCU, which may be connected to interesting parts of the module.
			First of all, IDA Pro is namely the best tool you can get for this job, but it can only aid you in the process. Overall it's a veeery time consuming task to get into another ones code. Also keep in mind that in general IDA Pro translates the binary data into the machines (CPU) meta language called "mnemonics", which is only a readable representation of the commands directly executeable by the CPU of the target system.
What you need to begin is, of course, the binary image of whatever system you're investigating on. I don't handle this task here.
Next you need to gather some informations about the target. Most of all you need to know the type of CPU executing the image. This can mostly be done by search the web for the part marking you find on the hardware. In car modules there are often Microcontrollers used, which is nothing else than an set components in one case. A so called MCU/MPU would always contain some type of CPU, some periphals for making IO, a small base RAM for CPU execution tasks (stack and so on), some have additional Flash directly included, some have it externally, a Co-Processor, Watchdogs, and so on. All these informations could be found in the datasheet of the part, which is the first thing of interest to get.
If you got the type of CPU (e.g. "ARM9") and dialect (e.g. "ARMv5") you should also look for the Endianes (Big oder Little) and if it is an 8, 16, 32 oder 64 Bit architecture. Then you can try to load you image. IDA Pro can try to determine those information itself, but it is much more promissing to set the right ones from the start.
The first challenge of disassembling an binary image is to distinguish data from code. It is in the nature of an image that those things are mixed all over. The compiler/linker which produced the image decides on where to put what and which memory location. Also, binary images, espacially on car modules which have limited memory capacity, are most often "stripped", means that all kind of "symbols" are removed from the image. So you will not have any self explaining locations, variables, routine names, nothing. It's just a big mess of executeable code and you need to understand and name the parts you decode one after the other. The compiler usually puts those symbols into the image to aid debugging.
Now after loading, IDA starts to find code in the image itself. It starts from the beginning and has some techniques to know which parts could be data and which could be code. This process is very good but not errorless. So you may end up in an image only shown as data.
So the second challenge is to find the startpoint of execution ("entry point"). Here the datasheet can be a great help! It will describe how the CPU expects the image to be aligned. Most CPUs have an reset vector table in the first few bytes. Vector 0 is the main reset and should lead to the entrypoint of the code. A vector is nothing else than and executeable code to a memory location. Interrupts causes the CPU to set the program counter (PC) to the memory location of a vector and executes from there.
If you don't got any information on how the CPU is doing this, assume that it will start at 0x0000, which is always the best bet.
Now, the code should be in front of and you see bunches of machine commands. Like every program, the first part will be some base initialization of the whole system, e.g. preset the periphal IO ports, init memory, whatever. In order to access other components of and MCU, it is often done by "memory-mapped-IO". Each periphal has it's registers mapped into a unique memory location of the addressable space of the CPU, normally at the upper end of it. Here the datasheet is a great help also. You should look for commands loading those addresses and rename the memory locations to the function-labels from the datasheet and the code will make much more sense. You are than also capable to imagine the side effects of commands on periphals or IO ports and may be able to find routines handling specific IO-pins of the MCU, which may be connected to interesting parts of the module.

