Introduction to asm
What is assembly language ?⌗
A processor understands only machine language instructions which are strings of 1’s and 0’s. Assembly language is basically just a wrapper on top of machine code. However machine language is too obscure and complex to understand so low level assembly language is designed for specific family of processors.
Each family of processors have its own set of instructions for handling various operations. These set of instructions are called machine-language-instructions.
x86 Registers!⌗
x86 processors have 8, 32-bit general purpose registers
EAX is called as the accumulator since it does number of arithmetic operations. In division and multiplication, one of the numbers must be in AX(32-bit) or AL(16-bit)
AX - primary accumulator => AH, AL
EBX is called as the base index (eg:arrays) holds the address of the base storage location from where the data is stored continuously. Base register value is used to find the data that is required.
BX - base register => BH, BL
ECX is known as the counter as it’s used to hold a loop index. While shifting and rotating bits CX is used as counter.
CX - counter register => CH, CL
EDX is the data register is used in I/O operations as well as preferred in division and multiplication while execution of large values.(DX and DL)
DX - data register => DH, DL
ESI - used for string operations as source index
SI - source index
EDI - used for string operations as destination index
DI - destinaton index
The registers [ EAX, EBX, ECX, EDX ] can be used as subsections as shown in EAX, it contains a 16bit register AX which is split into two 8 bit registers again AH and AL
32-bit register [EAX, EBX, ECX, EDX, ESI, EDI, ESP, or EBP]
16-bit register [AX, BX, CX, or DX]
8-bit register [AH, BH, CH, DH, AL, BL, CL, or DL]
An assembly program can be split into three sections
- The data section
- The bss section
- the text section
The data section⌗
The data section is used for declaring initialized data or constant values, file names or buffer size. This data doesn’t change at runtime
section.data ;(syntax for declaration)
The bss section⌗
The bss section is used for declaring variables
section.bss ;(syntax for declaration)
The text section⌗
The text section contains the actual code this begins with a declaration global _start which tells the kernel where the execution of program begins
section.text
global _start ;(syntax for declaration)
_start:
STACK⌗
It is an abstract data structure which consists of information in a Last In First Out system(LIFO). You put arbitrary objects onto the stack and then you take them off again, much like an in/out tray, the top item is always the one that is taken off and you always put on to the top.
It generally has a static size per program and frequently used to store function parameters. You push the parameters onto the stack when you call a function and the function either address the stack directly or pops off the variables from the stack.
Function prologue and epilogue⌗
The prolouge is what happens at the beginning of a function. Its responsibility is to set up the stack frame of the called function. The epilogue is the exact opposite: it is what happens last in a function, and its purpose is to restore the stack frame of the called (parent) function.
In x86, the ebp register is used by the language to keep track of the function’s stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack.
The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack.
Then the prologue is executed:
push ebp ; Save the stack-frame base pointer (of the calling function).
mov ebp, esp ; Set the stack-frame base pointer to be the current
; location on the stack.
sub esp, N ; Grow the stack by N bytes to reserve space for local variables
At this point, we have:
ebp + 4: Return address
ebp + 0: Calling function's old ebp value
ebp - 4: (local variables)
The epilogue:
mov esp, ebp ; Put the stack pointer back where it was when this function is called
pop ebp ; Restore the calling function's stack frame.
ret ; Return to the calling function.
Instructions⌗
There are some basic intructions in assembly language such as add, sub, mul, div
ADD⌗
add instruction is used to add two numbers together. You can use it to add a value directly to memory or to a register
Eg: mov rax, 0x1 ;; rax equals to 0x1
mov rbx, 0x2 ;; rbx equals to 0x2
add rbx, rax ;; rbx + rax equals to 0x3
SUB⌗
sub instruction is the opposite of add as you might guessed it now it is used to subtract
Eg: mov rax, 0x1 ;; rax equals to 0x1
mov rbx, 0x2 ;; rbx equals to 0x2
sub rbx, rax ;; rbx -= rax => rbx = 0x2 - 0x1 = 0x1
MUL⌗
mul instruction is used to perform multiplication. The result is is stored in rax implictily
Eg: mov rax 0x2
mov rbx 0x2
mul rbx ;; rax = rax * rbx = 0x2 * 0x2 = 0x4
we can also multiply rax
with itself
mov rax 0x2
mul rax ;; equals rax * rax = 0x4
DIV⌗
divides the contents
mov edx, 0 ; clear dividend
mov eax, 0x8003 ; dividend
mov ecx, 0x100 ; divisor
div ecx ; EAX = 0x80, EDX = 0x3