What is assembly language ?

A processor understands only machine language instructions which are strings of 1’s and 0’s. Assembly language is basically just a wrapper on top of machine code. However machine language is too obscure and complex to understand so low level assembly language is designed for specific family of processors.

Each family of processors have its own set of instructions for handling various operations. These set of instructions are called machine-language-instructions.

x86 Registers!

x86 processors have 8, 32-bit general purpose registers

EAX is called as the accumulator since it does number of arithmetic operations. In division and multiplication, one of the numbers must be in AX(32-bit) or AL(16-bit)

AX - primary accumulator  => AH, AL

EBX is called as the base index (eg:arrays) holds the address of the base storage location from where the data is stored continuously. Base register value is used to find the data that is required.

BX - base register  => BH, BL

ECX is known as the counter as it’s used to hold a loop index. While shifting and rotating bits CX is used as counter.

CX - counter register => CH, CL

EDX is the data register is used in I/O operations as well as preferred in division and multiplication while execution of large values.(DX and DL)

DX - data register  => DH, DL

ESI - used for string operations as source index

SI - source index 

EDI - used for string operations as destination index

DI - destinaton index

The registers [ EAX, EBX, ECX, EDX ] can be used as subsections as shown in EAX, it contains a 16bit register AX which is split into two 8 bit registers again AH and AL

32-bit register [EAX, EBX, ECX, EDX, ESI, EDI, ESP, or EBP]
16-bit register [AX, BX, CX, or DX]
8-bit register [AH, BH, CH, DH, AL, BL, CL, or DL] 

An assembly program can be split into three sections

  • The data section
  • The bss section
  • the text section

The data section

The data section is used for declaring initialized data or constant values, file names or buffer size. This data doesn’t change at runtime

section.data ;(syntax for declaration)

The bss section

The bss section is used for declaring variables

section.bss ;(syntax for declaration)

The text section

The text section contains the actual code this begins with a declaration global _start which tells the kernel where the execution of program begins

section.text
   global _start ;(syntax for declaration)
_start:

STACK

It is an abstract data structure which consists of information in a Last In First Out system(LIFO). You put arbitrary objects onto the stack and then you take them off again, much like an in/out tray, the top item is always the one that is taken off and you always put on to the top.

It generally has a static size per program and frequently used to store function parameters. You push the parameters onto the stack when you call a function and the function either address the stack directly or pops off the variables from the stack.

Function prologue and epilogue

The prolouge is what happens at the beginning of a function. Its responsibility is to set up the stack frame of the called function. The epilogue is the exact opposite: it is what happens last in a function, and its purpose is to restore the stack frame of the called (parent) function.

In x86, the ebp register is used by the language to keep track of the function’s stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack.

The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack.

Then the prologue is executed:

push ebp ; Save the stack-frame base pointer (of the calling function).

mov ebp, esp ; Set the stack-frame base pointer to be the current

; location on the stack.

sub esp, N ; Grow the stack by N bytes to reserve space for local variables

At this point, we have:


ebp + 4:    Return address
ebp + 0:    Calling function's old ebp value
ebp - 4:    (local variables)

The epilogue:

mov esp, ebp    ; Put the stack pointer back where it was when this function is called

pop ebp         ; Restore the calling function's stack frame.

ret             ; Return to the calling function.

Instructions

There are some basic intructions in assembly language such as add, sub, mul, div

ADD

add instruction is used to add two numbers together. You can use it to add a value directly to memory or to a register

Eg: mov rax, 0x1 ;; rax equals to 0x1

mov rbx, 0x2 ;; rbx equals to 0x2

add rbx, rax ;; rbx + rax equals to 0x3

SUB

sub instruction is the opposite of add as you might guessed it now it is used to subtract

Eg: mov rax, 0x1 ;; rax equals to 0x1

 mov rbx, 0x2 ;; rbx equals to 0x2

 sub  rbx, rax ;; rbx -= rax => rbx = 0x2 - 0x1 = 0x1 

MUL

mul instruction is used to perform multiplication. The result is is stored in rax implictily

Eg: mov rax 0x2

mov rbx 0x2

mul rbx  ;; rax = rax * rbx = 0x2 * 0x2 = 0x4

we can also multiply rax with itself

mov rax 0x2

mul rax ;; equals rax * rax = 0x4

DIV

divides the contents

mov edx, 0 ; clear dividend

mov eax, 0x8003 ; dividend

mov ecx, 0x100 ; divisor

div ecx ; EAX = 0x80, EDX = 0x3