Skip to content

How to Learn Systems Programming from Scratch

Published: at 02:05 AM

1. Basic Assembly Programming

2. Simple Assembler

3. Two-Pass Assembler

4. Macros in Assembly

5. Lexical Analysis

6. Compiler Basics

7. Cross-Compilers

Learning Path Summary:

  1. Simple assembly programs
  2. One-pass assembler
  3. Two-pass assembler
  4. Macro processor + macro assembler
  5. Lexical analyzer
  6. Basic interpreter/compiler
  7. Cross-compiler exploration

With this approach, you’ll build your understanding gradually while directly applying what you learn through hands-on projects.

#1 Basic assembly programming

Assembly language is a low-level programming language that provides direct control over a computer’s hardware. It is specific to a computer architecture (like x86, ARM, etc.), and every instruction in assembly corresponds to a machine-level instruction.

1. Registers

Registers are small storage locations within the CPU, used to hold data temporarily for quick access. Common registers in x86 assembly include:

Each register has different sizes: 16-bit (AX), 32-bit (EAX), and 64-bit (RAX) in modern x86 architectures.

2. Memory Access

Assembly language gives you explicit control over memory, using registers to interact with memory locations. You can move data between registers and memory, and access memory addresses directly:

Example:

MOV AX, 10   ; Move 10 into the AX register
MOV [var], AX ; Move the value of AX into memory location 'var'

3. Basic Arithmetic Instructions

Common arithmetic instructions include:

Example:

MOV AX, 5   ; Load 5 into AX
MOV BX, 3   ; Load 3 into BX
ADD AX, BX  ; AX = AX + BX (AX now contains 8)

4. Basic Control Structures

Loops and jumps in assembly are done using labels and jump instructions:

Example (loop):

MOV CX, 5     ; Set loop counter to 5
start_loop:
  ; Your code here
  LOOP start_loop  ; Decrease CX, jump to start_loop if CX != 0

Project: Simple Assembly Programs

1. Add Two Numbers

section .data
  num1 dw 5
  num2 dw 10
  result dw 0

section .text
  global _start
_start:
  MOV AX, [num1]     ; Load num1 into AX
  ADD AX, [num2]     ; Add num2 to AX
  MOV [result], AX   ; Store result in memory

  ; Exit (for Linux syscalls)
  MOV AX, 1          ; Syscall number for exit
  XOR BX, BX         ; Exit status
  INT 0x80

2. Basic Loop

section .bss
  count resb 1

section .text
  global _start
_start:
  MOV CX, 10        ; Set loop counter to 10
loop_start:
  ; Loop code here
  LOOP loop_start   ; Decrease CX and loop if CX != 0

  ; Exit
  MOV AX, 1
  XOR BX, BX
  INT 0x80

Tools to Use

NASM (Netwide Assembler)

To assemble and run:

nasm -f elf64 your_program.asm   # Assemble
ld -o your_program your_program.o  # Link
./your_program                   # Run

Next Steps

  1. Write simple programs: Start with adding two numbers and loops, and get comfortable with registers and memory access.
  2. Explore advanced instructions: Look into string operations, system calls, and conditional jumps.
  3. Move to building a basic assembler: Once you’re comfortable with writing and understanding assembly programs, you’ll be ready to implement your assembler.

This will give you enough experience to move forward with creating a basic assembler!

Extra: Assembly Programming Overview

1. Registers Overview

Registers are small, very fast storage locations in the CPU used to hold data for quick access. You already understand general-purpose registers like AX, BX, etc., but there are other specialized registers too, each serving different roles.

SI (Source Index) and DI (Destination Index)

SP (Stack Pointer)

BP (Base Pointer)

IP (Instruction Pointer)

2. Memory Access

Memory is where your program’s data is stored. Assembly gives you direct control over memory, and there are two main ways to interact with it:

  1. Direct Addressing: You directly access a specific memory address.
  2. Register Indirect Addressing: You use registers (like SI or DI) to point to memory locations.

3. Sections in an Assembly Program

Now let’s discuss the different sections in an assembly program like .data, .bss, and .text. These sections organize your code and data:

.data Section

.bss Section

.text Section

Why Not Write Instructions Directly?

The sections (.data, .bss, .text) provide structure to the program:

global _start

Simplified Program Breakdown

Let’s revisit the program to understand it better:

section .data
  num1 dw 5        ; Define num1, a 16-bit word with the value 5
  num2 dw 10       ; Define num2, a 16-bit word with the value 10
  result dw 0      ; Define result, initialized to 0

section .text
  global _start    ; This is where the program will start execution
_start:
  MOV AX, [num1]   ; Load the value of num1 (5) into register AX
  ADD AX, [num2]   ; Add the value of num2 (10) to AX (AX now contains 15)
  MOV [result], AX ; Store the value of AX (15) into result

Explanation:

Final Notes

With this understanding, you can now explore writing simple programs, and from here, you can move towards building your assembler!

Appendix: Registers in x86 Assembly

In x86-64 architecture, registers are split into different sizes (64-bit, 32-bit, 16-bit, and 8-bit). Let’s break them down by level:

1. 64-bit Registers (64-bit Mode)

These are the full, general-purpose registers available in 64-bit mode.

These registers are the full 64-bit versions.

2. 32-bit Registers (Lower 32 bits of 64-bit Registers)

These are the lower 32 bits of the corresponding 64-bit registers. When you use these registers, the upper 32 bits of the corresponding 64-bit register are cleared (set to zero).

3. 16-bit Registers (Lower 16 bits of 32-bit Registers)

These are the lower 16 bits of the corresponding 32-bit registers. Using these does not affect the upper 48 bits of the 64-bit register.

4. 8-bit Registers (Lower 8 or Middle 8 bits of 16-bit Registers)

These registers can access either the lowest 8 bits or the next 8 bits of the corresponding 16-bit registers.

There are also the “high byte” registers, which refer to the next 8 bits (bits 8-15) of the 16-bit registers (only for ax, bx, cx, and dx):

In x86-64, however, there are additional low 8-bit registers for the higher registers:

Summary Table

64-bit32-bit16-bit8-bit (low)8-bit (high)
raxeaxaxalah
rbxebxbxblbh
rcxecxcxclch
rdxedxdxdldh
rsiesisisil-
rdiedididil-
rbpebpbpbpl-
rspespspspl-
r8r8dr8wr8b-
r9r9dr9wr9b-
r10r10dr10wr10b-
r11r11dr11wr11b-
r12r12dr12wr12b-
r13r13dr13wr13b-
r14r14dr14wr14b-
r15r15dr15wr15b-

Notes:

This structure allows you to work with different sizes of data and to optimize operations when using smaller data sizes.

references

how do you feel about this

NOTE: This is a draft post. The final version will include more detailed explanations and examples for each step. Stay tuned for updates!