Beyond all those OS abstractions, your computer is just sand. Well, a large dune of sand compacted to fractions of millimeters of a chip, and you call it the CPU. In the PC world, there are two major companies producing these CPUs; Intel and AMD. Intel has always been at the top although AMD is rising quite well.
At this silicon level, programming varies by different architectures and instruction sets. Popular in the PCs are x86 and x86_64 for 32 bit and 64 bit architectures. They are Complex Instruction Set Computers (CISC) where one single instruction executes multiple low level operations, while the smartphone market has been taken by ARM’s Reduced Instruction Set Computer (RISC) processors where one single instruction processes one single low level operation in one single clock cycle. You must play with these multiple architectures for developing your software to reach more users. Fortunately, and unfortunately for low level knowledge, our OSes encapsulate most of the chip level instructions and expose general functions for us to program with relative ease. To make things worse, high level programming languages further encapsulate these OS level instructions and provide us with English sentences to write software.
We can learn a lot of low level stuff just by sticking to native OS level programming using C++, C or Assembly (as done generally in). You play with memory segments, storage access and more of the functions that your favorite Windows, Mac or Linux exposes. But for today, let’s bid our OSes a goodbye, forget our favorite Java or Python and let’s go assembly deep into bare-metal programming of the silicon chip itself.
The Boot Process
Before starting our bare metal programming, we must understand how computers boot up. Generally we work under Operating Systems for developing software and don’t need to really know anything below it. But as we have no OS now, we need a way to start our computer before we go on to writing software.
So there is a small piece of unalterable (or at least not easily alterable) code residing in the ROM section, known as BIOS.
- After the power is ON, the CPU jumps to ROM address where it eventually jumps to the BIOS section.
- When the BIOS code starts, it performs Power-On-Self-Test (POST) operation where it checks devices' and bus status, detects memory and other tests.
- Next, the BIOS scans through connected devices/boot option to select a bootable device to start the system from. A bootable device is something that has the first readable sector filled with 512 bytes of special code that ends with bytes 0x55 and 0xAA in exact 510th and 511th byte respectively. This first 512 bytes is generally known as a Bootloader. In case of hard disks, it is called a Master Boot Record (MBR).
- This MBR stores a partition table for the partitions we make in the hard disk during configuration. There are 4 partitions, 3 of which are primary partitions and the other one, known as extended partition can hold many logical partitions. First 512 bytes of each of these partitions is called the Volume Boot Record (VBR). Out of these partitions, one must be set as active where the second stage bootloader and OS kernel is stored.
- After a bootable device is detected, the BIOS loads its Bootloader code to memory address 0x7C00. (Note that the address is a 16 bit value because the BIOS runs by default in 16 bit mode, otherwise known as the Real Mode, regardless of the processor being 32 bit or a 64 bit.)
- When the BIOS finds an actual bootable device and starts booting from it, the device’s drive number is stored in DL register.
- Next, the execution jumps to the copied bootloader code at 0x7C00.
- Now, because the bootloader must fit within 512 bytes, the general way things are done is by implementing a second stage bootloader which in turn loads the actual OS kernel into memory.
- A second stage bootloader can be placed at a fixed place in the hard disk, or at any random place (which requires you to write a filesystem to detect the bootloader to read). Each partition in the hard disk has its own filesystem.
- As the system is still in Real Mode, we can utilize BIOS interrupts to read the hard disk for fetching desired sector. BIOS places a table of interrupt info starting from memory address 0x0000. It is known as Interrupt Vector Table (IVT). Interrupts are nothing but low level functions. To say a little more, for each exposed interrupt, this table contains an interrupt number and the memory address where the code of this interrupt starts. So when an interrupt is made, it checks for the matching interrupt number from the table, jumps to it’s memory address and starts executing the interrupt code.
- Note also that these BIOS interrupts are available only in the Real Mode. Modern OSes switch to 32 bit operating mode, also known as the Protected Mode, for accessing more memory, security and advanced process management. BIOS interrupts are not available in this mode and we must write our own implementation of those interrupts.
- Coming back, when the second stage bootloader is loaded into memory, several things are done by modern OSes. First thing is to switch to the Protected mode. But before that, the CPU requires setting up a few descriptor tables (not in the scope of this article). After the switch is made, OS kernel is loaded and you finally log in to the desktop of your operating system.
Cutting off some steps
Since we are not planning to develop our own OS nor are we writing complex and huge software, we can keep everything under 512 bytes and ignore anything else like second stage bootloader, protected mode, kernel etc. Also, we rely on the BIOS interrupts when needed, for simplicity and also we are not creating anything fancy to go past the Real Mode. So, let’s take advantage of the BIOS for this article.
All we actually need is the first bootloader code. We’ll write a simple number guessing game which fits in 512 bytes easily. Also we don’t need to include partition table in the bootloader for this purpose. In a way, we’re not writing a bootloader, but a simple game that first starts the PC.
Memory Access in Real Mode
In a 16 bit Real Mode, we access a memory address using virtual Segment:Offset pair which is mapped to the physical memory address as given by the formula,
Physical Memory Address = 16 * Segment + Offset
For example, the physical memory address 0x7C00 where the bootloader loads, can be accessed by the following segment : offset pair.
Segment = 0x0000 and Offset = 0x7c00 OR Segment = 0x07c0 and Offset = 0x0000
There are two solutions because any of the two leads to the same physical memory. i.e.
a) 16 * 0x0000 + 0x7C00 = 0x00000 + 0x7C00 = 0x07C00
b) 16 * 0x07C0 + 0x0000 = 0x07C00 + 0x0000 = 0x07C00
Note: * 16 in hex is equal to * 10 in decimal. To be completely flat, it just means to append the number by 0. ☺
- A good knowledge of x86 assembly is highly recommended. I’ll try to explain the instructions at my best, though.
- NASM assembler for writing assembly code. We will use the Intel Syntax to write assembly.
- QEMU emulator for testing the program.
- We’ll use the Intel x86 architecture for this project.
About x86 processor registers
A good read: https://en.m.wikipedia.org/wiki/X86
The x86, specifically Intel 80386 processor, has 16 registers which are 32 bit in size (with exception to segment registers).
Four general purpose registers (that can be used in any way) are:
- Accumulator Register (EAX) is an Extension of its 16 bit AX register which in turn is the combination of low 8 bit (AL) and high 8 bit (AH) registers. Initially intended for storing any intermediate value during arithmetic and logic operations, this register can, however, be used in any way by a system programmer.
- Base Register (EBX) has a similar child tree as the above register i.e. BX, BH, BL (And so do all 32 bit registers, FYI). The original intended purpose of this register is to store base index of arrays.
- Counter Register (ECX) was intended to be used as a counter in loop instruction and string iteration. But it can be used for any other purpose.
- Data Register (EDX) adds precision to EAX. To clarify with an example scenario, suppose you multiply a 32 bit integer with some value and it results in, say a 48 bit number. The 32 bit (and no more) portion of the result is stored in EAX, while the rest bits of the result are stored by EDX, thus adding precision to EAX register.
Two Stack Related Pointer Registers are:
- Stack Pointer Register (ESP) points to the top of the stack. Note that the stack grows downwards in this x86 architecture (and in general). And also the stack element size is exact 32-bit. That means each entry to the stack takes exactly 32 bit of space. Thus, every time something is pushed onto the stack, the stack grows below by 32 bit towards low memory address and that’s where ESP points to. Any pull from the stack makes ESP point to memory address that is 32 bit higher than it is currently pointing at.
- Stack Base Pointer Register (EBP) points to the base of a stack. Again, as the stack grows downwards, EBP points to some high memory address than ESP.
Two string index registers are:
- Source Index Register (ESI) has its 16 bit equivalent as SI but no further division of 8 bits. This is used to store a source string’s index 0 address (because a string is an array of characters, rather say ASCII bytes, with index starting from 0).
- Destination Index Register (EDI) has its 16 bit equivalent as DI and no further division of 8 bits. It is used to point to the destination memory address where the string pointed by ESI be copied to, during related string operation.
Six 16 bit segment registers are:
- Code Segment Register (CS) points to the start of Code segment.
- Data Segment Register (DS) points to the start of Data segment.
- Stack Segment Register (SS) points to the start of Stack segment.
- Extra Segment Register (ES) is used for extra data segment.
- Extra Segment #2 (FS)
- Extra Segment #3 (GS)
One register is special:
- Instruction Pointer Register (EIP) has its 16 bit equivalent IP, which holds the address of the next instruction to be executed. It is also known as a Program Counter. This is special also in a way that it is the only register which can’t be directly accessed in a program.
One 16 bit FLAG register include these flags:
- Overflow Flag
- Direction Flag
- Interrupt Flag
- Trap Flag
- Sign Flag
- Zero Flag
- Auxiliary Carry Flag
- Parity Flag
- Carry Flag
Time to some assembly
Below program is the smallest possible bootloader code.
What useful is a code that just hangs! Let’s make a text equivalent of a game splash screen. Below program, at least displays “Welcome to a number guessing game" before it hangs. We’ll introduce some new commands here.
The XOR of AX with itself sets its value to 0. And we’re storing it in DS. That means the data segment points to 0x0000 address. Next the SI register points to the start of welcome_message string. Here the SI holds offset from the beginning of this segment to the string. We’re setting DS and SI because the LODSB instruction reads a byte from the address pointed by DS:SI (remember Segment:Offset ?), moves the read byte to AL register and advances the SI to next byte. We are OR(ing) AL with itself which results to 0 if AL is 0 and that means end of string is reached. If so, jump to done_printing label and in an infinite loop. Else we call the BIOS interrupt 0x10. BIOS interrupt is a family of a number of sub functions. Here, we are selecting a printing sub function by setting AH to 0x0E. Setting BH to 0 means we’re selecting the first page to print our message byte. You can find more about BIOS INT 10H in Wikipedia. The CLI and STI clears and sets Interrupt Flag(IF) respectively in order to ignore or process the system interrupts during the printing operation is being run.
Before continuing our game, let’s escape out of our current progress to create another simple code that shows how to print numbers. Because console input and output are always ASCII here, we must perform some manipulation to make numbers out of the ASCIIs.
I’ve included ASCII Chart below for reference.
Let’s do two more examples before appending code to our target project. So far, we’ve printed text and number on screen. Let’s write a program to read user input, again using BIOS interrupt obviously.
One last example we discuss is the same “reading user input” but in number. Again, we can’t read number but can manipulate an ASCII character to store it as a number.
Finally, let’s continue our game with one of the two program-flow assumptions written below.
- User A enters a number between 0 and 9 when User B is not around to see.
- User B comes to guess what User A has entered.
- User B guesses with a number and the program converts ASCII to integer value.
- If User B guesses with out-of-range number, gets error and tries again.
- ASCII Hex of a fixed number (0–9) is hard-coded in the program. i.e. store 0x39 in the program if the actual number thought is 9.
- User A guesses with a number, the program doesn’t convert ASCII to integer but directly checks the ASCII hex of input byte and the program’s byte.
- If match, success which prints the same number on screen. Else, just try again without even showing error.
And I am going with this simple Plan #2 for this game implementation because:
- I am feeling too lazy coming towards the end.
- I hope you’ll try to practice writing the game with Plan #1. Do your research. It helps a lot and is good for your … health, I guess!
The extreme FINAL part! Assemble the project with NASM and either running on QEMU emulator or making a bootable USB drive.
Build and Test the raw binary file
Type the following code on command prompt in the same directory where you saved the above “bootloaded_game.asm” file.
nasm bootloaded_game.asm -f bin -o bootloaded_game.bin
To run on QEMU emulator, run the following command,
qemu-system-i386 -hda bootloaded_game.bin
If you want to make a USB bootable with the binary file, do use following commands,
On Linux (Note: replace sda with your USB device name)
dd if=bootloaded_game.bin of=/dev/sda bs=512 count=1
On Windows, you can install and use dd (just change Z: with your USB drive letter assigned by windows)
dd if=bootloaded_game.bin of=\\.\Z: bs=512 count=1
Now boot your computer from the loaded USB drive, next time you power your PC.