Dozens of minimal operating systems to learn x86 system programming. Tested on Ubuntu 17.10 host in QEMU 2.10 and <<test-hardware,real hardware>>. Userland cheat at: https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly ARM baremetal setup at: https://github.com/cirosantilli/linux-kernel-module-cheat#baremetal-setup
To overcome the lack of GUI, we can use QEMU's VNC implementation instead of the default SDL, which is visible on the host due to `--net=host`:
....
qemu-system-i386 -hda main.img -vnc :0
....
and then on host:
....
sudo apt-get install vinagre
vinagre localhost:5900
....
=== GDB step debug
TODO get it working nicely:
....
./run bios_hello_world debug
....
This will only cover specifics, you have to know GDB debugging already.
How to have debug symbols: https://stackoverflow.com/questions/32955887/how-to-disassemble-16-bit-x86-boot-sector-code-in-gdb-with-x-i-pc-it-gets-tr/32960272#32960272 TODO implement here. Needs to point GDB to an ELF file in addition to the remote listen.
How to step over `int` calls: http://stackoverflow.com/questions/24491516/how-to-step-over-interrupt-calls-when-debugging-a-bootloader-bios-with-gdb-and-q
Single stepping until a given opcode can be helpful sometimes: https://stackoverflow.com/questions/14031930/break-on-instruction-with-specific-opcode-in-gdb/31249378#31249378
TODO: detect if we are on 16 or 32 bit automatically from control registers. Now I'm using 2 functions `16` and `32` to switch manually, but that sucks. The problem is that it's not possible to read them directly: http://stackoverflow.com/a/31340294/895245 If we had `cr0`, it would be easy to do with an `if cr0 & 1` inside a hook-stop.
TODO: Take segmentation offsets into account: http://stackoverflow.com/questions/10354063/how-to-use-a-logical-address-in-gdb
This critical file determines the memory layout of our assembly, take some time to read the comments in that file and familiarize yourself with it.
The Linux kernel also uses linker scripts to setup its image memory layout, see for example: https://github.com/torvalds/linux/blob/v4.2/arch/x86/boot/setup.ld
* single stage, so still limited to 512 bytes of code + data! TODO: it should be easy to solve that with <<bios-disk-load>>, send a pull request :-) Here is full example that we could also adapt: http://3zanders.co.uk/2017/10/18/writing-a-bootloader3
* use use GCC's `-m` which does not produce "real" 16 bit code, but rather 32-bit code with `0x66` and `0x67` prefixes: https://wiki.osdev.org/X86-64_Instruction_Encoding#Legacy_Prefixes
* setting up the initial state and the linker script is much harder and error prone than with assembly
Therefore, for most applications, you will just want to use <<multiboot>> instead, which overcomes all of those problems.
This is important in particular so that you can start your stack there when you enter <<protected-mode>>, since the stack grows down.
In 16-bit mode, it does not matter much, since most modern machines have all addressable memory there, but in 32-bit protected it does, as our emulator usually does not have all 4Gb. And of course, 64-bit RAM is currently larger than the total RAM in the world.
`int 15` returns a list: each time you call it a new memory region is returned.
The format is not too complicated, and documented at: http://wiki.osdev.org/Detecting_Memory_%28x86%29#Detecting_Upper_Memory
`dx` seems to be like the only interesting regular register: the firmware stores the value of the current disk number to help with `int 15h` there. Thus it usually contains `0x80`.
with red foreground and blue background shows on the top left of the cleared screen.
This example uses the fact that BIOS maps video memory to address 0xB8000.
We can then move 0xB800 to a segment register and use segment:offset addressing to access this memory.
Then we can show characters by treating `0xB800:0000` as a `uint16_t` array, where low 8 bytes is the ASCII character, and the high 8 bytes is the color attribute of this character.
The x86 processor has a few modes, which have huge impact on how the processor works.
Covered on the <<intel-manual>> Volume 3. Specially useful is the "Figure 2-3. Transitions Among the Processor’s Operating Modes" diagram.
The modes are:
* Real-address, usually known just as "real mode"
* Protected
* System management
* IA-32e. Has two sub modes:
** Compatibility
** 64-bit
* Virtual-8086 Mode
Transition tables:
....
(all modes)
|
| Reset
|
v
+---------------------+
| Real address (PE=0) |
+---------------------+
^
|
| PE
|
v
+------------------------+
| Protected (PE=1, VM=0) |
+------------------------+
^ ^
| |
| | VM
| |
v v
+--------------+ +---------------------+
| IA-32e | | Virtual-8086 (VM=1) |
+--------------+ +---------------------+
....
and:
....
+------------------------+
| System management mode |
+------------------------+
| ^
| |
| RSM | SMI#
| |
v |
(All other modes)
....
The IA-32e transition is trickier, but clearly described on the <<intel-manual>> Volume 3 - 9.8.5 "Initializing IA-32e Mode":
____
Operating systems should follow this sequence to initialize IA-32e mode:
1. Starting from protected mode, disable paging by setting `CR0.PG = 0`. Use the `MOV CR0` instruction to disable paging (the instruction must be located in an identity-mapped page).
2. Enable physical-address extensions (PAE) by setting CR4.`PAE = 1`. Failure to enable PAE will result in a `#GP` fault when an attempt is made to initialize IA-32e mode.
3. Load `CR3` with the physical base address of the Level 4 page map table (PML4).
4. Enable IA-32e mode by setting `IA32_EFER.LME = 1`.
5. Enable paging by setting `CR0.PG = 1`. This causes the processor to set the `IA32_EFER.LMA` bit to 1. The `MOV CR0` instruction that enables paging and the following instructions must be located in an identity-mapped page (until such time that a branch to non-identity mapped pages can be effected).
____
=== Legacy modes
The term defined in the <<intel-manual>> Volume 3 - CHAPTER 2 "SYSTEM ARCHITECTURE OVERVIEW":
____
Real mode, protected mode, virtual 8086 mode, and system management mode. These are sometimes referred to as legacy modes.
____
In other words: anything except IA-32e and System management mode.
This further suggests that real, protected and virtual mode are not the main intended modes of operation.
=== Real mode
http://wiki.osdev.org/Real_Mode
The CPU starts in this mode after power up.
All our <<bios>> examples are in real mode.
It is possible to use 32-bit registers in this mode with the "Operand Size Override Prefix" `0x66`.
TODO is it possible to access memory above 1M like this:
* https://thiscouldbebetter.wordpress.com/2011/03/17/entering-protected-mode-from-assembly/ FASM based. Did not word on first try, but looks real clean.
* http://skelix.net/skelixos/tutorial02_en.html
* Linux kernel v4.12 `arch/x86/include/asm/segment.h`
Things get much more involved than in real mode: http://stackoverflow.com/questions/14419088/how-to-draw-a-pixel-on-the-screen-in-protected-mode-in-x86-assembly
First read the paging tutorial, and in particular: http://www.cirosantilli.com/x86-paging/#segmentation to get a feel for the type of register and data structure manipulation required to configure the CPU, and how segmentation compares to paging.
Segmentation modifies every memory access of a given segment by:
* adding an offset to it
* limiting how big the segment is
If an access is made at an offset larger than allowed an exception happens, which is like an interrupt, and gets handled by a previously registered handler.
Segmentation could be used to implement virtual memory by assigning one segment per program:
....
+-----------+--------+--------------------------+
| Program 1 | Unused | Program 2 |
+-----------+--------+--------------------------+
^ ^ ^ ^
| | | |
Start1 End1 Start2 End2
....
Besides address translation, the segmentation system also managed other features such as <<protection-rings>>. TODO: how are those done in 64-bit mode?
In Linux 32-bit for example, only two segments are used at all times: one at ring 0 for the kernel, and one another at privilege 3 for all user processes.
===== Segment selector
In protected mode, the segment registers `CS`, `DS`, `SS`, `ES`, `FS` and `GS` contain a data structure more complex than a simple address as in real mode, which contains a single number.
This 2 byte data structure is called a _segment selector_:
[options="header"]
|===
|Position (bits) |Size (bits) |Name |Description
|0
|2
|Request Privilege Level (RPL)
|Protection ring level, from 0 to 3.
|2
|1
|Table Indicator (TI)
a|
* 0: global descriptor table
* 1: local descriptor table
|3
|13
|Index
a|Index of the <<segment-descriptor>> to be used from the descriptor table.
|===
Like in real mode, this data structure is loaded on the registers with a regular `mov` mnemonic instruction.
<<intel-manual>> Volume 3 - 3.4.2 "Segment Selectors" says that we can't use the first entry of the GDT:
____
The first entry of the GDT is not used by the processor. A segment selector that points to this entry of the GDT (that is, a segment selector with an index of 0 and the TI flag set to 0) is used as a “null segment selector.” The processor does not generate an exception when a segment register (other than the CS or SS registers) is loaded with a null selector. It does, however, generate an exception when a segment register holding a null selector is used to access memory. A null selector can be used to initialize unused segment registers. Loading the CS or SS register with a null segment selector causes a general-protection exception (#GP) to be generated.
____
===== Segment descriptor
A data structure that is stored in the <<gdt>>.
Clearly described on the <<intel-manual>> Volume 3 - 3.4.5 "Segment Descriptors" and in particular Figure 3-8 "Segment Descriptor".
The Linux kernel v4.2 encodes it at: `arch/x86/include/asm/desc_defs.h` in `struct desc_struct`
The first 32 handlers are reserved by the processor and have predefined meanings, as specified in the <<intel-manual>> Volume 3 Table 3-3. "Intel 64 and IA-32 General Exceptions".
In the Linux kernel, https://github.com/torvalds/linux/blob/v4.2/arch/x86/entry/entry_64.S sets them all up: each `idtentry divide_error` call sets up a new one.
This is printed from a page fault handler that we setup an triggered by writing to an unmapped address.
=== IA-32e mode
Wikipedia seems to call it long mode: https://en.wikipedia.org/wiki/Long_mode
Contains two sub-modes: <<64-bit-mode>> and <<compatibility-mode>>.
This controlled by the `CS.L` bit of the segment descriptor.
It appears that it is possible for user programs to modify that during execution from userland: http://stackoverflow.com/questions/12716419/can-you-enter-x64-32-bit-long-compatibility-sub-mode-outside-of-kernel-mode
TODO vs <<protected-mode>>.
=== 64-bit mode
64-bit is the major mode of operation, and enables the full 64 bit instructions.
Compatibility vs protected: https://stackoverflow.com/questions/20848412/modes-of-intel-64-cpu
== in and out instructions
x86 has dedicated instructions for certain IO operations: `in` and `out`.
These instructions take an IO address which identifies which hardware they will communicate to.
The IO ports don't seem to be standardized, like everything else: http://stackoverflow.com/questions/14194798/is-there-a-specification-of-x86-i-o-port-assignment
The Linux kernel wraps those instructions with the `inb` and `outb` family of instructions:
....
man inb
man outb
....
=== Memory mapped vs port mapped IO
Not all instruction sets have dedicated instructions such as `in` and `out` for IO.
In ARM for example, everything is done by writing to magic memory addresses.
The dedicated `in` and `out` approach is called "port mapped IO", and the approach of the magic addresses "memory mapp"
From an interface point of view, I feel that memory mapped is more elegant: port IO simply creates a second addresses space.
TODO: are there performance considerations when designing CPUs?
TODO I think this counts down from the value value in channel 0, and therefore allows to schedule a single event in the future.
The PIT can generate periodic interrupts (or <<pc-speaker,sound>>!) with a given frequency to `IRQ0`, which on real mode maps to interrupt 8 by default.
Major application: interrupt the running process to allow the OS to schedule processes.
The PIT 3 channels that can generate 3 independent signals
* channel 0 at port `40h`: generates interrupts
* channel 1 at port `41h`: not to be used for some reason
* channel 2 at port `42h`: linked to the speaker to generate sounds
Port `43h` is used to control signal properties except frequency, which goes in the channel ports, for the 3 channels.
We don't control the frequency of the PIT directly, which is fixed at `0x1234DD`.
Instead, we control a frequency divisor. This is a classic type of discrete electronic circuit: https://en.wikipedia.org/wiki/Frequency_divider
The magic frequency comes from historical reasons to reuse television hardware according to link:https://wiki.osdev.org/Programmable_Interval_Timer[], which in turn is likely influenced by some physical properties of crystal oscillators.
The constant `1193181 == 0x1234DD` has 2 occurrences on Linux 4.16.
Modes determine what interrupt functions can be used.
There are 2 main types of modes:
* text, where we operate character-wise
* video, operate byte-wise
Modes can be set with `int 0x10` and `AH = 0x00`, and get with `AH = 0x0F`
The most common modes seem to be:
* 0x01: 40x25 Text, 16 colors, 8 pages
* 0x03: 80x25 Text, 16 colors, 8 pages
* 0x13: 320x200 Graphics, 256 colors, 1 page
You can add 128 to the modes to prevent them from clearing the screen.
Taken from: https://courses.engr.illinois.edu/ece390/books/labmanual/graphics-int10h.html
A larger list: http://www.columbia.edu/~em36/wpdos/videomodes.txt
See also: http://wiki.osdev.org/How_do_I_set_a_graphics_mode
=== Video mode 13h
https://en.wikipedia.org/wiki/Mode_13h
Example at: <<bios-draw-pixel>>
Video Mode `13h` has: 320 x 200 Graphics, 256 colors, 1 page.
The color encoding is just an arbitrary palette that fits 1 byte, it is not split colors like R R R G G G B B or anything mentioned at: https://en.wikipedia.org/wiki/8-bit_color. Related: http://stackoverflow.com/questions/14233437/convert-normal-256-color-to-mode-13h-version-color
By Microsoft in 1995. Spec seems to be in RTF format...
Can't find the URL. A Google cache: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB0QFjAAahUKEwj7qpLN_4XIAhWCVxoKHa_nAxY&url=http%3A%2F%2Fdownload.microsoft.com%2Fdownload%2F1%2F6%2F1%2F161ba512-40e2-4cc9-843a-923143f3456c%2FAPMV12.rtf&usg=AFQjCNHoCx8gHv-w08Dn_Aoy6Q3K3DLWRg&sig2=D_66xvI7Y2n1cvyB8d2Mmg
All <<test-hardware,laptops I tested BIOS with>> had UEFI, so UEFI must have a BIOS emulation mode for backwards compatibility: https://www.howtogeek.com/56958/htg-explains-how-uefi-will-replace-the-bios/
Made by Intel, mostly MIT open source, which likely implies that vendors will hack away closed source versions.
link:https://mjg59.dreamwidth.org/10014.html[Matthew Garrett says] it is huge: larger than Linux without drivers.
Since it is huge, it inevitably contains bugs. Garret says that Intel sometimes does not feel like updating the firmware with bugfixes.
UEFI offers a large API comparable to what most people would call an operating system:
* https://software.intel.com/en-us/articles/uefi-application mentions a POSIX C library port
* https://lwn.net/Articles/641244/ mentions a Python interpreter port!
ARM is considering an implementation https://wiki.linaro.org/ARM/UEFI
=== UEFI example
....
make -C uefi run
....
TODO get a hello world program working:
* http://www.rodsbooks.com/efi-programming/hello.html Best source so far: allowed me to compile the hello world! TODO: how to run it now on QEMU and real hardware?
Running without image gives the UEFI shell, and a Linux kernel image booted fine with it: link:http://unix.stackexchange.com/a/228053/32558[], so we just need to generate the image.
The blob `uefi/ovmf.fd` IA32 r15214 was downloaded from: https://sourceforge.net/projects/edk2/files/OVMF/OVMF-IA32-r15214.zip/download TODO: automate building it from source instead, get rid of the blob, and force push it away from history. Working build setup sketch: https://github.com/cirosantilli/linux-cheat/blob/b1c3740519eff18a7707de981ee3afea2051ba10/ovmf.sh
It seems that they have moved to GitHub at last: https://github.com/tianocore/tianocore.github.io/wiki/How-to-build-OVMF/e372aa54750838a7165b08bb02b105148e2c4190
Open source hippie freedom loving cross platform firmware that attempts to replace BIOS and UEFI for the greater good of mankind.
== GRUB
link:grub/README.adoc[] TODO cleanup and exemplify everything in that file. Some hosty stuff needs to go out maybe.
=== GRUB chainloader
....
make -C grub/chainloader run
....
Outcome: you are left in an interactive GRUB menu with two choices:
* `hello-world`: go into a hello world OS
* `self +1`: reload ourselves, and almost immediately reload GRUB and fall on the same menu as before
This example illustrates the `chainloader` GRUB command, which just loads a boot sector and runs it: https://www.gnu.org/software/grub/manual/grub/html_node/chainloader.html
This is what you need to boot systems like Windows which GRUB does not know anything about: just point to their partition and let them do the job.
Both of the menu options are implemented with `chainloader`:
* `hello-world`:
+
Loads a given image file within the partition.
+
After build, `grub-mkrescue` creates a few filesystems, and `grub/chainloader/iso/boot/main.img` is placed inside one of those filesystems.
+
This illustrates GRUB's awesome ability to understand certain filesystem formats, and fetch files from them, thus allowing us to pick between multiple operating systems with a single filesystem.
+
It is educational to open up the generated `grub/chainloader/main.img` with the techniques described at https://askubuntu.com/questions/69363/mount-single-partition-from-image-of-entire-disk-device/673257#673257 to observe that the third partition of the image is a VFAT filesystem, and that it contains the `boot/main.img` image as a regular file.
* `self +1`: uses the syntax:
+
....
chainloader +1
....
+
which reloads the first sector of the current partition, and therefor ourselves.
TODO: why does it fail for hybrid ISO images? http://superuser.com/questions/154134/grub-how-to-boot-into-iso-partition#comment1337357_154271
=== GRUB linux
TODO get working.
OK, let's have some fun and do the real thing!
....
make -C grub/linux run
....
Expected outcome: GRUB menu with a single `Buildroot` entry. When you select it, a tiny pre-built Linux image boots from: https://github.com/cirosantilli/linux-kernel-module-cheat
Actual outcome: after selecting the entry, nothing shows on the screen. Even if we fix this, we will then also need to provide a rootfs somehow: the `initrd` GRUB command would be a simple method, that repo can also generate initrd images: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/c06476bfc821659a4731d49e808f45e8c509c5e1#initrd Maybe have look under Buildroot `boot/grub2` and copy what they are doing there.
The GRUB command is of form:
....
linux /boot/bzImage root=/dev/sda1 console=tty1
....
so we see that the kernel boot parameters are passed right there, for example try to change the value of the `printk.time` parameter:
....
printk.time=y
....
and see how the dmesg times not get printed anymore.
Multiboot files are an extension of ELF files with a special header.
Advantages: GRUB does housekeeping magic for you:
* you can store the OS as a regular file inside a filesystem
* your program starts in 32-bit mode already, not 16 bit real mode
* it gets the available memory ranges for you
Disadvantages:
* more boilerplate
GRUB leaves the application into a well defined starting state.
It seems that Linux does not implement Multiboot natively, but GRUB supports it as an exception: http://stackoverflow.com/questions/17909429/booting-a-non-multiboot-kernel-with-grub2
Then, when I moved to a new ThinkPad, I tested some of the examples on the link:https://www.cnet.com/products/lenovo-thinkpad-t400/specs/[Lenovo ThinkPad T430] I originally used to write this :-)
This repository covers only things that can only be done from ring 0 (system) and not ring 3 (userland).
Ring 3 is covered at: https://github.com/cirosantilli/x86-assembly-cheat
An overview of rings 0 and 3 can be found at: https://stackoverflow.com/questions/18717016/what-are-ring-0-and-ring-3-in-the-context-of-operating-systems/44483439#44483439
There are a few tutorials that explain how to make an operating system and give examples of increasing complexity with more and more functionality added: <<progressive-tutorials>>.
The goal of this repository is to use the minimal setup possible to be able to observe _a single_ low-level programming concept for each minimal operating system we create.
This is not meant provide a template from which you can write a real OS, but instead to illustrate how those low-level concepts work in isolation, so that you can use that knowledge to implement operating systems or drivers.
Minimal examples are useful because it is easier to observe the requirements for a given concept to be observable.
Another advantage is that it is easier to DRY up minimal examples with macros or functions, which is much harder on progressive OS template tutorials, which tend to repeat big chunks of code between the examples.
But I have since change my mind, and if I ever touch this again seriously, I would rewrite it in C based on <<c-hello-world>> and Newlib: https://electronics.stackexchange.com/questions/223929/c-standard-libraries-on-bare-metal/400077#400077
If this is done, we this repo should then be merged into: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/87e846fc1f9c57840e143513ebd69c638bd37aa8#baremetal-setup together with the ARM Newlib baremetal setups present there.
Using macros for now on link:common.h[] instead of functions because it simplifies the linker script.
But the downsides are severe:
* no symbols to help debugging. TODO: I think there are assembly constructs for that.
* impossible to step over method calls: you have to step into everything. TODO: `until`?
* larger output, supposing I can get linker gc for unused functions working, see `--gc-section`, which is for now uncertain.
+
If I can get this working, I'll definitely move to function calls.
+
The problem is that if I don't, every image will need a stage 2 loader. That is not too serious though, it could be added to the `BEGIN`.
+
It seems that `ld` can only remove sections, not individual symbols: http://stackoverflow.com/questions/6687630/c-c-gcc-ld-remove-unused-symbols With GCC we can use `-ffunction-sections -fdata-sections` to quickly generate a ton of sections, but I don't thing GAS supports that...
While NASM is a bit more convenient than GAS to write a boot sector, I think it is just not worth it.
When writing an OS in C, we are going to use GCC, which already uses GAS. So it's better to reduce the number of assemblers to one and stick to GAS only.
Right now, this directory is not very DRY since NASM is secondary to me, so it contains mostly some copy / paste examples.
On top of that, GAS also supports other architectures besides x86, so learning it is more useful in that sense.
Always try looking into the Linux kernel to find how those CPU capabilities are used in a "real" OS.
=== Pre-requisites
OS dev is one of the most insanely hard programming tasks a person can undertake, and will push your knowledge of several domains to the limit.
Knowing the following will help a lot:
* userland x86 assembly: https://github.com/cirosantilli/assembly-cheat
* compilation, linking and ELF format basics
* GDB debugging
While it is possible to learn those topics as you go along, and it is almost certain that you will end up learning more about them, we will not explain them here in detail.
* gem5 benchmarking and exploration, currently blocked on https://stackoverflow.com/questions/50364863/how-to-get-graphical-gui-output-and-user-touch-keyboard-mouse-input-in-a-ful/50364864#50364864
* automated unit tests. Ha, like I'm gonna be that dilligent!
We are interested mostly in the "Intel Manual Volume 3 System Programming Guide", where system programming basically means "OS stuff" or "bare metal" as opposed to userland present in the other manuals.
This repository quotes by default the following revision: 325384-056US September 2015 https://web.archive.org/web/20151025081259/http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf
Has one big source tree that goes up to multitasking and a stdlib. Kernel written C++ and stdlib in C. TODO check: 64-bit, ring 0 vs ring 3? `git grep rax` has no hits, so I'm guessing no 64-bit.
A list of ARM bare metal resources can be found at: https://github.com/cirosantilli/arm-assembly-cheat/tree/117f5d7d3458c028275ce112725f2e36f594f13c#bare-metal
https://www.gnu.org/licenses/gpl-3.0.txt[GPL v3] for executable computer program usage.
https://creativecommons.org/licenses/by-sa/4.0/[CC BY-SA v4] for human consumption usage in learning material, e.g. `.md` files, source code comments, using source code excerpts in tutorials. Recommended attribution:
* Single file adaptations:
+
....
Based on https://github.com/cirosantilli/x86-bare-metal-examples/blob/<commit-id>/path/to/file.md under CC BY-SA v4
....
* Multi-file adaptations:
+
....
Based on https://github.com/cirosantilli/x86-bare-metal-examples/tree/<commit-id> under CC BY-SA v4
....
If you want to use this work under a different license, contact the copyright owner, and he might make a good price.