PennOS Hardware Wrapper
OS Class
The OS class at my university has assignments and projects that run as an application under linux, unlike some OS classes in other universities that develop run on baremetal, which is immensely more difficult to debug.
The class was difficult and I learned a lot about how unix worked (Shells, pipes, redirection, signals, …) and OSes in general (virtual memory, scheduling, caching, threads, …).
For the final project, the goal was to create a file system, scheduler and shell, which took a lot of effort and was an awesome learning experience.
So, what’s wrong?
However, I wasn’t satisfied with not learning the more hardware-y side of things. Abstraction of topics are nice, but, for me, I want to dive a bit deeper. I had only five days before the deadline.
Let’s execute this bad boi on x86 baremetal.
Wait, how?
I used qmeu, an open source machine emulator that could emulate an x86 machine to test/run my code and cross compliation tool chain to compile the code, so we didn’t have to link with any libraries (that’s why it’s baremetal :))
Also used GRUB, so I didn’t have to deal with nasty super specific hardware specifications and GRUB also gives a lot of information about the machine like the memory range.
To run on an actual machine, I used rufus to format my image, since I’m on windows, but you can format the usb on the linux cli and install it like any other OS.
Please gimme details.
The final project had several dependencies.
Libraries
Remember booting on baremetal and linking with any libraries that is specific to the OS is a no-no.
No… I’m not kidding. GNU C Library wraps around system calls that are specific to the kernel.
Basically, what I did was try to implement every function I used by myself. So, for example, printf, malloc, and other standard library functions had to be implemented. (Very simple implementation)
CPU specific stuff
Unfortunately, I won’t be going into this much as it’s going to make this post 10x longer, but it’s a lot of CPU specific setup to get interrupts to work.
But, here’s the x86 OS dev link that provides a ton of information about everything OS dev related.
BTW, I only used physical memory and did not setup virtual memory.
Threading Library
Our “threading” (I guess you can call it that) library used for the project was a combination of ucontext, even though it is deprecated and the alarm signal.
ucontext is a user level thread context.
The alarm signal would be sent every x milliseconds and in the signal handler, ucontext would fetch the correct context (containing the saved registers, instruction being executed and flags) and go to that context. Think of this context as the very simplified version of the kernel thread structure like, for example, the ETHREAD structure in the windows kernel.
Here’s a really detailed explanation with cool pictures and everything, albeit in chinese and explaining the x64 version.
It’s actually quite awesome how it’s implemented.
I made my version a bit simplier version (does not perserve signals, etc).
In order to create a context, one needs to store the contents of the current stack into a ucontext_t structure and allocate memory for the new stack. If this is confusing, I’ll provide an example below and details about the ucontext_t structure.
Everything that is required for the simpliest ucontext_t structure (help from glibc source):
As you can see, it’s mostly about containing the register values and the stack.
Example of making a context
In the example above, we call getcontext(...)
in order to allocate memory for the ucontext_t
struct and store information such as the current register values into the structure.
Then, we call makecontext(...)
to setup a function call on the allocated stack to do_something(...)
.
setcontext(...)
actually switches the context to the other context.
WARNING: x86 assembly below
global getcontext
getcontext:
;get parameter (ucontext_t* ucp)
mov eax, [esp + 4]
;eax not preserved
;save general purpose registers
;mov [eax + ucontext_t.eax], 0
mov [eax + ucontext_t.ebx], ebx
mov [eax + ucontext_t.ecx], ecx
mov [eax + ucontext_t.edx], edx
mov [eax + ucontext_t.edi], edi
mov [eax + ucontext_t.esi], esi
mov [eax + ucontext_t.ebp], ebp
...
ret
To summarize, all this does is save the register values into the cpu_registers
struct
Now for makecontext(...)
global makecontext
makecontext:
;get parameter (ucontext_t* ucp)
mov eax, [esp + 4]
;get parameter (void (*function)())
mov ecx, [esp + 8]
;store the function of what we want to run
mov [eax + ucontext_t.eip], ecx
;get parameter (argc)
mov edx, [esp + 12]
...
;get each parameter and populate stack
;jump if ecx == 0
jecxz .done_args
.more_args:
;get from rightmost to leftmost
mov eax, [esp + ecx * 4 + 12]
mov [edx + ecx * 4], eax
dec ecx
jnz .more_args
...
ret
To summarize again, makecontext(...)
pushes everything on the stack, so it’s ready to swap.
Here’s the stack when makecontext is done. As you can see, it contains the parameters and return address. (The function address is stored in ucontext.eip)
;void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...);
;https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/i386/makecontext.S.html
;top of stack
;-------------
;ucontext_termination (return address)
;-------------
;parameters to func
;-------------
;termination function
;-------------
;uc_link
;-------------
setcontext(...)
actually switches to the other context.
global setcontext
setcontext:
...
;push return address context's eip
mov ecx, [eax + ucontext_t.eip]
push ecx
mov ecx, [eax + ucontext_t.cs]
;mov cs, cx ;doing so results in a general protection fault?
;restore eflags
mov ecx, [eax + ucontext_t.eflags]
push ecx
popf
;eax not preserved
;save general purpose registers
;mov [eax + ucontext_t.eax], 0
mov ebx, [eax + ucontext_t.ebx]
mov ecx, [eax + ucontext_t.ecx]
mov edx, [eax + ucontext_t.edx]
mov edi, [eax + ucontext_t.edi]
mov esi, [eax + ucontext_t.esi]
mov ebp, [eax + ucontext_t.ebp]
;return to the ucontext's eip we pushed earlier
ret
To summarize again, it restores all the register values and uses a push ret technique to push
the function address on the stack and then use the ret
to return to the function address, which is pretty neat.
Here is the state of the stack before ret
is called.
;-------------
;function to execute (from ucontext's eip, which is set by makecontext)
;------------- (assume everything down here is set from make context
;ucontext_termination (return address)
;-------------
;parameters to func
;-------------
;termination function
;-------------
;uc_link
;-------------
I learned a lot from doing this project on my own and it was pretty cool to see my own application run on a machine by itself! :)
Enjoy Reading This Article?
Here are some more articles you might like to read next: