Lab 5: Getting Familiar with WeensyOS

Due March 10, 2020, at 8:00PM

Introduction

In this lab, we will introduce you to kernel programming in WeensyOS. We will the give you a high-level overview of virtual memory, and walk you through “memory iterators”, which you’ll use in Project 3. Your tasks will involve modifying the WeensyOS kernel to start some processes on bootup, making those processes allocate memory on the heap, and then making a copy of the kernel memory mappings. This stencil will look very similar to the stencil you’ll use for Project 3.

Assignment

You must run this assignment on a Linux machine, such as your course VM. Running on Mac, a Windows box without a VM, or on the department machine is unlikely to work!

Assignment installation

First, ensure that your repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci1310/cs131-s20-labs.git

Then run:

$ git pull
$ git pull handout master

This will merge our Lab 5 stencil code with your previous work. If you have any “conflicts” from Lab 4, resolve them before continuing further. Run git push to save your work back to your personal repository.

Starting up WeensyOS

WeensyOS can, in principle, run on any computer with an x86-64 CPU architecture (like yours, most likely). For this lab, however, you will run the OS in QEMU, a CPU emulator. QEMU makes it possible to easily restart and debug WeensyOS, and you will use it in Project 3 as well.

Task: Run WeensyOS! To do so, run:

make run in the lab5 directory of your lab repository.
OR make run-console if QEMU’s default display causes accessibility problems.

When running these commands, you may receive an error telling you to install a package called qemu-system-x86 in your course VM. If so, go ahead and install it using the command quoted in the error message (sudo apt-get -y install qemu-system-x86).

You should see a screen similar to the one below, which indicates that the WeensyOS kernel has booted on the computer simulated by QEMU.

Nothing much is happening here! To exit, press Ctrl-C.

Now, let’s look at some code. Since WeensyOS is a complete operating system, the stencil code for this lab is quite large – but rest assured, you will only need to look at a few files for now.

Open kernel.cc in your editor. This file contains the implementation of the WeensyOS kernel. Find the function void kernel(const char* command). This function is what runs once the bootloader hands over control to the WeensyOS kernel. At the bottom of the function, you will find the message you saw on the screen.

Optional: How does kernel() get called?

If you’re curious how the boot process works, take a look at boot.cc and read the comment at the top of the file. You don’t have to understand the details of the bootup process for either this lab or the WeensyOS project.

In a nutshell:

When the computer turns on, it runs the code in bootentry.S, which eventually calls the void boot() function in boot.cc.

The bootloader reads the kernel binary (in Executable and Linkable (ELF) format) from disk into memory.

It then jumps to the kernel_entry label in the assembly code in k-exception.S.

At the end of that short bit of assembly code, the jmp _Z6kernelPKc instruciton jumps to the location at which the compiler placed the kernel function (_Z6kernelPKc is the mangled symbol name for kernel).

Starting Processes in WeensyOS

Now let’s start some processes! Unlike the OS on your computer, the basic WeensyOS we’re using for this lab does not allow the user to specify what program they would like to run. Instead, it will run a fixed set of programs at bootup.

Take a look at process_setup in kernel.cc:

void process_setup(pid_t pid, const char* program_name)

This function is responsible for setting up kernel state for a new userspace process: for example, it loads the program (via the program_loader, an ELF binary loader we’re providing you with). It then sets the instruction pointer and stack pointers, and finally marks the new process as runnable. Read through it now; you will need to understand and modify this function in Project 3.

Task: In the kernel function in kernel.cc, we’ll call process_setup to set up a new process:

Remove the console_printf call in the kernel function, and instead add a call to process_setup with these paramters:
- pid_t pid: You will need to pass a process ID (PID). PIDs for user processes will start at 1, as the kernel is PID 0.
- const char* program_name: Use "allocator" as your program_name. The WeensyOS programmer loader recognizes a specific set of program names. For example, when asked to load "allocator" into memory, the program loader will load compiled source code based on "p-allocator.cc" into a specific address of physical memory.
Finally, you need to tell the kernel to actually run a process and jump to user mode. For that, call run(&ptable[1]), which starts PID 1.

Once you complete this step and run your WeesyOS, you should see a screen similar to the one below.

Whoa ! WeensyOS is displaying some helpful information to you here. Let’s look at what we’ve got. The output visualizes the contents of the machine’s physical memory (i.e., RAM).

The addresses listed on the left are the physical memory addresses of the leftmost character on each line.
Each character represents a 4 KB region of memory (a “page” of physical memory), and its color and letter indicate what the memory is used for.

The picture below explains the meaning of different characters. Your current WeensyOS will only have characters for the (red) process 1.

Looking more closely at the regions of memory that are in use here, the second picture explains what they’re for. You’ll see that each process has code and data pages, which the program loader puts in memory, and a stack (at the far end of its row), which process_setup put in place

Now that you understand the output, let’s have some more processes.

Task: Create three more processes in the kernel function. Use program_names "allocator2", "allocator3", and "allocator4" when you start them. This makes the program loader load the same compiled source code, p-allocator.cc, but into different regions of memory.

Note: you can leave the call to run(&ptable[1]) unmodified. It just asks the kernel to enter the kernel process scheduler and try to schedule process 1. If process 1 isn’t runnable or yields back, the kernel will automatically pick another process to run.

Once you’ve completed this step, you should see the following output on the screen.

It looks like the processes are running, but they’re not actively using any heap memory. The code for your processes is in p-allocator.cc:

p-allocator.cc

This code runs in user space (i.e., outside the kernel). Right now, each process figures out where its heap and stack are, and then loops doing nothing. We can make the processes allocate some memory using the sys_page_alloc system call.

The signature and specification of sys_page_alloc is in u-lib.hh. Go and read it.

int sys_page_alloc(void* addr)

The implementation simply invokes make_syscall, which executes the inline assembly code in the same file. You don’t need to understand this code in detail, but note that it invokes the syscall x86-64 assembly instruction. This instruction transfers control into the kernel.

Optional: How does the WeensyOS kernel handle the system call?

The syscall instruction raises a trap, which is ultimately handled by uintptr_t syscall(regstate* regs) in kernel.cc. The switch statement in that function figures out what system call was made (you can see several others there!), and then calls syscall_page_alloc.

syscall_page_alloc is implemented in kernel.cc, and it’s different from the user-space function sys_page_alloc (which we saw in u-lib.hh). syscall_page_alloc has a very basic implementation in this lab, which you’ll modify in the project!

Task: Change p-allocator.cc to loop allocating successive heap pages via sys_page_alloc, as specified in process_main.

If you run WeensyOS after completing this change, you should see your four processes rapidly fill up the memory from left to right, with the end result shown below.

If you instead see WeensyOS crash with a scary error similar to the picture below, think about when the loop allocating pages needs to stop, and what could go wrong if it does not!

Now, once you’ve got this right, let’s figure out why some numbers are white. The processes asked for memory and the kernel allocated it, but they haven’t actually done anything with that memory. The white numbers indicate that there is no data in the first four bytes of the page (they are all zeros, as the kernel zeros the page when allocating it).

Task: Make your p-allocator.cc write the PID of the current process into the first four bytes of each page it allocates.

Hint:

Any physical page is directly accessible to userspace in this version of WeensyOS. Consider dereferencing a pointer to write to a page.

Now your output should be all colored:

So far, the console is only showing you the physical memory. In the lab version of WeensyOS – as well as in the initial stencil for Project 3 – there is no isolation between processes. In other words, process 1 could totally trample all over the memory of the other processes! In Project 3, you’ll implement virtual memory to support better process isolation.

Virtual Memory

Virtual memory is the hardware mechanism that allows multiple processes on a computer to safely share physical memory. “Safely” here means that processes should not be able to write all over other processes’ memory – and that includes the kernel’s memory, as the kernel is also a process (with PID 0 in WeensyOS). You will have learned (or are soon about to learn) about virtual memory in lectures!

Need a refresher on virtual memory? Expand to read the details.

The kernel — which needs complete privilege over all operations of the machine, and must be protected at all costs — must run without inappropriate interference from user-level processes, which have no privilege and might have bugs.

The kernel’s instructions and data are stored in the computer’s physical memory. We need some way to partition the physical memory so that some parts of it are inaccessible to unprivileged processes. Different operating systems have different memory layouts, but for concreteness, let’s look at a specific layout based on the one from WeensyOS.

Here’s how “physical memory” might be laid out:

+----------------------------------------------------------------------------+
|          |   Kernel data   |               | Unprivileged data |           |
+----------------------------------------------------------------------------+
           ^                                 ^
        0x40000                          0x100000

How can we protect the kernel’s data from access by unprivileged processes? One natural solution is segmentation. That is, the kernel program can access any kind of address, whereas user programs can only access a specific range of address. If the user program violates this rule, then it needs to give control back to the kernel using fault (this is where segmentation fault comes from!).

However, the problem with segmentation is its inflexibility. For example, suppose that process P1 wants to allocate more heap memory from the operating system. If this is the current memory map:

+----------------------------------------------------------------------------+
|..........|      K      |...............|     P1     |......................|
+----------------------------------------------------------------------------+
           ^                             ^
        0x40000                       0x100000

where the dots are free, then it’s safe to allocate more memory for P1:

+----------------------------------------------------------------------------+
|..........|      K      |...............|     P1          |.................|
+----------------------------------------------------------------------------+
           ^                             ^
        0x40000                       0x100000

But what if another program, P2, was allocated close by?

+----------------------------------------------------------------------------+
|..........|      K      |...............|     P1     |..|    P2    |........|
+----------------------------------------------------------------------------+
           ^                             ^
        0x40000                       0x100000

Of course, P2’s data should be inaccessible to P1, and vice versa. Unfortunately, if we want to give each process a contiguous block of memory, this means that now P1’s ability to grow is limited by the nearby placement of P2.

We therefore need a new layer of indirection, and this layer is called virtual memory. The purpose of the layer is to divorce the addresses visible to process instructions from the addresses in physical memory. We want a flexible mapping between the addresses visible to processes, which we call virtual addresses, and the addresses corresponding to physical memory chips. Since different processes have different permissions to access memory, they must have different virtual address spaces.

For example, here’s a view of memory with two processes P1 and P2 (where P1 contains some data “AB”):

                      0x90000
                         v
                +--------------------------------------+
                |........| P1  "AB" |..................|
                +--------------------------------------+
                          \        
                           \                           0x90000
                            \                             v
                             \                    +-----------------------+
                              \                   |.......| P2 |..........|
                               \                  +-----------------------+
                                \                        /
                                 \                      /
                                  \                    /
                                   \                  /
     +-------------------------------------------------------------------+
     |....|  K   |..................| P1  "AB" |....| P2 |...............|
     +-------------------------------------------------------------------+
                                    ^               ^
                                0x100000        0x140000

Note that the addresses in these different address spaces can differ from physical addresses, and in fact, the same virtual address, when accessed in different processes, can refer to different physical memory. Now, if P1 wants to borrow more memory for some new data “CD”, it doesn’t matter that P2 is restricting its growth in contiguous physical memory. The kernel can allocate P1 new physical memory from anywhere, but make it appear contiguous to P1 by arranging mappings carefully:

                      0x90000    0x120000
                         v          v
                +--------------------------------------+
                |........| P1  "AB" "CD" |.............|
                +--------------------------------------+
                          \        /
                           \      /                    0x90000
                            \    /                        v
                             \  /                 +-----------------------+
                              \/                  |.......| P2 |..........|
                              /\                  +-----------------------+
                             /  \                        /
                            /    \                      /
                           /      \                    /
                          /        \                  /
     +-------------------------------------------------------------------+
     |....|  K   |.......|"CD"|......| P1  "AB" |....| P2 |..............|
     +-------------------------------------------------------------------+
                         ^            ^               ^
                     0x70000      0x100000        0x140000

Let’s enable displaying virtual memory on the console.

Task: In kernel.cc, edit the memshow function and change the value of show_virtual to 1 to display the virtual address space. Then run your WeensyOS.

You’ll now see a picture like the following (though your processes might allocate memory faster).

Our current WeensyOS has no virtual memory at all. The bottom part of the console output cycles through the virtual memory view for your four processes in turn. Evidently, all processes see identical views of memory, and that view is exactly what’s in physical memory. Moreover, the characters in the virtual memory display are inverted, which indicates that the process is allowed to access the corresponding address. This implies that memory is not properly isolated: for example, all four processes can access the kernel memory (K), and process 1 can access process 2’s data (2)!

In Project 3, you will add memory isolation via virtual memory. To do so, you will give processes their own page tables, which translate virtual addresses (which the user-space processes use) into physical addresses (in RAM).

Page Tables

We know how important and useful virtual memory is — but how do we actually translate a given virtual address to the underlying physical address? This is where page tables come in. A page table is a tree-like data structure consisting of pages that store mappings between virtual addresses and physical addresses. When a processes tries to access a virtual address, the CPU looks up the real, physical address in the page table and translates the address that actually gets accessed.

Many processors use multi-level page tables to perform translations. Specifically, x86-64 and WeensyOS have a four-level page table. Recall that each physical page is 2¹² (4096) bytes in size (= 4KB), and since each address is 64 bits (= 2³), each page can contain 2⁹ addresses.

In a four-level page table, the top-level (level 4) page table contains addresses of level 3 page tables. Each level 3 page table contains addresses of level 2 page tables, each level 2 page table contains addresses of level 1 page tables, and lastly, each level 1 page table contains actual physical addresses.

We won’t go into the specifics of how exactly virtual address translation is performed here, and you do not need to know them for this lab and the project. But if you’re curious, definitely refer to lecture materials, research on your own, or ask the TAs!

Memory Iterators

WeensyOS abstracts the details of x86-64 page tables away from you, and instead gives you high-level interfaces for dealing with virtual memory and processing page tables: vmiter and ptiter, which are C++ iterators that allow you to conveniently loop over virtual memory mappings and page tables. The vmiter helps to iterate through pages in virtual memory, and lets you check the underlying, destination physical pages for properties like permissions and physical addresses. On the other hand, the ptiter helps you iterate through page table pages (yes, page tables are themselves stored in pages – it’s pages all the way down!).

We picked out some useful methods for vmiter to walk through together below (to see the full declaration and implementation of the interfaces, see k-vmiter.hh and k-vmiter.cc). There’s a couple of methods not mentioned here, but feel free to read about them on your own and use them wherever you see fit.

`vmiter`

class vmiter {
  public:
    // initialize a `vmiter` for page table `pt`,
    // pointing at virtual address `va`
    inline vmiter(x86_64_pagetable* pt, uintptr_t va = 0);
    // initialize a `vmiter` for process `p`'s page table,
    // pointing at virtual address `va`
    inline vmiter(const proc* p, uintptr_t va = 0);

    inline uintptr_t va() const;      // current virtual address
    inline uint64_t pa() const;       // current physical address

    inline uint64_t perm() const;     // current permissions
    inline bool present() const;      // is va present?
    inline bool writable() const;     // is va writable?
    inline bool user() const;         // is va user-accessible (unprivileged)?

    inline vmiter& operator+=(intptr_t delta);  // advance `va` by `delta`

    // map current va to `pa` with permissions `perm`
    // Current va must be page-aligned. Calls kalloc() to allocate
    // page table pages if necessary. Panics on failure.
    void map(uintptr_t pa, int perm);

    // like `map`, but returns 0 on success and -1 on failure
    int try_map(uintptr_t pa, int perm);
};

You can initialize a vmiter object by passing in either a process (in which case it will figure out the page table for that process), or an x86-64 page table. Here is an example that checks if the address 0x040000 is mapped in the kernel page tables (you can try this example somewhere in your kernel.cc!):

// `kernel_pagetable` is in kernel.hh
x86_64_pagetable* pt = kernel_pagetable;
uintptr_t pa = vmiter(pt, (uintptr_t)0x040000).pa(); 
// returns uintptr_t(-1) if unmapped
assert(pa != (uintptr_t)-1);

The vmiter(pt, (uintptr_t)0x040000) part creates a vmiter object that examines the virtual address 0x040000 in the page table pt. The pa() method on this object returns the corresponding physical address pa.

The vmiter can also find out the permission associated with the current page it is looking at. In x86-64.h, the following permission flags (bits) are defined:

PTE_P should be set if a page is present in a page table
PTE_W should be set if a page is writable
PTE_U should be set if a page is user-accessible

The vmiter can then use the following functions to determine the permissions of the current page:

present() returns if the current virtual address is present (i.e., if the PTE_P flag is set)
writable() returns if the current virtual address is present and writable (i.e., if the PTE_P | PTE_W flag is set)
user() returns if the current virtual address is present and user-accessible (i.e., if the PTE_P | PTE_U flag is set)

For example, this snippet of code checks if the va is present and writable (PTE_P | PTE_W) in pt:

if (vmiter(pt, va).writable()) {
    ...
}

You can also use vmiter with a for loop, calling methods that query its state and methods that change its current virtual address. For example, this loop prints all present mappings in the lower 64KiB of memory:

for (vmiter it(pt, 0); it.va() < 0x10000; it += PAGESIZE) {
    if (it.present()) {
        log_printf("%p maps to %p\n", it.va(), it.pa());
    }
}

Note that this loop iterates one page at a time: the it += PAGESIZE expression increases it.va() by PAGESIZE.

The map() and try_map() methods are used to add mappings to a page table. They also allocate actual page table pages on demand (e.g., the second and third level page tables mentioned above). The key difference between try_map and map is that try_map returns -1 on failure, while map panics on failure. In Project 3, it’s up to you to decide which is appropriate to call! The following snippet maps virtual address 0x2000 to the physical address 0x3000 with all permissions in the page table pointed to by pt:

int r = vmiter(pt, 0x2000).try_map(0x3000, PTE_P | PTE_W | PTE_U);
// r == 0 on success, r < 0 on failure

Task:
Now that you’re more familiar with how vmiter works, use vmiter to implement a function that copies virtual memory mappings from a source page table to a destination page table.

Write this function inside kernel.cc in your WeensyOS codebase, and use the global kernel_pagetable as your source, and global kernel_pagetable_copy as your destination. Call copy_mappings somewhere in the kernel function after the kernel_pagetable is initialized.

void copy_mappings(x86_64_pagetable* dst, x86_64_pagetable* src) {
    // Copy all virtual memory mappings from `src` into `dst`
    // for addresses in the range [0, MEMSIZE_VIRTUAL).
    // You may assume that `dst` starts out empty (has no mappings).
    
    // For our grading purposes, use the following line to print out
    // the physical and virtual addresses you're mapping, as well as the 
    // present, writable, and user-accessible permission bits (in that order): 
    // log_printf("VA %p maps to PA %p with PERMS %p, %p, %p\n", ...);
    //
    // Make sure to use the exact same format string above, but fill
    // out the rest of the line.

    // After this function completes, for any virtual address `va` with
    // 0 <= va < MEMSIZE_VIRTUAL, `dst` and `src` should map that va
    // to the same physical address with the same permissions.
}

Hint:

In order to access the specific bits of the permissions, recall the PTE_P, PTE_W, and PTE_U flags

When is copying mappings from one page table to another useful?

When you start a new process with separate page tables for memory isolation (you’ll do this in Project 3!), you will want to give it a copy of the kernel’s or parent process’s page table by copying that page table into a new one. This is also important when you implement the fork system call in Project 3 (however, the copying process will be more involved than the above function)!

Handin instructions

Turn in your code by pushing your git repository to github.com/csci1310/cs131-s20-labs-YOURNAME.git.

Then, head to the grading server. On the “Labs” page, use the “Lab 5 checkoff” button to check off your lab.

Note that the checkoff for this lab runs for almost a minute without producing output, as it runs your WeensyOS in the background. Don’t be alarmed if the checkoff script seems “stuck” – it will complete after about a minute and show you the results.

Acknowledgements: WeensyOS was originally developed for Harvard’s CS 61 course. This lab was developed for CS 131.