Starting from:
$35

$29

Project Eight Solution

Purpose

In this project you will figure out how to turn on ARM’s virtual memory system and run at least two different threads in two different virtual spaces that are the “same” addresses but map to completely different physical locations. Virtual memory underlies many of computing’s most important facilities, including process protection, shared memory, multitasking, the kernel’s privileged mode, the familiar virtual-machine programming model, and more. It is essential to most operating systems, especially general-purpose operating systems. Your implementation will be very simple but will have all of the essentials, including shared pages (two different virtual pages mapping to the same physical page), different mapping characteristics for different pages, etc. This is as real as it gets. With this, you will have built all of the primary functions one finds in a modern operating system.

You will read in application binaries from the SD card to start threads, and you will do this both as the startup thread (the shell) as well as in response to “RUN” commands executed in the shell, which will start up either or both of the “app1” and “app2” binaries. The difference between this project and the previous one is that, whereas, in the previous project each of the applications were hard-coded at build time to run in predefined memory locations (something that is not really practical in a general-purpose machine), in this project, each application has its code and data start at location 0x00100000, and its stack start at location 0x7FFFFFF0. Thus, to run two different user-level threads, you need to have separate page tables for each process and to figure out how to tell the ARM processor about two different ASIDs (Address Space Identifier).

Working Example

You have been given a working binary file to experiment with. The following is its boot sequence.

[c0|00:01.957] ...

[c0|00:01.959] System is booting, kernel cpuid = 00000000

[c0|00:01.964] Kernel version [p8-solution, Mon Apr 22 20:45:29 EDT 2019]

[c0|00:01.971] Initializing SD Card ...

[c0|00:01.975] EMMC: reset card.

[c0|00:01.978] EMMC: setting clock speed to    00061A80

[c0|00:01.983] GO_IDLE_STATE 00000000

[c0|00:01.986] SEND_IF_COND 000001AA

[c0|00:01.989] APP_CMD 00000000

[c0|00:01.992] SD_SENDOPCOND 50FF8000

[c0|00:02.396] APP_CMD 00000000

[c0|00:02.399] SD_SENDOPCOND 50FF8000

[c0|00:02.403] ALL_SEND_CID 00000000

[c0|00:02.406] SEND_REL_ADDR 00000000

[c0|00:02.409] SEND_CSD AAAA0000

[c0|00:02.412] EMMC: setting clock speed to    017D7840

[c0|00:02.417] CARD_SELECT AAAA0000

[c0|00:02.420] APP_CMD AAAA0000

[c0|00:02.423] SEND_SCR 00000000

[c0|00:02.429] SET_BLOCKLEN 00000200

sdTransferBlocks read blk 00000000 len 00000001 addr 0002BD80 [c0|00:02.437] READ_SINGLE 00000000

sdTransferBlocks read blk 00002000 len 00000001 addr 0002BD80

[c0|00:02.450] READ_SINGLE 00002000

[c0|00:02.464] ... SD Card working.

[c0|00:02.467]
Starting virtual memory ...

[c0|00:02.471]
TTBCR before = 00000000

© Copyright 2020 Bruce Jacob, All Rights Reserved
1



ENEE 447: Operating Systems — Project Eight
[c0|00:02.475]
Initialize DACR

[c0|00:02.478]
Initialize SCTLR.AFE

[c0|00:02.481]
SCTLR before AFE = 00C51838

[c0|00:02.485]
Setting page table to 00030000

[c0|00:02.489]
PTE[0] = 00026C0A

[c0|00:02.492]
PTE[1] = 00126C0A

[c0|00:02.495]
SCTLR before = 00C51838

[c0|00:02.498]
SCTLR after = 00C5183D

[c0|00:02.502]
... VM up and running

[c0|00:02.505]
Calling create_thread

[c0|00:02.509]
NULL thread 00000000

[c0|00:02.512]
tcb
= 00013DF4

[c0|00:02.515]
stack = 0001FFFC

[c0|00:02.518]
start = 00000040

[c0|00:02.521]
ttbr0 = 0003004A

[c0|00:02.524]
asid
= 00000000

[c0|00:02.527]
PTE[0] = 00026C0A

[c0|00:02.530]
PTE[1] = 00126C0A

[c0|00:02.533]
PTE[2] = 00226C0A

[c0|00:02.536]
Calling create_thread

sdTransferBlocks read blk 00003DCA len 00000001 addr 000077A8

[c0|00:02.545]
READ_SINGLE 00003DCA

LocateFATEntry: [shell.bin]

sdTransferBlocks read blk 00003DCB len 00000001 addr 000077A8

[c0|00:02.559]
READ_SINGLE 00003DCB

sdTransferBlocks read blk 00027A0A len 00000001 addr 000077A8

[c0|00:02.570]
READ_SINGLE 00027A0A

[c0|00:02.576]
create success 00000001

sdTransferBlocks read blk 00027A0B len 00000001 addr 000077A8

[c0|00:02.585]
READ_SINGLE 00027A0B

sdTransferBlocks read blk 00027A0C len 00000001 addr 000077A8

[c0|00:02.596]
READ_SINGLE 00027A0C

sdTransferBlocks read blk 00027A0D len 00000001 addr 000077A8

[c0|00:02.608]
READ_SINGLE 00027A0D

sdTransferBlocks read blk 00027A0E len 00000001 addr 000077A8

[c0|00:02.619]
READ_SINGLE 00027A0E

sdTransferBlocks read blk 00027A0F len 00000001 addr 000077A8

[c0|00:02.631]
READ_SINGLE 00027A0F

[c0|00:02.637]
create_thread - successful file read into 00200000

[c0|00:02.642]
new thread from disk:

[c0|00:02.646]
shell.bin 00200000

[c0|00:02.649]
shell 00000001

[c0|00:02.651]
tcb
= 00013E5C

[c0|00:02.654]
stack = 7FFFFFF0

[c0|00:02.657]
start = 00100000

[c0|00:02.660]
ttbr0 = 0003404A

[c0|00:02.663]
asid
= 00000001

[c0|00:02.666]
PTE[0] = 00000000

[c0|00:02.669]
PTE[1] = 00226C0A

[c0|00:02.673]
PTE[2] = 00000000

[c0|00:02.676] ...


[c0|00:02.677]
Init complete. Please hit any key to continue.

<hit enter>



Running the eggshell on core 0.

Available commands:


RUN
= 004E5552


PS
= 00005350


TIME
= 454D4954


LED
= 0044454C


LOG
= 00474F4C


EXIT
= 54495845


DUMP
= 504D5544


Please enter a
command.

c0> PS




© Copyright 2020 Bruce Jacob, All Rights Reserved
2
ENEE 447: Operating Systems — Project Eight


CMD_PS
Active processes ...
[c0|00:29.279]

[c0|00:29.282]
Dumping TCB for thread 00000001
[c0|00:29.286]
shell 00000001
[c0|00:29.289]
tcb @ 00013E5C
[c0|00:29.291]
r0
00000001
[c0|00:29.294]
r1
0000000A
[c0|00:29.297]
r2
00005350
[c0|00:29.300]
r3
00005350
[c0|00:29.303]
r4
7FFFFFB8
[c0|00:29.305]
r5
00000000
[c0|00:29.308]
r6
00000000
[c0|00:29.311]
r7
00000009
[c0|00:29.314]
r8
00000000
[c0|00:29.317]
r9
00100BA8
[c0|00:29.319]
r10
00000000
[c0|00:29.322]
r11
504D5544
[c0|00:29.325]
r12
7FFFFFB2
[c0|00:29.328]
sp
7FFFFF94
[c0|00:29.331]
lr
001008E8
[c0|00:29.333]
pc
00100270
[c0|00:29.336]
spsr
60000150
[c0|00:29.339]
ttbr 0003404A
[c0|00:29.342]
asid 00000001
Please enter a
command.
c0>



A few things to note from this. The following lines show that the bottom two bits of the kernel’s PTEs are 0b10, which indicates that the pages are mapped at a “section” level, meaning 1MB pages (this simplifies the mapping scheme tremendously). They also indicate that the kernel’s mappings are global (the bit at 0x00020000 is bit 17, set to 1, which is the “not-global” bit, meaning that the mappings are shared across all code).

[c0|00:02.489] PTE[0] = 00026C0A

[c0|00:02.492] PTE[1] = 00126C0A

The following line shows that the data is read into physical page 0x002 (address 0x00200000):

[c0|00:02.637] create_thread - successful file read into 00200000

The kernel uses de facto physical addresses, because ARM’s virtual memory mechanism does not have any easy way to allow the kernel to use physical addresses while user applications use virtual ones. When the MMU is turned on, all addresses will be translated, so we have the kernel do a 1:1 mapping.

You will also notice that, in the earlier section it is shown that the start address of the newly created thread, the shell, is 0x00100000, and its stack address is 0x7FFFFFF0. Later, when the PS command is run, the shell has been executing for a short while, and its PC and SP registers indicate that it does, indeed, execute starting at 0x00100000, and its stack does indeed start just below 0x80000000 and work its way downward.

One of the difficult aspects of moving data back and forth between the user code and kernel code is the transfer of data through pointers. Character-based I/O is relatively simple (e.g., reading and writing to the console), but more complex data requires bulk transfer through pointers. The problem is that pointers do not work across address spaces, as we have discussed in class. The solution that most operating systems adopt is to use physical addresses, or de facto physical addresses as mentioned above, to “copy in” or “copy out” data between the kernel space and the user’s space. This requires a manual translation between the user’s virtual address (what is sent in through a system call), and its physical location. An

© Copyright 2020 Bruce Jacob, All Rights Reserved
3
ENEE 447: Operating Systems — Project Eight


example of this in action is the transfer of a character string from user space to kernel space in the LOG system call:

Please enter a command.

c0> LOG "FOO BAR”

CMD_LOG [FOO BAR]

[c0|01:05.075] FOO BAR

Please enter a command.

c0>

The string “FOO BAR” is read in a character at a time from the console, and then it is sent as a string to the kernel-log device. If the translation is not done correctly, this will either produce garbage, or it will cause a non-recoverable address fault, at which point the OS comes to a grinding halt.

Transferring strings is also used to start up applications. Note that the trap handler recognizes both file names and the simple integers “1” and “2” as input (as indicating “app1.bin” and “app2.bin” respectively). This will allow you to test your code even if the string-transfer is not working correctly.

Please enter a
command.
c0> RUN BLK "APP1.BIN"

CMD_RUN [BLK, 00100BB1]
[c0|01:27.345]
SYSCALL_START_THREAD name = 004B4C42
[c0|01:27.350]
SYSCALL_START_THREAD file = 00100BB1
[c0|01:27.354]
BLK 


[c0|01:27.356]
Calling create_thread
sdTransferBlocks read blk 00003DCA len 00000001 addr 000077A8
[c0|01:27.365]
READ_SINGLE 00003DCA
LocateFATEntry: [APP1.BIN]
sdTransferBlocks read blk 00003DCB len 00000001 addr 000077A8
[c0|01:27.379]
READ_SINGLE 00003DCB
sdTransferBlocks read blk 00027A4A len 00000001 addr 000077A8
[c0|01:27.390]
READ_SINGLE 00027A4A
[c0|01:27.396]
create success 00000001
sdTransferBlocks read blk 00027A4B len 00000001 addr 000077A8
[c0|01:27.405]
READ_SINGLE 00027A4B
[c0|01:27.411]
create_thread - successful file read into 00400000
[c0|01:27.416]
new thread from disk:
[c0|01:27.420]
APP1.BIN 00400000
[c0|01:27.423]
BLK 00000002
[c0|01:27.425]
tcb
= 00013EC4
[c0|01:27.428]
stack
= 7FFFFFF0
[c0|01:27.431]
start
= 00100000
[c0|01:27.434]
ttbr0
= 0003804A
[c0|01:27.437]
asid
= 00000002
[c0|01:27.440]
PTE[0] = 00000000
[c0|01:27.443]
PTE[1] = 00426C0A
[c0|01:27.446]
PTE[2] = 00000000
Please enter a
command.
c0>



At this point, the LED starts blinking in a 1/2/3/4/1/2/3 … pattern, and the shell is responsive.

A few things to note from the output above. First, the string transfer, as described above. Second, the data is copied into physical page 0x004 (physical address 0x00400000), like the previous application binary went into page 0x002. Every application starts out with two 1MB pages: one to hold code & data, the other to hold the stack.




© Copyright 2020 Bruce Jacob, All Rights Reserved
4
ENEE 447: Operating Systems — Project Eight


If the PS command were run at this point, we would see those values changing over time as the code executes and moves up and down the stack:

Please enter a
command.
c0> PS


CMD_PS
Active processes ...
[c0|01:39.441]

[c0|01:39.444]
Dumping TCB for thread 00000001
[c0|01:39.448]
shell 00000001
[c0|01:39.451]
tcb @ 00013E5C
[c0|01:39.454]
r0
00000001
[c0|01:39.457]
r1
0000000A
[c0|01:39.460]
r2
00005350
[c0|01:39.462]
r3
00005350
[c0|01:39.465]
r4
7FFFFFB8
[c0|01:39.468]
r5
00000000
[c0|01:39.471]
r6
00000000
[c0|01:39.474]
r7
00000009
[c0|01:39.476]
r8
00000000
[c0|01:39.479]
r9
00100BA8
[c0|01:39.482]
r10
00000000
[c0|01:39.485]
r11
504D5544
[c0|01:39.488]
r12
7FFFFFB2
[c0|01:39.491]
sp
7FFFFF94
[c0|01:39.493]
lr
001008E8
[c0|01:39.496]
pc
00100270
[c0|01:39.499]
spsr 60000150
[c0|01:39.502]
ttbr 0003404A
[c0|01:39.505]
asid 00000001
[c0|01:39.507]
Dumping TCB for thread 00000002
[c0|01:39.512]
BLK 00000002
[c0|01:39.514]
tcb @ 00013EC4
[c0|01:39.517]
r0
00000003
[c0|01:39.520]
r1
7FFFFFD0
[c0|01:39.523]
r2
00000008
[c0|01:39.525]
r3
00000000
[c0|01:39.528]
r4
00000000
[c0|01:39.531]
r5
000AAE60
[c0|01:39.534]
r6
05EE2A63
[c0|01:39.537]
r7
00000004
[c0|01:39.539]
r8
00000000
[c0|01:39.542]
r9
00000000
[c0|01:39.545]
r10
00000000
[c0|01:39.548]
r11
00000000
[c0|01:39.551]
r12
00000000
[c0|01:39.553]
sp
7FFFFFCC
[c0|01:39.556]
lr
00100220
[c0|01:39.559]
pc
00100050
[c0|01:39.562]
spsr 80000150
[c0|01:39.565]
ttbr 0003804A
[c0|01:39.568]
asid 00000002
Please enter a
command.
c0>



As said before, this represents all of the main points of an operating system: we have multiple threads running in user space, each using the same virtual address (which simplifies the job of the compiler and linker), but each is operating out of a different physical space. This is what virtual memory is all about, and with this project, you have encountered the heart of the OS.





© Copyright 2020 Bruce Jacob, All Rights Reserved
5
ENEE 447: Operating Systems — Project Eight


Virtual Memory and ARM/Raspberry Pi

Address translation is the mechanism through which the operating system provides virtual address spaces to user-level applications. The operating system maintains a set of mappings that translate references within the per-process virtual spaces to the system’s physical space. Addresses are usually mapped at a page granularity—typically several kilobytes. The mappings are organized in a page table, and for performance reasons most hardware systems provide a translation lookaside buffer (TLB) that caches those PTEs (page-table entries; i.e. mappings) that have been needed recently. When a process performs a load or store to a virtual address, the hardware translates this to a physical address using the mapping information in the TLB. If the mapping is not found in the TLB, it must be retrieved from the page table and loaded into the TLB before processing can continue. ARM has a TLB, and its hardware can automatically walk the page tables and load the TLB with the required information, when it finds it in the page table.

ARM’s page table looks like this:



























Note that there is one 4096-entry page in the first-level table and potentially thousands of pages making up the second-level table. However, if the PTE at the first level indicates that it maps a large area, like a 1MB “section” or a 16MB “supersection,” then there need be no second-level table at all. That is what we will do: have one simple 4096-entry table per process (and one for the kernel as well), with each entry mapping a 1MB “section” of memory.

The format of the ARM PTE (page-table entry) looks like this:













© Copyright 2020 Bruce Jacob, All Rights Reserved
6
ENEE 447: Operating Systems — Project Eight







































Putting 0b10 in the bottom two bits indicates that the PTE is for a 1MB section. That is what we will do.

Your First-Ever VM Implementation

We will implement the simplest of facilities: a single level page table (just an array, really) of page-table entries (PTEs) indexed by the virtual page number. Our page sizes will be the 1MB sections, so the page table need only hold 4K entries to map the entire 4GB space. Using large pages allows the table to be relatively small: 16KB per page table.

Note that, if a page size is 1MB, then the bottom 20 bits are page-offset bits, and the topmost 12 bits create the virtual page number. Thus, an address looks like the following in hex:

0xVVVOOOOO

Where the “V” bits make up the virtual page number, and the “O” bits make up the page offset.

The kernel code on core0 at the outset initializes the user page tables to 0s … in other words, all PTEs are invalid at startup. Thus, the enable_vm() routine needs only to set a handful of PTEs and then turn the correct switches to get the TLB operational. There are only a handful of distinct pages being used by your code at the moment the enable_vm() function is called:

    • 0x3F0xxxxx — GPIO addresses

    • 0x3F1xxxxx — GPIO addresses

    • 0x3F2xxxxx — GPIO addresses

    • 0x3F3xxxxx — GPIO addresses

© Copyright 2020 Bruce Jacob, All Rights Reserved
7
ENEE 447: Operating Systems — Project Eight


    • 0x400xxxxx — timer/clock device-register addresses

    • 0x000xxxxx — where nearly all your code and data lies

You will also want to use the following for user code, data, and stack data:

    • 0x001xxxxx–0x010xxxxx — for thread code, data, stacks (can be as big a region as you want)

You will want to create a mapping for each. The general code and data should be mapped as normal data, but the I/O addresses (0x3Fxxxxxx and 0x40xxxxxx) should be marked as non-cacheable so that they are handled correctly. This is controlled by the TEX field starting at bit 12 in the PTE.

ARM Documentation

You will find the ARM Architecture Reference Manual to be invaluable. I will point out some of the most important pages, but you need to explore this document yourself, because the information that you need is spread out all over the document. This is one of those (perhaps many) instances in which you curse ARM, because they really are a misnomer: ARM stands for Acorn RISC Machines, and RISC means Reduced Instruction-Set Computer … any computer architecture that requires tens of thousands of pages of documentation cannot possibly—in any way, shape, or form—be considered “reduced” …













































© Copyright 2020 Bruce Jacob, All Rights Reserved
8
ENEE 447: Operating Systems — Project Eight


























































Above is a picture of the format of the PTE … each of the bits has meaning, and the pages appearing after this one in the Architectural Reference Manual go into detail (and some are described much later in the document).

Note: in this project we are re-routing I/O addresses through the TLB. I suspect this is unusual, except for hypervisor/guest-operating-system configurations, because the OS on other architectures often runs in physical mode and is the only one allowed to touch the devices.


© Copyright 2020 Bruce Jacob, All Rights Reserved
9
ENEE 447: Operating Systems — Project Eight






























































Shown above is the TTBCR, the register that determines how big the page size is, and whether there is one page-table or two, via the N bits. We will set it to use just one: the TTBR0 table, and we will disable the TTBR1 table, through the setting of the N bits in the TTBCR register.



© Copyright 2020 Bruce Jacob, All Rights Reserved
10
ENEE 447: Operating Systems — Project Eight






























































Shown above is the TTBR0 register. This contains the address of the page table for the currently executing process. When you context switch to another running process (which has a different address space, as opposed to switching to another thread, which doesn’t), you need to give the hardware the pointer to the new process’s address space.


© Copyright 2020 Bruce Jacob, All Rights Reserved
11
ENEE 447: Operating Systems — Project Eight





























































Shown above is the page-table organization, again (this is reproduced to give you the page number). The first level entries point to second-level entries, which point to the actual page data. When the first-level entries identify themselves as “sections” they instead point directly to page data.



© Copyright 2020 Bruce Jacob, All Rights Reserved
12
ENEE 447: Operating Systems — Project Eight






























































The discussion in the page above (and pages following it in the documentation) indicates how the system behaves w.r.t. multiple simultaneous mappings (e.g. split between two different guest operating systems). One is mapped through the TTBR0 page table, and the other is mapped through the TTBR1 page table, and the amount of memory assigned to each is variable. We will only use the TTBR0 page table and register.

© Copyright 2020 Bruce Jacob, All Rights Reserved
13
ENEE 447: Operating Systems — Project Eight






























































Shown above are the values that indicate how much space goes to the TTBR0 address space, and how much goes to the TTBR1 address space.




© Copyright 2020 Bruce Jacob, All Rights Reserved
14
ENEE 447: Operating Systems — Project Eight






























































Shown above is a (partial) list of the various control registers that we use. Nice to have it in one place.

The mmu.s file has a bunch of functions that read and write many of these registers.




© Copyright 2020 Bruce Jacob, All Rights Reserved
15
ENEE 447: Operating Systems — Project Eight






























































Shown above is the System Control Register, which has the all-important M bit in it, which turns on/off the MMU (i.e., virtual memory).




© Copyright 2020 Bruce Jacob, All Rights Reserved
16
ENEE 447: Operating Systems — Project Eight






























































When threads from multiple address spaces run, the hardware needs to be able to distinguish them. Shown above is the register that does so. It tells the hardware “any PTE you load while running, attach this ASID to it when you put it into the TLB.” That way, when that process is swapped out and then is swapped back in later, it can still use its old mappings if they are still in the TLB.

© Copyright 2020 Bruce Jacob, All Rights Reserved
17
ENEE 447: Operating Systems — Project Eight



Note that handling the various registers is extremely difficult to do, and so the changeover at process-switch time has been done for you. Otherwise, you would easily spend weeks trying to get it right. Remember, the important thing you are to learn in this project is the concept of mapping … learning the low-level details of how to interact with the ARM hardware is not the main goal. Thus, the interrupt vectors have been provided.

Where Things Go

As discussed in the previous project, we know how big the kernel is, and so we know where we can put things in physical memory. The following diagram indicates the major components for this project:













































The main difference between this and the previous project is that the thread stacks have been moved elsewhere, since they are virtual pages and not physically assigned. Instead, starting at location 0x00030000 we have the page tables, indexed by the thread ID (ASID) number. You only need a handful of these, because you only need to run two threads (and we only have three application binaries at any rate …).


© Copyright 2020 Bruce Jacob, All Rights Reserved
18
ENEE 447: Operating Systems — Project Eight


The physical page ends at the 1MB boundary: address 0x00100000. At that point we start using space for the application binaries. This is shown in the following figure:


























Everything in the previous figure is in the “page 0 kernel” box at the bottom of the stack above. The system’s physical memory is divided into 1MB chunks, called “sections” in the ARM documentation, and there are 4096 of them in the system, so we have 4096-entry page tables to map the space. The easiest allocation scheme is to start at location 0x00100000 and increment it every time you create a new task: once for the code and data, and once for the stack. The code and data starting at 0x00100000 is hard-coded into the linker files (memmap files) in the application directories.

Other Changes

There are some other changes you might notice. To simplify things, the kernel.c module launches into the idle task first, and then it simply puts the shell on the runq. The shell is started when the timer interrupt causes the IRQ interrupt handler to run, at which point it finds the shell on the runq and makes the thread active. Thus, there are only two places where user-thread contexts can be swapped (the two interrupt handlers), and there is only one place where a newly-created user thread can start running (the IRQ interrupt handler). The idle thread is actually a kernel thread.

Build It, Load It, Run It

Once you have it working, show us.
















© Copyright 2020 Bruce Jacob, All Rights Reserved
19

More products