Part of the OSTEP notes

Memory

Unit of memory is the byte. Cpu can’t really talk about less than a byte. (I think there are even larger practicalities, like the size of a cache line)

Cpu runs code from memory too, so programs have to get into memory before they can be ran.

Haters will say a kilobyte is 1000 bytes. Umm actually it’s 1024.

2322^32 bytes is 4gb which is max addressible memory on 32bit machines.

how is memory used

How it used to be: No translation in effect at all. Accessing address 0xWhatever accessed byte 0xWhatever on the chip. Kernel usually at a known location in memory. Addresses for loads, stores, and jumps can be computed at compile time. The slide introducing this is called “Good old days”

Has problems. If we want to run multiple programs at the same time we can’t just hope they use disparate memory addresses. Need the ability to relocate them. Processes can also scribble over each other’s memory (which can crash them, or these days, we think about the security implications).

Programs might not even use all the memory they are allocated. Often programs have a stack and a heap and a big ol gap between them. It would be a waste to allocate the same amount of physical memory “square footage” to one program that uses 5kband another program that uses 5gb of memory.

virtual addressing

Lets keep it simple on the process side. Processes think they access memory from 0 to some maximum amount of memory in a big contiguous block. Between the process and the chip, though, we have a translation layer from that “virutal address” to a physical address on the chip.

Physical addresses don’t necessarily have to be contiguous. They also don’t even need to actually be in memory (swapping).

“OS and CPU collaborate to translate a virtual address to a physical address”.

pages

Generally we work in “pages”, aka shift the memory address right a few times. Don’t need to micromanage memory at the level of bytes, but dividing memory into chunks of, say, 4kb is a little more reasonable.

One page belongs to one process. “Huge pages” exist on some linux systems, as large as 2mb, but the page size is planck’s constant for memory usage. You can’t allocate less than a page without getting the whole page. Also other attributes are attached to the page such as whether the page is swapped out, memory protection info, rwx, etc so it doesnt make sense to grab less than a page.

address translation

Turning a virtual address into a physical address.

You can imagine a simple scheme (without any pages, assuming contiguous memory) as a simple offset. If a process’s offset is 0x1234, then accessing virtual address 0 accesses physical address 0x1234, accessing virtual address 1 accesses 0x1235 etc.

Implement memory protection by adding a “bounds” to each process. If the virtual address is greater than bounds, then trigger some kind of fault and quit the process instead of actually accessing the memory. Now processes can’t scribble over each other.

The base address and the bounds become more things that are saved and restored across context switches. They need to cooperate closely with the cpu and memory management unit so they are kind of special. Changing these memory registers is a privileged operation.

problem with that simple scheme

Could try segmentation. Code, heap, and stack all get their own segments which can have their own sizes. Well at least the chunks are smaller and can be appropriately-sized for each program, but fragmentation is still an issue.

Btw if you use high bits of the address to encode what segment it’s in (aka “placing the segments REALLY far apart”), going from address -> segment is basically free

how programs use memory

Due to the “elegance” and “beauty” of the C memory allocator api 😒 malloc needs lots of bookkeeping. Specifically free has no way of telling how big the allocation was

This is also a fully general api and it’s expected to do enough bookkeeping to handle any sequence of mallocs and frees. Needs to know where to put stuff. Very difficult algorithms are behind this, active area of research…

free lists

Linked list data structure.

Because of the shitty malloc/free api malloc keeps a “header” which contains the size of the block (and other random crap). malloc(size) actually allocates size + header bytes and puts its stuff in the header

Freelist nodes and malloc headers occupy the same space, one is overwritten with the other when mallocating and freeing

You can put a “magic number” in the malloc header. When you free you can check that the magic number is still the same and a stupid C programmer hasn’t clobbered it.

This is all academic I think real malloc implementations are fancier but u get the idea

policies

User calls malloc(100) and there are 5 big-enough slots in the freelist. Which one to return.

Thought balloon: How do freelists interact with parallelism?

fragmentation

Malloc 3kb, malloc 1kb, malloc 3kb, malloc 1kb, repeat. Free all the 3kb stuff. Then malloc 4kb.

Assume all the 3kb/1kb allocations were placed back-to-back. Now we’re in a situation where we have more than 4kb available but we don’t have any contiguous 4kb region to put this 4kb allocation