6.2 Uses for Virtual Memory

Flexible Memory Allocation

Virtual memory allows physical memory to be allocated and deallocated more freely
A single process can be allocated pages from all over physical memory
These pages appear as a single unified (and often contiguous) address space

Fragmentation

Solution

Sparse Address Spaces

Processes need not have all addresses mapped
This can allow data structure space to grow without wasting memory

Sparse address space

Persistence

Memory addresses don’t have to correspond to physical memory
The OS may provide the ability to map a persistent storage medium to a process address space

Demand Driven Loading

Many programs are large
Conceptually, these programs need to be loaded before being run
Virtual memory can be used to load portions of these programs as needed

Efficient Zero-filling

Memory allocated to a process should be zeroed before use
This task takes time
An OS can avoid it by assigning copies of a read-only zeroed page to processes
Pages can be swapped out when writes occur

Substituting Disk Storage for RAM

Persistent storage is typically cheaper than RAM
Virtual memory provides the tools to move rarely used memory pages to persistent storage

6.3 Virtual Memory Mechanisms

Mapping

Should be configurable
Should be efficient
Mapping function for pages of addresses, not single addresses

Null mappings

Virtual address pages may map to physical addresses
They may also map to nothing at all

Page mapping

Hardware

Dedicated hardware (MMU) implements mapping for performance reasons
OS configures MMU with appropriate mappings

Page table storage

Data structure may be stored in memory
Using the structure from memory at least doubles the number of memory accesses. Why?

Locality

Memory accesses exhibit spatial and temporal locality
This can be exploited to create far more efficient memory access

Translation Lookaside Buffer

MMU stores (caches) recently used translations
Translation lookaside buffer (TLB) is used to improve performance

Caches

Used to improve latency to access data
Generally have an inverse relationship between latency and size
That is, larger caches are necessarily slower

TLB

Should be fast
Should be large
Can’t be both

Splitting TLB

Using a separate TLB for instruction fetches and data loads improves performance
Locality is improved
Lookups can happen in parallel
Caches can be smaller and therefore faster

TLB Hierarchy

Just like other caches, architects can include multiple levels of TLB cache
This ensures extremely fast performance in the common case
This provides improved performance when the L1 misses and avoids some memory accesses

Combining Entries

Often, systems will perform large allocations of many contiguous addresses
We can eliminate the number of entries needed in the TLB by allowing a single entry to describe multiple contiguous mappings

Page Size

Page size will have a strong impact on MMU and TLB latency
Smaller pages require more entries
Larger pages make less efficient use of space
Variable size pages may provide benefits, but increase complexity

Performance Implications

Programs that access memory sequentially will benefit from both TLB hits and data cache hits
Dense data structures will perform much better than sparse ones
Shorter programs and shorter jumps will see better TLB performance

Hardware Support

Common for MMU to expect page tables in a fixed format
Register is used to tell the MMU where to look for the page table
OS stores the page tables to memory and sets this register

Software-only

MMU may simply send control to the OS on TLB miss
OS returns memory mapping to be used and stored in TLB
Provides slower performance on miss but enhanced flexibility for mapping storage and lookup

Context Switching

Most process need their own memory mapping and page table
Context switches therefore put pressure on the MMU
TLB needs to be flushed on switch, or entries need to be tagged with a process ID

Linear Page Tables

Store page mapping in a simple array
Easy to access and reason about
Waste large amounts of space for largely sparse mapping

Valid	Page Frame
1	1
1	0
0	X
0	X
0	X
0	X
1	3
0	X

Sparse Addresses

Most virtual addresses aren’t valid mappings
Modern systems include 64 bit addresses
Even using absurdly large page sizes in the GB range, multiple GB are still needed to store the page table for each process

Breaking the Page Table into Pages

A large linear table can itself be broken into pages
Only pages that include valid entries need to be stored

Address translation

Two Level Page Table

Explicitly store directory of pages as one page
Store page tables as needed referenced from the directory
Used by 32-bit Intel CPUs

Two-level page table

Two-level translation

x86 two-level page table

Large Pages

x86 implementation provides for skipping the page table entirely
Directory entries include a size bit, which if set indicate a pointer to single large (4MiB) page frame

Multilevel page tables

We can support larger addresses using more levels
Modern 64-bit system often use 4 layer page tables

x86 PAE Three-level page table

Hashed Page Tables

Alternative to multilevel page tables
Use a hash data structure in place of a tree