6.2 Uses for Virtual Memory
Flexible Memory Allocation
- Virtual memory allows physical memory to be allocated and
deallocated more freely
- A single process can be allocated pages from all over physical
memory
- These pages appear as a single unified (and often contiguous)
address space
Fragmentation
Solution
Sparse Address Spaces
- Processes need not have all addresses mapped
- This can allow data structure space to grow without wasting
memory
Sparse address space
Persistence
- Memory addresses don’t have to correspond to physical memory
- The OS may provide the ability to map a persistent storage medium to
a process address space
Demand Driven Loading
- Many programs are large
- Conceptually, these programs need to be loaded before being run
- Virtual memory can be used to load portions of these programs as
needed
Efficient Zero-filling
- Memory allocated to a process should be zeroed before use
- This task takes time
- An OS can avoid it by assigning copies of a read-only zeroed page to
processes
- Pages can be swapped out when writes occur
Substituting Disk Storage for RAM
- Persistent storage is typically cheaper than RAM
- Virtual memory provides the tools to move rarely used memory pages
to persistent storage
6.3 Virtual Memory Mechanisms
Mapping
- Should be configurable
- Should be efficient
- Mapping function for pages of addresses, not single addresses
Null mappings
- Virtual address pages may map to physical addresses
- They may also map to nothing at all
Page mapping
Hardware
- Dedicated hardware (MMU) implements mapping for performance
reasons
- OS configures MMU with appropriate mappings
Page table storage
- Data structure may be stored in memory
- Using the structure from memory at least doubles the number of
memory accesses. Why?
Locality
- Memory accesses exhibit spatial and temporal locality
- This can be exploited to create far more efficient memory
access
Translation Lookaside Buffer
- MMU stores (caches) recently used translations
- Translation lookaside buffer (TLB) is used to improve
performance
Caches
- Used to improve latency to access data
- Generally have an inverse relationship between latency and size
- That is, larger caches are necessarily slower
TLB
- Should be fast
- Should be large
- Can’t be both
Splitting TLB
- Using a separate TLB for instruction fetches and data loads improves
performance
- Locality is improved
- Lookups can happen in parallel
- Caches can be smaller and therefore faster
TLB Hierarchy
- Just like other caches, architects can include multiple levels of
TLB cache
- This ensures extremely fast performance in the common case
- This provides improved performance when the L1 misses and avoids
some memory accesses
Combining Entries
- Often, systems will perform large allocations of many contiguous
addresses
- We can eliminate the number of entries needed in the TLB by allowing
a single entry to describe multiple contiguous mappings
Page Size
- Page size will have a strong impact on MMU and TLB latency
- Smaller pages require more entries
- Larger pages make less efficient use of space
- Variable size pages may provide benefits, but increase
complexity
Hardware Support
- Common for MMU to expect page tables in a fixed format
- Register is used to tell the MMU where to look for the page
table
- OS stores the page tables to memory and sets this register
Software-only
- MMU may simply send control to the OS on TLB miss
- OS returns memory mapping to be used and stored in TLB
- Provides slower performance on miss but enhanced flexibility for
mapping storage and lookup
Context Switching
- Most process need their own memory mapping and page table
- Context switches therefore put pressure on the MMU
- TLB needs to be flushed on switch, or entries need to be tagged with
a process ID
Linear Page Tables
- Store page mapping in a simple array
- Easy to access and reason about
- Waste large amounts of space for largely sparse mapping
1 |
1 |
1 |
0 |
0 |
X |
0 |
X |
0 |
X |
0 |
X |
1 |
3 |
0 |
X |
Sparse Addresses
- Most virtual addresses aren’t valid mappings
- Modern systems include 64 bit addresses
- Even using absurdly large page sizes in the GB range, multiple GB
are still needed to store the page table for each process
Breaking the Page Table into Pages
- A large linear table can itself be broken into pages
- Only pages that include valid entries need to be stored
Address translation
Two Level Page Table
- Explicitly store directory of pages as one page
- Store page tables as needed referenced from the directory
- Used by 32-bit Intel CPUs
Two-level page table
Two-level translation
x86 two-level page table
Large Pages
- x86 implementation provides for skipping the page table
entirely
- Directory entries include a size bit, which if set indicate a
pointer to single large (4MiB) page frame
Multilevel page tables
- We can support larger addresses using more levels
- Modern 64-bit system often use 4 layer page tables
x86 PAE Three-level page
table
Hashed Page Tables
- Alternative to multilevel page tables
- Use a hash data structure in place of a tree