Virtual Memory in Linux

Last modified : 3 August, 2017

The Virtual Memory subsystem in Linux is extremely interesting. Here’s my understanding of it. I’m sure there are lots of things I am missing still, so if you could please leave me a comment, I’d greatly appreciate it.

Let’s start with the view of memory address space that every process has. On a 64-bit OS, a process thinks it can address memory from 0x0000000000000000 to 0xFFFFFFFFFFFFFFFF .

Process Virtual Memory

Let’s not worry about what the process does with this address space except briefly. Some applications will choose to “grow the stack downwards and heap upwards.” (Interestingly I recently became aware of the Stack Clash vulnerability ) . I suspect (although I’m not sure), that different file formats (e.g. ELF) would cause the program to be loaded differently. In the end the operating system will load up the Instruction Register and hand off control. What the list of instructions does after that point, is totally up to the application. So long as it interfaces with the OS system calls properly and doesn’t try to run priviledged instructions, it can go on its merry way. A JVM process may partition the address space into different regions for the permanent generation, heap, stack, etc.

So what happened when Linux so generously allocated this 64-bit address space to the process? Very little actually. The kernel created a new process and assigned it a new page table. You can see this using pmap. For example, the following is pmap output of a top process on my machine. Using the -XX option will tabulate all the information present in the /proc/<pid>/smaps file.

$ pmap -XX 49910
49910:   top
         Address Perm   Offset Device   Inode   Size  Rss  Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty Referenced Anonymous AnonHugePages ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss KernelPageSize MMUPageSize Locked                   VmFlagsMapping
    55cde8d1a000 r-xp 00000000  fd:01 5899811     96   96   96            0            0            96             0         96         0             0              0              0               0    0       0              4           4      0    rd ex mr mw me dw sd  top
    55cde8f32000 r--p 00018000  fd:01 5899811      4    4    4            0            0             0             4          4         4             0              0              0               0    0       0              4           4      0    rd mr mw me dw ac sd  top
    55cde8f33000 rw-p 00019000  fd:01 5899811      4    4    4            0            0             0             4          4         4             0              0              0               0    0       0              4           4      0 rd wr mr mw me dw ac sd  top
    55cde8f34000 rw-p 00000000  00:00       0    160   36   36            0            0             0            36         36        36             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  
    55cdeae5e000 rw-p 00000000  00:00       0   1096  984  984            0            0             0           984        984       984             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  [heap]
    7fd7b2efd000 r-xp 00000000  fd:01 5904102     44   44    0           44            0             0             0         44         0             0              0              0               0    0       0              4           4      0       rd ex mr mw me sd  libnss_files-2.25.so
    7fd7b2f08000 ---p 0000b000  fd:01 5904102   2044    0    0            0            0             0             0          0         0             0              0              0               0    0       0              4           4      0             mr mw me sd  libnss_files-2.25.so
    7fd7b3107000 r--p 0000a000  fd:01 5904102      4    4    4            0            0             0             4          4         4             0              0              0               0    0       0              4           4      0       rd mr mw me ac sd  libnss_files-2.25.so
    7fd7baf82000 ---p 00012000  fd:01 5906405   2048    0    0            0            0             0             0          0         0             0              0              0               0    0       0              4           4      0             mr mw me sd  libgpg-error.so.0.20.0
    7fd7bb183000 rw-p 00013000  fd:01 5906405      4    4    4            0            0             0             4          4         4             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  libgpg-error.so.0.20.0
    7fd7bb48b000 rw-p 00107000  fd:01 5905989     24   24   24            0            0             0            24         24        24             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  libgcrypt.so.20.1.8
    7fd7bb6a4000 r--p 00012000  fd:01 5905318      4    4    4            0            0             0             4          4         4             0              0              0               0    0       0              4           4      0       rd mr mw me ac sd  liblz4.so.1.7.5
    7fd7bbf01000 r-xp 00000000  fd:01 5904106     88   60    0           60            0             0             0         60         0             0              0              0               0    0       0              4           4      0       rd ex mr mw me sd  libresolv-2.25.so
    7fd7bbf17000 ---p 00016000  fd:01 5904106   2044    0    0            0            0             0             0          0         0             0              0              0               0    0       0              4           4      0             mr mw me sd  libresolv-2.25.so
....
.....
.......
........
    7fd7bcd6a000 r-xp 00000000  fd:01 5904081    156  152    0          152            0             0             0        152         0             0              0              0               0    0       0              4           4      0    rd ex mr mw me dw sd  ld-2.25.so
    7fd7bce78000 rw-p 00000000  00:00       0    448  188  188            0            0             0           188        188       188             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  
    7fd7bcee8000 r-xp 00000000  fd:01 5902146    532   64    0           64            0             0             0         64         0             0              0              0               0    0       0              4           4      0       rd ex mr mw me sd  libsystemd.so.0.18.0
    7fd7bcf6d000 ---p 00085000  fd:01 5902146      4    0    0            0            0             0             0          0         0             0              0              0               0    0       0              4           4      0             mr mw me sd  libsystemd.so.0.18.0
    7fd7bcf72000 rw-p 00000000  00:00       0      4    0    0            0            0             0             0          0         0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  
    7fd7bcf8e000 rw-p 00000000  00:00       0      8    8    8            0            0             0             8          8         8             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  
    7fd7bcf90000 r--p 00026000  fd:01 5904081      4    4    4            0            0             0             4          4         4             0              0              0               0    0       0              4           4      0    rd mr mw me dw ac sd  ld-2.25.so
    7fd7bcf91000 rw-p 00027000  fd:01 5904081      8    8    8            0            0             0             8          8         8             0              0              0               0    0       0              4           4      0 rd wr mr mw me dw ac sd  ld-2.25.so
    7ffd2004d000 rw-p 00000000  00:00       0    136   28   28            0            0             0            28         28        28             0              0              0               0    0       0              4           4      0    rd wr mr mw me gd ac  [stack]
    7ffd20134000 r--p 00000000  00:00       0      8    0    0            0            0             0             0          0         0             0              0              0               0    0       0              4           4      0    rd mr pf io de dd sd  [vvar]
    7ffd20136000 r-xp 00000000  00:00       0      8    4    0            4            0             0             0          4         0             0              0              0               0    0       0              4           4      0    rd ex mr mw me de sd  [vdso]
ffffffffff600000 r-xp 00000000  00:00       0      4    0    0            0            0             0             0          0         0             0              0              0               0    0       0              4           4      0                   rd ex  [vsyscall]
                                              ====== ==== ==== ============ ============ ============= ============= ========== ========= ============= ============== ============== =============== ==== ======= ============== =========== ====== 
                                              164924 5340 1782         3656            0           160          1524       5340      1524             0              0              0               0    0       0            416         416      0 KB 

Here is my understanding of the different columns:

  1. Address : The virtual memory address
  2. Perm : Whether this process is allowed to read / write / execute instructions from this page. p refers to whether the changes on this page will be private and s to shared.
  3. Offset : ?
  4. Device : ?
  5. Inode : ?
  6. Size : The page size in Kilobytes (no I am not going to call 1024 bytes Kibibytes)
  7. Rss : Resident set size. This is the actual amount of RAM used.
  8. Pss : ??
  9. Shared : Whether pages are private or shared.
  10. Clean/Dirty : Whether the pages have been modified.
  11. HugePages : https://wiki.debian.org/Hugepages . Pages much bigger than 4kb. (To reduce the size of the page table)
  12. Swap : Whether the page is actually swapped to disk.
  13. Anonymous : Whether this page is a mapping of a file or not.

I’m not sure I understand how to interpret the table fully, but thankfully I’ve never needed to. eg. Is each row describing one page? Or a group of continuous pages? Or just the Mapping? And what are these [stack] and [heap] pages? Toodly-do.

Logical View

Lets take a step back and try to understand what the different kinds of pages may be.

Types of Pages

Here PageA is a shared page. The shared pages although mapped in the virtual memory of several processes are only taking 1 page of RAM. Also Linux has Copy-On-Write, so if a page is not modified, it isn’t copied.

PageB although allocated by the Linux Kernel in the address space of the process has never been used. So no RAM is actually being used for it.

PageC is a private page and is allocated on the RAM.

Swapping

Now lets say you turn on swap. At this point Linux has the luxury of mapping some pages in the processes address space to a disk instead of physical memory. It may choose to do this based on several criteria. Maybe there isn’t a whole lot of RAM left. Maybe PageC hasn’t been used in a while. Then,

Swapped Page

The kernel has moved PageC to disk. Ofcourse if the process tries to use this page (read or write), this would cause a page fault and make the Linux Kernel try to move PageC back into RAM (since RAM is the only memory directly addressable by the CPU)

No Swapping (or swap is full)

In case you turn swap off with swapoff a process will still be able to get virtual pages, but soon as it writes to a page, space must be made available in RAM. Initially when the cumulative sum of all pages increases beyond the RAM, the Linux OOM Killer kicks in. There are oom_scores that it takes into account. I know this could lead to some pretty important processes getting shot (including sshd). I’m not sure what would happen if the single process tried using all its virtual memory exceeding the RAM with swap off.

You can limit a process from asking for too much virtual memory by setting ulimits

$ ulimit -v
unlimited

In addition there are a bunch of things one can control using the proc file system (although I try not to because I don’t know enough). I’ve used drop_caches and swappiness

$ ls /proc/sys/vm
admin_reserve_kbytes         dirty_expire_centisecs      hugetlb_shm_group          min_free_kbytes       nr_hugepages_mempolicy    overcommit_memory         swappiness
block_dump                   dirty_ratio                 laptop_mode                min_slab_ratio        nr_overcommit_hugepages   overcommit_ratio          user_reserve_kbytes
compact_memory               dirtytime_expire_seconds    legacy_va_layout           min_unmapped_ratio    nr_pdflush_threads        page-cluster              vfs_cache_pressure
compact_unevictable_allowed  dirty_writeback_centisecs   lowmem_reserve_ratio       mmap_min_addr         numa_zonelist_order       panic_on_oom              watermark_scale_factor
dirty_background_bytes       drop_caches                 max_map_count              mmap_rnd_bits         oom_dump_tasks            percpu_pagelist_fraction  zone_reclaim_mode
dirty_background_ratio       extfrag_threshold           memory_failure_early_kill  mmap_rnd_compat_bits  oom_kill_allocating_task  stat_interval
dirty_bytes                  hugepages_treat_as_movable  memory_failure_recovery    nr_hugepages          overcommit_kbytes         stat_refresh

This is all in addition to memory-mapped files, and caches which are incredible pieces of engineering deserving their own post.

Resources I found useful:

  1. Kernel book
  2. TLDP

All content on this website is licensed as Creative Commons-Attribution-ShareAlike 4.0 License. Opinions expressed are solely my own.