What happens when a JAVA program writes a file

Last modified : 9 October, 2017

Let’s try to find out to the greatest detail, what happens when a file is written. Technically the end product is a bunch of rust particles magnetized in some orientation on a platter. We’ll see how far this rabbit hole we can go.

Here’s the program. I type it into my favorite IDE, and store it on disk (inception much?) encoded as 7-bit ASCII :

$ cat FileWrite.java
import java.io.*;

class FileWrite {
	public static void main(String args[]) throws IOException {
		File f = new File("/tmp/file.txt");
		FileWriter fw = new FileWriter(f);
		fw.write("Hello World!");
		fw.close();
	}
}
$ enca -L none FileWrite.java
7bit ASCII characters
$ javac FileWrite.java

Here’s what the disassembly shows:

$ javap -c -verbose FileWrite.class
Classfile /tmp/FileWrite.class
  Last modified Oct 9, 2017; size 535 bytes
  MD5 checksum 0797c47f70a0af662d45249c20e5b75b
  Compiled from "FileWrite.java"
class FileWrite
  minor version: 0
  major version: 52
  flags: ACC_SUPER
Constant pool:
   #1 = Methodref          #11.#22        // java/lang/Object."<init>":()V
   #2 = Class              #23            // java/io/File
   #3 = String             #24            // /tmp/file.txt
   #4 = Methodref          #2.#25         // java/io/File."<init>":(Ljava/lang/String;)V
   #5 = Class              #26            // java/io/FileWriter
   #6 = Methodref          #5.#27         // java/io/FileWriter."<init>":(Ljava/io/File;)V
   #7 = String             #28            // Hello World!
   #8 = Methodref          #5.#29         // java/io/FileWriter.write:(Ljava/lang/String;)V
   #9 = Methodref          #5.#30         // java/io/FileWriter.close:()V
  #10 = Class              #31            // FileWrite
  #11 = Class              #32            // java/lang/Object
  #12 = Utf8               <init>
  #13 = Utf8               ()V
  #14 = Utf8               Code
  #15 = Utf8               LineNumberTable
  #16 = Utf8               main
  #17 = Utf8               ([Ljava/lang/String;)V
  #18 = Utf8               Exceptions
  #19 = Class              #33            // java/io/IOException
  #20 = Utf8               SourceFile
  #21 = Utf8               FileWrite.java
  #22 = NameAndType        #12:#13        // "<init>":()V
  #23 = Utf8               java/io/File
  #24 = Utf8               /tmp/file.txt
  #25 = NameAndType        #12:#34        // "<init>":(Ljava/lang/String;)V
  #26 = Utf8               java/io/FileWriter
  #27 = NameAndType        #12:#35        // "<init>":(Ljava/io/File;)V
  #28 = Utf8               Hello World!
  #29 = NameAndType        #36:#34        // write:(Ljava/lang/String;)V
  #30 = NameAndType        #37:#13        // close:()V
  #31 = Utf8               FileWrite
  #32 = Utf8               java/lang/Object
  #33 = Utf8               java/io/IOException
  #34 = Utf8               (Ljava/lang/String;)V
  #35 = Utf8               (Ljava/io/File;)V
  #36 = Utf8               write
  #37 = Utf8               close
{
  FileWrite();
    descriptor: ()V
    flags:
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 3: 0

  public static void main(java.lang.String[]) throws java.io.IOException;
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=3, locals=3, args_size=1
         0: new           #2                  // class java/io/File
         3: dup
         4: ldc           #3                  // String /tmp/file.txt
         6: invokespecial #4                  // Method java/io/File."<init>":(Ljava/lang/String;)V
         9: astore_1
        10: new           #5                  // class java/io/FileWriter
        13: dup
        14: aload_1
        15: invokespecial #6                  // Method java/io/FileWriter."<init>":(Ljava/io/File;)V
        18: astore_2
        19: aload_2
        20: ldc           #7                  // String Hello World!
        22: invokevirtual #8                  // Method java/io/FileWriter.write:(Ljava/lang/String;)V
        25: aload_2
        26: invokevirtual #9                  // Method java/io/FileWriter.close:()V
        29: return
      LineNumberTable:
        line 5: 0
        line 6: 10
        line 7: 19
        line 8: 25
        line 9: 29
    Exceptions:
      throws java.io.IOException
}
SourceFile: "FileWrite.java"

Understandably, the disassembled code in the main() method is quite simple since the java.io.* classes are doing most of the work. The JAVA Virtual Machine Specification details all the instructions. e.g.

         0: new           #2                  // class java/io/File

new: “Create new object”. #2 refers to the Constant pool, which the disassembler helpfully looks up to be a class java/io/File

         4: ldc           #3                  // String /tmp/file.txt

ldc: “Push item from run-time constant pool” .

         6: invokespecial #4                  // Method java/io/File."<init>":(Ljava/lang/String;)V

invokespecial: “Invoke instance method; special handling for superclass, private, and instance initialization method invocations” .

To understand what each of these instructions actually do, we’ll have to dig through JVM source. e.g. for the invokespecial bytecode, I’m guessing the source for the JVM interpreter is here or here. I tried installing java with debug symbols and then running jdb inside of gdb (inception is an awesome movie) but I ran out of patience to learn both CLI tools. I’ll punt on that for another blog post.

$ dnf debuginfo-install java-1.8.0-openjdk-devel-debug-1.8.0.144-5.b01.fc26.x86_64 
$ dnf debuginfo-install libgcc-7.2.1-2.fc26.x86_64 libstdc++-7.2.1-2.fc26.x86_64 sssd-client-1.15.3-4.fc26.x86_64 zlib-1.2.11-2.fc26.x86_64

If I insert a Thread.sleep() just before the fw.close(), I’m able to run pmap on the process. Excerpted:

$ pmap -XX 31191
31191:   java FileWrite
         Address Perm   Offset Device    Inode    Size   Rss   Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty Referenced Anonymous LazyFree AnonHugePages ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss KernelPageSize MMUPageSize Locked                   VmFlagsMapping
        83400000 rw-p 00000000  00:00        0   86016     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd
        88800000 ---p 00000000  00:00        0 1276928     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0          mr mw me nr sd
        d6700000 rw-p 00000000  00:00        0   43008   408   408            0            0             0           408        408       408        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd
        d9100000 ---p 00000000  00:00        0  637952     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0          mr mw me nr sd
       100000000 rw-p 00000000  00:00        0     512   264   264            0            0             0           264        264       264        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd
       100080000 ---p 00000000  00:00        0 1048064     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0          mr mw me nr sd
    55598aa39000 r-xp 00000000  fd:01    12216       4     4     4            0            0             4             0          4         0        0             0              0              0               0    0       0              4           4      0    rd ex mr mw me dw sd  java
    55598ac39000 r--p 00000000  fd:01    12216       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd mr mw me dw ac sd  java
    55598ac3a000 rw-p 00001000  fd:01    12216       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0 rd wr mr mw me dw ac sd  java
    55598b9d7000 rw-p 00000000  00:00        0     132    20    20            0            0             0            20         20        20        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  [heap]
    7f401c000000 rw-p 00000000  00:00        0     132     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me nr sd
    7f401c021000 ---p 00000000  00:00        0   65404     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0          mr mw me nr sd
    7f4024000000 rw-p 00000000  00:00        0     132    52    52            0            0             0            52         52        52        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me nr sd
..........
..........
    7f40397d4000 r--p 00000000  fd:01  3024762  106672   440    42          440            0             0             0        440         0        0             0              0              0               0    0       0              4           4      0          rd mr mw me sd  locale-archive
    7f4040000000 rw-p 00000000  00:00        0     132    28    28            0            0             0            28         28        28        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me nr sd
    7f4040021000 ---p 00000000  00:00        0   65404     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0          mr mw me nr sd
    7f4044000000 rw-p 00000000  00:00        0     132     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me nr sd
    7f408801b000 rw-p 00000000  00:00        0   11120   368   368            0            0             0           368        368       368        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd
.........
.........
    7f4088af7000 r--s 043f2000  fd:01    12302    1852  1852  1852            0            0          1852             0       1852         0        0             0              0              0               0    0       0              4           4      0          rd mr me ms sd  rt.jar
    7f4088cc6000 rw-p 00000000  00:00        0    4824  2928  2928            0            0             0          2928       2928      2928        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd
    7f408917c000 ---p 00000000  00:00        0    3840     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0          mr mw me nr sd
    7f408953c000 rw-p 00000000  00:00        0    3992     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd
    7f408a5b3000 ---p 00000000  00:00        0    3800     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0          mr mw me nr sd
.........
    7f408a969000 r-xp 00000000  fd:01    12261      32    32    11           32            0             0             0         32         0        0             0              0              0               0    0       0              4           4      0       rd ex mr mw me sd  libzip.so
.........
    7f408ad7d000 rw-p 0000b000  fd:01  3026726       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  libnss_files-2.25.so
.........
    7f408afb2000 rw-p 0002e000  fd:01    12243       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  libjava.so
.........
    7f408b1c0000 r--p 0000c000  fd:01    12260       8     8     8            0            0             0             8          8         8        0             0              0              0               0    0       0              4           4      0       rd mr mw me ac sd  libverify.so
.........
    7f408b3ca000 rw-p 00007000  fd:01  3026732       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  librt-2.25.so
.........
    7f408b3cb000 r-xp 00000000  fd:01  3015258      88    80     8           80            0             0             0         80         0        0             0              0              0               0    0       0              4           4      0       rd ex mr mw me sd  libgcc_s-7-20170622.so.1
.........
    7f408b5e2000 r-xp 00000000  fd:01  3026718    1108   424    48          424            0             0             0        424         0        0             0              0              0               0    0       0              4           4      0       rd ex mr mw me sd  libm-2.25.so
.........
    7f408bc7b000 rw-p 00183000  fd:01  3015480       8     8     8            0            0             0             8          8         8        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  libstdc++.so.6.0.24
.........
    7f408cc7e000 rw-p 00dfe000  fd:01    12265     164   164   164            0            0             0           164        164       164        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  libjvm.so
.........
    7f408d0a2000 rw-p 001cb000  fd:01  3024787       8     8     8            0            0             0             8          8         8        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  libc-2.25.so
.........
    7f408d2ab000 rw-p 00003000  fd:01  3026716       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  libdl-2.25.so
.........
    7f408d2ac000 r-xp 00000000  fd:01    12229      56    56    56            0            0            56             0         56         0        0             0              0              0               0    0       0              4           4      0       rd ex mr mw me sd  libjli.so
.........
    7f408d6d0000 r--p 00015000  fd:01  3027525       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0       rd mr mw me ac sd  libz.so.1.2.11
.........
    7f408d8ec000 rw-p 0001a000  fd:01  3026728       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  libpthread-2.25.so
.........
    7f408d8f1000 r-xp 00000000  fd:01  3026695     156   156     4          156            0             0             0        156         0        0             0              0              0               0    0       0              4           4      0    rd ex mr mw me dw sd  ld-2.25.so
.........
    7f408db0b000 rw-s 00000000  00:29 10797082      32    32    32            0            0             0            32         32         0        0             0              0              0               0    0       0              4           4      0 rd wr sh mr mw me ms sd  31191
.........
    7f408db13000 rw-p 00000000  00:00        0       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd
    7f408db14000 r--p 00000000  00:00        0       4     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0          rd mr mw me sd
    7f408db15000 rw-p 00000000  00:00        0       8     8     8            0            0             0             8          8         8        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd
    7f408db17000 r--p 00026000  fd:01  3026695       4     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd mr mw me dw ac sd  ld-2.25.so
    7f408db18000 rw-p 00027000  fd:01  3026695       8     8     8            0            0             0             8          8         8        0             0              0              0               0    0       0              4           4      0 rd wr mr mw me dw ac sd  ld-2.25.so
    7ffc10fa6000 rw-p 00000000  00:00        0     136    36    36            0            0             0            36         36        36        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me gd ac  [stack]
    7ffc10fdf000 r--p 00000000  00:00        0      12     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0    rd mr pf io de dd sd  [vvar]
    7ffc10fe2000 r-xp 00000000  00:00        0       8     4     0            4            0             0             0          4         0        0             0              0              0               0    0       0              4           4      0    rd ex mr mw me de sd  [vdso]
ffffffffff600000 r-xp 00000000  00:00        0       4     0     0            0            0             0             0          0         0        0             0              0              0               0    0       0              4           4      0                   rd ex  [vsyscall]
                                               ======= ===== ===== ============ ============ ============= ============= ========== ========= ======== ============= ============== ============== =============== ==== ======= ============== =========== ======
                                               4671532 24612 16349        10856            0          5288          8468      24612      8436        0             0              0              0               0    0       0            656         656      0 KB

I found this interesting:

    55598b9d7000 rw-p 00000000  00:00        0     132    20    20            0            0             0            20         20        20        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me ac sd  [heap]
    7f401c000000 rw-p 00000000  00:00        0     132     4     4            0            0             0             4          4         4        0             0              0              0               0    0       0              4           4      0    rd wr mr mw me nr sd  

I wonder if the huge gap of addresses between the heap and the next page is intentional since the process may need to grow the heap. Its important to note that this is not the JVM heap. This annotation is present on pmaps of non-JVM processes just as well.

Let’s also look at the system calls made by the process. Here’s the trace I dumped to a file. Excerpted:

$ strace -f java FileWrite
....... Bunch of syscalls setting up the JVM
....... Lots of syscalls trying to open .so files which don't exist
....... Some mprotect() calls
....... A few clone() calls which I imagine are spawning different JVM system threads.
....... A stat for ".hotspotrc"
....... Reads /proc/self/maps 
....... A bunch of sched_getaffinity()
....... 
....... 
....... 
....... Only in the last 94% of the trace does the FileWrite.class file actually get loaded
....... In the last 99% of the trace (I kid you not)
.......
[pid 31727] open("/tmp/file.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
[pid 31727] fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
[pid 31727] write(4, "Hello World!", 12) = 12
[pid 31727] lseek(3, 69527180, SEEK_SET) = 69527180
[pid 31727] read(3, "PK\3\4\n\0\0\10\0\0\310E\34K\26\310\316\254\226\2\0\0\226\2\0\0 \0\0\0", 30) = 30
[pid 31727] lseek(3, 69527242, SEEK_SET) = 69527242
[pid 31727] read(3, "\312\376\272\276\0\0\0004\0#\t\0\4\0\32\n\0\5\0\33\n\0\30\0\34\7\0\35\7\0\36\7"..., 662) = 662
[pid 31727] close(4)                    = 0
........
........ a few more system calls, and then everyone goes home.

I would like to descend into the system call code, but that’s obviously not possible using gdb or user-land tools. I’m going to punt on this too for another blog post.

Here’s what I already know about some of the system calls.

  1. The open() syscall is given a path. To resolve the path, Linux calls into the VFS (Virtual File System). The VFS maintains a dentry cache, but in case the inode is not present in the cache, the kernel delegates to the file system (which may be ext4, zfs, xfs, btrfs or fat etc.) to which the sub-path is mapped (usually from /etc/fstab).
  2. The file system in my case is usually ext4, which resolves the path by spinning the disk and reading metadata from wherever it was stored.
  3. Once an inode is found for the directory, for a create, a new inode must be created. Note that this inode may not be persistently written to disk immediately based on options which were used while loading the file system. ext4 writes to a journal.
  4. The write() syscall probably writes to the buffer cache. We have had a fair amount of fun with performance benchmarks because of this feature. At some point in the future, the ext4 filesystem promises to flush the buffer cache to the disk. An fsync() syscall can force the flushing of data for a file handle. fincore is a command that will tell you what files’ pages have been buffered. You can always ask Linux to flush to persistent storage using the command sync. Note that this is different from dropping page cache / dentry cache which is achieved by writing 1/2/3 to /proc/sys/vm/drop_caches .
  5. When the filesystem (ext4 in my case) wants to write to disk, it must do so in blocks using LBA. You can get the block device size using

    $ sudo blockdev --getbsz /dev/sda
    4096
    
  6. Disks are usually connected using SATA which allows the disk controllers to communicate with the disks. Disks themselves usually run ARM cores with millions of lines of code which cleverly schedule the block accesses, accumulate SMART statistics, do error detection and correction etc. In the case of SSDs, they even try wear leveling and other fancy stuff which is usually proprietary to disk manufacturers. Talk about dark rabbit holes ;-)

All content on this website is licensed as Creative Commons-Attribution-ShareAlike 4.0 License. Opinions expressed are solely my own and do not express the views or opinions of my employer.