What happens when a JAVA program writes a file
Last modified : 9 October, 2017
Let’s try to find out to the greatest detail, what happens when a file is written. Technically the end product is a bunch of rust particles magnetized in some orientation on a platter. We’ll see how far this rabbit hole we can go.
Here’s the program. I type it into my favorite IDE, and store it on disk (inception much?) encoded as 7-bit ASCII :
$ cat FileWrite.java
import java.io.*;
class FileWrite {
public static void main(String args[]) throws IOException {
File f = new File("/tmp/file.txt");
FileWriter fw = new FileWriter(f);
fw.write("Hello World!");
fw.close();
}
}
$ enca -L none FileWrite.java
7bit ASCII characters
$ javac FileWrite.java
Here’s what the disassembly shows:
$ javap -c -verbose FileWrite.class
Classfile /tmp/FileWrite.class
Last modified Oct 9, 2017; size 535 bytes
MD5 checksum 0797c47f70a0af662d45249c20e5b75b
Compiled from "FileWrite.java"
class FileWrite
minor version: 0
major version: 52
flags: ACC_SUPER
Constant pool:
#1 = Methodref #11.#22 // java/lang/Object."<init>":()V
#2 = Class #23 // java/io/File
#3 = String #24 // /tmp/file.txt
#4 = Methodref #2.#25 // java/io/File."<init>":(Ljava/lang/String;)V
#5 = Class #26 // java/io/FileWriter
#6 = Methodref #5.#27 // java/io/FileWriter."<init>":(Ljava/io/File;)V
#7 = String #28 // Hello World!
#8 = Methodref #5.#29 // java/io/FileWriter.write:(Ljava/lang/String;)V
#9 = Methodref #5.#30 // java/io/FileWriter.close:()V
#10 = Class #31 // FileWrite
#11 = Class #32 // java/lang/Object
#12 = Utf8 <init>
#13 = Utf8 ()V
#14 = Utf8 Code
#15 = Utf8 LineNumberTable
#16 = Utf8 main
#17 = Utf8 ([Ljava/lang/String;)V
#18 = Utf8 Exceptions
#19 = Class #33 // java/io/IOException
#20 = Utf8 SourceFile
#21 = Utf8 FileWrite.java
#22 = NameAndType #12:#13 // "<init>":()V
#23 = Utf8 java/io/File
#24 = Utf8 /tmp/file.txt
#25 = NameAndType #12:#34 // "<init>":(Ljava/lang/String;)V
#26 = Utf8 java/io/FileWriter
#27 = NameAndType #12:#35 // "<init>":(Ljava/io/File;)V
#28 = Utf8 Hello World!
#29 = NameAndType #36:#34 // write:(Ljava/lang/String;)V
#30 = NameAndType #37:#13 // close:()V
#31 = Utf8 FileWrite
#32 = Utf8 java/lang/Object
#33 = Utf8 java/io/IOException
#34 = Utf8 (Ljava/lang/String;)V
#35 = Utf8 (Ljava/io/File;)V
#36 = Utf8 write
#37 = Utf8 close
{
FileWrite();
descriptor: ()V
flags:
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 3: 0
public static void main(java.lang.String[]) throws java.io.IOException;
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=3, locals=3, args_size=1
0: new #2 // class java/io/File
3: dup
4: ldc #3 // String /tmp/file.txt
6: invokespecial #4 // Method java/io/File."<init>":(Ljava/lang/String;)V
9: astore_1
10: new #5 // class java/io/FileWriter
13: dup
14: aload_1
15: invokespecial #6 // Method java/io/FileWriter."<init>":(Ljava/io/File;)V
18: astore_2
19: aload_2
20: ldc #7 // String Hello World!
22: invokevirtual #8 // Method java/io/FileWriter.write:(Ljava/lang/String;)V
25: aload_2
26: invokevirtual #9 // Method java/io/FileWriter.close:()V
29: return
LineNumberTable:
line 5: 0
line 6: 10
line 7: 19
line 8: 25
line 9: 29
Exceptions:
throws java.io.IOException
}
SourceFile: "FileWrite.java"
Understandably, the disassembled code in the main() method is quite simple since the java.io.* classes are doing most of the work. The JAVA Virtual Machine Specification details all the instructions. e.g.
0: new #2 // class java/io/File
new: “Create new object”. #2 refers to the Constant pool, which the disassembler helpfully looks up to be a class java/io/File
4: ldc #3 // String /tmp/file.txt
ldc: “Push item from run-time constant pool” .
6: invokespecial #4 // Method java/io/File."<init>":(Ljava/lang/String;)V
invokespecial: “Invoke instance method; special handling for superclass, private, and instance initialization method invocations” .
To understand what each of these instructions actually do, we’ll have to dig through JVM source. e.g. for the invokespecial bytecode, I’m guessing the source for the JVM interpreter is here or here. I tried installing java with debug symbols and then running jdb inside of gdb (inception is an awesome movie) but I ran out of patience to learn both CLI tools. I’ll punt on that for another blog post.
$ dnf debuginfo-install java-1.8.0-openjdk-devel-debug-1.8.0.144-5.b01.fc26.x86_64
$ dnf debuginfo-install libgcc-7.2.1-2.fc26.x86_64 libstdc++-7.2.1-2.fc26.x86_64 sssd-client-1.15.3-4.fc26.x86_64 zlib-1.2.11-2.fc26.x86_64
If I insert a Thread.sleep() just before the fw.close(), I’m able to run pmap on the process. Excerpted:
$ pmap -XX 31191
31191: java FileWrite
Address Perm Offset Device Inode Size Rss Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty Referenced Anonymous LazyFree AnonHugePages ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss KernelPageSize MMUPageSize Locked VmFlagsMapping
83400000 rw-p 00000000 00:00 0 86016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd
88800000 ---p 00000000 00:00 0 1276928 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 mr mw me nr sd
d6700000 rw-p 00000000 00:00 0 43008 408 408 0 0 0 408 408 408 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd
d9100000 ---p 00000000 00:00 0 637952 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 mr mw me nr sd
100000000 rw-p 00000000 00:00 0 512 264 264 0 0 0 264 264 264 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd
100080000 ---p 00000000 00:00 0 1048064 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 mr mw me nr sd
55598aa39000 r-xp 00000000 fd:01 12216 4 4 4 0 0 4 0 4 0 0 0 0 0 0 0 0 4 4 0 rd ex mr mw me dw sd java
55598ac39000 r--p 00000000 fd:01 12216 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd mr mw me dw ac sd java
55598ac3a000 rw-p 00001000 fd:01 12216 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me dw ac sd java
55598b9d7000 rw-p 00000000 00:00 0 132 20 20 0 0 0 20 20 20 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd [heap]
7f401c000000 rw-p 00000000 00:00 0 132 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me nr sd
7f401c021000 ---p 00000000 00:00 0 65404 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 mr mw me nr sd
7f4024000000 rw-p 00000000 00:00 0 132 52 52 0 0 0 52 52 52 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me nr sd
..........
..........
7f40397d4000 r--p 00000000 fd:01 3024762 106672 440 42 440 0 0 0 440 0 0 0 0 0 0 0 0 4 4 0 rd mr mw me sd locale-archive
7f4040000000 rw-p 00000000 00:00 0 132 28 28 0 0 0 28 28 28 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me nr sd
7f4040021000 ---p 00000000 00:00 0 65404 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 mr mw me nr sd
7f4044000000 rw-p 00000000 00:00 0 132 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me nr sd
7f408801b000 rw-p 00000000 00:00 0 11120 368 368 0 0 0 368 368 368 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd
.........
.........
7f4088af7000 r--s 043f2000 fd:01 12302 1852 1852 1852 0 0 1852 0 1852 0 0 0 0 0 0 0 0 4 4 0 rd mr me ms sd rt.jar
7f4088cc6000 rw-p 00000000 00:00 0 4824 2928 2928 0 0 0 2928 2928 2928 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd
7f408917c000 ---p 00000000 00:00 0 3840 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 mr mw me nr sd
7f408953c000 rw-p 00000000 00:00 0 3992 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd
7f408a5b3000 ---p 00000000 00:00 0 3800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 mr mw me nr sd
.........
7f408a969000 r-xp 00000000 fd:01 12261 32 32 11 32 0 0 0 32 0 0 0 0 0 0 0 0 4 4 0 rd ex mr mw me sd libzip.so
.........
7f408ad7d000 rw-p 0000b000 fd:01 3026726 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd libnss_files-2.25.so
.........
7f408afb2000 rw-p 0002e000 fd:01 12243 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd libjava.so
.........
7f408b1c0000 r--p 0000c000 fd:01 12260 8 8 8 0 0 0 8 8 8 0 0 0 0 0 0 0 4 4 0 rd mr mw me ac sd libverify.so
.........
7f408b3ca000 rw-p 00007000 fd:01 3026732 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd librt-2.25.so
.........
7f408b3cb000 r-xp 00000000 fd:01 3015258 88 80 8 80 0 0 0 80 0 0 0 0 0 0 0 0 4 4 0 rd ex mr mw me sd libgcc_s-7-20170622.so.1
.........
7f408b5e2000 r-xp 00000000 fd:01 3026718 1108 424 48 424 0 0 0 424 0 0 0 0 0 0 0 0 4 4 0 rd ex mr mw me sd libm-2.25.so
.........
7f408bc7b000 rw-p 00183000 fd:01 3015480 8 8 8 0 0 0 8 8 8 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd libstdc++.so.6.0.24
.........
7f408cc7e000 rw-p 00dfe000 fd:01 12265 164 164 164 0 0 0 164 164 164 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd libjvm.so
.........
7f408d0a2000 rw-p 001cb000 fd:01 3024787 8 8 8 0 0 0 8 8 8 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd libc-2.25.so
.........
7f408d2ab000 rw-p 00003000 fd:01 3026716 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd libdl-2.25.so
.........
7f408d2ac000 r-xp 00000000 fd:01 12229 56 56 56 0 0 56 0 56 0 0 0 0 0 0 0 0 4 4 0 rd ex mr mw me sd libjli.so
.........
7f408d6d0000 r--p 00015000 fd:01 3027525 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd mr mw me ac sd libz.so.1.2.11
.........
7f408d8ec000 rw-p 0001a000 fd:01 3026728 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd libpthread-2.25.so
.........
7f408d8f1000 r-xp 00000000 fd:01 3026695 156 156 4 156 0 0 0 156 0 0 0 0 0 0 0 0 4 4 0 rd ex mr mw me dw sd ld-2.25.so
.........
7f408db0b000 rw-s 00000000 00:29 10797082 32 32 32 0 0 0 32 32 0 0 0 0 0 0 0 0 4 4 0 rd wr sh mr mw me ms sd 31191
.........
7f408db13000 rw-p 00000000 00:00 0 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd
7f408db14000 r--p 00000000 00:00 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 rd mr mw me sd
7f408db15000 rw-p 00000000 00:00 0 8 8 8 0 0 0 8 8 8 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd
7f408db17000 r--p 00026000 fd:01 3026695 4 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd mr mw me dw ac sd ld-2.25.so
7f408db18000 rw-p 00027000 fd:01 3026695 8 8 8 0 0 0 8 8 8 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me dw ac sd ld-2.25.so
7ffc10fa6000 rw-p 00000000 00:00 0 136 36 36 0 0 0 36 36 36 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me gd ac [stack]
7ffc10fdf000 r--p 00000000 00:00 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 rd mr pf io de dd sd [vvar]
7ffc10fe2000 r-xp 00000000 00:00 0 8 4 0 4 0 0 0 4 0 0 0 0 0 0 0 0 4 4 0 rd ex mr mw me de sd [vdso]
ffffffffff600000 r-xp 00000000 00:00 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 rd ex [vsyscall]
======= ===== ===== ============ ============ ============= ============= ========== ========= ======== ============= ============== ============== =============== ==== ======= ============== =========== ======
4671532 24612 16349 10856 0 5288 8468 24612 8436 0 0 0 0 0 0 0 656 656 0 KB
I found this interesting:
55598b9d7000 rw-p 00000000 00:00 0 132 20 20 0 0 0 20 20 20 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me ac sd [heap]
7f401c000000 rw-p 00000000 00:00 0 132 4 4 0 0 0 4 4 4 0 0 0 0 0 0 0 4 4 0 rd wr mr mw me nr sd
I wonder if the huge gap of addresses between the heap and the next page is intentional since the process may need to grow the heap. Its important to note that this is not the JVM heap. This annotation is present on pmaps of non-JVM processes just as well.
Let’s also look at the system calls made by the process. Here’s the trace I dumped to a file. Excerpted:
$ strace -f java FileWrite
....... Bunch of syscalls setting up the JVM
....... Lots of syscalls trying to open .so files which don't exist
....... Some mprotect() calls
....... A few clone() calls which I imagine are spawning different JVM system threads.
....... A stat for ".hotspotrc"
....... Reads /proc/self/maps
....... A bunch of sched_getaffinity()
.......
.......
.......
....... Only in the last 94% of the trace does the FileWrite.class file actually get loaded
....... In the last 99% of the trace (I kid you not)
.......
[pid 31727] open("/tmp/file.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
[pid 31727] fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
[pid 31727] write(4, "Hello World!", 12) = 12
[pid 31727] lseek(3, 69527180, SEEK_SET) = 69527180
[pid 31727] read(3, "PK\3\4\n\0\0\10\0\0\310E\34K\26\310\316\254\226\2\0\0\226\2\0\0 \0\0\0", 30) = 30
[pid 31727] lseek(3, 69527242, SEEK_SET) = 69527242
[pid 31727] read(3, "\312\376\272\276\0\0\0004\0#\t\0\4\0\32\n\0\5\0\33\n\0\30\0\34\7\0\35\7\0\36\7"..., 662) = 662
[pid 31727] close(4) = 0
........
........ a few more system calls, and then everyone goes home.
I would like to descend into the system call code, but that’s obviously not possible using gdb or user-land tools. I’m going to punt on this too for another blog post.
Here’s what I already know about some of the system calls.
- The open() syscall is given a path. To resolve the path, Linux calls into the VFS (Virtual File System). The VFS maintains a dentry cache, but in case the inode is not present in the cache, the kernel delegates to the file system (which may be ext4, zfs, xfs, btrfs or fat etc.) to which the sub-path is mapped (usually from /etc/fstab).
- The file system in my case is usually ext4, which resolves the path by spinning the disk and reading metadata from wherever it was stored.
- Once an inode is found for the directory, for a create, a new inode must be created. Note that this inode may not be persistently written to disk immediately based on options which were used while loading the file system. ext4 writes to a journal.
- The write() syscall probably writes to the buffer cache. We have had a fair amount of fun with performance benchmarks because of this feature. At some point in the future, the ext4 filesystem promises to flush the buffer cache to the disk. An fsync() syscall can force the flushing of data for a file handle. fincore is a command that will tell you what files’ pages have been buffered. You can always ask Linux to flush to persistent storage using the command sync. Note that this is different from dropping page cache / dentry cache which is achieved by writing 1/2/3 to /proc/sys/vm/drop_caches .
-
When the filesystem (ext4 in my case) wants to write to disk, it must do so in blocks using LBA. You can get the block device size using
$ sudo blockdev --getbsz /dev/sda 4096
- Disks are usually connected using SATA which allows the disk controllers to communicate with the disks. Disks themselves usually run ARM cores with millions of lines of code which cleverly schedule the block accesses, accumulate SMART statistics, do error detection and correction etc. In the case of SSDs, they even try wear leveling and other fancy stuff which is usually proprietary to disk manufacturers. Talk about dark rabbit holes ;-)
All content on this website is licensed as Creative Commons-Attribution-ShareAlike 4.0 License. Opinions expressed are solely my own.