Thursday, October 31, 2013

How To Write A File

You probably think that you know how to write to a file. After all, it's pretty simple and you do it many times per day. You open the file, write some data to it, close the file, and you're done. Wrong. This is how you write a file:


  • Create and open a temporary file in the directory you are writing your new file to.
  • Write your data to the temporary file.
  • fsync the temporary file. (fdatasync is insufficient, it doesn't sync the size)
  • Close the temporary file.
  • If any errors happen before this point, delete the temporary file and handle the error as appropriate.
  • Rename your temporary file to the name you want your new file to have.
  • Open the directory containing both files.
  • fsync the directory's file descriptor.
  • Close the directory's file descriptor.
If you open the file that you are writing to directly, several problems happen. If your program crashes before you've finished writing your data, your system is in an inconsistent state. Worse yet, if you overwrote the contents of a previous file with the same name, you have lost that file and you are left with only part of your new file. Any other program that had the old file open will get weird errors due to the unexpected truncation to zero length. By creating a temporary file your new file has its own inode and your old file is left alone.

If you don't fsync the file before you rename it you can still get into inconsistent states. Your filesystem journals all of your metadata operations but not your data operations. That means that if you lose power after the rename has been issued, it is possible that the rename will be replayed from the journal yet the data itself was never written to disk, so it's gone forever along with your old file of the same name.

If you write to a temporary file, fsync, and rename you know that:
  • Writing to your temporary file leaves the old version of the file alone (if it exists)
  • After the fsync is complete your data has safely been written all the way through to your disk (unless you have one of those crappy disks that lie about sync, in which case you should throw it away and buy a new disk)
  • At any point immediately before, during, or after the rename operation you have exactly one version of your file and it is complete.
The only remaining problem is what happens if the system crashes before the directory entry edits from the rename get sync'ed to disk. To guarantee that the directory contents are sync'ed you must open the directory and call fsync on the returned file descriptor.

Now you know how to write a file. Go tell all your friends, I'm tired of using programs that get this wrong.