RFC 1252: open-options

libs (platform | file)

Summary

Document and expand the open options.

Motivation

The options that can be passed to the os when opening a file vary between systems. And even if the options seem the same or similar, there may be unexpected corner cases.

This RFC attempts to

Detailed design

Access modes

Read-only

Open a file for read-only.

Write-only

Open a file for write-only.

If a file already exist, the contents of that file get overwritten, but it is not truncated. Example:

// contents of file before: "aaaaaaaa"
file.write(b"bbbb")
// contents of file after: "bbbbaaaa"

Read-write

This is the simple combinations of read-only and write-only.

Append-mode

Append-mode is similar to write-only, but all writes always happen at the end of the file. This mode is especially useful if multiple processes or threads write to a single file, like a log file. The operating system guarantees all writes are atomic: no writes get mangled because another process writes at the same time. No guarantees are made about the order writes end up in the file though.

Note: sadly append-mode is not atomic on NFS filesystems.

One maybe obvious note when using append-mode: make sure that all data that belongs together, is written the the file in one operation. This can be done by concatenating strings before passing them to write(), or using a buffered writer (with a more than adequately sized buffer) and calling flush() when the message is complete.

Implementation detail: On Windows opening a file in append-mode has one flag less, the right to change existing data is removed. On Unix opening a file in append-mode has one flag extra, that sets the status of the file descriptor to append-mode. You could say that on Windows write is a superset of append, while on Unix append is a superset of write.

Because of this append is treated as a separate access mode in Rust, and if .append(true) is specified than .write() is ignored.

Read-append

Writing to the file works exactly the same as in append-mode.

Reading is more difficult, and may involve a lot of seeking. When the file is opened, the position for reading may be set at the end of the file, so you should first seek to the beginning. Also after every write the position is set to the end of the file. So before writing you should save the current position, and restore it after the write.

try!(file.seek(SeekFrom::Start(0)));
try!(file.read(&mut buffer));
let pos = try!(file.seek(SeekFrom::Current(0)));
try!(file.write(b"foo"));
try!(file.seek(SeekFrom::Start(pos)));
try!(file.read(&mut buffer));

No access mode set

Even if you don't have read or write permission to a file, it is possible to open it on some systems by opening it with no access mode set (or the equivalent there of). This is true for Windows, Linux (with the flag O_PATH) and GNU/Hurd.

What can be done with a file opened this way is system-specific and niche. Since Linux version 2.6.39 all three operating systems support reading metadata such as the file size and timestamps.

On practically all variants of Unix opening a file without specifying the access mode falls back to opening the file read-only. This is because of the way the access flags where traditionally defined: O_RDONLY = 0, O_WRONLY = 1 and O_RDWR = 2. When no flags are set, the access mode is 0: read-only. But code that relies on this is considered buggy and not portable.

What should Rust do when no access mode is specified? Fall back to read-only, open with the most similar system-specific mode, or always fail to open? This RFC proposes to always fail. This is the conservative choice, and can be changed to open in a system-specific mode if a clear use case arises. Implementing a fallback is not worth it: it is no great effort to set the access mode explicitly.

Windows-specific

.access_mode(FILE_READ_DATA)

On Windows you can detail whether you want to have read and/or write access to the files data, attributes and/or extended attributes. Managing permissions in such detail has proven itself too difficult, and generally not worth it.

In Rust, .read(true) gives you read access to the data, attributes and extended attributes. Similarly, .write(true) gives write access to those three, and the right to append data beyond the current end of the file.

But if you want fine-grained control, with access_mode you have it.

.access_mode() overrides the access mode set with Rusts cross-platform options. Reasons to do so:

As a reference, this are the flags set by Rusts access modes:

bitflagreadwriteread-writeappendread-append
generic rights
31GENERIC_READsetsetset
30GENERIC_WRITEsetset
29GENERIC_EXECUTE
28GENERIC_ALL
specific rights
0FILE_READ_DATAimpliedimpliedimplied
1FILE_WRITE_DATAimpliedimplied
2FILE_APPEND_DATAimpliedimpliedsetset
3FILE_READ_EAimpliedimpliedimplied
4FILE_WRITE_EAimpliedimpliedsetset
6FILE_EXECUTE
7FILE_READ_ATTRIBUTESimpliedimpliedimplied
8FILE_WRITE_ATTRIBUTESimpliedimpliedsetset
standard rights
16DELETE
17READ_CONTROLimpliedimpliedimpliedsetset+implied
18WRITE_DAC
19WRITE_OWNER
20SYNCHRONIZEimpliedimpliedimpliedsetset+implied

The implied flags can be specified explicitly with the constants FILE_GENERIC_READ and FILE_GENERIC_WRITE.

Creation modes

creation modefile existsfile does not existUnixWindows
not set (open existing)openfailOPEN_EXISTING
.create(true)opencreateO_CREATOPEN_ALWAYS
.truncate(true)truncatefailO_TRUNCTRUNCATE_EXISTING
.create(true).truncate(true)truncatecreateO_CREAT + O_TRUNCCREATE_ALWAYS
.create_new(true)failcreateO_CREAT + O_EXCLCREATE_NEW + FILE_FLAG_OPEN_REPARSE_POINT

Not set (open existing)

Open an existing file. Fails if the file does not exist.

Create

.create(true)

Open an existing file, or create a new file if it does not already exists.

Truncate

.truncate(true)

Open an existing file, and truncate it to zero length. Fails if the file does not exist. Attributes and permissions of the truncated file are preserved.

Note when using the Windows-specific .access_mode(): truncating will only work if the GENERIC_WRITE flag is set. Setting the equivalent individual flags is not enough.

Create and truncate

.create(true).truncate(true)

Open an existing file and truncate it to zero length, or create a new file if it does not already exists.

Note when using the Windows-specific .access_mode(): Contrary to only .truncate(true), with .create(true).truncate(true) Windows can truncate an existing file without requiring any flags to be set.

On Windows the attributes of an existing file can cause .open() to fail. If the existing file has the attribute hidden set, it is necessary to open with FILE_ATTRIBUTE_HIDDEN. Similarly if the existing file has the attribute system set, it is necessary to open with FILE_ATTRIBUTE_SYSTEM. See the Windows-specific .attributes() below on how to set these.

Create_new

.create_new(true)

Create a new file, and fail if it already exist.

On Unix this options started its life as a security measure. If you first check if a file does not exists with exists() and then call open(), some other process may have created in the in mean time. .create_new() is an atomic operation that will fail if a file already exist at the location.

.create_new() has a special rule on Unix for dealing with symlinks. If there is a symlink at the final element of its path (e.g. the filename), open will fail. This is to prevent a vulnerability where an unprivileged process could trick a privileged process into following a symlink and overwriting a file the unprivileged process has no access to. See Exploiting symlinks and tmpfiles. On Windows this behaviour is imitated by specifying not only CREATE_NEW but also FILE_FLAG_OPEN_REPARSE_POINT.

Simply put: nothing is allowed to exist on the target location, also no (dangling) symlink.

if .create_new(true) is set, .create() and .truncate() are ignored.

Unix-specific: Mode

.mode(0o666)

On Unix the new file is created by default with permissions 0o666 minus the systems umask (see Wikipedia). It is possible to set on other mode with this option.

If a file already exist or .create(true) or .create_new(true) are not specified, .mode() is ignored.

Rust currently does not expose a way to modify the umask.

Windows-specific: Attributes

.attributes(FILE_ATTRIBUTE_READONLY | FILE_ATTRIBUTE_HIDDEN | FILE_ATTRIBUTE_SYSTEM)

Files on Windows can have several attributes, most commonly one or more of the following four: readonly, hidden, system and archive. Most others are properties set by the file system. Of the others only FILE_ATTRIBUTE_ENCRYPTED, FILE_ATTRIBUTE_TEMPORARY and FILE_ATTRIBUTE_OFFLINE can be set when creating a new file. All others are silently ignored.

It is no use to set the archive attribute, as Windows sets it automatically when the file is newly created or modified. This flag may then be used by backup applications as an indication of which files have changed.

If a new file is created because it does not yet exist and .create(true) or .create_new(true) are specified, the new file is given the attributes declared with .attributes().

If an existing file is opened with .create(true).truncate(true), its existing attributes are preserved and combined with the ones declared with .attributes().

In all other cases the attributes get ignored.

Combination of access modes and creation modes

Some combinations of creation modes and access modes do not make sense.

For example: .create(true) when opening read-only. If the file does not already exist, it is created and you start reading from an empty file. And it is questionable whether you have permission to create a new file if you don't have write access. A new file is created on all systems I have tested, but there is no documentation that explicitly guarantees this behaviour.

The same is true for .truncate(true) with read and/or append mode. Should an existing file be modified if you don't have write permission? On Unix it is undefined (see some comments on the OpenBSD mailing list). The behaviour on Windows is inconsistent and depends on whether .create(true) is set.

To give guarantees about cross-platform (and sane) behaviour, Rust should allow only the following combinations of access modes and creations modes:

creation modereadwriteread-writeappendread-append
not set (open existing)XXXXX
createXXXX
truncateXX
create and truncateXX
create_newXXXX

It is possible to bypass these restrictions by using system-specific options (as in this case you already have to take care of cross-platform support yourself). On Unix this is done by setting the creation mode using .custom_flags() with O_CREAT, O_TRUNC and/or O_EXCL. On Windows this can be done by manually specifying .access_mode() (see above).

Asynchronous IO

Out op scope.

Other options

Inheritance of file descriptors

Leaking file descriptors to child processes can cause problems and can be a security vulnerability. See this report by Python.

On Windows, child processes do not inherit file descriptors by default (but this can be changed). On Unix they always inherit, unless the close-on-exec flag is set.

The close on exec flag can be set atomically when opening the file, or later with fcntl. The O_CLOEXEC flag is in the relatively new POSIX-2008 standard, and all modern versions of Unix support it. The following table lists for which operating systems we can rely on the flag to be supported.

ossince versionoldest supported version
OS X10.610.7?
Linux2.6.232.6.32 (supported by Rust)
FreeBSD8.38.4
OpenBSD5.05.7
NetBSD6.05.0
Dragonfly BSD3.2? (3.2 is not updated since 2012-12-14)
Solaris1110

This means we can always set the flag O_CLOEXEC, and do an additional fcntl if the os is NetBSD or Solaris.

Custom flags

.custom_flags()

Windows and the various flavours of Unix support flags that are not cross-platform, but that can be useful in some circumstances. On Unix they will be passed as the variable flags to open, on Windows as the dwFlagsAndAttributes parameter.

The cross-platform options of Rust can do magic: they can set any flag necessary to ensure it works as expected. For example, .append(true) on Unix not only sets the flag O_APPEND, but also automatically O_WRONLY or O_RDWR. This special treatment is not available for the custom flags.

Custom flags can only set flags, not remove flags set by Rusts options.

For the custom flags on Unix, the bits that define the access mode are masked out with O_ACCMODE, to ensure they do not interfere with the access mode set by Rusts options.

Windows:

bitflag
31FILE_FLAG_WRITE_THROUGH
30FILE_FLAG_OVERLAPPED
29FILE_FLAG_NO_BUFFERING
28FILE_FLAG_RANDOM_ACCESS
27FILE_FLAG_SEQUENTIAL_SCAN
26FILE_FLAG_DELETE_ON_CLOSE
25FILE_FLAG_BACKUP_SEMANTICS
24FILE_FLAG_POSIX_SEMANTICS
23FILE_FLAG_SESSION_AWARE
21FILE_FLAG_OPEN_REPARSE_POINT
20FILE_FLAG_OPEN_NO_RECALL
19FILE_FLAG_FIRST_PIPE_INSTANCE
18FILE_FLAG_OPEN_REQUIRING_OPLOCK

Unix:

POSIXLinuxOS XFreeBSDOpenBSDNetBSDDragonfly BSDSolaris
O_TRUNCO_TRUNCO_TRUNCO_TRUNCO_TRUNCO_TRUNCO_TRUNCO_TRUNC
O_CREATO_CREATO_CREATO_CREATO_CREATO_CREATO_CREATO_CREAT
O_EXCLO_EXCLO_EXCLO_EXCLO_EXCLO_EXCLO_EXCLO_EXCL
O_APPENDO_APPENDO_APPENDO_APPENDO_APPENDO_APPENDO_APPENDO_APPEND
O_CLOEXECO_CLOEXECO_CLOEXECO_CLOEXECO_CLOEXECO_CLOEXECO_CLOEXECO_CLOEXEC
O_DIRECTORYO_DIRECTORYO_DIRECTORYO_DIRECTORYO_DIRECTORYO_DIRECTORYO_DIRECTORYO_DIRECTORY
O_NOCTTYO_NOCTTYO_NOCTTYO_NOCTTYO_NOCTTYO_NOCTTY
O_NOFOLLOWO_NOFOLLOWO_NOFOLLOWO_NOFOLLOWO_NOFOLLOWO_NOFOLLOWO_NOFOLLOWO_NOFOLLOW
O_NONBLOCKO_NONBLOCKO_NONBLOCKO_NONBLOCKO_NONBLOCKO_NONBLOCKO_NONBLOCKO_NONBLOCK
O_SYNCO_SYNCO_SYNCO_SYNCO_SYNCO_SYNCO_FSYNCO_SYNC
O_DSYNCO_DSYNCO_DSYNCO_DSYNCO_DSYNC
O_RSYNCO_RSYNCO_RSYNC
O_DIRECTO_DIRECTO_DIRECTO_DIRECT
O_ASYNCO_ASYNC
O_NOATIME
O_PATH
O_TMPFILE
O_SHLOCKO_SHLOCKO_SHLOCKO_SHLOCKO_SHLOCK
O_EXLOCKO_EXLOCKO_EXLOCKO_EXLOCKO_EXLOCK
O_SYMLINK
O_EVTONLY
O_NOSIGPIPE
O_ALT_IO
O_NOLINKS
O_XATTR
POSIXLinuxOS XFreeBSDOpenBSDNetBSDDragonfly BSDSolaris

Windows-specific flags and attributes

The following variables for CreateFile2 currently have no equivalent functions in Rust to set them:

DWORD                 dwSecurityQosFlags;
LPSECURITY_ATTRIBUTES lpSecurityAttributes;
HANDLE                hTemplateFile;

Changes from current

Access mode

Creation mode

Other options

Drawbacks

This adds a thin layer on top of the raw operating system calls. In this pull request the conclusion was: this seems like a good idea for a "high level" abstraction like OpenOptions.

This adds extra options that many applications can do without (otherwise they were already implemented).

Also this RFC is in line with the vision for IO in the IO-OS-redesign:

Alternatives

The first version of this RFC contained a proposal for options that control caching anf file locking. They are out of scope for now, but included here for reference.

Sharing / locking

On Unix it is possible for multiple processes to read and write to the same file at the same time.

When you open a file on Windows, the system by default denies other processes to read or write to the file, or delete it. By setting the sharing mode, it is possible to allow other processes read, write and/or delete access. For cross-platform consistency, Rust imitates Unix by setting all sharing flags.

Unix has no equivalent to the kind of file locking that Windows has. It has two types of advisory locking, POSIX and BSD-style. Advisory means any process that does not use locking itself can happily ignore the locking af another process. As if that is not bad enough, they both have problems that make them close to unusable for modern multi-threaded programs. Linux may in some very rare cases support mandatory file locking, but it is just as broken as advisory.

Windows-specific: Share mode

.share_mode(FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE)

It is possible to set the individual share permissions with .share_mode().

The current philosophy of this function is that others should have no rights, unless explicitly granted. I think a better fit for Rust would be to give all others all rights, unless explicitly denied, e.g.: .share_mode(DENY_READ | DENY_WRITE | DENY_DELETE).

Controlling caching

When dealing file file systems and hard disks, there are several kinds of caches. Giving hints or controlling them may improve performance or data consistency.

  1. read-ahead (performance of reads and overwrites) Instead of requesting only the data necessary for a single read() call from a storage device, an operating system may request more data than necessary to have it already available for the next read.
  2. os cache (performance of reads and overwrites) The os may keep the data of previous reads and writes in memory to increase the performance of future reads and possibly writes.
  3. os staging area (convenience/performance of reads and writes) The size and alignment of data reads and writes to a disk should correspondent to sectors on the storage device, usually 512 or 4096 bytes. The os makes sure a regular write() or read() doesn't have to care about this. For example a small write (say a 100 bytes) has to rewrite a whole sector. The os often has the surrounding data in its cache and can efficiently combine it to write the whole sector.
  4. delayed writing (performance/correctness of writes) The os may delay writes to improve performance, for example by batching consecutive writes, and scheduling with reads to minimize seeking.
  5. on-disk write cache (performance/correctness of writes) Most hard disk / storage devices have a small RAM cache. It can speed up reads, and writes can return as soon as the data is written to the devices cache.

Read-ahead hint

.read_ahead_hint(enum CacheHint)

enum ReadAheadHint {
    Default,
    Sequential,
    Random,
}

If you read a file sequentially the read-ahead is beneficial, for completely random access it can become a penalty.

This option is treated as a hint. It is ignored if the os does not support it, or if the behaviour of the application proves it is set wrong.

Open flags / system calls:

OS cache

used_once(true)

When reading many gigabytes of data a process may push useful data from other processes out of the os cache. To keep the performance of the whole system up, a process could indicate to the os whether data is only needed once, or not needed anymore. On Linux, FreeBSD and NetBSD this is possible with fcntl POSIX_FADV_DONTNEED after a read or write with sync (or before close). On FreeBSD and NetBSD it is also possible to specify this up-front with fnctl POSIX_FADV_NOREUSE, and on OS X with fnctl F_NOCACHE. Windows does not seem to provide an option for this.

This option may negatively effect the performance of writes smaller than the sector size, as cached data may not be available to the os staging area.

This control over the os cache is the main reason some applications use direct io, despite it being less convenient and disabling other useful caches.

Delayed writing and on-disk write cache

.sync_data(true) and .sync_all(true)

There can be two delays (by the os and by the disk cache) between when an application performs a write, and when the data is written to persistent storage. They increase performance, but increase the risk of data loss in case of a systems crash or power outage.

When dealing with critical data, it may be useful to control these caches to make the chance of data loss smaller. The application should normally do so by calling Rusts stand-alone functions sync_data() or sync_all() at meaningful points (e.g. when the file is in a consistent state, or a state it can recover from).

However, .sync_data() and .sync_all() may also be given as an open option. This guarantees every write will not return before the data is written to disk. These options improve reliability as and you can never accidentally forget a sync.

Whether perfermance with these options is worse than with the stand-alone functions is hard to say. With these options the data maybe has to be synchronised more often. But the stand-alone functions often sync outstanding writes to all files, while the options possibly sync only the current file.

The difference between .sync_all() and .sync_data(true) is that .sync_data(true) does not update the less critical metadata such as the last modified timestamp (although it will be written eventually).

Open flags:

If a system does not support syncing only data, this option will fall back to syncing both data and metadata. If .sync_all(true) is specified, .sync_data() is ignored.

Direct access / no caching

Most operating systems offer a mode that reads data straight from disk to an application buffer, or that writes straight from a buffer to disk. This avoid the small cost of a memory copy. It has the side effect that the data is not available to the os to provide caching. Also, because this does not use the os staging area all reads and writes have to take care of data sizes and alignment themselves.

Overview:

Open flags / system calls:

The other options offer a more fine-grained control over caching, and usually offer better performance or correctness guarantees. This option is sometimes used by applications as a crude way to control (disable) the os cache.

Rust should not currently expose this as an open option, because it should be used with an abstraction / external crate that handles the data size and alignment requirements. If it should be used at all.

Unresolved questions

None.