16 March 2022

Async IO with completion-model IO systems

Completion-model IO systems don't work naturally with the Read trait. In this post I want to explore why that is and some alternatives. Many of the issues are related to cancellation, and there is some discussion ongoing around whether we can change or amend the cancellation mechanisms in async Rust, however, I'll mostly assume here we only have the current style. I'll also propose a design for an async BufRead trait, and discuss some of the design constraints and options for implementers.

Previously: async IO fundamentals where I explain what completion-based IO is, and a proposal for async read and write traits in Rust.

Note: there are IO traits for reading and writing, but I'll just discuss reading in this post. Similar things apply to writing, but reading tends to be trickier.

Completion model reads

At a very high level, reads in completion systems work by initiating the IO by passing a buffer to the operating system, then the task waits until the IO is complete and can read data from the buffer. The buffer must be kept alive and must not be modified until the IO is complete. In Rust terms, ownership of the buffer is transferred to the OS.

This works pretty well with an async read function, except for the wrinkle of cancellation. Here's some pseudo code:

async fn foo(stream: impl async::Read) {
    let mut buf = [0; 1024];
    stream.read(&mut buf).await?;
    // Use buf
}

If the future is dropped then buf will be deleted. Unfortunately, that does not cancel the IO from the operating system's point of view, and when it writes into buf there will be a use after free error.

From a type-system point of view, the error here is that the future foo has ownership of buf and the OS mutably borrows it, whereas what should happen is that ownership is transferred to the OS and transferred back to foo when the read completes.

I see two ways to accomplish that, either the user passes an owned buffer to read, or the buffer is managed by the reader and reading is done through a BufRead trait. BufRead assumes the reader owns the buffer and that lets us handle buffer ownership independently of reading.

Reads with owned buffers

Calling a function using an owned buffer would look like:

async fn foo(stream: impl async::Read) {
    let buf = vec![0; 1024];
    let buf = stream.read(buf).await?;
    // Use buf
}

This assumes read has a signature likr async fn read(buf: Vec<u8>) -> Result<Vec<u8>>. That could be in a new trait, or it could be added to async::Read. The downside of this approach is that it ties us to a single buffer implementation. Using a trait object requires dynamic dispatch, and using generics prevents using the reader as a trait object.

The reader must keep the buffer alive for the duration of the IO, even if it is cancelled.

As a strawman design, I might expect a new trait for reading and a new trait to abstract an owned buffer, something like:

trait OwnedRead {
    async fn read(&mut self, buffer: impl OwnedReadBuf) -> Result<OwnedReadBuf>;
}

trait OwnedReadBuf {
    fn as_mut_slice(&mut self) -> *mut [u8];
    unsafe fn assume_init(&mut self, size: usize);
}

// An implementation using the initialised part of a Vector.
impl OwnedReadBuf for Vec<u8> {
    fn as_mut_slice(&mut self) -> *mut [u8] {
        &mut **self
    }

    unsafe fn assume_init(&mut self, size: usize) {
        self.truncate(size);
    }
}

As mentioned above, there is a trade-off in how to specify the type of the buffer argument to read and this might not be the right choice.

OwnedReadBuf is designed to permit reading into initialised or uninitialised memory. I'm sure the API can be improved, I just wanted to give some idea of the shape of the trait.

Adding another trait has significant downsides in terms of complexity of the API (note that it would be part of the async IO traits, but not the sync ones), it also makes writing generic code more difficult (do you require Read, ReadBuf, or ReadOwned?). An alternative would be to add OwnedRead::read to async::Read as read_owned. That would make async::Read further from Read, and make it a more complex trait. It would be easy to make a default implementation, at least.

Reads with `BufRead`

More pseudo code:

async fn foo(stream: impl Read) {
    let mut buf = vec![0; 1024];
    let mut reader = Reader(stream, buf);
    reader.read().await?;
    let buf = reader.buffer();
    // Use buf
}

Note here that the buffer is not passed in to read but is owned by the Reader. In this code, we get the buffer back out from the concrete type. We might also use utility functions on BufRead.

This is a good fit for the completion model: Reader will keep buf alive for as long as it needs and is responsible for releasing it if the read is cancelled. Note that there is still some hard work for the implementation of Reader to do: if reader is dropped before the IO completes, it must still keep buf alive for long enough. But at least it is now possible to solve (and it is the library's problem rather than the user's).

`async::BufRead`

An async BufRead trait design is pretty straightforward, following the sync version of BufRead:

pub trait BufRead: Read {
    async fn fill_buf(&mut self) -> Result<&[u8]>;
    fn consume(&mut self, amt: usize);

    async fn read_until(&mut self, byte: u8, buf: &mut Vec<u8>) -> Result<usize> { ... }
    async fn read_line(&mut self, buf: &mut String) -> Result<usize> { ... }
    #[unstable]
    async fn has_data_left(&mut self) -> Result<bool> { ... }
}

Notes:

fill_buf and consume are required methods. consume does not need to be async since it is just about buffer management, no IO will take place.
has_data_left must be async since it might fill the buffer to answer the question.
I've elided split and lines methods since these are async iterators and there are still open questions there. I assume we will add these later. Besides the questions about async iterators, I don't think there is anything too interesting about these methods.

`async::BufReader`?

BufReader is a concrete type, it's a utility type for converting objects which implement Read into objects which implement BufRead. I.e., it encapsulates a reader with a buffer to make a reader with an internal buffer. BufReader provides its own buffer and does not let the user customise it. I believe we'll need types similar to BufReader which offer the user more control over the buffer management strategy (I'll discuss that below). I'm not sure at the moment whether these more sophisticated readers should be runtime-independent or tied closely to the runtime. On one hand they need to understand the runtime's constraints on buffer lifetimes, but on the other they do not need to directly interact with the OS or the executor, as far as I can tell. In any case, this area needs further investigation.

As for BufReader itself, I think that we don't need a separate async::BufReader type, but rather we need to duplicate the impl<R: Read> BufReader<R> impl for R: async::Read and to implement async::BufRead where R: async::Read (this might be an area where async overloading is useful). There's also the question of seek_relative, but I haven't looked at async::Seek yet, so I'll leave that for later.

`async::BufWrite`

There is no synchronous BufWrite, so there is nothing to copy here. I think async::BufWrite is a useful trait to have for completion-based IO. Because for these systems the buffer's lifetime is important to manage, so we want internal buffering for that purpose, rather than for the larger API or for performance. I'll leave the design for later though since it is a more open question.

Design constraints for completion-based IO implementations

I would expect that std will only include an abstract interface for IO, such as one or more of the above traits and supporting types. I don't expect we will have any concrete implementations in std, leaving that up to async runtimes and/or other libraries. I think it is worth exploring in more depth how such systems might work in order to validate the abstract designs and the assumption that we don't need more in std. Lets start with the requirements (in rough priority order).

Soundness and safety

It is of utmost importance that any API is sound. An unsound API makes it impossible to write code which is bug-free and is a non-starter. Soundness means adhering to Rust's ownership discipline, and in the case of completion IO systems, the important aspect of that is ensuring that buffers which are 'owned' by the kernel remain alive for long enough.

APIs should minimise use of unsafe code. Most APIs should be entirely safe to use for most users. It is possible that an API might offer some unsafe API for advanced (or low level) use cases.

Memory leaks

We should avoid memory leaks; even in the face of exceptional behaviour. We also want to avoid 'weak' memory leaks where memory is kept alive for longer than necessary, even if it is eventually released. However, we do not need to guarantee no memory leaks in all circumstances (this is generally impossible, consider making a cycle of Rcs).

Destructors

Destructors in Rust are not guaranteed to run. That means we cannot rely on destructors running to guarantee the soundness of the system. However, we may expect memory to be leaked if a destructor is not run.

Blocking

We cannot have a system which blocks a thread, even for exceptional behaviour. This would break the 'async-ness' of a program.

Zero-copy reads

One of the major advantages of completion based IO is that reads can be zero-copy: the user passes a buffer to the OS, the OS writes data directly into that buffer, and notifies the user. Any IO library should strive to preserve this advantage. If the library has to copy the buffer at any point, then we've lost a major advantage of using completion IO.

Flexibility of buffer management

Somewhat of a corollary to the zero-copy requirement is the requirement for flexible buffer management. If the user has a data structure they wish to use for IO, then they should be able to pass a pointer into that data structure to the IO library and onwards to the OS, rather than being constrained in the kind of buffers which can be used for IO, which might force the user to copy data from the IO buffer into the data structure.

High-level design of buffer management

In the next few sections I'll discuss some possible designs for buffer management for completion IO systems. These designs are for concrete implementations of async IO (parts of libraries or runtimes), rather than the abstract interface that we might add to std. The designs are related to the proposals for IO traits though.

When considering Rust libraries for completion-based IO, we often talk about transferring ownership of a buffer to the OS. It is important to note that this is impossible within the Rust type system. In fact it should be considered more as a metaphor for how buffers are managed, rather than a technical constraint or goal. There must be some code in any library which has ownership of the buffer and thus responsibility for releasing its memory. This owner must keep the memory alive for long enough to satisfy the contract of the OS's IO facility.

There's obviously more to designing a library for completion IO, but I believe that managing buffers in the face of cancellation is the issue which is most fundamental from the perspective of evolving Rust's core async programming facilities.

Non-solutions

The following are either unsound or impractical

Blocking `drop`

In this scheme, the future (or an object owned by the future) owns the buffer. When the future is cancelled, the drop function blocks until IO is complete before releasing the buffer. This is unsound because Rust does not guarantee that drop will always be called. Even where drop is called, it would block the whole thread, killing asynchrony.

Async `drop`

Similar to above, but we add support for async drop to the language and rather than blocking the thread waiting for IO to complete, the destructor asynchronously awaits IO completing before dropping the buffer. This is an improvement, because it doesn't block the entire thread, but it is still unsound because async drop would not be guaranteed to run either.

Cancelling IO with the OS

It is sometimes suggested that when the future is cancelled, the IO can be cancelled with the OS. This is true, but it is more of an optimisation than a solution to the buffer ownership problem. Cancellation is also an async operation, it does not happen immediately, and the OS requires that the buffer is kept alive until either the original IO or the cancellation completes. So, you still need to find a way to release a buffer once the IO completes, but that completion should be quicker. You also need a reliable way to trigger cancellation (remember that we can't always rely on destructors running), however, since cancellation can only ever be an optimisation (we don't rely on it for soundness), that is not as important as relying on destructors to keep memory alive.

Leak the buffer

In this design, if the future is cancelled we simply leak the buffer, trusting that this won't happen often enough to OOM before the process terminates. Although this solution is clearly not acceptable, at least it is sound!

Specialist solution

The following is not a general solution, but might be useful in a restricted set of use cases

Programmer is responsible for keeping the buffer alive

We use unsafe code and leave it up to the programmer to keep buffers around. This is a C-like solution and may be acceptable in very low-level libraries but is not an acceptable solution for a general purpose library.

Possible future solutions

The following seem like they could be solutions in the future, but would need significant changes to either the language or fundamentals of the futures model to work, so are infeasible in the near term.

Non-cancellable futures

If a future is guaranteed to complete (i.e., cannot be cancelled by dropping), or if there is a pre-cancellation hook which is guaranteed to be run (c.f., drop), then managing the buffer becomes much easier (we can rely on the buffer outliving the IO call). However, this requires significant changes to Rust's async fundamentals.

Structured (async) concurrency

Structured concurrency should work well with completion IO. The nursery abstraction (or task group or whatever you want to call it) is guaranteed to live longer than its child tasks. So, if buffers are owned by the nursery, then even if the IO task is cancelled, the buffer will still live long enough (assuming the parent task is not cancelled, but I think that it is in most cases ok for the parent task to await the IO completing that the child started). However, supporting structured concurrency in Rust (at least the flavour required for correctness here, which requires guarantees about behaviour in the presence of cancellation) requires futures to be cancel-proof, so this has the same problems as that design (and structured concurrency may not be a suitable architecture for all applications).

Possible solutions

And finally here are some solutions which should work. I don't think there is one best solution and that different libraries might end up using different solutions depending on their priorities.

Pass owned buffers

In this design, the read functions accept an owned rather than a borrowed buffer. They then return the buffer when IO completes. The buffer type could be a concrete type (like Vec<u8> or a special purpose type similar to ReadBuf) or a trait. The former limits buffers to one specific type and if the user doesn't want to use that type, they must copy the data. The latter is more flexible, but it still requires that the user's buffer type can implement the trait and assumes that the buffer can be moved.

If the IO is cancelled, instead of the buffer being returned, it is released.

This design requires adding a ReadOwned trait or adding a read_owned method to Read. As well as requiring more additions to the IO traits (and additions without equivalents in the sync world), the other disadvantages of this approach are that we must choose either a generic or trait object approach and both have downsides (described above), and that a user's buffers must be movable to work with the design.

The advantage of this approach is that it is simple to use - reading is done directly with a read function and there is no setup involved, nor do users have to use BufRead.

Fully managed buffer pool

In this design, the IO library maintains a pool of buffers used for IO. The library owns the buffers (i.e., releases or recycles them when IO is complete). Either users can borrow or take ownership of the buffers (in this case, they either take some responsibility for buffer management or only use buffers temporarily. This is not flexible enough for all scenarios and in some cases the user will have to copy data from the buffer), or the buffers are an implementation detail (and the library must always copy the data to the user's buffer). Such a system is likely to be acceptable in many cases, but in some cases those costs will be unacceptable.

This design uses the BufRead trait described above. As long as using BufRead is expected, then this design is easy to use and efficient. The downside is the lack of flexibility described above.

Graveyard

Here, the IO library is not responsible for managing buffers entirely, but offers a way for the library to take ownership of the buffer if the IO is cancelled. For this to work, the buffer must be movable or reference counted, or similar. When a future is cancelled, instead of dropping the buffer, it is moved to the IO library's graveyard. The graveyard can drop it once the IO completes (or cancellation with the OS completes). The implementation of this is difficult. We cannot rely on destructors to move the buffer to the graveyard since destructors are not guaranteed to run. We must have a mechanism for reliably moving the buffer before cancellation (which is not currently possible, I believe), or a way to register the buffer in advance.

If the buffer can be 'registered' in some way with the IO library when IO starts and unregistered when it completes, then the IO library can release the buffer if it is still registered after the IO completes.

This design is more flexible than a managed buffer pool while keeping many of the benefits. Again, it uses the BufRead trait. In this case the buffer must be passed to (or registered with) the reader before reading begins (done with methods on the concrete type before reading using the trait methods), that makes this design a bit less ergonomic than the others and that pattern may not fit all use cases.

Existing solutions

Glommio currently follows the 'managed buffer pool' design, it copies buffers to implement the existing AsyncRead trait, and uses a BufRead-like approach for some file IO via a custom read_at method. In the future they would also like to support the 'passing owned buffer' design, and will experiment with an approach where the user supplies a buffer pool for IO, rather than individual buffers (not discussed above, worth looking into).

Ringbahn offers two different strategies at different levels of abstraction. At the higher level, the Ring type uses its own buffers for IO (the 'managed buffer pool' design). At the lower level, the Event type offers an unsafe method and cancellation callback ('programmer is responsible' design).

Tokio-uring uses an owned buffer trait (IoBuf). The buffer is then passed to the IO library and back to the user. This is an instance of the 'passing owned buffer' design.

Monoio also uses an owned buffer trait (IoBufMut), following the 'passing owned buffer' design.

Acknowledgements

Thank you to Yosh for general feedback, and the Glommio authors for help figuring out their buffer management strategy.

Async IO with completion-model IO systems

Completion model reads

Reads with owned buffers

Reads with `BufRead`

`async::BufRead`

`async::BufReader`?

`async::BufWrite`

Design constraints for completion-based IO implementations

Soundness and safety

Memory leaks

Destructors

Blocking

Zero-copy reads

Flexibility of buffer management

High-level design of buffer management

Non-solutions

Blocking `drop`

Async `drop`

Cancelling IO with the OS

Leak the buffer

Specialist solution

Programmer is responsible for keeping the buffer alive

Possible future solutions

Non-cancellable futures

Structured (async) concurrency

Possible solutions

Pass owned buffers

Fully managed buffer pool

Graveyard

Existing solutions

Acknowledgements

Complexity

We need to talk about RFCs

Completion model reads

Reads with owned buffers

Reads with BufRead

async::BufRead

async::BufReader?

async::BufWrite

Design constraints for completion-based IO implementations

Soundness and safety

Memory leaks

Destructors

Blocking

Zero-copy reads

Flexibility of buffer management

High-level design of buffer management

Non-solutions

Blocking drop

Async drop

Cancelling IO with the OS

Leak the buffer

Specialist solution

Programmer is responsible for keeping the buffer alive

Possible future solutions

Non-cancellable futures

Structured (async) concurrency

Possible solutions

Pass owned buffers

Fully managed buffer pool

Graveyard

Existing solutions

Acknowledgements

Reads with `BufRead`

`async::BufRead`

`async::BufReader`?

`async::BufWrite`

Blocking `drop`

Async `drop`