Async IO with completion-model IO systems
Completion-model IO systems don't work naturally with the Read
trait. In this post I want to explore why that is and some alternatives. Many of the issues are related to cancellation, and there is some discussion ongoing around whether we can change or amend the cancellation mechanisms in async Rust, however, I'll mostly assume here we only have the current style. I'll also propose a design for an async BufRead
trait, and discuss some of the design constraints and options for implementers.
Previously: async IO fundamentals where I explain what completion-based IO is, and a proposal for async read and write traits in Rust.
Note: there are IO traits for reading and writing, but I'll just discuss reading in this post. Similar things apply to writing, but reading tends to be trickier.
Completion model reads
At a very high level, reads in completion systems work by initiating the IO by passing a buffer to the operating system, then the task waits until the IO is complete and can read data from the buffer. The buffer must be kept alive and must not be modified until the IO is complete. In Rust terms, ownership of the buffer is transferred to the OS.
This works pretty well with an async read function, except for the wrinkle of cancellation. Here's some pseudo code:
async fn foo(stream: impl async::Read) {
let mut buf = [0; 1024];
stream.read(&mut buf).await?;
// Use buf
}
If the future is dropped then buf
will be deleted. Unfortunately, that does not cancel the IO from the operating system's point of view, and when it writes into buf
there will be a use after free error.
From a type-system point of view, the error here is that the future foo
has ownership of buf
and the OS mutably borrows it, whereas what should happen is that ownership is transferred to the OS and transferred back to foo
when the read completes.
I see two ways to accomplish that, either the user passes an owned buffer to read
, or the buffer is managed by the reader and reading is done through a BufRead
trait. BufRead
assumes the reader owns the buffer and that lets us handle buffer ownership independently of reading.
Reads with owned buffers
Calling a function using an owned buffer would look like:
async fn foo(stream: impl async::Read) {
let buf = vec![0; 1024];
let buf = stream.read(buf).await?;
// Use buf
}
This assumes read
has a signature likr async fn read(buf: Vec<u8>) -> Result<Vec<u8>>
. That could be in a new trait, or it could be added to async::Read
. The downside of this approach is that it ties us to a single buffer implementation. Using a trait object requires dynamic dispatch, and using generics prevents using the reader as a trait object.
The reader must keep the buffer alive for the duration of the IO, even if it is cancelled.
As a strawman design, I might expect a new trait for reading and a new trait to abstract an owned buffer, something like:
trait OwnedRead {
async fn read(&mut self, buffer: impl OwnedReadBuf) -> Result<OwnedReadBuf>;
}
trait OwnedReadBuf {
fn as_mut_slice(&mut self) -> *mut [u8];
unsafe fn assume_init(&mut self, size: usize);
}
// An implementation using the initialised part of a Vector.
impl OwnedReadBuf for Vec<u8> {
fn as_mut_slice(&mut self) -> *mut [u8] {
&mut **self
}
unsafe fn assume_init(&mut self, size: usize) {
self.truncate(size);
}
}
As mentioned above, there is a trade-off in how to specify the type of the buffer argument to read
and this might not be the right choice.
OwnedReadBuf
is designed to permit reading into initialised or uninitialised memory. I'm sure the API can be improved, I just wanted to give some idea of the shape of the trait.
Adding another trait has significant downsides in terms of complexity of the API (note that it would be part of the async IO traits, but not the sync ones), it also makes writing generic code more difficult (do you require Read
, ReadBuf
, or ReadOwned
?). An alternative would be to add OwnedRead::read
to async::Read
as read_owned
. That would make async::Read
further from Read
, and make it a more complex trait. It would be easy to make a default implementation, at least.
Reads with BufRead
More pseudo code:
async fn foo(stream: impl Read) {
let mut buf = vec![0; 1024];
let mut reader = Reader(stream, buf);
reader.read().await?;
let buf = reader.buffer();
// Use buf
}
Note here that the buffer is not passed in to read
but is owned by the Reader
. In this code, we get the buffer back out from the concrete type. We might also use utility functions on BufRead
.
This is a good fit for the completion model: Reader
will keep buf
alive for as long as it needs and is responsible for releasing it if the read is cancelled. Note that there is still some hard work for the implementation of Reader
to do: if reader
is dropped before the IO completes, it must still keep buf
alive for long enough. But at least it is now possible to solve (and it is the library's problem rather than the user's).
async::BufRead
An async BufRead
trait design is pretty straightforward, following the sync version of BufRead
:
pub trait BufRead: Read {
async fn fill_buf(&mut self) -> Result<&[u8]>;
fn consume(&mut self, amt: usize);
async fn read_until(&mut self, byte: u8, buf: &mut Vec<u8>) -> Result<usize> { ... }
async fn read_line(&mut self, buf: &mut String) -> Result<usize> { ... }
#[unstable]
async fn has_data_left(&mut self) -> Result<bool> { ... }
}
Notes:
fill_buf
andconsume
are required methods.consume
does not need to be async since it is just about buffer management, no IO will take place.has_data_left
must be async since it might fill the buffer to answer the question.- I've elided
split
andlines
methods since these are async iterators and there are still open questions there. I assume we will add these later. Besides the questions about async iterators, I don't think there is anything too interesting about these methods.
async::BufReader
?
BufReader
is a concrete type, it's a utility type for converting objects which implement Read
into objects which implement BufRead
. I.e., it encapsulates a reader with a buffer to make a reader with an internal buffer. BufReader
provides its own buffer and does not let the user customise it. I believe we'll need types similar to BufReader
which offer the user more control over the buffer management strategy (I'll discuss that below). I'm not sure at the moment whether these more sophisticated readers should be runtime-independent or tied closely to the runtime. On one hand they need to understand the runtime's constraints on buffer lifetimes, but on the other they do not need to directly interact with the OS or the executor, as far as I can tell. In any case, this area needs further investigation.
As for BufReader
itself, I think that we don't need a separate async::BufReader
type, but rather we need to duplicate the impl<R: Read> BufReader<R>
impl for R: async::Read
and to implement async::BufRead
where R: async::Read
(this might be an area where async overloading is useful). There's also the question of seek_relative
, but I haven't looked at async::Seek
yet, so I'll leave that for later.
async::BufWrite
There is no synchronous BufWrite
, so there is nothing to copy here. I think async::BufWrite
is a useful trait to have for completion-based IO. Because for these systems the buffer's lifetime is important to manage, so we want internal buffering for that purpose, rather than for the larger API or for performance. I'll leave the design for later though since it is a more open question.
Design constraints for completion-based IO implementations
I would expect that std will only include an abstract interface for IO, such as one or more of the above traits and supporting types. I don't expect we will have any concrete implementations in std, leaving that up to async runtimes and/or other libraries. I think it is worth exploring in more depth how such systems might work in order to validate the abstract designs and the assumption that we don't need more in std. Lets start with the requirements (in rough priority order).
Soundness and safety
It is of utmost importance that any API is sound. An unsound API makes it impossible to write code which is bug-free and is a non-starter. Soundness means adhering to Rust's ownership discipline, and in the case of completion IO systems, the important aspect of that is ensuring that buffers which are 'owned' by the kernel remain alive for long enough.
APIs should minimise use of unsafe code. Most APIs should be entirely safe to use for most users. It is possible that an API might offer some unsafe API for advanced (or low level) use cases.
Memory leaks
We should avoid memory leaks; even in the face of exceptional behaviour. We also want to avoid 'weak' memory leaks where memory is kept alive for longer than necessary, even if it is eventually released. However, we do not need to guarantee no memory leaks in all circumstances (this is generally impossible, consider making a cycle of Rc
s).
Destructors
Destructors in Rust are not guaranteed to run. That means we cannot rely on destructors running to guarantee the soundness of the system. However, we may expect memory to be leaked if a destructor is not run.
Blocking
We cannot have a system which blocks a thread, even for exceptional behaviour. This would break the 'async-ness' of a program.
Zero-copy reads
One of the major advantages of completion based IO is that reads can be zero-copy: the user passes a buffer to the OS, the OS writes data directly into that buffer, and notifies the user. Any IO library should strive to preserve this advantage. If the library has to copy the buffer at any point, then we've lost a major advantage of using completion IO.
Flexibility of buffer management
Somewhat of a corollary to the zero-copy requirement is the requirement for flexible buffer management. If the user has a data structure they wish to use for IO, then they should be able to pass a pointer into that data structure to the IO library and onwards to the OS, rather than being constrained in the kind of buffers which can be used for IO, which might force the user to copy data from the IO buffer into the data structure.
High-level design of buffer management
In the next few sections I'll discuss some possible designs for buffer management for completion IO systems. These designs are for concrete implementations of async IO (parts of libraries or runtimes), rather than the abstract interface that we might add to std. The designs are related to the proposals for IO traits though.
When considering Rust libraries for completion-based IO, we often talk about transferring ownership of a buffer to the OS. It is important to note that this is impossible within the Rust type system. In fact it should be considered more as a metaphor for how buffers are managed, rather than a technical constraint or goal. There must be some code in any library which has ownership of the buffer and thus responsibility for releasing its memory. This owner must keep the memory alive for long enough to satisfy the contract of the OS's IO facility.
There's obviously more to designing a library for completion IO, but I believe that managing buffers in the face of cancellation is the issue which is most fundamental from the perspective of evolving Rust's core async programming facilities.
Non-solutions
The following are either unsound or impractical
Blocking drop
In this scheme, the future (or an object owned by the future) owns the buffer. When the future is cancelled, the drop
function blocks until IO is complete before releasing the buffer. This is unsound because Rust does not guarantee that drop
will always be called. Even where drop
is called, it would block the whole thread, killing asynchrony.
Async drop
Similar to above, but we add support for async drop
to the language and rather than blocking the thread waiting for IO to complete, the destructor asynchronously awaits IO completing before dropping the buffer. This is an improvement, because it doesn't block the entire thread, but it is still unsound because async drop
would not be guaranteed to run either.
Cancelling IO with the OS
It is sometimes suggested that when the future is cancelled, the IO can be cancelled with the OS. This is true, but it is more of an optimisation than a solution to the buffer ownership problem. Cancellation is also an async operation, it does not happen immediately, and the OS requires that the buffer is kept alive until either the original IO or the cancellation completes. So, you still need to find a way to release a buffer once the IO completes, but that completion should be quicker. You also need a reliable way to trigger cancellation (remember that we can't always rely on destructors running), however, since cancellation can only ever be an optimisation (we don't rely on it for soundness), that is not as important as relying on destructors to keep memory alive.
Leak the buffer
In this design, if the future is cancelled we simply leak the buffer, trusting that this won't happen often enough to OOM before the process terminates. Although this solution is clearly not acceptable, at least it is sound!
Specialist solution
The following is not a general solution, but might be useful in a restricted set of use cases
Programmer is responsible for keeping the buffer alive
We use unsafe code and leave it up to the programmer to keep buffers around. This is a C-like solution and may be acceptable in very low-level libraries but is not an acceptable solution for a general purpose library.
Possible future solutions
The following seem like they could be solutions in the future, but would need significant changes to either the language or fundamentals of the futures model to work, so are infeasible in the near term.
Non-cancellable futures
If a future is guaranteed to complete (i.e., cannot be cancelled by dropping), or if there is a pre-cancellation hook which is guaranteed to be run (c.f., drop
), then managing the buffer becomes much easier (we can rely on the buffer outliving the IO call). However, this requires significant changes to Rust's async fundamentals.
Structured (async) concurrency
Structured concurrency should work well with completion IO. The nursery abstraction (or task group or whatever you want to call it) is guaranteed to live longer than its child tasks. So, if buffers are owned by the nursery, then even if the IO task is cancelled, the buffer will still live long enough (assuming the parent task is not cancelled, but I think that it is in most cases ok for the parent task to await the IO completing that the child started). However, supporting structured concurrency in Rust (at least the flavour required for correctness here, which requires guarantees about behaviour in the presence of cancellation) requires futures to be cancel-proof, so this has the same problems as that design (and structured concurrency may not be a suitable architecture for all applications).
Possible solutions
And finally here are some solutions which should work. I don't think there is one best solution and that different libraries might end up using different solutions depending on their priorities.
Pass owned buffers
In this design, the read functions accept an owned rather than a borrowed buffer. They then return the buffer when IO completes. The buffer type could be a concrete type (like Vec<u8>
or a special purpose type similar to ReadBuf
) or a trait. The former limits buffers to one specific type and if the user doesn't want to use that type, they must copy the data. The latter is more flexible, but it still requires that the user's buffer type can implement the trait and assumes that the buffer can be moved.
If the IO is cancelled, instead of the buffer being returned, it is released.
This design requires adding a ReadOwned
trait or adding a read_owned
method to Read
. As well as requiring more additions to the IO traits (and additions without equivalents in the sync world), the other disadvantages of this approach are that we must choose either a generic or trait object approach and both have downsides (described above), and that a user's buffers must be movable to work with the design.
The advantage of this approach is that it is simple to use - reading is done directly with a read
function and there is no setup involved, nor do users have to use BufRead
.
Fully managed buffer pool
In this design, the IO library maintains a pool of buffers used for IO. The library owns the buffers (i.e., releases or recycles them when IO is complete). Either users can borrow or take ownership of the buffers (in this case, they either take some responsibility for buffer management or only use buffers temporarily. This is not flexible enough for all scenarios and in some cases the user will have to copy data from the buffer), or the buffers are an implementation detail (and the library must always copy the data to the user's buffer). Such a system is likely to be acceptable in many cases, but in some cases those costs will be unacceptable.
This design uses the BufRead
trait described above. As long as using BufRead
is expected, then this design is easy to use and efficient. The downside is the lack of flexibility described above.
Graveyard
Here, the IO library is not responsible for managing buffers entirely, but offers a way for the library to take ownership of the buffer if the IO is cancelled. For this to work, the buffer must be movable or reference counted, or similar. When a future is cancelled, instead of dropping the buffer, it is moved to the IO library's graveyard. The graveyard can drop it once the IO completes (or cancellation with the OS completes). The implementation of this is difficult. We cannot rely on destructors to move the buffer to the graveyard since destructors are not guaranteed to run. We must have a mechanism for reliably moving the buffer before cancellation (which is not currently possible, I believe), or a way to register the buffer in advance.
If the buffer can be 'registered' in some way with the IO library when IO starts and unregistered when it completes, then the IO library can release the buffer if it is still registered after the IO completes.
This design is more flexible than a managed buffer pool while keeping many of the benefits. Again, it uses the BufRead
trait. In this case the buffer must be passed to (or registered with) the reader before reading begins (done with methods on the concrete type before reading using the trait methods), that makes this design a bit less ergonomic than the others and that pattern may not fit all use cases.
Existing solutions
Glommio currently follows the 'managed buffer pool' design, it copies buffers to implement the existing AsyncRead
trait, and uses a BufRead
-like approach for some file IO via a custom read_at
method. In the future they would also like to support the 'passing owned buffer' design, and will experiment with an approach where the user supplies a buffer pool for IO, rather than individual buffers (not discussed above, worth looking into).
Ringbahn offers two different strategies at different levels of abstraction. At the higher level, the Ring
type uses its own buffers for IO (the 'managed buffer pool' design). At the lower level, the Event
type offers an unsafe method and cancellation callback ('programmer is responsible' design).
Tokio-uring uses an owned buffer trait (IoBuf
). The buffer is then passed to the IO library and back to the user. This is an instance of the 'passing owned buffer' design.
Monoio also uses an owned buffer trait (IoBufMut
), following the 'passing owned buffer' design.
Acknowledgements
Thank you to Yosh for general feedback, and the Glommio authors for help figuring out their buffer management strategy.