RFC 1845: shared-from-slice

libs (slice | collections | dst)

Summary

This is an RFC to add the APIs: From<&[T]> for Rc<[T]> where T: Clone or T: Copy as well as From<&str> for Rc<str>. In addition: From<Vec<T>> for Rc<[T]> and From<Box<T: ?Sized>> for Rc<T> will be added.

Identical APIs will also be added for Arc.

Motivation

Caching and string interning

These, and especially the latter - i.e: From<&str>, trait implementations of From are useful when dealing with any form of caching of slices.

This especially applies to controllable string interning, where you can cheaply cache strings with a construct such as putting Rcs into HashSets, i.e: HashSet<Rc<str>>.

An example of string interning:

#![feature(ptr_eq)]
#![feature(shared_from_slice)]
use std::rc::Rc;
use std::collections::HashSet;
use std::mem::drop;

fn cache_str(cache: &mut HashSet<Rc<str>>, input: &str) -> Rc<str> {
     // If the input hasn't been cached, do it:
     if !cache.contains(input) {
         cache.insert(input.into());
     }

    // Retrieve the cached element.
    cache.get(input).unwrap().clone()
}

let first   = "hello world!";
let second  = "goodbye!";
let mut set = HashSet::new();

// Cache the slices:
let rc_first  = cache_str(&mut set, first);
let rc_second = cache_str(&mut set, second);
let rc_third  = cache_str(&mut set, second);

// The contents match:
assert_eq!(rc_first.as_ref(),  first);
assert_eq!(rc_second.as_ref(), second);
assert_eq!(rc_third.as_ref(),  rc_second.as_ref());

// It was cached:
assert_eq!(set.len(), 2);
drop(set);
assert_eq!(Rc::strong_count(&rc_first),  1);
assert_eq!(Rc::strong_count(&rc_second), 2);
assert_eq!(Rc::strong_count(&rc_third),  2);
assert!(Rc::ptr_eq(&rc_second, &rc_third));

One could imagine a scenario where you have an AST with string literals that gets repeated a lot in it. For example, namespaces in XML documents tends to be repeated many times.

The tendril crate does one form of interning:

Buffer sharing is accomplished through thread-local (non-atomic) reference counting

It is useful to provide an implementation of From<&[T]> as well, and not just for &str, because one might deal with non-utf8 strings, i.e: &[u8]. One could potentially reuse this for Path, OsStr.

Safe abstraction for unsafe code.

Providing these implementations in the current state of Rust requires substantial amount of unsafe code. Therefore, for the sake of confidence in that the implementations are safe - it is best done in the standard library.

RcBox is not public

Furthermore, since RcBox is not exposed publically from std::rc, one can't make an implementation outside of the standard library for this without making assumptions about the internal layout of Rc. The alternative is to roll your own implementation of Rc in its entirity - but this in turn requires using a lot of feature gates, which makes using this on stable Rust in the near future unfeasible.

For Arc

For Arc the synchronization overhead of doing .clone() is probably greater than the overhead of doing Arc<Box<str>>. But once the clones have been made, Arc<str> would probably be cheaper to dereference due to locality.

Most of the motivations for Rc applies to Arc as well, but the use cases might be fewer. Therefore, the case for adding the same API for Arc is less clear. One could perhaps use it for multi threaded interning with a type such as: Arc<Mutex<HashSet<Arc<str>>>>.

Because of the similarities between the layout of Rc and Arc, almost identical implementations could be added for From<&[T]> for Arc<[T]> and From<&str> for Arc<str>. It would also be consistent to do so.

Taking all of this into account, adding the APIs for Arc is warranted.

Detailed design

There's already an implementation

There is already an implementation of sorts alloc::rc for this. But it is hidden under the feature gate rustc_private, which, to the authors knowledge, will never be stabilized. The implementation is, on this day, as follows:

impl Rc<str> {
    /// Constructs a new `Rc<str>` from a string slice.
    #[doc(hidden)]
    #[unstable(feature = "rustc_private",
               reason = "for internal use in rustc",
               issue = "0")]
    pub fn __from_str(value: &str) -> Rc<str> {
        unsafe {
            // Allocate enough space for `RcBox<str>`.
            let aligned_len = 2 + (value.len() + size_of::<usize>() - 1) / size_of::<usize>();
            let vec = RawVec::<usize>::with_capacity(aligned_len);
            let ptr = vec.ptr();
            forget(vec);
            // Initialize fields of `RcBox<str>`.
            *ptr.offset(0) = 1; // strong: Cell::new(1)
            *ptr.offset(1) = 1; // weak: Cell::new(1)
            ptr::copy_nonoverlapping(value.as_ptr(), ptr.offset(2) as *mut u8, value.len());
            // Combine the allocation address and the string length into a fat pointer to `RcBox`.
            let rcbox_ptr: *mut RcBox<str> = mem::transmute([ptr as usize, value.len()]);
            assert!(aligned_len * size_of::<usize>() == size_of_val(&*rcbox_ptr));
            Rc { ptr: Shared::new(rcbox_ptr) }
        }
    }
}

The idea is to use the bulk of the implementation of that, generalize it to Vecs and slices, specialize it for &str, provide documentation for both.

Copy and Clone

For the implementation of From<&[T]> for Rc<[T]>, T must be Copy if ptr::copy_nonoverlapping is used because this relies on it being memory safe to simply copy the bits over. If instead, T::clone() is used in a loop, then T can simply be Clone instead. This is however slower than using ptr::copy_nonoverlapping.

Vec and Box

For the implementation of From<Vec<T>> for Rc<[T]>, T need not be Copy, nor Clone. The input vector already owns valid Ts, and these elements are simply copied over bit for bit. After copying all elements, they are no longer owned in the vector, which is then deallocated. Unfortunately, at this stage, the memory used by the vector can not be reused - this could potentially be changed in the future.

This is similar for Box.

Suggested implementation

The actual implementations could / will look something like:

For Rc

#[inline(always)]
unsafe fn slice_to_rc<'a, T, U, W, C>(src: &'a [T], cast: C, write_elems: W)
   -> Rc<U>
where U: ?Sized,
      W: FnOnce(&mut [T], &[T]),
      C: FnOnce(*mut RcBox<[T]>) -> *mut RcBox<U> {
    // Compute space to allocate for `RcBox<U>`.
    let susize = mem::size_of::<usize>();
    let aligned_len = 2 + (mem::size_of_val(src) + susize - 1) / susize;

    // Allocate enough space for `RcBox<U>`.
    let vec = RawVec::<usize>::with_capacity(aligned_len);
    let ptr = vec.ptr();
    forget(vec);

    // Combine the allocation address and the slice length into a
    // fat pointer to RcBox<[T]>.
    let rbp = slice::from_raw_parts_mut(ptr as *mut T, src.len())
                as *mut [T] as *mut RcBox<[T]>;

    // Initialize fields of RcBox<[T]>.
    (*rbp).strong.set(1);
    (*rbp).weak.set(1);
    write_elems(&mut (*rbp).value, src);

    // Recast to RcBox<U> and yield the Rc:
    let rcbox_ptr = cast(rbp);
    assert_eq!(aligned_len * susize, mem::size_of_val(&*rcbox_ptr));
    Rc { ptr: Shared::new(rcbox_ptr) }
}

#[unstable(feature = "shared_from_slice",
           reason = "TODO",
           issue = "TODO")]
impl<T> From<Vec<T>> for Rc<[T]> {
    /// Constructs a new `Rc<[T]>` from a `Vec<T>`.
    /// The allocated space of the `Vec<T>` is not reused,
    /// but new space is allocated and the old is deallocated.
    /// This happens due to the internal layout of `Rc`.
    ///
    /// # Examples
    ///
    /// ```
    /// #![feature(shared_from_slice)]
    /// use std::rc::Rc;
    ///
    /// let arr = [1, 2, 3];
    /// let vec = vec![Box::new(1), Box::new(2), Box::new(3)];
    /// let rc: Rc<[Box<usize>]> = Rc::from(vec);
    /// assert_eq!(rc.len(), arr.len());
    /// for (x, y) in rc.iter().zip(&arr) {
    ///     assert_eq!(**x, *y);
    /// }
    /// ```
    #[inline]
    fn from(mut vec: Vec<T>) -> Self {
        unsafe {
            let rc = slice_to_rc(vec.as_slice(), |p| p, |dst, src|
                ptr::copy_nonoverlapping(
                    src.as_ptr(), dst.as_mut_ptr(), src.len())
            );
            // Prevent vec from trying to drop the elements:
            vec.set_len(0);
            rc
        }
    }
}

#[unstable(feature = "shared_from_slice",
           reason = "TODO",
           issue = "TODO")]
impl<'a, T: Clone> From<&'a [T]> for Rc<[T]> {
    /// Constructs a new `Rc<[T]>` by cloning all elements from the shared slice
    /// [`&[T]`][slice]. The length of the reference counted slice will be exactly
    /// the given [slice].
    ///
    /// # Examples
    ///
    /// ```
    /// #![feature(shared_from_slice)]
    /// use std::rc::Rc;
    ///
    /// #[derive(PartialEq, Clone, Debug)]
    /// struct Wrap(u8);
    ///
    /// let arr = [Wrap(1), Wrap(2), Wrap(3)];
    /// let rc: Rc<[Wrap]> = Rc::from(arr.as_ref());
    /// assert_eq!(rc.as_ref(), &arr);   // The elements match.
    /// assert_eq!(rc.len(), arr.len()); // The lengths match.
    /// ```
    ///
    /// Using the [`Into`][Into] trait:
    ///
    /// ```
    /// #![feature(shared_from_slice)]
    /// use std::rc::Rc;
    ///
    /// #[derive(PartialEq, Clone, Debug)]
    /// struct Wrap(u8);
    ///
    /// let rc: Rc<[Wrap]> = arr.as_ref().into();
    /// assert_eq!(rc.as_ref(), &arr);   // The elements match.
    /// assert_eq!(rc.len(), arr.len()); // The lengths match.
    /// ```
    ///
    /// [Into]: https://doc.rust-lang.org/std/convert/trait.Into.html
    /// [slice]: https://doc.rust-lang.org/std/primitive.slice.html
    #[inline]
    default fn from(slice: &'a [T]) -> Self {
        unsafe {
            slice_to_rc(slice, |p| p, |dst, src| {
                for (d, s) in dst.iter_mut().zip(src) {
                    ptr::write(d, s.clone())
                }
            })
        }
    }
}

#[unstable(feature = "shared_from_slice",
           reason = "TODO",
           issue = "TODO")]
impl<'a, T: Copy> From<&'a [T]> for Rc<[T]> {
    /// Constructs a new `Rc<[T]>` from a shared slice [`&[T]`][slice].
    /// All elements in the slice are copied and the length is exactly that of
    /// the given [slice]. In this case, `T` must be `Copy`.
    ///
    /// # Examples
    ///
    /// ```
    /// #![feature(shared_from_slice)]
    /// use std::rc::Rc;
    ///
    /// let arr = [1, 2, 3];
    /// let rc  = Rc::from(arr);
    /// assert_eq!(rc.as_ref(), &arr);   // The elements match.
    /// assert_eq!(rc.len(), arr.len()); // The length is the same.
    /// ```
    ///
    /// Using the [`Into`][Into] trait:
    ///
    /// ```
    /// #![feature(shared_from_slice)]
    /// use std::rc::Rc;
    ///
    /// let arr          = [1, 2, 3];
    /// let rc: Rc<[u8]> = arr.as_ref().into();
    /// assert_eq!(rc.as_ref(), &arr);   // The elements match.
    /// assert_eq!(rc.len(), arr.len()); // The length is the same.
    /// ```
    ///
    /// [Into]: ../../std/convert/trait.Into.html
    /// [slice]: ../../std/primitive.slice.html
    #[inline]
    fn from(slice: &'a [T]) -> Self {
        unsafe {
            slice_to_rc(slice, |p| p, <[T]>::copy_from_slice)
        }
    }
}

#[unstable(feature = "shared_from_slice",
           reason = "TODO",
           issue = "TODO")]
impl<'a> From<&'a str> for Rc<str> {
    /// Constructs a new `Rc<str>` from a [string slice].
    /// The underlying bytes are copied from it.
    ///
    /// # Examples
    ///
    /// ```
    /// #![feature(shared_from_slice)]
    /// use std::rc::Rc;
    ///
    /// let slice = "hello world!";
    /// let rc: Rc<str> = Rc::from(slice);
    /// assert_eq!(rc.as_ref(), slice);    // The elements match.
    /// assert_eq!(rc.len(), slice.len()); // The length is the same.
    /// ```
    ///
    /// Using the [`Into`][Into] trait:
    ///
    /// ```
    /// #![feature(shared_from_slice)]
    /// use std::rc::Rc;
    ///
    /// let slice = "hello world!";
    /// let rc: Rc<str> = slice.into();
    /// assert_eq!(rc.as_ref(), slice);    // The elements match.
    /// assert_eq!(rc.len(), slice.len()); // The length is the same.
    /// ```
    ///
    /// This can be useful in doing [string interning], and caching your strings.
    ///
    /// ```
    /// // For Rc::ptr_eq
    /// #![feature(ptr_eq)]
    ///
    /// #![feature(shared_from_slice)]
    /// use std::rc::Rc;
    /// use std::collections::HashSet;
    /// use std::mem::drop;
    ///
    /// fn cache_str(cache: &mut HashSet<Rc<str>>, input: &str) -> Rc<str> {
    ///     // If the input hasn't been cached, do it:
    ///     if !cache.contains(input) {
    ///         cache.insert(input.into());
    ///     }
    ///
    ///     // Retrieve the cached element.
    ///     cache.get(input).unwrap().clone()
    /// }
    ///
    /// let first   = "hello world!";
    /// let second  = "goodbye!";
    /// let mut set = HashSet::new();
    ///
    /// // Cache the slices:
    /// let rc_first  = cache_str(&mut set, first);
    /// let rc_second = cache_str(&mut set, second);
    /// let rc_third  = cache_str(&mut set, second);
    ///
    /// // The contents match:
    /// assert_eq!(rc_first.as_ref(),  first);
    /// assert_eq!(rc_second.as_ref(), second);
    /// assert_eq!(rc_third.as_ref(),  rc_second.as_ref());
    ///
    /// // It was cached:
    /// assert_eq!(set.len(), 2);
    /// drop(set);
    /// assert_eq!(Rc::strong_count(&rc_first),  1);
    /// assert_eq!(Rc::strong_count(&rc_second), 2);
    /// assert_eq!(Rc::strong_count(&rc_third),  2);
    /// assert!(Rc::ptr_eq(&rc_second, &rc_third));
    ///
    /// [string interning]: https://en.wikipedia.org/wiki/String_interning
    fn from(slice: &'a str) -> Self {
        // This is safe since the input was valid utf8 to begin with, and thus
        // the invariants hold.
        unsafe {
            let bytes = slice.as_bytes();
            slice_to_rc(bytes, |p| p as *mut RcBox<str>, <[u8]>::copy_from_slice)
        }
    }
}

#[unstable(feature = "shared_from_slice",
           reason = "TODO",
           issue = "TODO")]
impl<T: ?Sized> From<Box<T>> for Rc<T> {
    /// Constructs a new `Rc<T>` from a `Box<T>` where `T` can be unsized.
    /// The allocated space of the `Box<T>` is not reused,
    /// but new space is allocated and the old is deallocated.
    /// This happens due to the internal layout of `Rc`.
    ///
    /// # Examples
    ///
    /// ```
    /// #![feature(shared_from_slice)]
    /// use std::rc::Rc;
    ///
    /// let arr = [1, 2, 3];
    /// let vec = vec![Box::new(1), Box::new(2), Box::new(3)].into_boxed_slice();
    /// let rc: Rc<[Box<usize>]> = Rc::from(vec);
    /// assert_eq!(rc.len(), arr.len());
    /// for (x, y) in rc.iter().zip(&arr) {
    ///     assert_eq!(**x, *y);
    /// }
    /// ```
    #[inline]
    fn from(boxed: Box<T>) -> Self {
        unsafe {
            // Compute space to allocate + alignment for `RcBox<T>`.
            let sizeb  = mem::size_of_val(&*boxed);
            let alignb = mem::align_of_val(&*boxed);
            let align  = cmp::max(alignb, mem::align_of::<usize>());
            let size   = offset_of_unsafe!(RcBox<T>, value) + sizeb;

            // Allocate the space.
            let alloc  = heap::allocate(size, align);

            // Cast to fat pointer: *mut RcBox<T>.
            let bptr      = Box::into_raw(boxed);
            let rcbox_ptr = {
                let mut tmp = bptr;
                ptr::write(&mut tmp as *mut _ as *mut * mut u8, alloc);
                tmp as *mut RcBox<T>
            };

            // Initialize fields of RcBox<T>.
            (*rcbox_ptr).strong.set(1);
            (*rcbox_ptr).weak.set(1);
            ptr::copy_nonoverlapping(
                bptr as *const u8,
                (&mut (*rcbox_ptr).value) as *mut T as *mut u8,
                sizeb);

            // Deallocate box, we've already forgotten it.
            heap::deallocate(bptr as *mut u8, sizeb, alignb);

            // Yield the Rc:
            assert_eq!(size, mem::size_of_val(&*rcbox_ptr));
            Rc { ptr: Shared::new(rcbox_ptr) }
        }
    }
}

These work on zero sized slices and vectors as well.

With more safe abstractions in the future, this can perhaps be rewritten with less unsafe code. But this should not change the API itself and thus will never cause a breaking change.

For Arc

For the sake of brevity, just use the implementation above, and replace: + slice_to_rc with slice_to_arc, + RcBox with ArcInner, + rcbox_ptr with arcinner_ptr, + Rc with Arc.

How We Teach This

The documentation provided in the impls should be enough.

Drawbacks

The main drawback would be increasing the size of the standard library.

Alternatives

  1. Only implement this for T: Copy and skip T: Clone.
  2. Let other libraries do this. This has the problems explained in the motivation section above regarding RcBox not being publically exposed as well as the amount of feature gates needed to roll ones own Rc alternative - for little gain.
  3. Only implement this for Rc and skip it for Arc.
  4. Skip this for Vec.
  5. Only implement this for Vec.
  6. Skip this for Box.
  7. Use AsRef. For example: impl<'a> From<&'a str> for Rc<str> becomes impl From<AsRef<str>> for Rc<str>. It could potentially make the API a bit more ergonomic to use. However, it could run afoul of coherence issues, preventing other wanted impls. This RFC currently leans towards not using it.
  8. Add these trait implementations of From as functions on &str like .into_rc_str() and on &[T] like .into_rc_slice(). This RFC currently leans towards using From implementations for the sake of uniformity and ergonomics. It also has the added benefit of letting you remember one method name instead of many. One could also consider String::into_boxed_str and Vec::into_boxed_slice, since these are similar with the difference being that this version uses the From trait, and is converted into a shared smart pointer instead.
  9. Also add these APIs as associated functions on Rc and Arc as follows:
impl<T: Clone> Rc<[T]> {
    fn from_slice(slice: &[T]) -> Self;
}

impl Rc<str> {
  fn from_str(slice: &str) -> Self;
}

impl<T: Clone> Arc<[T]> {
    fn from_slice(slice: &[T]) -> Self;
}

impl Arc<str> {
  fn from_str(slice: &str) -> Self;
}

Unresolved questions

impl<T> Rc<[T]> where T: Clone {
    fn make_mut_slice(this: &mut Rc<[T]>) -> &mut [T]
}