lang (ffi | dst)
Make CString
dereference to a token type CStr
, which designates
null-terminated string data.
// Type-checked to only accept C strings
fn safe_puts(s: &CStr) {
unsafe { libc::puts(s.as_ptr()) };
}
fn main() {
let s = CString::from_slice("A Rust string");
safe_puts(s);
}
The type std::ffi::CString
is used to prepare string data for passing
as null-terminated strings to FFI functions. This type dereferences to a
DST, [libc::c_char]
. The slice type as it is, however, is a poor choice
for representing borrowed C string data, since:
CString
internally (in this case plain Rust slices
must be passed with no interior NULs).CString
buffer is not the only desired source for
borrowed C string data. Specifically, it should be possible to interpret
a raw pointer, unsafely and at zero overhead, as a reference to a
null-terminated string, so that the reference can then be used safely.
However, in order to construct a slice (or a dynamically sized newtype
wrapping a slice), its length has to be determined, which is unnecessary
for the consuming FFI function that will only receive a thin pointer.
Another likely data source are string and byte string literals: provided
that a static string is null-terminated, there should be a way to pass it
to FFI functions without an intermediate allocation in CString
.As a pattern of owned/borrowed type pairs has been established
thoughout other modules (see e.g.
path reform),
it makes sense that CString
gets its own borrowed counterpart.
This proposal introduces CStr
, a type to designate a null-terminated
string. This type does not implement Sized
, Copy
, or Clone
.
References to CStr
are only safely obtained by dereferencing CString
and a few other helper methods, described below. A CStr
value should provide
no size information, as there is intent to turn CStr
into an
unsized type,
pending resolution on that proposal.
As current Rust does not have unsized types that are not DSTs, at this stage
CStr
is defined as a newtype over a character slice:
#[repr(C)]
pub struct CStr {
chars: [libc::c_char]
}
impl CStr {
pub fn as_ptr(&self) -> *const libc::c_char {
self.chars.as_ptr()
}
}
CString
is changed to dereference to CStr
:
impl Deref for CString {
type Target = CStr;
fn deref(&self) -> &CStr { ... }
}
In implementation, the CStr
value needs a length for the internal slice.
This RFC provides no guarantees that the length will be equal to the length
of the string, or be any particular value suitable for safe use.
If unsized types are enabled later one way of another, the definition
of CStr
would change to an unsized type with statically sized contents.
The authors of this RFC believe this would constitute no breakage to code
using CStr
safely. With a view towards this future change, it's recommended
to avoid any unsafe code depending on the internal representation of CStr
.
In cases when an FFI function returns a pointer to a non-owned C string,
it might be preferable to wrap the returned string safely as a 'thin'
&CStr
rather than scan it into a slice up front. To facilitate this,
conversion from a raw pointer should be added (with an inferred lifetime
as per the established convention):
impl CStr {
pub unsafe fn from_ptr<'a>(ptr: *const libc::c_char) -> &'a CStr {
...
}
}
For getting a slice out of a CStr
reference, method to_bytes
is
provided. The name is preferred over as_bytes
to reflect the linear cost
of calculating the length.
impl CStr {
pub fn to_bytes(&self) -> &[u8] { ... }
pub fn to_bytes_with_nul(&self) -> &[u8] { ... }
}
An odd consequence is that it is valid, if wasteful, to call to_bytes
on
a CString
via auto-dereferencing.
The functions c_str_to_bytes
and c_str_to_bytes_with_nul
, with their
problematic lifetime semantics, are deprecated and eventually removed
in favor of composition of the functions described above:
c_str_to_bytes(&ptr)
becomes CStr::from_ptr(ptr).to_bytes()
.
The described interface changes are implemented in crate c_string.
The change of the deref target type is another breaking change to CString
.
In practice the main purpose of borrowing from CString
is to obtain a
raw pointer with .as_ptr()
; for code which only does this and does not
expose the slice in type annotations, parameter signatures and so on,
the change should not be breaking since CStr
also provides
this method.
Making the deref target unsized throws away the length information
intrinsic to CString
and makes it less useful as a container for bytes.
This is countered by the fact that there are general purpose byte containers
in the core libraries, whereas CString
addresses the specific need to
convey string data from Rust to C-style APIs.
If the proposed enhancements or other equivalent facilities are not adopted, users of Rust can turn to third-party libraries for better convenience and safety when working with C strings. This may result in proliferation of incompatible helper types in public APIs until a dominant de-facto solution is established.
Need a Cow
?