RFC 0544: Rename int/uint to isize/usize

libs (primitive)

Summary

This RFC proposes that we rename the pointer-sized integer types int/uint, so as to avoid misconceptions and misuses. After extensive community discussions and several revisions of this RFC, the finally chosen names are isize/usize.

Motivation

Currently, Rust defines two machine-dependent integer types int/uint that have the same number of bits as the target platform's pointer type. These two types are used for many purposes: indices, counts, sizes, offsets, etc.

The problem is, int/uint look like default integer types, but pointer-sized integers are not good defaults, and it is desirable to discourage people from overusing them.

And it is a quite popular opinion that, the best way to discourage their use is to rename them.

Previously, the latest renaming attempt RFC PR 464 was rejected. (Some parts of this RFC is based on that RFC.) A tale of two's complement states the following reasons:

However:

Rust was and is undergoing quite a lot of breaking changes. Even though the int/uint renaming will "break the world", it is not unheard of, and it is mainly a "search & replace". Also, a transition period can be provided, during which int/uint can be deprecated, while the new names can take time to replace them. So "to avoid breaking the world" shouldn't stop the renaming.

int/uint have a long tradition of being the default integer type names, so programmers will be tempted to use them in Rust, even the experienced ones, no matter what the documentation says. The semantics of int/uint in Rust is quite different from that in many other mainstream languages. Worse, the Swift programming language, which is heavily influenced by Rust, has the types Int/UInt with almost the same semantics as Rust's int/uint, but it actively encourages programmers to use Int as much as possible. From the Swift Programming Language:

Swift provides an additional integer type, Int, which has the same size as the current platform’s native word size: ...

Swift also provides an unsigned integer type, UInt, which has the same size as the current platform’s native word size: ...

Unless you need to work with a specific size of integer, always use Int for integer values in your code. This aids code consistency and interoperability.

Use UInt only when you specifically need an unsigned integer type with the same size as the platform’s native word size. If this is not the case, Int is preferred, even when the values to be stored are known to be non-negative.

Thus, it is very likely that newcomers will come to Rust, expecting int/uint to be the preferred integer types, even if they know that they are pointer-sized.

Not renaming int/uint violates the principle of least surprise, and is not newcomer friendly.

Before the rejection of RFC PR 464, the community largely settled on two pairs of candidates: imem/umem and iptr/uptr. As stated in previous discussions, the names have some drawbacks that may be unbearable. (Please refer to A tale of two's complement and related discussions for details.)

This RFC originally proposed a new pair of alternatives intx/uintx.

However, given the discussions about the previous revisions of this RFC, and the discussions in Restarting the int/uint Discussion, this RFC author (@CloudiDust) now believes that intx/uintx are not ideal. Instead, one of the other pairs of alternatives should be chosen. The finally chosen names are isize/usize.

Detailed Design

usize in action:

fn slice_or_fail<'b>(&'b self, from: &usize, to: &usize) -> &'b [T]

There are different opinions about which literal suffixes to use. The following section would discuss the alternatives.

Choosing literal suffixes:

isize/usize:

is/us:

Note: No matter which suffixes get chosen, it can be beneficial to reserve is as a keyword, but this is outside the scope of this RFC.

iz/uz:

i/u:

isz/usz:

After community discussions, it is deemed that using isize/usize directly as suffixes is a fine choice and there is no need to introduce other suffixes.

Advantages of isize/usize:

See Alternatives B to L for the alternatives to isize/usize that have been rejected.

Drawbacks

Drawbacks of the renaming in general:

Drawbacks of isize/usize:

Familiarity is a double edged sword here. isize/usize are chosen not because they are perfect, but because they represent a good compromise between semantic accuracy, familiarity and code readability. Given good documentation, the drawbacks listed here may not matter much in practice, and the combined familiarity and readability advantage outweighs them all.

Alternatives

A. Keep the status quo:

Which may hurt in the long run, especially when there is at least one (would-be?) high-profile language (which is Rust-inspired) taking the opposite stance of Rust.

The following alternatives make different trade-offs, and choosing one would be quite a subjective matter. But they are all better than the status quo.

B. iptr/uptr:

In the following snippet:

fn slice_or_fail<'b>(&'b self, from: &uptr, to: &uptr) -> &'b [T]

It feels like working with pointers, not integers.

C. imem/umem:

When originally proposed, mem/m are interpreted as "memory numbers" (See @1fish2's comment in RFC PR 464):

imem/umem are "memory numbers." They're good for indexes, counts, offsets, sizes, etc. As memory numbers, it makes sense that they're sized by the address space.

However this interpretation seems vague and not quite convincing, especially when all other integer types in Rust are named precisely in the "i/u + {size}" pattern, with no "indirection" involved. What is "memory-sized" anyway? But actually, they can be interpreted as _mem_ory-pointer-sized, and be a precise size specifier just like ptr.

Also, for some, imem/umem just don't feel like integers no matter how they are interpreted, especially under certain circumstances. In the following snippet:

fn slice_or_fail<'b>(&'b self, from: &umem, to: &umem) -> &'b [T]

umem still feels like a pointer-like construct here (from "some memory" to "some other memory"), even though it doesn't have ptr in its name.

D. intp/uintp and intm/uintm:

Variants of Alternatives B and C. Instead of stressing the ptr or mem part, they stress the int or uint part.

They are more integer-like than iptr/uptr or imem/umem if one knows where to split the words.

The problem here is that they don't strictly follow the i/u + {size} pattern, are of different lengths, and the more frequently used type uintp(uintm) has a longer name. Granted, this problem already exists with int/uint, but those two are names that everyone is familiar with.

So they may not be as pretty as iptr/uptr or imem/umem.

fn slice_or_fail<'b>(&'b self, from: &uintm, to: &uintm) -> &'b [T]
fn slice_or_fail<'b>(&'b self, from: &uintp, to: &uintp) -> &'b [T]

E. intx/uintx:

The original proposed names of this RFC, where x means "unknown/variable/platform-dependent".

They share the same problems with intp/uintp and intm/uintm, while in addition failing to be specific enough. There are other kinds of platform-dependent integer types after all (like register-sized ones), so which ones are intx/uintx?

F. idiff/usize:

There is a problem with isize: it most likely will remind people of C/C++ ssize_t. But ssize_t is in the POSIX standard, not the C/C++ ones, and is not for index offsets according to POSIX. The correct type for index offsets in C99 is ptrdiff_t, so for a type representing offsets, idiff may be a better name.

However, isize/usize have the advantage of being symmetrical, and ultimately, even with a name like idiff, some semantic mismatch between idiff and ptrdiff_t would still exist. Also, for fitting a casted pointer value, a type named isize is better than one named idiff. (Though both would lose to iptr.)

G. iptr/uptr and idiff/usize:

Rename int/uint to iptr/uptr, with idiff/usize being aliases and used in container method signatures.

This is for addressing the "not enough use cases covered" problem. Best of both worlds at the first glance.

iptr/uptr will be used for storing casted pointer values, while idiff/usize will be used for offsets and sizes/indices, respectively.

iptr/uptr and idiff/usize may even be treated as different types to prevent people from accidentally mixing their usage.

This will bring the Rust type names quite in line with the standard C99 type names, which may be a plus from the familiarity point of view.

However, this setup brings two sets of types that share the same underlying representations. C distinguishes between size_t/uintptr_t/intptr_t/ptrdiff_t not only because they are used under different circumstances, but also because the four may have representations that are potentially different from each other on some architectures. Rust assumes a flat memory address space and its int/uint types don't exactly share semantics with any of the C types if the C standard is strictly followed.

Thus, even introducing four names would not fix the "failing to express the precise semantics of the types" problem. Rust just doesn't need to, and shouldn't distinguish between iptr/idiff and uptr/usize, doing so would bring much confusion for very questionable gain.

H. isiz/usiz:

A pair of variants of isize/usize. This author believes that the missing e may be enough to warn people that these are not ssize_t/size_t with "Rustfied" names. But at the same time, isiz/usiz mostly retain the familiarity of isize/usize.

However, isiz/usiz still hide the actual semantics of the types, and omitting but a single letter from a word does feel too hack-ish.

fn slice_or_fail<'b>(&'b self, from: &usiz, to: &usiz) -> &'b [T]

I. iptr_size/uptr_size:

The names are very clear about the semantics, but are also irregular, too long and feel out of place.

fn slice_or_fail<'b>(&'b self, from: &uptr_size, to: &uptr_size) -> &'b [T]

J. iptrsz/uptrsz:

Clear semantics, but still a bit too long (though better than iptr_size/uptr_size), and the ptr parts are still a bit concerning (though to a much less extent than iptr/uptr). On the other hand, being "a bit too long" may not be a disadvantage here.

fn slice_or_fail<'b>(&'b self, from: &uptrsz, to: &uptrsz) -> &'b [T]

K. ipsz/upsz:

Now (and only now, which is the problem) it is clear where this pair of alternatives comes from.

By shortening ptr to p, ipsz/upsz no longer stress the "pointer" parts in anyway. Instead, the sz or "size" parts are (comparatively) stressed. Interestingly, ipsz/upsz look similar to isiz/usiz.

So this pair of names actually reflects both the precise semantics of "pointer-sized integers" and the fact that they are commonly used for "sizes". However,

fn slice_or_fail<'b>(&'b self, from: &upsz, to: &upsz) -> &'b [T]

ipsz/upsz have gone too far. They are completely incomprehensible without the documentation. Many rightfully do not like letter soup. The only advantage here is that, no one would be very likely to think he/she is dealing with pointers. iptrsz/uptrsz are better in the comprehensibility aspect.

L. Others:

There are other alternatives not covered in this RFC. Please refer to this RFC's comments and RFC PR 464 for more.

Unresolved questions

None. Necessary decisions about Rust's general integer type policies have been made in Restarting the int/uint Discussion.

History

Amended by RFC 573 to change the suffixes from is and us to isize and usize. Tracking issue for this amendment is rust-lang/rust#22496.