RFC 1317: ide

tools (language-server)

Summary

This RFC describes the Rust Language Server (RLS). This is a program designed to service IDEs and other tools. It offers a new access point to compilation and APIs for getting information about a program. The RLS can be thought of as an alternate compiler, but internally will use the existing compiler.

Using the RLS offers very low latency compilation. This allows for an IDE to present information based on compilation to the user as quickly as possible.

Requirements

To be concrete about the requirements for the RLS, it should enable the following actions:

These requirements will be covered in more detail in later sections.

History note

This RFC started as a more wide-ranging RFC. Some of the details have been scaled back to allow for more focused and incremental development.

Parts of the RFC dealing with robust compilation have been removed - work here is ongoing and mostly doesn't require an RFC.

The RLS was earlier referred to as the oracle.

Motivation

Modern IDEs are large and complex pieces of software; creating a new one from scratch for Rust would be impractical. Therefore we need to work with existing IDEs (such as Eclipse, IntelliJ, and Visual Studio) to provide functionality. These IDEs provide excellent editor and project management support out of the box, but know nothing about the Rust language. This information must come from the compiler.

An important aspect of IDE support is that response times must be extremely quick. Users expect some feedback as they type. Running normal compilation of an entire project is far too slow. Furthermore, as the user is typing, the program will not be a valid, complete Rust program.

We expect that an IDE may have its own lexer and parser. This is necessary for the IDE to quickly give parse errors as the user types. Editors are free to rely on the compiler's parsing if they prefer (the compiler will do its own parsing in any case). Further information (name resolution, type information, etc.) will be provided by the RLS.

Requirements

We stated some requirements in the summary, here we'll cover more detail and the workflow between IDE and RLS.

The RLS should be safe to use in the face of concurrent actions. For example, multiple requests for compilation could occur, with later requests occurring before earlier requests have finished. There could be multiple clients making requests to the RLS, some of which may mutate its data. The RLS should provide reliable and consistent responses. However, it is not expected that clients are totally isolated, e.g., if client 1 updates the program, then client 2 requests information about the program, client 2's response will reflect the changes made by client 1, even if these are not otherwise known to client 2.

Show compilation errors and warnings, updated as the user types

The IDE will request compilation of the in-memory program. The RLS will compile the program and asynchronously supply the IDE with errors and warnings.

Code completion as the user types

The IDE will request compilation of the in-memory program and request code- completion options for the cursor position. The RLS will compile the program. As soon as it has enough information for code-completion it will return options to the IDE.

Highlight all references to an item

The IDE requests all references in the same file based on a position in the file. The RLS returns a list of spans.

Find all references to an item

The IDE requests all references based on a position in the file. The RLS returns a list of spans.

Jump to definition

The IDE requests the definition of an item based on a position in a file. The RLS returns a list of spans (a list is necessary since, for example, a dynamically dispatched trait method could be defined in multiple places).

Detailed design

Architecture

The basic requirements for the architecture of the RLS are that it should be:

The RLS will be a long running daemon process. Communication between the RLS and an IDE will be via IPC calls (tools (for example, Racer) will also be able to use the RLS as an in-process library.). The RLS will include the compiler as a library.

The RLS has three main components - the compiler, a database, and a work queue.

The RLS accepts two kinds of requests - compilation requests and queries. It will also push data to registered programs (generally triggered by compilation completing). Essentially, all communication with the RLS is asynchronous (when used as an in-process library, the client will be able to use synchronous function calls too).

The work queue is used to sequentialise requests and ensure consistency of responses. Both compilation requests and queries are stored in the queue. Some compilation requests can cause earlier compilation requests to be canceled. Queries blocked on the earlier compilation then become blocked on the new request.

In the future, we should move queries ahead of compilation requests where possible.

When compilation completes, the database is updated (see below for more details). All queries are answered from the database. The database has data for the whole project, not just one crate. This also means we don't need to keep the compiler's data in memory.

Compilation

The RLS is somewhat parametric in its compilation model. Theoretically, it could run a full compile on the requested crate, however this would be too slow in practice.

The general procedure is that the IDE (or other client) requests that the RLS compile a crate. It is up to the IDE to interact with Cargo (or some other build system) in order to produce the correct build command and to ensure that any dependencies are built.

Initially, the RLS will do a standard incremental compile on the specified crate. See RFC PR 1298 for more details on incremental compilation.

The crate being compiled should include any modifications made in the client and not yet committed to a file (e.g., changes the IDE has in memory). The client should pass such changes to the RLS along with the compilation request.

I see two ways to improve compilation times: lazy compilation and keeping the compiler in memory. We might also experiment with having the IDE specify which parts of the program have changed, rather than having the compiler compute this.

Lazy compilation

With lazy compilation the IDE requests that a specific item is compiled, rather than the whole program. The compiler compiles this function compiling other items only as necessary to compile the requested item.

Lazy compilation should also be incremental - an item is only compiled if required and if it has changed.

Obviously, we could miss some errors with pure lazy compilation. To address this the RLS schedules both a lazy and a full (but still incremental) compilation. The advantage of this approach is that many queries scheduled after compilation can be performed after the lazy compilation, but before the full compilation.

Keeping the compiler in memory

There are still overheads with the incremental compilation approach. We must startup the compiler initialising its data structures, we must parse the whole crate, and we must read the incremental compilation data and metadata from disk.

If we can keep the compiler in memory, we avoid these costs.

However, this would require some significant refactoring of the compiler. There is currently no way to invalidate data the compiler has already computed. It also becomes difficult to cancel compilation: if we receive two compile requests in rapid succession, we may wish to cancel the first compilation before it finishes, since it will be wasted work. This is currently easy - the compilation process is killed and all data released. However, if we want to keep the compiler in memory we must invalidate some data and ensure the compiler is in a consistent state.

Compilation output

Once compilation is finished, the RLS's database must be updated. Errors and warnings produced by the compiler are stored in the database. Information from name resolution and type checking is stored in the database (exactly which information will grow with time). The analysis information will be provided by the save-analysis API.

The compiler will also provide data on which (old) code has been invalidated. Any information (including errors) in the database concerning this code is removed before the new data is inserted.

Multiple crates

The RLS does not track dependencies, nor much crate information. However, it will be asked to compile many crates and it will keep track of which crate data belongs to. It will also keep track of which crates belong to a single program and will not share data between programs, even if the same crate is shared. This helps avoid versioning issues.

Versioning

The RLS will be released using the same train model as Rust. A version of the RLS is pinned to a specific version of Rust. If users want to operate with multiple versions, they will need multiple versions of the RLS (I hope we can extend multirust/rustup.rs to handle the RLS as well as Rust).

Drawbacks

It's a lot of work. But better we do it once than each IDE doing it themselves, or having sub-standard IDE support.

Alternatives

The big design choice here is using a database rather than the compiler's data structures. The primary motivation for this is the 'find all references' requirement. References could be in multiple crates, so we would need to reload incremental compilation data (which must include the serialised MIR, or something equivalent) for all crates, then search this data for matching identifiers. Assuming the serialisation format is not too complex, this should be possible in a reasonable amount of time. Since identifiers might be in function bodies, we can't rely on metadata.

This is a reasonable alternative, and may be simpler than the database approach. However, it is not planned to output this data in the near future (the initial plan for incremental compilation is to not store information required to re- check function bodies). This approach might be too slow for very large projects, we might wish to do searches in the future that cannot be answered without doing the equivalent of a database join, and the database simplifies questions about concurrent accesses.

We could only provide the RLS as a library, rather than providing an API via IPC. An IPC interface allows a single instance of the RLS to service multiple programs, is language-agnostic, and allows for easy asynchronous-ness between the RLS and its clients. It also provides isolation - a panic in the RLS will not cause the IDE to crash, not can a long-running operation delay the IDE. Most of these advantages could be captured using threads. However, the cost of implementing an IPC interface is fairly low and means less effort for clients, so it seems worthwhile to provide.

Extending this idea, we could do less than the RLS - provide a high-level library API for the Rust compiler and let other projects do the rest. In particular, Racer does an excellent job at providing the information the RLS would provide without much information from the compiler. This is certainly less work for the compiler team and more flexible for clients. On the other hand, it means more work for clients and possible fragmentation. Duplicated effort means that different clients will not benefit from each other's innovations.

The RLS could do more - actually perform some of the processing tasks usually done by IDEs (such as editing source code) or other tools (refactoring, reformating, etc.).

Unresolved questions

A problem is that Visual Studio uses UTF16 while Rust uses UTF8, there is (I understand) no efficient way to convert between byte counts in these systems. I'm not sure how to address this. It might require the RLS to be able to operate in UTF16 mode. This is only a problem with byte offsets in spans, not with row/column data (the RLS will supply both). It may be possible for Visual Studio to just use the row/column data, or convert inefficiently to UTF16. I guess the question comes down to should this conversion be done in the RLS or the client. I think we should start assuming the client, and perhaps adjust course later.

What kind of IPC protocol to use? HTTP is popular and simple to deal with. It's platform-independent and used in many similar pieces of software. On the other hand it is heavyweight and requires pulling in large libraries, and requires some attention to security issues. Alternatives are some kind of custom prototcol, or using a solution like Thrift. My prefernce is for HTTP, since it has been proven in similar situations.