Introduction

Interoperation with other languages is much easier in Rust than in most languages. However, it is often still the most difficult part of adopting Rust (especially if you need to interoperate with C++ rather than just C). Cross-language interop is often generically termed FFI (which literally means foreign function interface, but is used to mean all aspects of interop).

The good news is that interop is extremely low cost (interop with C is cheap and interop with other languages is no more expensive than interop with C) and the fundamentals (ABI compatibility of many datatypes and functions, extern declarations, etc.) are built in to Rust. In many cases, there is no need for data marshalling or serialization, or adaptation for calling conventions, etc. Calling a Rust function from C or vice versa is no more expensive than calling a C function in a different library (and since LTO works across the language boundary, it can even be as cheap as a within the same file).

Generally speaking, Rust is ABI-compatible with C. That means that Rust can interoperate with any language which can interoperate with C (though FFI with languages other than C is likely to be more complex and to have some runtime overhead). There is community support for interop with C++, Ruby, Javascript, and Python. Interop with .net and Java is supported via P/Invoke and JNI respectively.

There are three challenges with Rust and foreign language interop: building the project, ensuring mechanical compatibility between types and functions, and ensuring that Rust's safety invariants are upheld. All FFI calls are unsafe in Rust. To ensure correctness, invariants around ownership and uniqueness, thread safety, and panics must be ensured at the FFI boundary.

Architecture

The more well-defined the boundary between Rust and foreign code, the easier things will be. At the limit, if your Rust and foreign code can live in different processes (i.e., are different programs compiled separately) and can communicate via some form of IPC then you won't have to worry about a lot of the issues with interop at all! Rust has great support for serialization/deserialization, gRPC, and other IPC/RPC technologies which can facilitate this.

If you need Rust and foreign code in the same process, then they should be used in separate 'components' of your design. Do not attempt to have Rust and foreign code interoperate in a fine-grained way within a single component. If you are migrating from another language to Rust, plan the migration on a per-component rather than a per-file basis. It is worth putting some up-front effort into designing the API of these components and the language boundary. As well as the usual API design issues, making the API coarse-grained (i.e., avoiding many calls), using simple datatypes (the more C-like, the better) with simple invariants, and avoiding bidirectional interaction will make FFI issues simpler.

Using a generic FFI option, such as COM/WinRT, is a good option if components can be separate to this extent. You will still have to consider safety issues, but the mechanical issues of corresponding types, etc., are much easier. The windows-rs crate offers good support for COM and WinRT.

In terms of dependencies, Rust code can be either upstream (e.g., R -> C) or downstream (C -> R) of foreign code (it is possible to have many layers of dependencies, e.g., C -> R -> C but each dependency can be considered separately). It is possible to have Rust code embedded in a foreign library and thus have a bidirectional dependency, however, you should avoid this! It is difficult to manage and build the code, and makes interop error-prone.

In other words, you can think of interoperating code as either exposing a Rust API to C code (or other languages which interoperate with a C ABI), or as exposing a C API to Rust code. The former is usually encountered when writing a Rust component which can be used from other languages, the latter when new Rust code must interoperate with legacy code.

Using Rust code from C

When designing a Rust library to be used from other languages, the design depends on whether the library is only designed to be used from other languages or if it is meant to be used from Rust code too (and in this case, whether the usage from Rust or from other languages is primary). If the Rust code will only be used from other languages, then design a crate with no public items other than extern ones which are C-compatible. If the Rust code must be used from both Rust and other languages, then it is usually better to have a pure Rust crate and a second wrapper crate which provides the C API. If the primary consumer will be Rust code, then design the Rust crate to have a Rust-idiomatic API; the wrapper crate may need to do considerable work to project a C API. If the primary consumer will be other languages, then design the API of the Rust crate to be C-idiomatic (but expressed in Rust), and the wrapper crate can be a thin wrapper (perhaps entirely auto-generated by CBindgen).

Using C code from Rust

When wrapping a foreign library for use in Rust, consider writing a first layer in C (especially if the legacy code is C++) with an API better suited for interacting with Rust. Then have a crate which is only bindings of C code into Rust (either hand-written or auto-generated). The next layer is a crate which only has the functionality of the foreign library (i.e., no client logic), but presented in a Rust-idiomatic way. The bindings crate will be all unsafe, the idiomatic crate should aim to have a 100% safe API. Clients should only use the idiomatic crate and never use the bindings crate (some advanced usages may require using the bindings in unanticipated ways, however these clients should create safe abstractions of their own rather than use the bindings directly). If following this pattern, it is common to give the idiomatic Rust crate the same name as the foreign library, and the bindings library the same name with the -sys suffix, e.g., foo and foo-sys. (On the topic of naming, it is idiomatic to always avoid using an -rs suffix on any Rust crate: it is nearly always obvious from context that the crate is a Rust library, so -rs usually adds nothing).

------------------------
     C/C++ library          libfoo
------------------------
       C wrapper            libfoo-ffi
------------------------
 Rust bindings (unsafe)     foo-sys
------------------------
Rust wrapper (idiomatic)    foo
------------------------
      Rust users
------------------------

Building

If you have a mostly Rust project with some foreign libraries, you should use Cargo. If you have a project with only a small amount of Rust, then you will probably want to use the existing build system and will need to find a way to integrate the Rust build into it. Integrating Cargo and rustc with other build systems is a big topic, and this section will only be a brief summary.

To build foreign libraries inside a Cargo project, the usual approach is to orchestrate the foreign builds from build.rs. The CC crate is often used to build C/C++/ASM from a build script.

To build Rust code from a different build system you have several options, depending on your project's constraints. The simplest approach is to have the build system just call cargo build, however this means the build system treats the whole Rust build as a black box, that Cargo will need network access (or you can vendor the crates, see below), and if you have multiple Rust crates they will not share dependencies (unless they can all be built with a single Cargo invocation).

Another approach is to use cargo vendor to compute and download dependencies and keep these checked-in to version control ('vendored'). Building the Rust sub-project can be handled by the build system which will call rustc directly.

There are also more sophisticated, part-automated approaches available for some build systems. E.g., cargo-raze for Bazel or reindeer for Buck.

You will probably want to build a cdylib rather than the default rlib or dylib. That is because a cdylib uses the C ABI rather than Rust's unstable ABI.

It is common to end up with multiple disjoint components in each language within a single project. You probably don't want to 'split' the project by language (e.g., having a single Cargo project for all Rust code or having a high-level 'rust' directory). It is usually better to have independent builds for each component (i.e., one Cargo project for each Rust component and separate sub-projects for non-Rust components), and the main library/application build system composes the output of each sub-project.

As well as promoting more componentized design, this has practical benefits for Cargo feature propagation, dependency versioning, etc. However, it might make builds slower because there is less sharing of artefacts.

Bindings and types

Bindings between Rust and foreign functions can either be hand-written or auto-generated. To generate bindings for C/C++ functions in Rust, use bindgen. To generate bindings for Rust functions in C, use cbindgen. These tools can either be called from build.rs to create bindings on the fly, or used from the command line to generate bindings which can be adjusted and checked in to version control (the latter being a good compromise between generated and hand-written bindings).

Choosing hand-written or generated bindings is a trade-off. Automatically generated bindings are less work, stay up to date if the foreign code changes, and are more likely to be bug-free. Getting all the types right in bindings is sometimes subtle and tricky, and is not checked at compile time. Furthermore, some bindings can be target-dependent, so any approach which does not generate bindings with knowledge of the target platform has an increased likelihood of bugs.

On the other hand, hand-written bindings can sometimes be higher quality since the programmer has more knowledge of how the code is used, and binding-generating tools have limitations including around modularity (bindgen does not expect to run multiple times in a single project and therefore types which are logically the same will have multiple definitions which can lead to incompatibility).

We recommend using auto-generated bindings where possible. In particular, wrapping generated bindings with idiomatic Rust code is less fragile in the face of change or consistency issues than trying to write better bindings by hand.

Another approach if you really need custom bindings but have significant amount of code (or target-dependence) is to write your own bindings generators, either from scratch or by forking bindgen. This is more reasonable if you have some source of truth for the generated bindings other than C headers.

Whether bindings are hand-written or auto-generated, they must follow the same rules and idioms.

To call a foreign function from Rust, it must be redeclared in a Rust module inside an extern block, e.g.:

#![allow(unused)]
fn main() {
#[link(name = "some_c_library")]
extern {
    fn callable_from_rust();
} 
}

To expose a Rust function to C code, declare it using the extern keyword in its signature. Any extern function should use the #[no_mangle] attribute to prevent name mangling, e.g.:

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern fn callable_from_c() {
    ...
}
}

For primitive types (e.g., long, double) in these bindings, it is recommended to use the type aliases in the libc crate which match with C types. Libc also provides Rust versions of non-primitive types used in the C standard library; windows-rs provides similar Windows-specific types.

Rust integers, floats, and booleans correspond with C equivalents and no conversion is required (see the aliases in libc for the correspondence between Rust and C types). Note that booleans in Rust must be either 0 or 1, technically this is true in C/C++ too, however, it is common to use integers as booleans and to treat any non-zero value as true. You must ensure that a value is 0 or 1 before treating it as a Rust bool.

Rust raw pointers can correspond with C pointers. Use std::ffi::c_void for void pointers. 'Opaque pointers' (where the pointee is only used in one language) can be handled trivially. If the pointee is to be accessed from multiple languages, then you must consider the pointee type for compatibility.

Treating objects as opaque is a common idiom for interop (and in C++). For foreign types which should be opaque in Rust, you can use a struct with a single private field which is a zero-sized array (there used to be advice to use a zero-variant enum for opaque types but that is no longer recommended because it can lead to UB in some circumstances (because the compiler might assume a zero variant enum can never be created)). If you must pass an opaque struct by value, then you can make it the correct size (though this is obviously fragile). For Rust types which should be opaque in C, you can declare but not define the type. Both bindgen and cxx have built-in support for such opaque types.

Slices in Rust combine a pointer to data with the length of the slice into a wide pointer. These components can be passed to C for use as an array without any deep conversion. The slice must be disassembled when passed to C, and if an array is passed to Rust, then it can be re-assembled (see the FFI omnibus for details).

User-defined Rust types (structs, unions, and some enums) can be passed to foreign code and accessed there. You will need to declare structs and unions using #[repr(C)] (or rarely #[repr(packed)]), and ensure that all field types are C-compatible.

Only enums with no fields are C-compatible. You must specify the type of the determinant and may want to specify the values of variants.

Other Rust types should not be passed to C, unless they will be treated completely opaquely. This includes zero-sized types, trait objects, other dynamically sized types (such as slices and strings without being adapted), tuples, and enums with fields (technically it is possible to share enums with fields which are #[repr(C)] but the correspondence between C and Rust types is complicated and we advise against it).

Consider the traits derived for types which will cross the FFI boundary (e.g., Send, Sync, Copy, Clone, Default, Debug). These can affect the semantics of the types in Rust (e.g., Copy), can affect how tools generate bindings, and/or affect the ways in which types must be handled in foreign code (e.g, if a type does not implement Send then it must not be moved between threads even in foreign code where this is not enforced by the compiler). If you're using a tool to generate bindings, the documentation for that tool should have more details.

Error handling

It is never OK to unwind across the FFI boundary, therefore neither Rust panics nor C++ exceptions can be used. Rust's Result type is an enum with fields and therefore cannot cross the FFI boundary. This all makes error handling somewhat challenging. I don't think there is a general solution, you basically just have to do whatever fits best with the C code and convert that error handling to idiomatic Rust error handling as part of the wrapping of the FFI bindings into idiomatic Rust (e.g., implement a set of functions and macros to convert a C error code into a Rust Result).

Safety

All foreign code is considered unsafe by Rust. Therefore, working with foreign code is intimately related to working with unsafe code. If you are writing code which involves FFI you should have a good understanding of unsafe code in Rust. That is a big topic! Too big to cover in depth here, but I'll try and cover some of the basics and some of the interop-specific parts. See the resources below - the Nomicon is probably the best place to start.

Unsafe code does not give the programmer permission to violate Rust's safety invariants. Unsafe code requires the programmer to uphold those invariants rather than relying on the compiler to check them. Safety is not a local property, it is possible to do things in unsafe code which cause runtime errors in safe code. Safety is often subtle and unintuitive to reason about, see this blog post for some examples. The programmer must therefore carefully consider safety for any data which passes the FFI boundary, including how it is accessed in foreign code.

When a function is marked unsafe then it's whole body is treated as unsafe code, however, there is a big difference between an unsafe function and a safe function with an unsafe block - the former is unsafe to call, the latter is safe to call. You should make a function unsafe if the caller must help maintain safety invariants in any way. Making a function safe (with or without internal unsafety) indicates that the library and compiler will ensure safety with no requirements on the caller.

Safety invariants must be enforced at the boundary between safe and unsafe code. When interoperating with foreign code that means that safety invariants must be established as part of the FFI boundary. There are several techniques for helping to ensure safety at the boundary:

  • runtime assertions (e.g., asserting that a pointer is non-null),
  • types (both Rust and foreign types can encode information which can help ensure invariants),
  • documentation (clearly documenting safety invariants makes them easier to understand and maintain).

Ultimately, we rely on invariants being upheld in foreign code which the Rust compiler cannot check. This is mostly up to the programmer, but can be helped with the above techniques.

Safety in the context of unsafe Rust specifically means memory safety. This can be divided into a few areas which might feel disjoint:

Uniqueness and mutability invariants around pointers

Rust's key invariant for ensuring memory safety is that all values must be immutable or unique. This property can be ensured statically or dynamically, but must always be upheld. Even in foreign code, this invariant must be respected, at least as far as it is observable to Rust code. I.e., if Rust code has a reference to a value, then foreign code must not mutate that value unless it can be guaranteed that the value cannot be read by the Rust code.

Pointer validity invariants

If a raw pointer may be dereferenced in Rust code or converted to a safe reference, then it must be valid. Since it is usually too late to ensure validity at the point of dereference/conversion, the validity requirement must be well-documented at all points where the pointer is passed, in particular at any FFI boundary. Some aspects of validity can be checked with assertions and the FFI boundary is usually the right place to do that.

Pointer validity includes:

  • pointers must be non-null,
  • pointers must point to initialised data which has not been deallocated,
  • pointers must point to well-aligned data,
  • if the size of a value derived from the pointer's type (including any padding) is n bytes, then the pointer must point to at least n bytes from a single allocated object.

Thread safety invariants

You must ensure that data which is not Send is not passed between threads and data which is not Sync is not shared between threads, even in foreign code. Furthermore, if dealing with multi-threaded code, the uniqueness and mutability invariant will be especially difficult to uphold. Therefore, it is easiest if Rust data is always kept on a single thread in foreign code.

Panics

Stack unwinding due to Rust panics, C++ exceptions, or any other cause, must never cross the FFI boundary. On the Rust side, you can use catch_unwind to help with this. Note that when catching panics, exceptions etc., you must ensure that no data is left in an inconsistent state. That is often impossible to achieve and aborting the thread or process is the only reasonable behaviour.

Derived safety invariants

Many types have their own invariants required in order to preserve safety. These are usually not exposed to the user, except in unsafe functions where some requirements on the caller should be documented. All such requirements must be satisfied even if the function is called from foreign code. In addition foreign code may be able to create objects in ways which are impossible in Rust (e.g., by deserialization or casting from raw bytes). In these cases, you must ensure all invariants are properly established (this can be difficult since if these invariants are not user-facing in Rust code they may be poorly documented).

A good example of a 'derived' invariant is utf-8 validity. Rust strings must always be valid utf-8 and this is relied upon to ensure memory safety, even though utf-8 validity is not directly a memory safety issue. Whenever you create a Rust string, you must ensure it is valid utf-8 (see the String docs for details).

Memory management

There are several aspects of the object lifecycle to consider: deallocating memory, calling destructors, and ensuring expected lifetimes of objects. In Rust the object lifecycle is closely tied to ownership, so we discuss these aspects in terms of ownership. The tl;dr is that keeping ownership of an object (in terms of program design, not necessarily Rust types) in the language in which it was created is usually the best strategy.

Independent of FFI, memory must usually be deallocated by the same allocator which allocated it. Without some rather specialist effort, the allocators used from different languages will be different. Therefore, you must deallocate memory in the same language where it was allocated. If objects are passed across the FFI boundary by pointer, and that pointer is morally borrowed, then there is no tidying-up required. If ownership is transferred, then the programmer must keep around a callback to the creating language to deallocate the memory, or pass the object back for destruction.

Note that destructors will not be called automatically in the foreign language. So these must be called explicitly when the object is destroyed.

A common pattern for this is that the foreign language has a wrapper type who's destructor handles calling the creating language's destructor explicitly and calls back into that language to deallocate memory (this pattern works to or from Rust).

If objects are passed by value rather than by pointer, then they must implement Copy in Rust. Otherwise they will be copied in the foreign language where Rust assumes they will be moved. Note that objects cannot implement both Drop and Copy so you will not need to worry about calling drop in this case.

Any object accessed from Rust (whether the object was created in Rust code or not) must abide by Rust's ownership and borrowing discipline. With regards to lifecycle events, this means that destroying a borrowed pointer must not destroy the underlying object, that an owned object must not be destroyed if there are any borrowed pointers to it (or owning pointers to it if there is multiple ownership, e.g., via Rc), and that an object should be fully destroyed when it goes out of scope if held by value, or when all owning pointers are destroyed if it is held by pointer. Regarding FFI, this generally requires that documentation is clear about whether raw pointers/C pointers are morally owning or borrowed (and that this is tracked through foreign code), and that the FFI boundary should not transfer ownership when there are extant borrowed pointers to the object.

C++

Interoperating with C++ is much more complicated than interoperating with C. If you follow the advice above to only interoperate at component boundaries and you design your component APIs in a conservative, C-like way (possibly by having a C-like library wrap the C++ one), then Rust/C++ interop can be fine - it is even quite well supported by Bindgen. If you must have more fine-grained interop, then things get interesting.

If you can (and plain bindgen is not enough), we recommend using cxx to generate a bridge layer and bindings between Rust and C++ code. autocxx is an extension if you prefer auto-generated bindings.

Quite a lot of C++ features work well across FFI, see the bindgen docs for details. There are more links to docs on C++ interop below. It can be a bit hit and miss figuring out exactly what works and what doesn't and unfortunately some issues are not caught at compile time.

Patterns

Architectural patterns

  • Modular interop - a high level approach for ensuring effective interop
  • Layered library design - how to structure libraries and crates for interop
  • -sys crate
  • Wrap a C library
  • Serialization
  • Cross-language ownership

Design patterns

  • Foreign dtor
  • Object-based API (https://rust-unofficial.github.io/patterns/patterns/ffi/export.html)
  • Rust version of C object
  • Something about intermediate types like CString/OsString (https://rust-unofficial.github.io/patterns/idioms/ffi/accepting-strings.html, https://rust-unofficial.github.io/patterns/idioms/ffi/passing-strings.html)
  • Transparent smart pointer
  • Consolidated wrapper (https://rust-unofficial.github.io/patterns/patterns/ffi/wrappers.html)
  • Strings (how to actually use them, see strings links above, https://snacky.blog/en/string-ffi-rust.html, https://dev.to/kgrech/7-ways-to-pass-a-string-between-rust-and-c-4iebZ) )

Programming idioms and best practices

  • Representing Rust errors in C (https://rust-unofficial.github.io/patterns/idioms/ffi/errors.html)
  • Representing C errors in Rust

Anti-patterns

  • Disguising pointers as values (unclear, disguises unsafety)
  • Using C structs directly in Rust (back compat hazards including padding, due to different back compat between C and Rust)

Layered library design

When wrapping a foreign library for use in Rust, consider writing a first layer in C (especially if the legacy code is C++) with an API better suited for interacting with Rust. Then have a crate which is only bindings of C code into Rust (either hand-written or auto-generated). The next layer is a crate which only has the functionality of the foreign library (i.e., no client logic), but presented in a Rust-idiomatic way. The bindings crate will be all unsafe, the idiomatic crate should aim to have a 100% safe API. Clients should only use the idiomatic crate and never use the bindings crate (some advanced usages may require using the bindings in unanticipated ways, however these clients should create safe abstractions of their own rather than use the bindings directly).

If following this pattern, it is common to give the idiomatic Rust crate the same name as the foreign library, and the bindings library that name with the -sys suffix, e.g., foo and foo-sys. (On the topic of naming, it is idiomatic to always avoid using an -rs suffix on any Rust crate: it is nearly always obvious from context that the crate is a Rust library, so -rs usually adds nothing).

------------------------
     C/C++ library          libfoo
------------------------
       C wrapper            libfoo-ffi
------------------------
 Rust bindings (unsafe)     foo-sys
------------------------
Rust wrapper (idiomatic)    foo
------------------------
      Rust users
------------------------

When making a Rust library available to foreign code, you can adopt a similar strategy. Here, we have an idiomatic Rust crate which can be used directly by Rust users and is idiomatic and mostly safe code. There is then a Rust wrapper which is more C-like and presents an API which is more convenient to use for FFI and includes unsafe functions which make clear the invariants callers must maintain. C bindings reflect this wrapper into the C world. This can be used directly by C code, or can there can be a C/C++ wrapper which is more idiomatic (this is much more useful for C++ rather than C, since it is possible to have an idiomatic C API with the direct bindings, but that is much harder for C++). C/C++ users (again, more likely C++) then use this wrapper library rather than the bindings.

------------------------
 Rust crate (idiomatic)     foo
------------------------
 Rust wrapper (unsafe)      foo-ffi
------------------------
       C bindings           libfoo-ffi
------------------------
C/C++ wrapper (optional)    libfoo
------------------------
      C/C++ users
------------------------

There are not strong naming conventions in this direction, and the above example names are not great.

Tooling

You can use bindgen to generate the bindings layer (foo-sys in the example). You can use cxx to generate both the C wrapper and the bindings layer, or at least parts of both.

In the other direction, you can use cbindgen to generate C bindings (e.g., libfoo-ffi) or cxx to generate both the Rust wrapper, C bindings, and C++ wrapper (although in this case the layers are not clearly defined).

See also

  • -sys crate - separating the Rust bindings from the idiomatic Rust wrapper - a component of this pattern,
  • Wrap a C library - the C wrapper layer - a component of this pattern.

Reference

This section is designed as a reference and you probably don't want to read it end to end. It is primarily aimed at those implementing and designing tools and low-level libraries, or users who need to do unusual and/or low-level interop work. Hopefully, if you're doing common integration work you mostly won't need this level of detail.

TODO assumes C/C++

  • Functions and Methods
  • statics and consts - TODO used attribute. Using the no_mangle attribute implicitly implies used. Use extern for external linkage
  • Data types
    • Numeric types
    • Strings
    • Pointers, references, and arrays void pointers, fat pointers, const, arrays and slices, null/non-null, single allocation, no pointers into middle of an object, ZSTs, pointers to deallocated (e.g., dangling) mem, invalid metadata in wide pointers
    • structs, tuples, and unions
    • enums
    • properties - send, sync, eq, hash, etc.
    • classes? trait objects?

Linking

extern blocks

#[link(...)] attribute

Functions and Methods

Functions

Declaration

TODO

  • visibility

Extern blocks

TODO https://doc.rust-lang.org/reference/items/external-blocks.html

link attribute ABI implicit unsafe see also statics

Name

The names of functions (and other items) are mangled by the compiler by default. Name mangling means that the name of the symbol in the compiled binary is not the same as the name in the source code. Name mangling is not stable, and you should not rely on mangled names being the same between compiler versions.

Use the no_mangle attribute to prevent name mangling of a function's name. E.g.,

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" foo() {}
}

If you will call a function from foreign code by name then you must use no_mangle (not doing so may cause linking errors or may cause incorrect runtime behaviour). If you will only call a function via a function pointer, then you don't need to.

Alternatively, you can use the export_name attribute to explicitly specify the name to use for the exported symbol. E.g.,

#![allow(unused)]
fn main() {
#[export_name = "bar"]
pub extern "C" foo() {}
}

This function can be called using foo from Rust, and bar from C.

Likewise, by default C++ will mangle function names. This is inconsistent between platforms and compilers, so it is not advisable to use the mangled names (this can be done if absolutely necessary and tools like bindgen and cxx can help with this TODO is this true?). To prevent name mangling, define functions in an extern "C" block.

You can specify which section of the binary the function is placed in using the link_section attribute.

Calling convention

The extern keyword is used on function definitions to specify the calling convention (aka the ABI) used to call them (and on extern blocks to define the calling convention used to call the foreign functions declared inside, see above). The syntax is extern "ABI" where ABI is the optional ABI identifier string. E.g.,

#![allow(unused)]
fn main() {
pub extern "C" foo() {}
}

If no ABI identifier is supplied, then C is used. If extern is not used at all, then Rust is used.

The calling convention used in the declaration and definition of functions must match. This is likely to entail a somewhat complex interaction of defaults across different platforms and languages, and attributes in different languages. If you have control of both sides of the FFI, then making both extern "C" (either explicitly or by default) is probably the easiest option. You'll need to use other options if you want to match a calling convention in a library which cannot be changed.

The platform independent ABI identifier strings are:

  • Rust: Rust's ABI; this is unstable and should not be used for FFI code,
  • C: the default C calling convention,
  • system: the platform default calling convention for calling 'system functions'. Usually the same as extern "C", except on Win32, in which case it's "stdcall".

The platform-specific ABI identifier strings are:

  • cdecl: for x86_32 C code,
  • stdcall: for the Win32 API on x86_32,
  • win64: for C code on x86_64 Windows,
  • sysv64: for C code on non-Windows x86_64,
  • aapcs: for ARM,
  • fastcall: corresponds to MSVC's __fastcall and GCC and clang's __attribute__((fastcall)),
  • vectorcall: corresponds to MSVC's __vectorcall and clang's __attribute__((vectorcall)).

You might also come across rust-intrinsic, rust-call, and platform-intrinsic. These are used by the compiler and standard library, but you shouldn't use them in user code.

There also exist -unwind versions of the ABI identifier strings, e.g., C-unwind. These are all unstable, see the section below on unwinding for more details.

TODO thiscall is unstable, see discussion on methods

C/C++ linkage

C/C++ functions must have external linkage (this is the default, i.e., functions may not be marked static).

Signature

The types of all arguments in the function and its return type as written in the declaration and definition must agree. For more on type agreement, see the sections on data types. The names of arguments do not need to match. In Rust declarations of foreign functions, _ may be used instead of an argument name. No other patterns may be used in arguments. Patterns may be used instead of names in the usual way for Rust functions which are exported; the foreign declarations should use a name instead of a pattern.

The number of arguments in definition and declaration must match, including variadic arguments. Declarations (but not definitions) in Rust may be variadic (to match variadic functions defined in C). E.g.,

#![allow(unused)]
fn main() {
extern "C" {
    fn foo(format: *const u8, args: ...);
}
}

If a function diverges, then in Rust it should have the ! return type. In C/C++ the function should have a 'no return' attribute (__attribute__((noreturn)), [[noreturn]], [[__noreturn__]], [[_Noreturn]], etc. depending on the language, version, and compiler).

TODO what if sigs don't agree?

If the return type must be used, then the Rust function should have the #[must_use] attribute and the C/C++ function the __attribute__((warn_unused_result)) attribute. TODO [[nodiscard]] on ctors. Getting this wrong will lead to missing warnings which may in turn lead to runtime errors.

Function calls

TODO

calling convention (should work) calling variadics (just works) see also data unsafe

const functions

TODO

Unwinding

TODO

TODO -unwind ABIs

Exceptions

TODO

Closures

TODO

Methods

TODO

  • Virtual/static dispatch
  • ctors
  • dtors
  • operator overloading

Other

TODO

  • async
  • generators
  • templates/generics

Data Types

Data in both Rust and C/C++ is just ones and zeros in the computer's memory. These ones and zeros are independent of the language which generated them and there is no sense of 'compatibility'. However, when data is used, the compiler must have semantics for those ones and zeros which in Rust, C, and C++ is determined from the type of the data. When data is passed across the FFI boundary, two compilers are involved and each must have a type defined in its own language with which to understand the data. For this to produce correct results, corresponding types in the two languages must agree. An abstract example, if a function f is declared in both Rust and C and has a single argument with type T_Rust in Rust and T_C in C, then T_Rust and T_C must agree. If they do not then any operation on the data will be undefined behaviour.

This concept of agreement goes beyond what the bytes represent (e.g., that some sequence of four bytes should be interpreted as a little-endian, 32 bit, unsigned integer) and includes the invariants which are implicit for a type in a language. These invariants may be due to rules of the language, or due to the specific type itself. What makes this difficult is that invariants due to the language may be specified in a reference or spec, but may just be assumed by the compiler authors and otherwise undocumented. Invariants due to a specific type may be documented but, especially if they are invariants which users would not usually need to be aware of or are considered implementation details, may not be documented (or only documented in the source code). These may still be a concern when writing interop code since C/C++ allows treating data in ways usually forbidden in Rust.

Invariants due to specific data types can be found in their documentation or source code. Invariants due to the kind of data types can be found below and in sub-chapters. Rust also has some invariants which apply to all data (or nearly all data), we'll cover those in the next few paragraphs.

Much of Rust's ABI and invariants are formally undefined and may be subject to change. There have been very RFCs which cover this kind of thing, and work is ongoing within the Rust project and in academia to better specify language-wide invariants. However, there is a large body of code which works today and is unlikely to be broken, so much of this stuff is de facto standardized.

TODO what does this all mean for doing FFI?

Uniqueness and mutability

In Rust, data is immutable by default. Data must be known to be mutable from its type to be mutated (TODO phrasing). It is undefined behaviour to convert immutable data into mutable data, or to directly mutate immutable data (contrast this to C, where const-ness can be cast away).

Data can be mutated only if it is known to be unique, i.e., data cannot be accessed other than via the reference used to mutate the data. Such uniqueness may be established either statically (e.g., references &T and &mut T) or dynamically (e.g., RefCell). All dynamic tracking of uniqueness must use unsafe and raw pointers code at some level, usually wrapped so that end users only see safe abstractions.

References and values can only be mutated if they are declared as mutable (e.g., a &mut T can be mutated and a &T can never be mutated) and the compiler can prove they are unique at the point mutation occurs (e.g., a &mut T cannot be mutated if there exists a live &T referencing the same value). Raw pointers have the former constraint but not the latter. A *mut T pointer can always be dereferenced (an unsafe operation) and mutated. The programmer must ensure that when the pointer is dereferenced, TODO undefined?

An important aspect of Rust for preserving uniqueness is move semantics. When data is passed from one location to another (e.g., assigned to a variable or passed to a function), it is logically moved (you can think of this as a bitwise copy, then deleting the old copy, although the compiler may optimise that). That means that if data is unique before being passed, it is also unique after being passed. Compare this to C/C++ where data is copied or Java-like languages where passing most data implicitly passes a reference.

Some data in Rust is copied rather than moved. Primitive types, immutable references, raw pointers, and any data structure which implements the Copy marker trait are copied rather than moved.

Note than both moving and copying are simple, bitwise operations. Neither invokes a constructor or destructor (in contrast to C++).

If passing an object from Rust to C/C++, care must be taken around uniqueness. If passing by reference, then pointers/references in C/C++ are copied and so will not be unique. If passing by value, then data will be copied, not moved. Therefore, Rust data which implements Copy can be safely passed by value to C/C++ and passed around or stored. Immutable references/pointers can also be safely passed to C/C++ as long as they are never mutated or data is mutated through them. If non-Copy data is to be passed by value, or data is passed by reference and is mutated, more care must be taken around these invariants.

Invariants around pointer and reference types are covered in detail in the chapter on pointers, references, and arrays.

TODO raw mut pointers requirement in foreign/unsafe code UnsafeCell scope of requirement (data referenced from Rust? Allocated in Rust?)

Borrowing

TODO uniqueness mutable and immutable borrows overlapping borrows lifetimes lifetime due to scope - presence of dtor/attribute? and NLL drop check and phantomdata storing data 'static borrows variance

Initialization

TODO all memory is initialized

concurrency

TODO effect on uniqueness send/sync unsafe and dynamic guarantees (arc, mutex, scoped threads) thread-safety and FFI Rust stuff thread-safety guarantees from C/C++ atomic/non-atomic access to shared memory (even volatile ops) memory model is C++20 (nomicon atomics link)

Layout and alignment

TODO alignment and storage address size = multiple of alignment bounds/OOB access can't assume layout without repr, see structs and enums DSTs ZSTs

Platform-specific invariants

CHERI WASM - function/data pointers ARM? padding bits two's compliment

Kinds of data type

Rust and C/C++ have many different kinds of data type. These include primitive data, compound data (enums and structs, etc.), pointers, and more. For data to agree across the FFI boundary the kind of data type must correspond (and then the details must agree, which will be covered in the following chapters).

Primitive types

Primitive types are numeric (signed and unsigned integers, and floating point numbers), characters (but not strings), or booleans. These types have the same semantics and interpretation in C/C++ and in Rust. In particular, they are always passed by simple copying (i.e., without invoking a constructor, nor moved). The names of types and some details of their interpretation varies between C/C++ and Rust, see the chapter on numeric types for details. In particular, the names of types in both C/C++ and Rust can vary depending on the platform.

Because of the matching semantics and lack of aliasing, using these types for interop is usually very simple and efficient.

TODO void/()/!

Compound data

Compound data types are structs, unions, and enums in C/C++ and Rust, tuples and tuple structs in Rust, and classes in C++. Structs, unions, and some enums basically correspond between C/C++ and Rust, see the following sub-chapters for details. Tuples in Rust cannot be used in FFI because they always have the default representation (see below and the chapter on structs, tuples, and unions). Tuple structs correspond with foreign structs. Classes in C++ correspond with structs in Rust (although this is a complex correspondence), see the chapter on classes.

Individual compound data types are likely to have their own invariants which will need to be maintained in foreign code (or by Rust code for foreign types).

By default, the Rust compiler can layout data however it likes and this can change between compiler versions (or even for with the same compiler version, in theory). This is incompatible with FFI, and so you must specify an alternative representation for data types for them to agree with a foreign type. We'll cover this in detail in the following sub-chapters.

Aliases (typedef in C++ or type in Rust) are present only at compile time and do not affect the representation or the invariants of the data. Rust's 'newtypes' (usually a tuple struct with a single field) are not aliases and have the same behaviour as other compound types, i.e., can introduce new invariants and may have a different representation (unless explicitly specified), compared with the underlying data.

Strings, smart pointers, and array-like collections (e.g., Vec in Rust) are all compound data types in both Rust and C/C++. In principle, these do not require any special treatment over other user types. However, they are more likely to have important invariants which must be maintained for the sake of soundness. Several examples will be covered in the following sub-chapters.

pointers and references

TODO

smart pointers

arrays and slices

TODO

trait objects and class objects, and methods

TODO

Generic types

TODO

Numeric types

For numeric types to agree across an FFI, their kind (unsigned integer, signed integer, or floating point), size, and invariants must match. The size of most C/C++ types and usize/isize in Rust can vary depending on the platform. For all numeric types, if the size matches then the alignment will also match (on a single platform).

std::ffi defines type aliases for common numeric types which are platform-accurate; libc defines a few more aliases for less common types. Using these aliases is usually easier than using Rust types directly.

Integers, booleans, and characters

Rust integers

u8 ... u64 and i8 ... i64 are unsigned and signed respectively with the number in the type indicating the number of bits.

usize and isize are 32 bits on 32 bit platforms and 64 bits on 64 bit platforms.

C/C++ integers

A C/C++ integer is unsigned if it uses the unsigned keyword and signed otherwise.

A char is always 8 bits, a short is always 16 bits, and a long long is always 64 bits.

The size of int and long are platform dependent, see std::ffi::c_{int|long}_definition

128 bit integers

Rust supports i128 and u128. These types are mostly not safe for FFI (will lead to UB) and must be avoided. In particular, they are not compatible with C's 128bit integer types where those exist. However, they can be used on non-Windows aarch64.

booleans

Rust (bool) and C's (strictly, C99 and later, _Bool) boolean types are compatible. Technically, C++'s bool is not guaranteed to be the same representation as C's _Bool, but they are on all known platforms, so it is safe to assume that Rust's bool is compatible with C++'s.

It is common to use integers to represent booleans in C programs (especially older programs or when using older toolchains). These can be converted to Rust bools if the size matches and they are guaranteed to only have values 0 or 1. (It is possible to use 0 for false and non-zero for true with C's boolean operators, however, storing any value other than 0 or 1 in a Rust bool is UB. You can check and convert in either Rust or C code, but in the latter case you must not use a Rust bool in your FFI).

characters

Rust and C character types are incompatible.

C character types can be converted to or from Rust's 8 bit integer types. unsigned char is always u8, signed char is always i8. char may be either i8 or u8 depending on the platform, see std::ffi::c_char_definition.

A Rust char is a 32 bit type which must be a valid Unicode scalar value. It is UB to create a char which is not valid Unicode. You should probably avoid using char in FFI unless you have a custom character type with the same size and invariant in your foreign code. Otherwise it is usually better to pass numeric bytes and use helper methods on char to create the Rust char.

TODO wchar_t

Non-zero integers

There are (currently unstable) type aliases for non-zero integers in core::ffi. These map to the non-zero integer types in core::num with the correct size for the C integer types. The user must maintain the non-zero invariant (whether that is a safety issue depends on how the types are used); i.e., Rust does not ensure that values with this type are in fact non-zero.

Floating point

A C float is equivalent to a Rust f32 and a C double is equivalent to a Rust f64.

SIMD

SIMD vectors cannot be used in FFI (UB). There is an accepted RFC to address this, but it has not been implemented.

Strings

TODO see string patterns, pointer reference (since C strings are pointers)

Rust, C, and C++ strings

There are many string types in Rust and C/C++. I'll cover them here, focussing on their representations and invariants, since that is what is most important for language interop. For correct FFI, you need to understand a string's layout in memory, whether the string is nul-terminated (and whether nul characters may be embedded within the string), and the encoding of the string (e.g., UTF-8).

Rust

Rust has three classes of string types in the standard library, each of which has owned and borrowed1 types (the latter of which is usually a dynamically sized type, see the wide pointer section). The owned type is called a "string" and the borrowed type a "str". You could also use a sequence of characters or bytes as strings, or define your own custom string type (see the below section on Windows strings for some examples).

The standard Rust string types are String and str. Both are UTF-8 strings and must always be valid UTF-8. A String is a newtype wrapping a Vec<u8>; str is a built-in type and always has the same representation as a [u8]. This means that a String is a pointer (a unique, non-null pointer to a sequence of u8s, i.e., essentially a *mut u8 in terms of representation), a capacity (usize), and length (usize), in that order. A &str is a wide pointer consisting of a (non-null) pointer to a sequence of u8s and a length (usize). However, the order of the components of a wide pointer is unspecified and unstable (i.e., may change in the future).

Rust has the std::ffi::CString and std::ffi::CStr types for working more easily with foreign language string types. These types are not directly FFI compatible with C strings. These strings must be nul-terminated, have no internal nul characters, but do not have to be valid UTF-8. Use the as_ptr method to get a an FFI-compatible pointer. The representation of these strings is not part of their interface.

Similar to CString/CStr, std::ffi::OsString and std::ffi::OsStr are meant to make working with foreign string types easier but are not directly FFI compatible with foreign strings. OsString/OsStr are easily convertible to both platform-native strings and Rust strings (String/str). Neither their representation nor whether they are valid Unicode is part of their interface. On Unix platforms, OsStr can be cheaply interconverted with byte slices, however, these are not nul-terminated. On Windows, OsStr can be losslessly converted into a UTF-16 (wide) string, however, this requires copying and processing the string data; again, the output string is not nul-terminated.

1

technically, these are just dynamically sized string types and are not intrinsically borrowed (e.g., Box<str> is a valid type). However, in practice these types are nearly always used with borrowed references (e.g., &str) to represent borrowed strings. These are often called string slices since they can be a slice (aka substring) of the underlying string.

C

C strings are pointers to a nul-terminated sequence of chars. They may have either pointer or array types (which are equivalent in C). C strings to not have a specified encoding, that is a program is free to interpret a C string as ASCII, UTF-8, UTF-32, or any other encoding.

C++

The C++ standard library includes the string type (which is actually an alias of an instantiated generic type basic_string<char>). Like the C string type it does not have a specified encoding. Its methods are all byte-oriented (i.e., have no concept of a character beyond a char). It is not directly compatible with C strings and its representation is not part of its interface. It is easy to get a C string with the c_str method, whether this is guaranteed to return a pointer to the data in the string or a copy of it depends on the version of C++.

Windows

Windows uses many different string types: HSTRING, BSTR, and the PSTR family of types.

HSTRING is primarily used with WinRT and is immutable. It is usually (but not always) reference counted. It is nul-terminated, but may also include embedded nuls (it stores a length so doesn't rely on nul-termination). It's UTF-16 encoded. Empty strings are represented as a null pointer.

BSTR is primarily used with COM. It is a nul-terminated, mutable, UTF-16 string which may include embedded nuls. A null pointer is a valid BSTR and represents the empty string, though empty BSTRs may also be used. BSTRs always work in conjunction with the system allocator (SysAlloc*) and the length of the string is laid out in memory preceeding the data, and a nul character comes after the data in memory; neither are included in the BSTR's length. A BSTR is a pointer and points at the first character, not the length.

The PSTR family of types are 'pointer to char's, pointing to a null-terminated sequence of characters (similar to C strings). If there is a C in the name it is an immutable string (otherwise its mutable), if there is a W then the characters are wide (two bytes per character) and the string is UTF-16 encoded. If there is no W, then the characters are one byte and there is no specified encoding (i.e., may be ASCII or UTF-16 or whatever; these are compatible with C strings). An L in the name can be ignored, e.g., PCWSTR and LPCWSTR are the same type.

There are Rust bindings for these types in windows-rs and macros for creating some of these string types in Rust. The type bindings are best used only for FFI: most are newtype wrappers of raw pointers, so it is very easy to create dangling pointers and other memory safety errors when using them.

Windows primarily uses UTF-16. Rust does not have UTF-16 strings in its standard library (though as mentioned above, OsString can losslessly handle UTF-16). The widestring crate provides types including several UTF-16 string types which can make working with Windows strings much easier.

FFI with foreign Strings

For the actual FFI, use the Rust string type which agrees with the foreign string type (see table below).

Foreign typeRust type
C string [const] char [const] **{const|mut} c_char
C++ stringcxx::CxxString
HSTRINGwindows::core::HSTRING
BSTRwindows::core::BSTR
PSTR/LPSTRwindows::core::PSTR
PCSTR/LPCSTRwindows::core::PCSTR
PWSTR/LPWSTRwindows::core::PWSTR
PCWSTR/LPCWSTRwindows::core::PCWSTR

Creating most of these strings in Rust is usually possible via some macro or conversion function.

The more interesting question is when and how to convert between the FFI-specific types and more standard Rust types (and which types to use). That is out of scope for the reference, but see TODO patterns.

Memory management

The usual rules of memory management with FFI apply: memory must be released in the same language it was allocated, and using borrowed data is easier.

FFI with Rust Strings

It is possible to pass Rust strings across FFI to foreign functions. However, if you are designing an API, it is usually easier to use foreign strings in the FFI and convert these to and from Rust strings internally in Rust code.

If you manipulate the contents of the strings (either in foreign code or unsafe Rust code), then you must respect both the usual invariants around pointers, and Rust's string invariants (from String docs):

  • the memory must have be allocated by the same allocator the standard library uses, with a required alignment of exactly 1,
  • the length of the string must be less than or equal to its capacity,
  • the capacity of the string must be the correct size of the allocation,
  • the first length bytes of the string must be valid UTF-8.

Note that if you are using the string types in Rust functions with foreign bindings, then you must establish these invariants in the foreign code. Doing so in the Rust code is likely to be unsound.

To pass a Rust string to C++, you can use Cxx's bindings for String or &str.

To pass a Rust string to C, you can use a struct with the correct layout (you could look at the standard library source code, or just use the Cxx bindings as a reference).

Memory management

The easiest scenario is to create a String in Rust, pass a borrowed &str to foreign code and ensure that the foreign code does not store the pointer, pass it to another thread, call its destructor, or deallocate it.

If you must store the string in foreign code, then you must pass the owned type String. In this case, you must ensure the pointer remains unique (in particular, you must not keep a reference in the Rust code) and pass it back to Rust for destruction.

If you allocate memory for the string in foreign code, then you must not run its destructor in Rust, and you must pass the string back to foreign code for destruction. The easiest way to do that is to pass &str to Rust. If you must pass String (or a raw pointer used to produce a String in Rust code), then you must ensure that there is no copy of the pointer kept in foreign code, and that the pointer is returned to foreign code for destruction. Using a custom reference counted type might be a better alternative, see TODO pattern.

Resources

Tooling

Bindgen is the most popular and mature tool and is maintained by the Rust project. It is used to create bindings for C code (and some C++) in Rust code. Cbindgen can be used to create C bindings to Rust code. The other tools below are for C++ interop; cxx is the current favourite tool with the community, but is not suitable for all use cases.

You may want to use COM/WinRT for inter-language interaction, the best Rust support for COM and WinRT is in windows-rs.

Documentation

Unsafe programming

Resources for learning about unsafe programming: