Types
Data model
C/C++ compilers implement a data model that affects what width the standard types are. The general rule is that:
1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
The four common data models in C++ are:
- LP32 -
int
is 16-bit,long
and pointers are 32-bit. This is an uncommon model, a throw-back to DOS / Windows 3.1 - ILP32 -
int
,long
and pointers are 32-bit. Used by Win32, Linux, OS X - LLP64 -
int
andlong
are 32-bit,long long
and pointers are 64-bit. Used by Win64 - LP64 -
int
is 32-bit,long
/long long
and pointers are 64-bit. Used by Linux, OS X
As you can see, potentially everything all the way to long long
could be a single byte, or there could be some other crazy definition. In practice however the models above are the most common.
Comparing C/C++ types to Rust
For this section, we'll cover the most likely analogous types between Rust and C/C++.
C/C++ | Rust | Notes |
---|---|---|
char |
i8 (or u8 ) |
The signedness of a C++ char can be signed or unsigned - the assumption here is signed but it varies by target system. A Rust char is not the same as a C/C++ char since it can hold any Unicode character. 1 |
unsigned char |
u8 |
|
signed char |
i8 |
|
short int |
i16 |
|
unsigned short int |
u16 |
|
(signed) int |
i32 or i16 |
In C/C++ this is data model dependent 2 |
unsigned int |
u32 or u16 |
In C/C++ this is data model dependent 2 |
(signed) long int |
i32 or i64 |
In C/C++ this is data model dependent 2 |
unsigned long int |
u32 or u64 |
In C/C++ this is data model dependent 2 |
(signed) long long int |
i64 |
|
unsigned long long int |
u64 |
|
size_t |
usize |
usize holds numbers as large as the address space 3 |
float |
f32 |
|
double |
f64 |
|
long double |
f128 support was present in Rust but removed due to issues for some platforms in implementing it. | |
bool |
bool |
|
void |
() |
The unit type (see below) |
1 Rust's char
type, is 4 bytes wide, enough to hold any Unicode character. This is equivalent to the belated char32_t
that appears in C++11 to rectify the abused C++98 wchar_t
type which on operating systems such as Windows is only 2 bytes wide. When you iterate strings in Rust you may do so either by character or u8
, i.e. a byte.
2 See the next section to for a discussion on data models.
3 Rust has a specific numeric type for indexing on arrays and collections called usize
. A usize
is designed to be able to reference as many elements in an array as there is addressable memory. i.e. if memory is 64-bit addressable then usize is 64-bits in length. There is also a signed isize
which is less used but also available.
stdint.h / cstdint
C provides a <stdint.h>
header that provides unambigious typedefs with length and signedess, e.g. uint32_t
. The equivalent in C++ is <cstdlib>
.
If you use the types defined in this header file the types become directly analogous and unambiguous between C/C++ and Rust.
C/C++ | Rust |
---|---|
int8_t |
i8 |
uint8_t |
u8 |
int16_t |
i16 |
uint16_t |
u16 |
uint32_t |
u32 |
int32_t |
i32 |
int64_t |
i64 |
uint64_t |
u64 |
Machine types under the covers
C/C++ and Rust will share the same machine types for each corresponding language type and the same compiler / backend technology, i.e.:
- Signed types are two's complement
- IEE 754-2008 binary32 and binary64 floating points for float and double precision types.
Integer types
C++
C/C++ has primitive types for numeric values, floating point values and booleans. Strings will be dealt in a separate section.
Integer types (char
, short
, int
, long
) come in signed
and unsigned
versions.
A char
is always 8-bits, but for historical reasons, the standards only guarantee the other types are "at least" a certain number of bits. So an int
is ordinarily 32-bits but the standard only say it should be at least as large as a short
, so potentially it could be 16-bits!
More recent versions of C and C++ provide a <cstdint>
(or <stdint.h>
for C) with typedefs that are unambiguous about their precision.
Even though <stdint.h>
can clear up the ambiguities, code frequently sacrifices correctness for terseness. It is not unusual to see an int
used as a temporary incremental value in a loop:
string s = read_file();
for (int i = 0; i < s.size(); ++i) {
//...
}
While int
is unlikely to fail for most loops in a modern compiler supporting ILP32 or greater, it is still technically wrong. In a LP32 data model incrementing 32767 by one would become -32768 so this loop would never terminate if s.size()
was a value greater than that.
But look again at this snippet. What if the file read by read_file()
is outside of our control. What if someone deliberately or accidentally feeds us a file so large that our loop will fail trying to iterate over it? In doing so our code is hopelessly broken.
This loop should be using the same type returned from string::size()
which is an opaque unsigned integer type called size_type
. This is usually a typedef for std::size_t
but not necessarily. Thus we have a type mismatch. A string
has an iterator which could be used instead but perhaps you need the index for some reason, but it can messy:
string s = read_file();
for (string::iterator i = s.begin(); i != s.end(); ++i) {
string::difference_type idx = std::distance(s.begin(), i);
//...
}
Now we've swapped from one opaque type size_type
to another called difference_type
. Ugh.
C/C++ types can also be needlessly wordy such as unsigned long long int
. Again, this sort of puffery encourages code to make bad assumptions, use a less wordy type, or bloat the code with typedefs.
Rust
Rust benefits from integer types that unambiguously denote their signedness and width in their name - i16
, u8
etc.
They are also extremely terse making it easy to declare and use them. For example a u32
is an unsigned 32-bit integer. An i64
is a signed 64-bit integer.
Types may be inferred or explicitly prefixed to the value:
let v1 = 1000;
let v2 : u32 = 25;
let v3 = 126i8;
Rust also has two types called usize
and isize
respectively. These are equivalent to size_t
in that they are as large enough to hold as many elements as there is addressable memory. So in a 32-bit operating system they will be 32-bits in size, in a 64-bit operating system they will be 64-bits in size.
Rust will not implicitly coerce an integer from one size to another without explicit use of the as
keyword.
let v1 = 1000u32;
let v2: u16 = v1 as u16;
Real types
C++
C/C++ has float, double and long double precision floating point types and they suffer the same vagueness as integer types.
float
double
- "at least as much precision as afloat
"long double
- "at least as much precision as adouble
"
In most compilers and architectures however a float is a 32-bit single precision value, and a double is an 64-bit double precision value. The most common machine representation is the IEEE 754-2008 format.
Long double
The long double
has proven quite problematic for compilers. Despite expectations that it is a quadruple precision value it usually isn't. Some compilers such as gcc may offer 80-bit extended precision on x86 processors with a floating point unit but it is implementation defined behaviour.
The Microsoft Visual C++ compiler treats it with the same precision as a double
. Other architectures may treat it as quadruple precision. The fundamental problem with long double
is that most desktop processors do not have the ability in hardware to perform 128-bit floating point operations so a compiler must either implement it in software or not bother.
Math functions
The <math.h>
C header provides math functions for working with different precision types.
#include <math.h>
const double PI = 3.1415927;
double result = cos(45.0 * PI / 180.0);
//..
double result2 = abs(-124.77);
//..
float result3 = sqrtf(9.0f);
//
long double result4 = powl(9,10);
Note how different calls are required according to the precision, e.g. sinf, sin or sinl. C99 supplies a "type-generic" set of macros in <tgmath.h>
which allows sin to be used regardless of type.
C++11 provides a <cmath>
that uses specialised inline functions for the same purpose:
#include <cmath>
float result = std::sqrt(9.0f);
Rust
Rust implements two floating point types - f32
and f64
. These would be analogous to a 32-bit float
and 64-bit double
in C/C++.
let v1 = 10.0;
let v2 = 99.99f32;
let v3 = -10e4f64;
Unlike in C/C++, the math functions are directly bound to the type itself providing you properly qualify the type.
let result = 10.0f32.sqrt();
//
let degrees = 45.0f64;
let result2 = angle.to_radians().cos();
Rust does not have a 128-bit double. A f128
did exist for a period of time but was removed to portability, complexity and maintenance issues. Note how long double
is treated (or not) according to the compiler and target platform.
At some point Rust might get a f128 or f80 but at this time does not have such a type.
Booleans
A bool
(boolean) type in C/C++ can have the value true
or false
, however it can be promoted to an integer type (0 == false
, 1 == true
) and a bool even has a ++ operator for turning false to true although it has no -- operator!?
But inverting true with a ! becomes false and vice versa.
!false == true
!true == false
Rust also has a bool
type that can have the value true
or false
. Unlike C/C++ it is a true type with no promotion to integer type
void / Unit type
C/C++ uses void
to specify a type of nothing or an indeterminate pointer to something.
// A function that doesn't return anything
void delete_directory(const std::string &path);
// Indeterminate pointer use
struct file_stat {
uint32_t creation_date;
uint32_t last_modified;
char file_name[MAX_PATH + 1];
};
// malloc returns a void * which must be cast to the type need
file_stat *s = (file_stat *) malloc(sizeof(file_stat));
// But casting is not required when going back to void *
free(s);
The nearest thing to void
in Rust is the Unit type. It's called a Unit type because it's type is ()
and it has one value of ()
.
Technically void
is absolutely nothing and ()
is a single value of type ()
so they're not analogous but they serve a similar purpose.
When a block evaluates to nothing it returns ()
. We can also use it in places where we don't care about one parameter. e.g. say we have a function do_action()
that succeeds or fails for various reasons. We don't need any payload with the Ok response so specify ()
as the payload of success:
fn do_action() -> Result<(), String> {
//...
Result::Ok(())
}
let result = do_action();
if result.is_ok() {
println!("Success!");
}
Empty enums
Rust does have something closer (but not the same as) void
- empty enumerations.
enum Void {}
Essentially this enum has no values at all so anything that assigns or matches this nothing-ness is unreachable and the compiler can issue warnings or errors. If the code had used ()
the compiler might not be able to determine this.
Tuples
A tuple is a collection of values of the same or different type passed to a function or returned by one as if it were a single value.
C/C++ has no concept of a tuple primitive type, however C++11 can construct a tuple using a template:
std::tuple<std::string, int> v1 = std::make_tuple("Sally", 25);
//
std::cout << "Name = " << std::get<0>(v1)
<< ", age = " << std::get<1>(v1) << std::endl;
Rust supports tuples as part of its language:
let v1 = ("Sally", 25);
println!("Name = {}, age = {}", v1.0, v1.1);
As you can see this is more terse and more useful. Note that the way a tuple is indexed is different from an array though, values are indexed via .0, .1 etc.
Tuples can also be returned by functions and assignment operators can ignore tuple members we're not interested in.
let (x, y, _) = calculate_coords();
println!("x = {}, y = {}", x, y);
//...
pub fn calculate_coords() -> (i16, i16, i16) {
(11, 200, -33)
}
In this example, the calculate_coords() function returns a tuple containing three i16
values. We assign the first two values to x
and y
respectively and ignore the third by passing an underscore. The underscore tells the compiler we're aware of the 3rd value but we just don't care about it.
Tuples can be particularly useful with code blocks. For example, let's say we want to get some values from a piece of code that uses a guard lock on a reference counted service. We can lock the service in the block and return all the values as a tuple to the recipients outside of the block:
let protected_service: Arc<Mutex<ProtectedService>> = Arc::new(Mutex::new(ProtectedService::new()));
//...
let (host, port, url) = {
// Lock and acquire access to ProtectedService
let protected_service = protected_service.lock().unwrap();
let host = protected_service.host();
let port = protected_service.port();
let url = protected_service.url();
(host, port, url)
}
This code is really neat - the lock allows us to obtain the values, the lock goes out of scope and the values are returned in one go.
Arrays
An array is a fixed size list of elements in a contiguous memory location that can be referenced by an index. Arrays can be allocated either on the stack or the heap.
E.g to create a 100 element array of double
values in C++ / C using the language features:
// Stack (uninitialized)
double values[100]; // ?,?,?,?,?,...
// Stack with assignment
double values[100] = [1, 2, 3]; // 1,2,3,?,?,?,?,...
// Heap
double *values = new double[100]; // ?,?,?,?,?,...
delete []values;
// C99 initialized arrays
double values[100] = { }; // 0,0,0,0,0,...
double values[100] = {1, 2, 3}; // 1,2,3,0,0,0,0...
// C99 initialized arrays with designators
double values[100] = {1, 2, 3, [99] = 99}; // 1,2,3,0,0,0,...,0,99
// C++ doesn't need the assignment
double values[100] {1, 2, 3}; // 1,2,3,0,0,0,0...
As can be seen, arrays have evolved a lot to resolve issues using uninitialized data but it is also leads to a lot of variation in how they are defined. Designators can be be incredibly powerful.
The language also doesn't help you know what the size of an array is, so you will often see code like this:
// Number of elements is the size of the entire array divided by the size of one element
int len = sizeof(values) / sizeof(values[0]);
But this isn't the end of it because C++ also defines std::array
which is slightly more convenient for having size()
, empty()
, begin()
, end()
etc. making it similar to other kinds of collection:
#include <array>
//...
std::array values {1, 2, 3};
for (int i = 0; i < values.size(); i++) {
//...
}
Rust has a less powerful syntax than is possible with initialized arrays in C++ but it is also less ambiguous:
// Stack
let values = [0f64; 100]; // 100 elements initialised to 0
let values = [1f64, 2f64, 3f64]; // 3 elements 1,2,3
// Heap
let values = Box::new([0f64; 100]);
Note how Rust provides a shorthand to initialise the array with the same value or assigns the array with every value. Initialisation in C and C++ is optional but it is more expressive in that portions of the array can be set or not set using enclosed list syntax.
But Rust forces you to initialise an array to something, ensuring the content of the array is predictable. Attempting to declare an array without assigning it a value is a compiler error.
In addition, a Rust array coerces to be a slice &[T]
, so methods like len()
, is_empty()
, get()
, swap()
, reverse()
are all instantly available:
// Reverse the order of values in this array in-place
let mut values = [1, 2, 3, 4];
values.reverse();
println!("Values = {:?}", values);
Multi-dimensional arrays
Slices
A slice is a runtime view of a part of an array or string. A slice is not a copy of the array / string rather that it is a reference to a portion of it. The reference holds a pointer to the starting element and the number of elements in the slice.
let array = ["Mary", "Sue", "Bob", "Michael"];
println!("{:?}", array);
let slice = &array[2..];
println!("{:?}", slice);
This slice represents the portion of array starting from index 2.
["Mary", "Sue", "Bob", "Michael"]
["Bob", "Michael"]
Size of the array
C and C++ basically give no easy way to know the length of the array unless you encapsulate the array with a std::array
or happen to remember it from the code that declares it.
// C++11
std::array<Element, 100> elements;
std::cout << "Size of array = " << elements.size() << std::endl;
The std::array
wrapper is of limited use because you cannot pass arrays of an unknown size to a function. Therefore even with this template you may pass the array into a function as one argument and its size as another.
Alternatively you might see code like this:
const size_t num_elements = 1024;
char buffer[num_elements];
//...
// fill_buffer needs to be told how many elements there are
fill_buffer(buffer, num_elements);
Or like this
Element elements[100];
//...
int num_elements = sizeof(elements) / sizeof(Element);
In Rust, the array has a function bound to it called len()
. This always provides the length of the array. In addition if we take a slice of the array, that also has a len()
.
let buffer: [u8; 1024]
println!("Buffer length = {}", buffer.len());
fill_buffer(&buffer[0..10]);
//...
fn fill_buffer(elements: &[Element]) {
println!("Number of elements = {}", elements.len());
}