Mutation - Part 1: Variables
What is a variable?
Ok, here there is really a lot that could be said, and a simple blog post will not cover it. I will just talk about two important and distinct thinking patterns involved:
the idea of binding a name to a value;
the idea of reading/writing values from/into memory;
1. Identity binding
The best example in Rust for “just a binding” (i.e., with no backing memory) is const
1type is = i32;
2
3const x :is= the_answer + 27;
4const the_answer :is= 42;
5Here we have defined the_answer as an alias for the value 42; meaning that, from now on, everytime we say the_answer what we actually mean is 42.
We are at the meta level: if we ask the program to evaluate x, what we are actually1 asking the program to do is to evaluate the_answer + 27, i.e., to evaluate 42 + 27.
The same in C, with #define:
1#define x (the_answer + 27)
2
3#define the_answer (42)If you look at both code snippets, you can notice that there is no notion of temporality (regarding the definitions):
It is all about essence / identity.
Thus, if the_answer is and will always be 42, there is no room for mutation.
42cannot mutate and become43;42is and will always be1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 02in a similar manner,
the_answercannot change either, since it was defined as being 42: it is an identity, and…
and yet … we feel like we should be able to redefine what
the_answeris, don’t we?re-define?
This implies some form of temporality, and for that there are two solutions:
scoped bindings (and shadowing).
a binding to (runtime) memory.
1.1 Scoped bindings
You can limit the “scope” of your identity-definition:
1// undefined
2{const the_answer :is= 42;
3 // defined
4}
5// undefined
6This is one of the ways to have some form of temporality with bindings: time progresses as we enter and leave scopes.
1{const the_answer :is= 42;
2 {const x :is= the_answer + 27;
3 // x -> the_answer + 27 -> 42 + 27
4 }
5}
6But what about re-defining
the_answer?
By using a narrower scope, we can indeed re-define the value of the_answer, even though what we are actually doing is defining a brand new the_answer that happens to lead to a name collision. But given the different scopes and their temporality, this is unambiguous and thus allowed:
1{const the_answer :is= 42;
2 // 'the_answer' <-> 42
3 {const the_answer :is= 0;
4 // 'the_answer' <-> 0
5 }
6 // 'the_answer' <-> 42
7}
8This trick is called shadowing and is a clever way to avoid using mutation when it is not needed.
2. Runtime memory
We are so used to manipulating memory through variables that I have defined some functions to pretend we are manipulating low-level memory; it thus provides “primitives / intrinsics” that allow us to read/write from/into the memory and also to print values:
1#![allow(non_camel_case_types, non_snake_case)]
2
3/// 4Ko RAM?
4static mut MEMORY: [u8; 4096] = [0; 4096];
5type address = usize;
6
7fn READ (whence: address) -> u8
8{
9 unsafe { MEMORY[whence] }
10}
11
12fn WRITE (where_to: address, what: u8)
13{
14 unsafe { MEMORY[where_to] = what }
15}
16
17use ::std::dbg as PRINT;
18
19fn main ()
20{
21 WRITE(1000, 42);
22 WRITE(1001, 27);
23 PRINT!(READ(1000)); // -> 42
24 WRITE(1000, READ(1000) + READ(1001));
25 PRINT!(READ(1000)); // -> 69
26}
27This way, we can see that we were able to
write at the address
1000the value42;then write at that same address the value
69.
We have successfully mutated something!
Good, now we have a clearer understanding of what mutation actually is: modifying memory contents.
But what have variables to do with any of this?
2.1 Runtime memory with variables
The following sentence explains the relation between bindings and memory:
A (runtime) variable is a (static) binding to some place in memory, and is used to read/write from/into that place in memory
const at_the_answer: address = 1000;
const at_x: address = 1001;
fn main ()
{
WRITE(at_the_answer, 42);
WRITE(at_x, 27);
PRINT(READ(at_the_answer)); // -> 42
WRITE(at_the_answer, READ(at_the_answer) + READ(at_x));
PRINT!(READ(at_the_answer)); // -> 69
}
And now with some sugar:
const at_the_answer: address = 1000;
const at_x: address = 1001;
main! {
the_answer <- 42;
x <- 27;
PRINT!(the_answer); // -> 42
the_answer <- the_answer + x;
PRINT!(the_answer); // -> 69
}
How does that sugar work? Let’s describe the unsugaring of line 139:
The statement
the_answer <- the_answer + x;is identified as an assignment: a value (right hand side) is written to memory (left hand side)First the right hand side of the assignment (
the_answer + x) is treated, and every variable occurrence is replaced with READ memory access:the_answerbecomesREAD(at_the_answer)xbecomesREAD(at_x)
Then the left hand side of the assignment (
the_answer) is treated, and the variable is replaced with a WRITE memory access:the_answer <- ...becomesWRITE(at_the_answer, ...)
We end up with the following unsugaring:
WRITE(at_the_answer, READ(at_the_answer) + READ(at_x));
Hence my aforementioned definition of a variable (let’s repeat it):
A (runtime) variable is a (static) binding to some place in memory, and is used to read/write from/into that place in memory
In memory? But where exactly?
static vs let
In Rust, there are two3 ways to declare a variable:
static, for global variables; the variable refers to a section of memory statically laid out before the program runs, and which is never deallocated.There often are two such sections:
The
.rodatasection, which is read-only. Attempting to write in this section will result in a memory violation, a.k.a. a Segmentation Fault.The
.datasection4, which is both readable and writable.
let5, for local variables: the variable refers to a section of memory called the stack. It is always readable and writable.Clever pointer arithmetic using two6 CPU registers leads to a very useful abstraction: stack frames, which can be created and accumulated on top of each other using LIFO semantics (hence the name of the section).
They thus become very well suited for scoped memory, such as function locals:
When a scope starts, the compiler looks at all the local variables declared within that scope (e.g.,
x: i32andy: u8withinfoo’s function);Then a new stack frame is created by shifting the stack pointer(s), in order to reserve enough memory for all those variables (e.g.,
4 + 1 = 5bytes are allocated infoo’s stack frame);Within that function’s body, each “variable” is just a statically known offset from the beginning of the stack frame (e.g.,
at_xandat_yare the offsets0and4from the beginning7 of the stack frame). And as said previously, each “variable access” is actually the result of dereferencing8 those offsets, each time;When the scope ends, the frame is automatically and inevitably
pop-ped off, i.e., freed.
We finally have the bases required for thinking about mutation in Rust!
Appendix
Full unsugaring code
#![allow(non_upper_case_globals, non_snake_case)]
/// 4Ko RAM?
static mut MEMORY: [u8; 4096] = [0; 4096];
fn READ (whence: address) -> u8
{
unsafe { MEMORY[whence] }
}
fn WRITE (where_to: address, what: u8)
{
unsafe { MEMORY[where_to] = what }
}
/// Recursively replaces:
/// the_answer => READ(at_the_answer)
/// x => READ(at_x)
macro_rules! replace_rvalues {
(
[ $($replaced:tt)* ]
the_answer
$($rest:tt)*
) => (replace_rvalues!(
[ $($replaced)* READ(at_the_answer) ]
$($rest)*
));
(
[ $($replaced:tt)* ]
x
$($rest:tt)*
) => (replace_rvalues!(
[ $($replaced)* READ(at_x) ]
$($rest)*
));
(
[ $($replaced:tt)* ]
$tt:tt
$($rest:tt)*
) => (replace_rvalues!(
[ $($replaced)* $tt ]
$($rest)*
));
(
[ $($replaced:tt)* ]
) => (
$($replaced)*
);
( $($tt:tt)* ) => (
replace_rvalues!([] $($tt)*)
)
}
macro_rules! exec {
/* replaces */ (
the_answer <- $($tt:tt)*
) => /* with */ (
WRITE(at_the_answer,
replace_rvalues!( $($tt)* )
)
);
/* replaces */ (
x <- $($tt:tt)*
) => /* WITH */ (
WRITE(at_x,
replace_rvalues!( $($tt)* )
)
);
/* replaces */ (
PRINT ( $($tt:tt)* )
) => /* with */ (
println!("{} = {}", stringify!($($tt)*), replace_rvalues!( $($tt)* ));
);
}
macro_rules! main {
(@parsing
[ $($stmts:tt)* ]
[ $($current:tt)* ]
;
$($rest:tt)*
) => (main!{@parsing
[ $($stmts)* ( $($current)* ) ]
[]
$($rest)*
});
(@parsing
$stmts:tt
[ $($current:tt)* ]
$tt:tt $($rest:tt)*
) => (main!{@parsing
$stmts
[ $($current)* $tt]
$($rest)*
});
(@parsing
[ $( ( $($stmts:tt)* ) )* ]
[]
) => (
fn main () {
$(
exec!{ $($stmts)* }
)*
}
);
(
$($tt:tt)*
) => (main!{@parsing
[]
[]
$($tt)*
});
}
#[allow(non_camel_case_types)]
type address = usize;
const at_the_answer: address = 1000;
const at_x: address = 1001;
main! {
the_answer <- 42;
x <- 27;
PRINT(the_answer); // -> 42
the_answer <- the_answer + x;
PRINT(the_answer); // -> 69
}
- i.e., on runtime [return]
- The representation of a number (here in base ten) vs its true nature is a (very interesting) topic in and on itself, which will most probably deserve its own blog post. One great definition / view of a (natural) number’s nature is Peano’s arithmetic, with which one can prove that
1+1+0 + 1+1+0 = 1+1+1+1+0[return] constdoes not declare a variable, but a meta binding. [return]- and
.bssfor initially zero-ed (global) memory [return] - and pattern bindings such as
matcharms or function parameters [return] - or can even use just one register (since the allocations are all static) [return]
- offsets are actually negative and do not start at 0, but those are just implementation details [return]
- accessing the memory at a given address [return]