Mutation - Part 1: Variables
What is a variable?
Ok, here there is really a lot that could be said, and a simple blog post will not cover it. I will just talk about two important and distinct thinking patterns involved:
the idea of binding a name to a value;
the idea of reading/writing values from/into memory;
1. Identity binding
The best example in Rust for “just a binding” (i.e., with no backing memory) is const
1type is = i32;
2
3const x :is= the_answer + 27;
4const the_answer :is= 42;
5
Here we have defined the_answer
as an alias for the value 42
; meaning that, from now on, everytime we say the_answer
what we actually mean is 42
.
We are at the meta level: if we ask the program to evaluate x
, what we are actually1 asking the program to do is to evaluate the_answer + 27
, i.e., to evaluate 42 + 27
.
The same in C, with #define
:
1#define x (the_answer + 27)
2
3#define the_answer (42)
If you look at both code snippets, you can notice that there is no notion of temporality (regarding the definitions):
It is all about essence / identity.
Thus, if the_answer
is and will always be 42
, there is no room for mutation.
42
cannot mutate and become43
;42
is and will always be1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0
2in a similar manner,
the_answer
cannot change either, since it was defined as being 42: it is an identity, and…and yet … we feel like we should be able to redefine what
the_answer
is, don’t we?re-define?
This implies some form of temporality, and for that there are two solutions:
scoped bindings (and shadowing).
a binding to (runtime) memory.
1.1 Scoped bindings
You can limit the “scope” of your identity-definition:
1// undefined
2{const the_answer :is= 42;
3 // defined
4}
5// undefined
6
This is one of the ways to have some form of temporality with bindings: time progresses as we enter and leave scopes.
1{const the_answer :is= 42;
2 {const x :is= the_answer + 27;
3 // x -> the_answer + 27 -> 42 + 27
4 }
5}
6
But what about re-defining
the_answer
?
By using a narrower scope, we can indeed re-define the value of the_answer
, even though what we are actually doing is defining a brand new the_answer
that happens to lead to a name collision. But given the different scopes and their temporality, this is unambiguous and thus allowed:
1{const the_answer :is= 42;
2 // 'the_answer' <-> 42
3 {const the_answer :is= 0;
4 // 'the_answer' <-> 0
5 }
6 // 'the_answer' <-> 42
7}
8
This trick is called shadowing and is a clever way to avoid using mutation when it is not needed.
2. Runtime memory
We are so used to manipulating memory through variables that I have defined some functions to pretend we are manipulating low-level memory; it thus provides “primitives / intrinsics” that allow us to read/write from/into the memory and also to print values:
1#![allow(non_camel_case_types, non_snake_case)]
2
3/// 4Ko RAM?
4static mut MEMORY: [u8; 4096] = [0; 4096];
5type address = usize;
6
7fn READ (whence: address) -> u8
8{
9 unsafe { MEMORY[whence] }
10}
11
12fn WRITE (where_to: address, what: u8)
13{
14 unsafe { MEMORY[where_to] = what }
15}
16
17use ::std::dbg as PRINT;
18
19fn main ()
20{
21 WRITE(1000, 42);
22 WRITE(1001, 27);
23 PRINT!(READ(1000)); // -> 42
24 WRITE(1000, READ(1000) + READ(1001));
25 PRINT!(READ(1000)); // -> 69
26}
27
This way, we can see that we were able to
write at the address
1000
the value42
;then write at that same address the value
69
.
We have successfully mutated something!
Good, now we have a clearer understanding of what mutation actually is: modifying memory contents.
But what have variables to do with any of this?
2.1 Runtime memory with variables
The following sentence explains the relation between bindings and memory:
A (runtime) variable is a (static) binding to some place in memory, and is used to read/write from/into that place in memory
const at_the_answer: address = 1000;
const at_x: address = 1001;
fn main ()
{
WRITE(at_the_answer, 42);
WRITE(at_x, 27);
PRINT(READ(at_the_answer)); // -> 42
WRITE(at_the_answer, READ(at_the_answer) + READ(at_x));
PRINT!(READ(at_the_answer)); // -> 69
}
And now with some sugar:
const at_the_answer: address = 1000;
const at_x: address = 1001;
main! {
the_answer <- 42;
x <- 27;
PRINT!(the_answer); // -> 42
the_answer <- the_answer + x;
PRINT!(the_answer); // -> 69
}
How does that sugar work? Let’s describe the unsugaring of line 139
:
The statement
the_answer <- the_answer + x;
is identified as an assignment: a value (right hand side) is written to memory (left hand side)First the right hand side of the assignment (
the_answer + x
) is treated, and every variable occurrence is replaced with READ memory access:the_answer
becomesREAD(at_the_answer)
x
becomesREAD(at_x)
Then the left hand side of the assignment (
the_answer
) is treated, and the variable is replaced with a WRITE memory access:the_answer <- ...
becomesWRITE(at_the_answer, ...)
We end up with the following unsugaring:
WRITE(at_the_answer, READ(at_the_answer) + READ(at_x));
Hence my aforementioned definition of a variable (let’s repeat it):
A (runtime) variable is a (static) binding to some place in memory, and is used to read/write from/into that place in memory
In memory? But where exactly?
static
vs let
In Rust, there are two3 ways to declare a variable:
static
, for global variables; the variable refers to a section of memory statically laid out before the program runs, and which is never deallocated.There often are two such sections:
The
.rodata
section, which is read-only. Attempting to write in this section will result in a memory violation, a.k.a. a Segmentation Fault.The
.data
section4, which is both readable and writable.
let
5, for local variables: the variable refers to a section of memory called the stack. It is always readable and writable.Clever pointer arithmetic using two6 CPU registers leads to a very useful abstraction: stack frames, which can be created and accumulated on top of each other using LIFO semantics (hence the name of the section).
They thus become very well suited for scoped memory, such as function locals:
When a scope starts, the compiler looks at all the local variables declared within that scope (e.g.,
x: i32
andy: u8
withinfoo
’s function);Then a new stack frame is created by shifting the stack pointer(s), in order to reserve enough memory for all those variables (e.g.,
4 + 1 = 5
bytes are allocated infoo
’s stack frame);Within that function’s body, each “variable” is just a statically known offset from the beginning of the stack frame (e.g.,
at_x
andat_y
are the offsets0
and4
from the beginning7 of the stack frame). And as said previously, each “variable access” is actually the result of dereferencing8 those offsets, each time;When the scope ends, the frame is automatically and inevitably
pop
-ped off, i.e., freed.
We finally have the bases required for thinking about mutation in Rust!
Appendix
Full unsugaring code
#![allow(non_upper_case_globals, non_snake_case)]
/// 4Ko RAM?
static mut MEMORY: [u8; 4096] = [0; 4096];
fn READ (whence: address) -> u8
{
unsafe { MEMORY[whence] }
}
fn WRITE (where_to: address, what: u8)
{
unsafe { MEMORY[where_to] = what }
}
/// Recursively replaces:
/// the_answer => READ(at_the_answer)
/// x => READ(at_x)
macro_rules! replace_rvalues {
(
[ $($replaced:tt)* ]
the_answer
$($rest:tt)*
) => (replace_rvalues!(
[ $($replaced)* READ(at_the_answer) ]
$($rest)*
));
(
[ $($replaced:tt)* ]
x
$($rest:tt)*
) => (replace_rvalues!(
[ $($replaced)* READ(at_x) ]
$($rest)*
));
(
[ $($replaced:tt)* ]
$tt:tt
$($rest:tt)*
) => (replace_rvalues!(
[ $($replaced)* $tt ]
$($rest)*
));
(
[ $($replaced:tt)* ]
) => (
$($replaced)*
);
( $($tt:tt)* ) => (
replace_rvalues!([] $($tt)*)
)
}
macro_rules! exec {
/* replaces */ (
the_answer <- $($tt:tt)*
) => /* with */ (
WRITE(at_the_answer,
replace_rvalues!( $($tt)* )
)
);
/* replaces */ (
x <- $($tt:tt)*
) => /* WITH */ (
WRITE(at_x,
replace_rvalues!( $($tt)* )
)
);
/* replaces */ (
PRINT ( $($tt:tt)* )
) => /* with */ (
println!("{} = {}", stringify!($($tt)*), replace_rvalues!( $($tt)* ));
);
}
macro_rules! main {
(@parsing
[ $($stmts:tt)* ]
[ $($current:tt)* ]
;
$($rest:tt)*
) => (main!{@parsing
[ $($stmts)* ( $($current)* ) ]
[]
$($rest)*
});
(@parsing
$stmts:tt
[ $($current:tt)* ]
$tt:tt $($rest:tt)*
) => (main!{@parsing
$stmts
[ $($current)* $tt]
$($rest)*
});
(@parsing
[ $( ( $($stmts:tt)* ) )* ]
[]
) => (
fn main () {
$(
exec!{ $($stmts)* }
)*
}
);
(
$($tt:tt)*
) => (main!{@parsing
[]
[]
$($tt)*
});
}
#[allow(non_camel_case_types)]
type address = usize;
const at_the_answer: address = 1000;
const at_x: address = 1001;
main! {
the_answer <- 42;
x <- 27;
PRINT(the_answer); // -> 42
the_answer <- the_answer + x;
PRINT(the_answer); // -> 69
}
- i.e., on runtime [return]
- The representation of a number (here in base ten) vs its true nature is a (very interesting) topic in and on itself, which will most probably deserve its own blog post. One great definition / view of a (natural) number’s nature is Peano’s arithmetic, with which one can prove that
1+1+0 + 1+1+0 = 1+1+1+1+0
[return] const
does not declare a variable, but a meta binding. [return]- and
.bss
for initially zero-ed (global) memory [return] - and pattern bindings such as
match
arms or function parameters [return] - or can even use just one register (since the allocations are all static) [return]
- offsets are actually negative and do not start at 0, but those are just implementation details [return]
- accessing the memory at a given address [return]