Mutation - Part 1: Variables

What is a variable?

Ok, here there is really a lot that could be said, and a simple blog post will not cover it. I will just talk about two important and distinct thinking patterns involved:

  1. the idea of binding a name to a value;

  2. the idea of reading/writing values from/into memory;

1. Identity binding

The best example in Rust for “just a binding” (i.e., with no backing memory) is const

1type is = i32;
2
3const x          :is= the_answer + 27;
4const the_answer :is= 42;
5

Here we have defined the_answer as an alias for the value 42; meaning that, from now on, everytime we say the_answer what we actually mean is 42.

We are at the meta level: if we ask the program to evaluate x, what we are actually1 asking the program to do is to evaluate the_answer + 27, i.e., to evaluate 42 + 27.

The same in C, with #define:

1#define x          (the_answer + 27)
2
3#define the_answer (42)

If you look at both code snippets, you can notice that there is no notion of temporality (regarding the definitions):

It is all about essence / identity.

Thus, if the_answer is and will always be 42, there is no room for mutation.

1.1 Scoped bindings

You can limit the “scope” of your identity-definition:

1// undefined
2{const the_answer :is= 42;
3    // defined
4}
5// undefined
6

This is one of the ways to have some form of temporality with bindings: time progresses as we enter and leave scopes.

1{const the_answer :is= 42;
2    {const x :is= the_answer + 27;
3        // x -> the_answer + 27 -> 42 + 27
4    }
5}
6

But what about re-defining the_answer?

By using a narrower scope, we can indeed re-define the value of the_answer, even though what we are actually doing is defining a brand new the_answer that happens to lead to a name collision. But given the different scopes and their temporality, this is unambiguous and thus allowed:

1{const the_answer :is= 42;
2    // 'the_answer' <-> 42
3    {const the_answer :is= 0;
4        // 'the_answer' <-> 0
5    }
6    // 'the_answer' <-> 42
7}
8

This trick is called shadowing and is a clever way to avoid using mutation when it is not needed.

2. Runtime memory

We are so used to manipulating memory through variables that I have defined some functions to pretend we are manipulating low-level memory; it thus provides “primitives / intrinsics” that allow us to read/write from/into the memory and also to print values:

 1#![allow(non_camel_case_types, non_snake_case)]
 2
 3/// 4Ko RAM?
 4static mut MEMORY: [u8; 4096] = [0; 4096];
 5type address = usize;
 6
 7fn READ (whence: address) -> u8
 8{
 9    unsafe { MEMORY[whence] }
10}
11
12fn WRITE (where_to: address, what: u8)
13{
14    unsafe { MEMORY[where_to] = what }
15}
16
17use ::std::dbg as PRINT;
18
19fn main ()
20{
21    WRITE(1000, 42);
22    WRITE(1001, 27);
23    PRINT!(READ(1000)); // -> 42
24    WRITE(1000, READ(1000) + READ(1001));
25    PRINT!(READ(1000)); // -> 69
26}
27

This way, we can see that we were able to

  1. write at the address 1000 the value 42;

  2. then write at that same address the value 69.

We have successfully mutated something!

Good, now we have a clearer understanding of what mutation actually is: modifying memory contents.

But what have variables to do with any of this?

2.1 Runtime memory with variables

The following sentence explains the relation between bindings and memory:

A (runtime) variable is a (static) binding to some place in memory, and is used to read/write from/into that place in memory

const at_the_answer: address = 1000;
const at_x: address = 1001;

fn main ()
{
    WRITE(at_the_answer, 42);
    WRITE(at_x, 27);
    PRINT(READ(at_the_answer)); // -> 42
    WRITE(at_the_answer, READ(at_the_answer) + READ(at_x));
    PRINT!(READ(at_the_answer)); // -> 69
}

And now with some sugar:

const at_the_answer: address = 1000;
const at_x: address = 1001;

main! {
    the_answer <- 42;
    x <- 27;
    PRINT!(the_answer); // -> 42
    the_answer <- the_answer + x;
    PRINT!(the_answer); // -> 69
}

How does that sugar work? Let’s describe the unsugaring of line 139:

Hence my aforementioned definition of a variable (let’s repeat it):

A (runtime) variable is a (static) binding to some place in memory, and is used to read/write from/into that place in memory

In memory? But where exactly?

static vs let

In Rust, there are two3 ways to declare a variable:

We finally have the bases required for thinking about mutation in Rust!

Appendix

Full unsugaring code

#![allow(non_upper_case_globals, non_snake_case)]

/// 4Ko RAM?
static mut MEMORY: [u8; 4096] = [0; 4096];

fn READ (whence: address) -> u8
{
    unsafe { MEMORY[whence] }
}

fn WRITE (where_to: address, what: u8)
{
    unsafe { MEMORY[where_to] = what }
}

/// Recursively replaces:
///     the_answer => READ(at_the_answer)
///     x          => READ(at_x)
macro_rules! replace_rvalues {
    (
        [ $($replaced:tt)* ]
        the_answer
        $($rest:tt)*
    ) => (replace_rvalues!(
        [ $($replaced)* READ(at_the_answer) ]
        $($rest)*
    ));

    (
        [ $($replaced:tt)* ]
        x
        $($rest:tt)*
    ) => (replace_rvalues!(
        [ $($replaced)* READ(at_x) ]
        $($rest)*
    ));


    (
        [ $($replaced:tt)* ]
        $tt:tt
        $($rest:tt)*
    ) => (replace_rvalues!(
        [ $($replaced)* $tt ]
        $($rest)*
    ));

    (
        [ $($replaced:tt)* ]
    ) => (
        $($replaced)*
    );

    ( $($tt:tt)* ) => (
        replace_rvalues!([] $($tt)*)
    )
}

macro_rules! exec {
    /* replaces */ (
        the_answer <- $($tt:tt)*
    ) => /* with */ (
        WRITE(at_the_answer,
            replace_rvalues!( $($tt)* )
        )
    );

    /* replaces */ (
        x <- $($tt:tt)*
    ) => /* WITH */ (
        WRITE(at_x,
            replace_rvalues!( $($tt)* )
        )
    );

    /* replaces */ (
        PRINT ( $($tt:tt)* )
    ) => /* with */ (
        println!("{} = {}", stringify!($($tt)*), replace_rvalues!( $($tt)* ));
    );
}

macro_rules! main {
    (@parsing
        [ $($stmts:tt)* ]
        [ $($current:tt)* ]
        ;
        $($rest:tt)*
    ) => (main!{@parsing
        [ $($stmts)* ( $($current)* ) ]
        []
        $($rest)*
    });

    (@parsing
        $stmts:tt
        [ $($current:tt)* ]
        $tt:tt $($rest:tt)*
    ) => (main!{@parsing
        $stmts
        [ $($current)* $tt]
        $($rest)*
    });

    (@parsing
        [ $( ( $($stmts:tt)* ) )* ]
        []
    ) => (
        fn main () {
            $(
                exec!{ $($stmts)* }
            )*
        }
    );

    (
        $($tt:tt)*
    ) => (main!{@parsing
        []
        []
        $($tt)*
    });
}

#[allow(non_camel_case_types)]
type address = usize;

const at_the_answer: address = 1000;
const at_x: address = 1001;

main! {
    the_answer <- 42;
    x <- 27;
    PRINT(the_answer); // -> 42
    the_answer <- the_answer + x;
    PRINT(the_answer); // -> 69
}

  1. i.e., on runtime [return]
  2. The representation of a number (here in base ten) vs its true nature is a (very interesting) topic in and on itself, which will most probably deserve its own blog post. One great definition / view of a (natural) number’s nature is Peano’s arithmetic, with which one can prove that 1+1+0 + 1+1+0 = 1+1+1+1+0 [return]
  3. const does not declare a variable, but a meta binding. [return]
  4. and .bss for initially zero-ed (global) memory [return]
  5. and pattern bindings such as match arms or function parameters [return]
  6. or can even use just one register (since the allocations are all static) [return]
  7. offsets are actually negative and do not start at 0, but those are just implementation details [return]
  8. accessing the memory at a given address [return]