Looking Back on C# 7: refs enhancements

With C# 8 on our doorstep, I figure it is a good time to reflect on recent additions to the language that have come before. There are some great improvements you may have missed, some that I really enjoy using, and some I consider have reached canonical usage status that I think are all worth some reflection.

We talked about the out variables in the previous post in the series, but there are a few other enhancements related to ref as well.

Hopefully, the concepts of values, pointers, stacks and heaps make sense to you at a conceptual level. These are crucial concepts to using and understanding ref and out.

As a quick recap, stack memory is local to a function and the function call stack. This memory is semantically “pushed” to for each method call, and “popped” from on each return (the call stack, stack overflow etc). The heap is a shared memory space where objects are allocated and stored, which is acted on by the Garbage Collector.

Values are mostly stack-allocated. They are copied around so that setting a value variable does not affect any other variable. Primitive types and structs are Value types. Reference types are the classes in C#. Classes are always stored on the heap, and all variables of class types are pointers or references. Assigning a reference to a new variable will point to the same value on the heap. Changing a field value on a reference type will be reflected in both variables. (Boxing values onto the heap is another thing, too.)

We already have out that allows the caller to declare a variable on the current stack that it passes by address (pointer, reference, or ref) that the method is contracted to set for you. This essentially gives us some of the power of reference types from a stack-allocated value. When we use ref, we gain all of the power of passing by reference that we have from heap allocated class types. We are essentially saying that the caller can use the existing value of the reference, and also set a new value to the stack variable if it wants to as well. Basically, all the restrictions that out imposes are taken off. A ref may not just be a stack variable, but could also be a field on a class that you want to access directly by reference, instead of having to constantly dereference it.

Pass by ref has been in C# since the beginning, but like the out parameters, only works in method signatures. However, from the beginning, you could never declare a ref variable.

In C# 7, ref has been extended to work with return types, and with variables. You can return a value by reference. And so that it the returned result can be assigned to something in a useful way, we also now have ref variable typing. Like most C# language features, there is safety built in.

You must add the ref keyword to the method signature and to all return statements in a method.
A ref return may be assigned to a value variable (by copy) or a ref variable.
You can’t assign a standard method return value to a ref local variable.
You can’t return a ref to a variable whose lifetime doesn’t extend beyond the execution of the method.

These rules ensure the safety of your code and ensure readability, that it is clear about what is happening.

I’m going to blatantly steal the Microsoft examples for this because I don’t want to invent an example and get it wrong.

// We declare the method as returning by reference (rather than copy)
public ref int Find(int number, int[] numbers)
{
    for (int i = 0; i < numbers.Length; i++)
    {
        if (numbers[i] == number)
        {
            // All returns must use the `ref` keyword
            return ref numbers[i]; // return the storage location, not the value
        }
    }
    // We can still throw an exception if necessary
    throw new IndexOutOfRangeException($"{nameof(number)} not found");
}

// arrays are already allocated to the heap and passed by reference rather than by value
// (see the `stackalloc` keyword for stack allocating arrays, though)
int[] array = { 1, 15, -39, 0, 7, 14, -12 };

// We have to use `ref` to call the method
// We choose to declare `place` as a reference
ref int place = ref Find(7, array); // aliases 7's place in the array
place = 9; // replaces 7 with 9 in the array
WriteLine(array[4]); // prints 9

In the above example, we could have chosen to declare place without the ref keyword and the value returned would be copied instead. However, in this case, the assignment of place = 9; would be overriding the local copy, and not modifying the original array.

Why would you use these pass by reference additions? Huge performance enhancements can be achieved by avoiding stack and heap copying or dereferencing of values in certain algorithms. Performance is the name of the game here.

In C# 7.2, the conditional operator (isTrue ? x : y syntax) can now evaluate to a reference result when both operands (x and y above) are also references.

ref var r = ref (arr != null ? ref arr[0] : ref otherArr[0]);

In 7.2 we also got ref readonly, which allows returned references to disallow modification enforced by the compiler. This may save time constantly dereferencing a child field in a scenario that you need to get the latest value, for instance in a loop. Again, performance is the target use-case.

The first version of ref variables was immutable only. Whatever you declared them to point to was what they always referred to for their lifetime. In C# 7.3, they were updated so you can reuse a variable to point to a different reference instead.

To complement the safety of the out restrictions compared to ref we also get the in keyword.

Declaring a method parameter with in essentially makes it a read-only reference. the method is not allowed to modify the value passed in but gets all the benefits of being passed by reference rather than by value (copied). The in keyword will make the compiler ensure that the method is not allowed to modify the original passed in value. If necessary it will create a defensive shadow-copy to ensure that is true.

This is well paired with another new feature, readonly struct. Declaring a struct as read-only means the compiler will ensure you are indeed read-only. (It will disallow public int Foo { get; private set; } for example.) You can use the in keyword for any methods that you want to take a ref to one of these structs again to ensure clarity when reading the code, but also enforced by the compiler.

I mentioned the defensive shadow copy above. The language and runtime do not guarantee that the internal implementation detail of a Property or Method is non-mutable from the contracts, so the compiler will get defensive, and make copies before calling anything that might cause a mutation. This way, the language guarantees the expectations of passing a read-only reference, but maybe doesn’t match performance expectations in the process. As a developer, by making the type a readonly struct instead, the compiler can rely on the guarantees and won’t make any copies. The struct won’t compile if it mutates any of its internal state, so we have stronger guarantees at compile-time and run-time.

These features are certainly power features, and when you need them they will be useful. But like most advanced features, you may be sacrificing readability for performance and optimisation. Use sparingly, but maybe measure first and then sprinkle in and measure again.