A persistent tree using indirect enums in Swift

UPDATE: now conforms to Swift 2.1b2

Suppose you want an ordered set. Set is great and all, but it’s unordered. You could always use a Set anyway, and sort the contents when needed. Or you could use an array, keep it sorted, and always insert new elements in the right place. Or, you could use a tree.

One way to build a tree in Swift would be with a class that refers to itself. Something like this for a binary search tree:

class Tree<Element: Comparable> {
    let value: Element
    // entries < value go on the left
    let left: Tree<Element>?
    // entries > value go on the right
    let right: Tree<Element>?
}

For the tree’s logic, Element would only have to be Comparable, not Hashable, which is another potential benefit over Set.

The left and right subtrees are optional, with nil at the leaves when there’s no lesser/greater value than the current node. But there’s a problem with this approach – how do you represent an empty tree? value is not optional. Sure you could make it optional, and have an empty tree be represented by a nil value. But that doesn’t feel healthy.

Really, you want an enum with an associated type. Something like this:

enum Tree<Element: Comparable> {
    case Empty
    case Node(Tree<Element>,Element,Tree<Element>)
}

So, a tree is either empty, or it has a left and right subtree and a value. The subtrees no longer need to be optional, because if they’re empty, they can just hold the .Empty value.

No more boxes

Except, this won’t work. Swift enums are value types. So they can’t contain themselves like this. You need some kind of indirection – a reference type, such as a class.

Up until 2.0b4, you would have to use a Box trick – inserting a class in between each enum:

final class Box<Element> {
    let unbox: Element
    init(_ x: Element) { self.unbox = x }
}

enum Tree <Element: Comparable> {
    case Empty
    case Node(Box<Tree<Element>>,Element,Box<Tree<Element>>)
}

Well that’s pretty gross. And we haven’t even got to the actual implementation logic yet. Trust me, Box is gonna stomp all over that code leaving its muddy footprints everywhere.

But good news! The latest release of Swift introduces the indirect keyword for enums. This is similar to a transparent version of the box above, putting a reference where the boxed value would go. Xcode will even suggest putting in the keyword for you.

So now we can write this:

indirect enum Tree<Element: Comparable> {
    case Empty
    case Node(Tree<Element>,Element,Tree<Element>)
}

But really, we want a balanced tree. Simple binary search trees work well with random data, but if you insert already-ordered data into them, they degenerate into a linked list (because all the insertions happen down one side of the tree).

So lets go with a red-black tree – a kind of binary search tree that colours its nodes red or black, and then uses some invariants to guarantee it remains balanced on insertion. If you follow that link you’ll see quite a lot of ugly code for maintaining those invariants – but don’t worry, we’re going to write something that hopefully looks a lot simpler.

So here’s the tree, now with added colour, plus a couple of initializers to make things easier:

enum Color { case R, B }

indirect enum Tree<Element: Comparable> {
    case Empty
    case Node(Color,Tree<Element>,Element,Tree<Element>)

    init() { self = .Empty }

    init(_ x: Element, color: Color = .B,
        left: Tree<Element> = .Empty, right: Tree<Element> = .Empty)
    {
        self = .Node(color, left, x, right)
    }
}

Some might not like that I called my Color cases R and B instead of Red and Black. It’s a personal preference, but I like to keep them shorter because a) they’re arbitrary – they are only called that because those were the two colours on the laser printer they had at Xerox PARC, and b) it keeps the pattern matching shorter and (therefore, IMO) clearer. Judge for yourself when we get to that part.

By the way, this implementation is a Swift translation of an ML version from Chris Okasaki’s Purely Functional Data Structures which is definitely worth reading if you like this kind of stuff. What’s cool is that with the new 2.0 features of Swift, the Swift version is getting pretty close to the ML one in terms of expressivity/simplicity.

Checking if the tree contains an element

So, let’s start with contains. This needs to recur down the tree, checking each value to see if it’s empty, or if not, whether its value is less than, greater than or equal to the value we’re looking for.

Ideally I would write this using a guard statement like so:

extension Tree {
    func contains(x: Element) -> Bool {
        guard case let .Node(_,left,y,right) = self
          else { return false }

        if x < y { return left.contains(x) }
        if y < x { return right.contains(x) }
        return true
    }
}

I like how guard helps here. It lets us unpack the important stuff used by the main body of the function (is it a node? If so, grab the left, right and value elements – discarding color, since its not important). But it also lets us put the failure case – if you hit an empty node, you know the tree does not contain the element – up front. Then, head left if the element we’re looking for is less than this one, or right if it’s greater.

We can rely on the fact that Comparable elements are required to be in a strict total order. This means if x is neither less than or greater than a value, it must be equal to that value. So we’ve found the element and contains is true.

Inserting an element

OK, insert is a little trickier. We’re going to define insert as a method that returns a new tree – one with the new value inserted. This is a little different to Set.insert, which is an in-place insert.

In doing this, we’re going to create a “persistent” data structure – one that does not change the old structure when it updates it. This might sound like multiple insertions in a row are going to perform many unnecessary copies of the whole structure, but it isn’t necessarily that bad. Because nodes of the tree are read-only, two trees can share nodes. Only the nodes that are actually changed as a result of the insert need to be copied.

First, the insert method. It doesn’t really do much except maintain one of the invariants required to maintain balance: the root node must always be black. It leaves most of the work to a second function, ins:

private func ins<T>(into: Tree<T>, _ x: T) -> Tree<T> {
    guard case let .Node(c, l, y, r) = into
        else { return Tree(x, color: .R) }

    if x < y { return balance(Tree(y, color: c, left: ins(l,x), right: r)) }
    if y < x { return balance(Tree(y, color: c, left: l, right: ins(r, x))) }
    return into
}


extension Tree {
    func insert(x: Element) -> Tree {
        guard case let .Node(_,l,y,r) = ins(self, x)
            else { fatalError("ins should never return an empty tree") }

        return .Node(.B,l,y,r)
    }
}

So, if we’ve hit an empty node, append the value here as a new node. New nodes are always red.

Otherwise, insert the value into the left or right subtree. And if the element is equal, stop and don’t insert anything, since we’re implementing set behaviour here.

But there’s also the call to balance. This function is going to ensure that, as we come back up the tree having inserted our new element, the tree is properly balanced, by looking at whether any invariants have been broken and fixing them.

This is normally where the ugliness of implementing a red-black tree resides. But we’re going to use Swift’s pattern matching to try and keep it simple. The balancing logic can be summarized as: “on the way back up from an insertion, if there’s a black node, with a red child and a red grandchild, this needs fixing”.

A red child and grandchild of a black node can happen in one of four different configurations. Here’s a visualization of one of those possibilities (excuse the ASCII art):

        z: black
       /        \
      x: red     d
     / \
    a   y: red
       / \
      b   c

needs rearranging into:

         y: red
       /        \
      x: black   z: black
     / \        / \
    a   b      c   d

Once you understand this visualization, you can then encode it into a Swift pattern matching if statement, that extracts all the relevant variables from the possible imbalanced structure, then returns them reconfigured as the second, balanced, version.

The above re-balancing operation would end up encoded in Swift like this:

case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
    return .Node(.R, .Node(.B, a, x, b), y, .Node(.B, c, z, d))

This is why I prefer .B and .R for the colour case names – single-letter names to me look clearer in this stage, which is when they matter. There’s no getting around it, the transformation code is going to be hard to read no matter what you do, so the best you can hope for is easy translation. The longer words wouldn’t help with the key problem, which is encoding the diagram above into a pattern-matching statement.

To write the full balancing function, you need to visualize all 4 imbalanced configurations, then encode them into 4 patterns. If a node doesn’t match any, it’s good and you can return it as-is:

private func balance<T>(tree: Tree<T>) -> Tree<T> {
    switch tree {
    case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    default:
        return tree
    }
}

Note, this pattern match recurses 3 levels deep into the enum. This is also where getting rid of Box was critical. It seriously gets in the way – here is how it would look before indirect:

if case let .Node(.B,l,z,d) = tree, case let .Node(.R,ll,y,c) = l.unbox, case let .Node(.R,a,x,b) = ll.unbox {
    return .Node(.R, Box(.Node(.B,a,x,b)),y,Box(.Node(.B,c,z,d)))
}

Note, the return statement is the same in all 4 rebalance patterns. It’s only the pattern match that changes, based on which of the 4 imbalance possibilities is being checked.

It might be nice to be able to group these 4 possible matches together like so:

    switch tree {
    case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d),
             .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d),
             .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)),
             .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    default:
        return tree
    }

but Swift won’t allow multiple patterns in a case when the pattern declares variables, so we have to stick with the multiple returns for now.

Traversing in-order

So, now we have a nice balanced tree. But to get the benefit of an ordered set, we want to iterate over it in order. This sounds like a job for SequenceType. That way, you could use for...in over the sorted elements.

In-order traversal of a tree is usually done recursively, but that’s a little tricky to do when you want to give anyGenerator a closure expression that spits out the next element. So instead, we can use a stack to do it iteratively:

extension Tree: SequenceType {
    func generate() -> AnyGenerator<Element> {
        var stack: [Tree] = []
        var current: Tree = self

        return anyGenerator { _ -> Element? in
            while true {
                // if there's a left-hand node, head down it
                if case let .Node(_,l,_,_) = current {
                    stack.append(current)
                    current = l
                }
                // if there isn’t, head back up, going right as
                // soon as you can:
                else if !stack.isEmpty, case let .Node(_,_,x,r) = stack.removeLast() {
                    current = r
                    return x
                }
                else {
                // otherwise, we’re done
                return nil
                }
            }
        }
    }
}

Initializing from another sequence

One last extension. When you have a container, it’s nice to be able to initialize it from another sequence, as well as from a literal. These are both pretty easy:

extension Tree: ArrayLiteralConvertible {
    init <S: SequenceType where S.Generator.Element == Element>(_ source: S) {
        self = source.reduce(Tree()) { $0.insert($1) }
    }

    init(arrayLiteral elements: Element...) {
        self = Tree(elements)
    }
}

Given these, you can now create new sets using literal syntax:

let alphabet = Tree("the quick brown fox jumps over the lazy dog".characters)
let primes: Tree = [2, 3, 5, 7, 11, 13, 17, 19]

That’s it for now. Several features of Swift 2.0 make this code a lot more pleasant to read and write. There are various further enhancements you could make. The tree maybe ought to be wrapped in an outer struct to avoid exposing the enum implementation. The next logical extension would be conforming to CollectionType.

And there are also performance improvements you could make (the functional data structures book suggests a few as exercises). Honestly, the times when this data structure will do better than an unordered set or array + sort are probably going to be pretty rare. But it’s nice to have the option now.

Below is the final code, along with some tests, or you can find the full code for the tree in a gist, here.

If you found this interesting (and if you made it all the way down here, hopefully that’s true!), it’ll be part of Chris Eidhof and my book, which you can get a preview of here.

enum Color { case R, B }

indirect enum Tree<Element: Comparable> {
    case Empty
    case Node(Color,Tree<Element>,Element,Tree<Element>)
    
    init() { self = .Empty }
    
    init(_ x: Element, color: Color = .B,
        left: Tree<Element> = .Empty, right: Tree<Element> = .Empty)
    {
        self = .Node(color, left, x, right)
    }
}


extension Tree {
    func contains(x: Element) -> Bool {
        guard case let .Node(_,left,y,right) = self
            else { return false }
        
        if x < y { return left.contains(x) }
        if y < x { return right.contains(x) }
        return true
    }
}

private func balance<T>(tree: Tree<T>) -> Tree<T> {
    switch tree {
    case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    default:
        return tree
    }
}


private func ins<T>(into: Tree<T>, _ x: T) -> Tree<T> {
    guard case let .Node(c, l, y, r) = into
        else { return Tree(x, color: .R) }

    if x < y { return balance(Tree(y, color: c, left: ins(l,x), right: r)) }
    if y < x { return balance(Tree(y, color: c, left: l, right: ins(r, x))) }
    return into
}


extension Tree {
    func insert(x: Element) -> Tree {
        guard case let .Node(_,l,y,r) = ins(self, x)
            else { fatalError("ins should never return an empty tree") }

        return .Node(.B,l,y,r)
    }
}



extension Tree: SequenceType {
    func generate() -> AnyGenerator<Element> {
        var stack: [Tree] = []
        var current: Tree = self

        return anyGenerator { _ -> Element? in
            while true {
                // if there's a left-hand node, head down it
                if case let .Node(_,l,_,_) = current {
                    stack.append(current)
                    current = l
                }
                // if there isn’t, head back up, going right as
                // soon as you can:
                else if !stack.isEmpty, case let .Node(_,_,x,r) = stack.removeLast() {
                    current = r
                    return x
                }
                else {
                // otherwise, we’re done
                return nil
                }
            }
        }
    }
}

extension Tree: ArrayLiteralConvertible {
    init <S: SequenceType where S.Generator.Element == Element>(_ source: S) {
        self = source.reduce(Tree()) { $0.insert($1) }
    }
    
    init(arrayLiteral elements: Element...) {
        self = Tree(elements)
    }
}

import Darwin

extension Array {
    func shuffle() -> [Element] {
        var list = self
        for i in 0..<(list.count - 1) {
            let j = Int(arc4random_uniform(UInt32(list.count - i))) + i
            guard i != j else { continue }
            swap(&list[i], &list[j])
        }
        return list
    }
}


let engines = [
    "Daisy", "Salty", "Harold", "Cranky",
    "Thomas", "Henry", "James", "Toby",
    "Belle", "Diesel", "Stepney", "Gordon",
    "Captain", "Percy", "Arry", "Bert",
    "Spencer",
]

// test various inserting engines in various different permutations
for permutation in [engines, engines.sort(), engines.sort(>),engines.shuffle(),engines.shuffle()] {
    let t = Tree(permutation)
    assert(!t.contains("Fred"))
    assert(t.elementsEqual(t.insert("Thomas")))
    assert(!engines.contains { !t.contains($0) })
    assert(t.elementsEqual(engines.sort()))
    print(t.joinWithSeparator(","))
}

6 thoughts on “A persistent tree using indirect enums in Swift

  1. This is great actually. I remember my computer science lecturer getting very confused over red black trees , which put me off to the point where I have not looked at them for years (and that was using Haskell !) This code and explanation is very clear.

    Interested in the book now 🙂

  2. To avoid the repetition of the same return statement in all 4 rebalance patterns, we could use the “fallthrough” keyword in the first three of them.

Leave a comment