SI221 (Fall AY2021)

Introduction to Trees and Binary Trees

Introduction

At this point in the semester, we've covered several data structures, and discussed the details of their implementation, as well as the pros can cons of their use. You may not have noticed, but every structure we've discussed has had one thing in common. Regardless of the way we interact with it, or what's the underlying container is, each Abstract Data Type (ADT) has been linear in nature. By linear, I mean that the contents of the container follow some kind of sequence from one to the next. Another way to think about "linear" is that each element has at most one item before it and one after.

Many times, the information we want to represent is not linear. For example, a file directory structure or, perhaps more appropriate in our environment, an organizational chart. For these things, a linear data structure would not suffice. If only there were a data structure that could handle non-linear relationships....

The Tree ADT is the first data structure we'll examine that is non-linear in nature. Because it is non-linear, the relationship between elements is more complex. But it allows us to represent non-linear data in fairly straight-forward a way as long as we follow some basic rules. Before we jump into the implementation considerations, let's take a few moments to review some basic nomenclature.

Tree Nomenclature

The tree data structure is not unlike a biological tree (thus the name, if you didn't already figure that out!). Like the majestic oak, our trees have roots. But unlike the oak, our tree has a single root, and we put that root at the top when represented visually. Actually, our tree looks more like a family tree (like you would see in genealogy)...and the terms we use are very similar.

Each data-holding element in the tree is called a Node. Each node in the tree can have 0 or more descendant Nodes. A node with descendants is called a Parent node, and these descendants are called Child nodes. Two nodes that have the same parent are said to be siblings. While a parent can have multiple children, a child can only have one parent! While there are data structures that allow such things, a tree is not one of them!

The tree Root is a special Node which has no parent. A node that has no children is called a Leaf node. Every node in the tree is the root of a Subtree.

All of the nodes above a given node are known as that node's ancestors. Likewise, all nodes below a given node are called that node's descendants.

The direct connection between two nodes is called an edge. A tree with N nodes always has N-1 edges (can you write a proof for that?).

There can never be more than one path between two nodes in a tree. The path length between two nodes is the number of edges that must be crossed to reach one node from another.

When we talk about the depth of the tree, we're referring to the maximum path length to a given leaf from the tree root. Conversely, the height refers to the maximum path length to the tree root from a given node.

Because there can only be one path between nodes, it's not possible for a path to lead back the starting node. In fact, when traversing a path in a tree, your depth will always increase and your height will always decrease!

Tree methods

Before we ever look at code, we could probably figure out pretty easily what functionality we would need to build into a Tree class to accomplish what we want. I think the following list is a good start:

root( ): returns root
parent(v): returns parent of v
children(v): returns iterator of children of v
elements( ): returns iterator of all elements
swapElements(v,w): swaps elements at two nodes
replaceElement(v,e): replaces element of a node

Let's not forget the standard methods included in all Standard Template Library ADTs:

size( ): returns number of nodes
isEmpty(): returns boolean value

We'll look at each of these in more detail in follow-on lectures as we delve into the details of implementation.

Binary Trees

The most popular tree in computer science is the binary tree. Ok, so I have no proof to back up that statement. But you can see it all over the place. A binary tree conforms to the standard introduced above, but applies a strict limitation on the number of children a node can have. Specifically, no node in a binary tree can have more than 2 children.

A full binary tree is one in which every leaf is at the same depth:

A binary tree doesn't have to be full. In fact, the are often not. A tree can be not full, but still be considered complete. To be a complete tree, all leaves at the highest depth (N) must be added from the left-most leaf first. The tree must be full at depth N-1.

Determining Binary Tree Depth

Limiting the number of children to two results in a data structure that is highly efficient at some of the most common tasks, such as searching. It also allows us figure out the depth of a tree given a number of nodes.

Finding the minimum depth is easy. Knowing the every node can have two children, we know that at depth d, a tree can have from 2^d to 2^d+1-1 nodes. This implies that the minimum depth of a binary tree is log₂N. The minimum depth represents the best case scenario (I'll leave it to you as an academic exercise to show that the minimum depth occurs when a tree is complete).

The maximum depth of a binary tree is found when a tree is in its most "incomplete" state. For example, every node in the following tree has only one child:

Take a moment to validate in your head that this is indeed a binary tree. Furthermore, it's in the worst possible state....it may as well be a linked list! Quick observation tells us that the maximum depth of a binary tree, given N nodes, is N-1.

In order to capitalize on the benefits of a binary tree, we should always strive to have a binary tree that is closer to a depth around log₂N.