List Class & Iterators

Introduction

The ADTs we've seen thus far are Stack and Queue. In both cases we've seen that the underlying implementation can be based on either arrays or linked lists. Arrays and linked lists don't have ADT's because we have no way of separating their interfaces from their implementation. For example, you can't use a linked list without understanding pointers, knowing about the Node class, knowing about the data and next fields of the Node class, knowing that a next pointer value of zero indicates the end of the lists, etc. So Stack and Queue are nice data structures with ADT's, which have the separation of interface and implementation that we've been trying so hard all semester to achieve. Linked lists and arrays are just tools we use to, among other things, implement our Stacks and Queues.

Despite the previous paragraph, it'd be very nice if we had a List ADT and a module which implemented it in the same nice way as the stack and queue modules we've been using. We're going to develop an ADT for list, but we'll see that doing so requires more than just specifying member functions!

A first attempt

A first attempt at specifying an ADT for a class List is this:

List();
~List();
void add2front(const DataType &val);
void add2back(const DataType &val);

So, we can add new values to the front of a list and we can add new values to the back of a list. While adding values to a list is important, it doesn't do us much good if we can't do anything with what we've put in the list. With Stack and Queue this was easy, because the user is only allowed to access data from a Stack via "pop", and the user is only allowed to access data from a Queue via "dequeue." With a List we expect to be able to access elements any way we want by traversing the list.

One way we could imagine letting a user access elements in a list is to subscript it the way we subscript an array. Perhaps we could add

DataType getElement(int i);    

to our ADT. There are at least two objections to this. The first, and most important, is that traversing a list becomes a lot of work for our List class. Suppose we simply want to cycle through a List, which is pretty typical. Remember, when getElement(i) is called, the List class is going to have to start at the beginning of the linked list and move all the way through to the ith node, which will take i steps. To get all the elements out of a list of n elements this way takes:

getElement(1);    1 step
getElement(2);    2 steps
getElement(3);    3 steps
.
.
.
getElement(n-1);  n-1 steps
getElement(n);    n steps
-----------------------------
                  n(n+1)/2 steps

So, with a list of 1,000 elements, it takes us 500,500 steps. When we were manipulating the pointers ourselves, we could traverse a 1,000-element linked list in 1000 steps and get the data along the way. With our fancy new ADT it takes over 500 times longer! This doesn't seem like progress.

A better idea

The problem with putting list access by index in our ADT for List is, basically, that it's not the way people use lists. When we use lists, we have some idea of our "current position," and we move our current position around in the list. We've been using pointers to tell us what our current position is, but requiring the user of our List class to access the list via pointers violates the whole concept of separating interface and implementation, and the whole point of having an ADT. So let's think about what the pointer was doing for us as we traversed our list. The pointer basically just remembered where we were in the list. That way we could tell it to access the data "where we were in the list", or move "where we were in the list" to the next item. What we need for our List ADT is a type of object that can remember positions in a list - i.e. we need to give our ADT a type, not just more functions. We refer to objects that allow us to move through collections of data remembering "where we were" as iterators. So what we want to do is augment our ADT with a datatype called Iterator.

All we need to be able to do with an iterator is get the value at the position the iterator is "remembering", and move the iterator to the next position. For the list we need an additional function begin() that returns an iterator set to the first element in the list.

List();
~List();
void add2front(const DataType &val);
void add2back(const DataType &val);

class Iterator {
public:
  DataType& get(); // <-- Note: return by reference!
  void next();
};
Iterator begin();

To give you a rough idea of what a program using a class List implementing this ADT would look like, consider the following code, which would read in numbers from the user and print the first 5 entered:

List L;
int k;
while(cin >> k)
  L.add2back(k);

Iterator itr = L.begin();
for(int i = 0; i < 5; i++) {
  cout << itr.get() << endl;
  itr.next();
}

In a sense, the iterator is giving us an ADT for a pointer - i.e. it's like a pointer whose interface and implementation are separate. Note: the get() function returns its object by reference. This is a subtle point, but what it means is that what's returned is not a copy of the value stored at the given node, rather it's the actual value. That means you can change what's stored at the given node. In the above we could've gone through the list as in our for-loop doing "itr.get() = 2*itr.get()", and we'd be modifying the list so that each element was double its original value.

Finishing off our List ADT

To finish off our list ADT, we need a mechanism for determining when an iterator has reached the end of a list. One could dream up many ways to do this. One nice mechanism, and one that we'll revisit later when we discuss the Standard Template Library, is to add a member end() to List that returns an iterator not to the last element in the list, but one element past. In other words, end() returns the iterator you'd get if you had an iterator to the last element in the list and then did a next(). If we also have a mechanism to determine whether two iterators are equal then we can check our iterator to see if it equals end()and, if so, we'll know we've traversed the whole list.

List();
~List();
void add2front(const DataType &val);
void add2back(const DataType &val);

class Iterator {
public:
  DataType& get();  <-- Note: return by reference!
  void next();
  bool equal(const Iterator& A);
};

Iterator begin();
Iterator end();

Now we can revise our previous program so that it actually prints out all the values stored in the list.

With a WHILE loop: With a FOR loop:
List L;
int k;
while(cin >> k)
  L.add2back(k);

Iterator itr = L.begin();
while (!itr.equal(L.end())) {
  cout << itr.get() << endl;
  itr.next();
}
or
List L;
int k;
while(cin >> k)
  L.add2back(k);

for (Iterator itr = L.begin(); !itr.equal(L.end()); itr.next()) {
  cout << itr.get() << endl;
}
  

Acting as user of a class implementing the List ADT

Just as with Stack and Queue, we're going to act as the user of a class that conforms to the List ADT before we worry about being the implementer of such a class. The file List.h, defines a class List conforming to the ADT. There is no associated .cpp file for this class. Just as with Stack and Queue, the name List must be followed by the name (in angle brackets) of the type you want to store in the list. Another subtle point is that the iterator class has been declared inside the class list. So its name isn't Iterator, rather it's List<DataType>::Iterator, which looks intimidating but which hopefully makes sense by now! So a complete version of the above program would be:

With a WHILE loop: With a FOR loop:
#include "List.h"
#include <iostream>
using namespace std;

int main() {
  // Read int's from the user and store in list L
  List<int> L;
  int k;
  while(cin >> k)
    L.add2back(k);

  // Print contents of the list
  List<int>::Iterator itr = L.begin();
  while (!itr.equal(L.end())) {
    cout << itr.get() << endl;
    itr.next();
  }
}
or
#include "List.h"
#include <iostream>
using namespace std;

int main() {
  // Read int's from the user and store in list L
  List<int> L;
  int k;
  while(cin >> k)
    L.add2back(k);

  // Print contents of the list
  for (List<int>::Iterator itr = L.begin(); !itr.equal(L.end()); itr.next()) {
    cout << itr.get() << endl;
  }
}

To understand why we can't just say Iterator and be done with it, imagine what would happen with our program if there were two lists, one storing doubles and one storing ints. When you said Iterator the compiler wouldn't know if it was an iterator for the list of double or for the list of ints. You might object "well why can't we say Iterator<double> and Iterator<int>?" Well, if things worked that way, then only Lists could have iterators. As we use more and more data structures you'll find that many of them ought to have iterators. So we need to be able to specify exactly what type of data structure and iterator belongs too, and List<double>::Iterator does that!