An array is one of the most common data types. It is also one of the most fundamental data structures. In this article, we would like to discuss its properties and try to answer the following question: why does the indexing of an array always start from 0?
Array and Random Access
An array is one kind of linear lists. It makes use of a contiguous block in memory to store a set of data with the same type.
A linear list means that data are stored in a linear structure (think of a line). At each node (datum), the relative position of other nodes is either in front or back.
Besides arrays, some other common linear list structures include linked list, queue, and stack.
There are also many data structures that are non-linear, e.g., tree (BT, BST), graph, and heap. In those non-linear structures, the relative connections between nodes are much more complicated.
A contiguous block in memory and a collection of data with the same type both contribute to the most crucial property of array: random accessibility. However, these two restrictions also cause problems of inefficient insertion and deletion in an array since we need to do extra work of moving data to ensure they are still living in a contiguous area.
So how does an array randomly accesses its items by their index? The computer acquires data in memory by their address. For arrays, we have the following formula:
a[i]_address = base_address + i * data_type_size
As an example, for an integer array,
data_type_size is 4. It is easy to see that the time complexity for accessing an element in an array is O(1). Note that it is random accessing an element in an array that costs O(1), but not searching an element. Even if an array is sorted, searching an element will cost O(log n), not O(1).
Inefficient Insertion and Deletion
As we all know, the requirement of contiguous memory address results in inefficient insertions and deletions for an array, whose time complexity is O(n) for both.
Container v.s. Array
The advantage over using a raw array to use a container, e.g.,
vector in C++, is that a container typically hides the details of various array manipulations and it enables dynamic expansion of the size. However, in some cases, an array is a better choice, for example:
we already know the size of data and simple manipulations are enough to complete the task
we do not want to pay extra overhead (even little) to use a container, especially in low-level development where we must be extremely efficient to optimize the performance
Why does indexing start from 0?
One explanation for “index” is “offset”, which measures the offset distance from the starting position of the array. Let
a be the starting memory address of an array, then
a stands for the element that has an offset 0, which is exactly the first element. Similarly,
a[k] stands for the element that has an offset
k * data_type_size. To acquire the value of
a[k], we only need to calculate
a[k]_address = base_address + k * data_type_size
Now let us suppose the indexing of array starts from 1, the above formula for finding
a[k]_address = base_address + (k-1) * data_type_size
In this formula, there is an extra subtraction, which is inefficient since the array is a very primary data structure and finding its elements by indexing is one of the most fundamental operations. We should try our best to optimize the performance, and thus indexing of array always starts from 0, not 1.
There may be some historical reasons that also explains this well. When the feature is implemented in C language initially, other later languages, such as Java, adopt this practice to reduce the learning cost. In fact, in some language array indexing does not start from 0, e.g., Matlab. Additionally, some language even supports negative indexing, e.g., Python.