What should we do if we have a large amount of related data, such as the ages of 1000 people? We could make a single variable to store each of those values, but that would require 1000 lines of code and the same amount of variable names to keep track of. What if that number is in the hundreds of thousands? Arrays are an amazing tool which can make problems like this much easier.
An array is a container for variables or data, which can store up to hundreds of millions of separate values, each of which can easily be accessed by a corresponding index. The index of a value in an array is its location within that array, with the first element at index 0 and the last element at the index (length of array - 1). If we are trying to retrieve a value from an array, as long as we know its index, we can retrieve it with the brackets ([ ]). Here is a simple example:
my_array = [3, 2, 1]
x = my_array[1]
print(x)
>>2
An array is declared and assigned by using the equals sign, and putting values on the right hand side of the equation between brackets, separated by commas. In the above example, we specified three values in our array, so it is of length three. You can access a certain element of the array through indexing, with the index inside brackets after the array’s variable name, as shown above.
Not only can we retrieve the value of an element in an array through indexing, but we can also modify the value of the element through the exact same method, by putting it on the left side of the equation.
my_array = [3, 2, 1]
print(my_array)
my_array[0] = 1
print(my_array)
>>[3, 2, 1]
[1, 2, 1]
Unlike in other languages, Python arrays can hold data of many different types:
my_array = [4, 3.1, "Hello"]
print(my_array)
>>[4, 3.1, 'Hello']
Although I have been calling them arrays up to this point, Python’s terminology does not actually call these objects “arrays”. This is because Python has three different types of array-like data types, lists, tuples, and dictionaries. All three of these are similar, but excel in their own niche.
Lists are the most array-like out of the three, and they will be the data type that you will probably use the most out of them. In the examples above, we were using lists. Lists are declared and defined by using the brackets, and we can initialize them with any values we might like. A key aspect of lists is that they are mutable, meaning we can alter their size by adding and removing elements whenever we want. For this reason, it is common practice to initialize a list as empty, and fill it with a for loop, like this:
my_list = []
for i in range(10):
my_list.append(i)
print(my_list)
>>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Here, we use the function append(), which both adds another empty space in the list my_list, and then fills it with the value passed as its argument. I believe this is our first encounter with the type of functions that are called with a period (.). Append() is an example of a member function. Member functions are functions that can only be called by a certain type of object. Objects are data types that are members of classes. This is a lot of terminology which is associated with the data structure known as a class, which we will explore in greater depth later on. For now, understand that this specific append() function can only be called by a list, and for that reason we use the dot syntax.
You probably noticed in the above example we used a for loop to fill out our list. For loops are often used in conjunction with lists. In many scenarios, we may need to apply an operation to each element in a list, separately. To do so, we must loop over each element in the list so that we can access every element, without missing any. Here is how we might do that:
my_array = [0, 8, 1]
for i in range(len(my_array)):
my_array[i] = my_array[i] + 5
print(my_array)
>>[5, 13, 6]
The first step is to recognize that we want a for loop which will start at i=0, the first index of the array, and end at, in this case, i=2, which is the last index of the array. Instead of specifying range(3) as the stopping point, we use range(len(my_array)). The len() function will return the length of the array which is passed into it, in this case three. So, by using len(), we can loop over an array for which we do not know the length of, a handy feature if we are implementing these instructions inside a function, where the array passed into it is not necessarily the same length every time. This also highlights an interesting feature of range(x), which is that it will start at 0 and end at x-1. Try print(range(10)) to see this in action.
Since our goal is to add five to each array element, inside the for loop we want to index our array be the iteration variable, i, so that each index is covered. We then alter the values of my_array to be that plus five.
If you do not have a strong grasp on the concepts so far, I recommend implementing your own lists and indexing them to get a feel of how they work, before we move on to two-dimensional lists.
For more complicated tasks, we might want lists whose elements are also lists, like in this example:
my_list = [[2, 3], [1, 0], [4, 2]]
Pay attention to the comma and bracket usage here to understand what is going on. This kind of structure would be useful in a scenario where we have data points such as age and gender, for each of many individuals.
Here, we have a list of length three, and each of its elements is a list of length two. This is an example of a two-dimensional list. The first dimension is the outermost list, so the one of length three, and the second dimension is the next outermost, or the one of length two. We can access every configuration of elements through clever indexing. Let’s see what happens when we only index by the first dimension of the array:
my_list = [[2, 3], [1, 0], [4, 2]]
for i in range(len(my_list)):
print(my_list[i])
>>[2, 3]
[1, 0]
[4, 2]
We can see that the for loop is executed three times, and each time we print the list corresponding to the iteration. This is no different from the situation of a one dimensional list, except instead of the data contained in the element being an integer or float, it is another list. But what if we want to access a value inside a list of the second dimension? To do this, we can introduce a second index:
my_list = [[2, 3], [1, 0], [4, 2]]
val = my_list[2][0]
print(val)
>>4
We can have as many indices as we have dimensions in a list. Here we have two, the first is to specify what data in the first dimension we want to access (in this case the list at index 2), and the second index will tell us, of that list we retrieved from the first index, what data value do we want (in this case, the first value).
What if we want to write a for loop, similar to the one that loops through each element in a one dimensional list, but for a two dimensional one? We could use a nested for loop, with each for loop looping through its own dimension of the list:
my_list = [[2, 3], [1, 0], [4, 2]]
for i in range(len(my_list)):
for j in range(len(my_list[0])):
print(my_list[i][j])
>>2
3
1
0
4
2
Notice that the number of iterations for the first for loop is len(my_list). As we have seen, this returns the length of the first dimension of my_list, three. However, for the second for loop, we don’t want it to loop three times, since there are only two elements in the second dimension of this list. So, we instead loop for len(my_list[0]) times. The first element of my_list is the list, [2, 3], which is of length two, so len(my_list[0]) = 2. For every extra dimension we might want to loop through, we would add another for loop and another variable of iteration.
Lists, and especially multi-dimensional lists, are tough concepts to wrap your head around. Once again, the best way to understand them is to head into your IDE and practice by creating your own lists and trying to index and loop through them.
If you feel confident with lists, here is a reference for other member functions lists can call, similar to .append(): https://www.w3schools.com/python/python_ref_list.asp
The next array-like structure is the tuple. We already saw one instance of the tuple, which pops up when we return more than one value from a function. The tuple is very similar to the list, except it is immutable, meaning it is created with a fixed length, that cannot change. This is a drawback to the power of a tuple, but at the same time it means we can iterate over tuples at greater speeds, due to the way memory is allocated in our computers. This benefit is not noticeable with small tuples and lists, but nevertheless it is there if we need it. Although lists can contain different data types, we often use tuples when we need an array-like structure whose elements are of different types, because their fixed length means we have greater knowledge over what kind of data is in each element of a tuple.
To declare and initialize a tuple, we use the same syntax as a list except instead of brackets ([ ]), we use the parentheses (( )):
my_tuple = (2, "hello", [3, 2, 1])
print(my_tuple[2])
>>[3, 2, 1]
Note that we index a tuple in the exact same way we index a list. We can also take a tuple and directly assign its contents to a corresponding amount of variables:
my_tuple = (2, "hello", [3, 2, 1])
num_cars, my_message, my_list = my_tuple
print(num_cars)
print(my_message)
print(my_list)
>>2
hello
[3, 2, 1]
Dictionaries are the final array-like structure. They are like lists in that they are mutable and indexable, but instead of their indices being labeled by integers, they are labeled by strings. They are declared with a very different syntax, which can be explained most clearly through an example:
my_dict = {"fruit": "apple", "age": 3, "price": 2}
print(my_dict)
>>{'fruit': 'apple', 'age': 3, 'price': 2}
This is a dictionary of length three, but instead of “apple” being at index 0, 3 at index 1, and 2 at index 2, “apple” is said to correspond to the key, “fruit”, 3 to “age”, and 2 to “price”.
We would index this dictionary like a list, but instead of an integer we pass the name of the key, like so:
my_dict = {"fruit": "apple", "age": 3, "price": 2}
val = my_dict["age"]
print(val)
>>3
We might use dictionaries if we have a running list of data points on someone or something, we want to keep adding or subtracting data points, and we want these data points to have descriptive keys which tell what they represent, instead of just indices.
For more information on dictionaries, check out this link: https://www.w3schools.com/python/python_dictionaries.asp
We should know all about array-like structures for robotics and machine learning. You will encounter them everywhere, and getting to know them can make your life so much easier. Later on we will learn about NumPy, a Python package which allows for defter usage of arrays, and if you understand these concepts making the jump will not be a problem. Thanks for reading!
Challenge Activities:
Create a function that takes in a one-dimensional array of integers, and returns a version, sorted from smallest to largest.
Create a three-dimensional list, where the first dimension has length five, the second has length three, and the third has length two. Then, loop through the entire list and print each value.
Comments