NumPy is one of the most fundamental packages for scientific computing in Python. At its core is the NumPy array – an efficient multidimensional container for homogeneous data.
In this comprehensive tutorial, you‘ll learn:
- The key differences between NumPy arrays and Python lists
- How to create NumPy arrays using a variety of methods
- How to manipulate arrays by slicing, indexing, and reshaping
- Broadcasting and vectorized operations with NumPy
- Aggregations like summing arrays, finding min/max
- Sorting and searching arrays
- Combining multiple arrays together (concatenation, stacking)
- Working with arrays in higher dimensions
This tutorial will give you a deep understanding of arrays in NumPy through practical examples and clear explanations. Let‘s get started!
NumPy Arrays vs Python Lists
The key difference between NumPy arrays and Python‘s built-in lists is that arrays provide a fixed size, multidimensional container of homogeneous elements. Lists are mutable, meaning you can add and remove items, whereas NumPy arrays have a fixed size at creation.
Consider this simple Python list:
py_list = [1, 2, 3, 4]
We can convert this list to a NumPy array:
import numpy as np
np_arr = np.array(py_list)
np_arr is now a 1D NumPy array with four elements. Let‘s go over some key differences:
-
Data types: NumPy arrays have consistent, homogeneous data types for all elements. The default is
np.float64(64-bit float) whereas lists can contain mixed types. -
Size: NumPy arrays have a fixed size (shaped defined at creation). Lists can keep growing dynamically in size.
-
Contiguous memory: Arrays store data in a contiguous block of memory, leading to efficient lookups and computations. Lists can have elements scattered in memory.
-
Broadcasting: Vectorized operations on arrays apply element-wise. Scalars broadcast across the dimensions.
-
Dimensionality: Arrays are multidimensional – 1D, 2D, 3D and beyond. Lists are 1D only.
These attributes make NumPy arrays ideal for scientific computing tasks. The vectorized operations, broadcasting capabilities and multidimensional nature of arrays enable fast computations without loops.
Now let‘s go over how to create NumPy arrays.
Creating NumPy Arrays
There are a variety of functions to create new arrays in NumPy:
From Python Lists
We already saw that you can create an array from a list using np.array(). For example:
data = [1, 2, 3, 4]
arr = np.array(data)
One thing to note – if your list contains elements of varying types, NumPy will upcast them to a common type:
data = [1, 2, ‘3‘, 4.0]
arr = np.array(data)
print(arr.dtype)
# float64
Ones and Zeros
You can create arrays filled with zeros or ones:
np.zeros(shape) #creates array of all 0s
np.ones(shape) #creates array of all 1s
Shape is a tuple that defines each dimension, for example:
np.zeros((2, 3)) # 2D array with 2 rows, 3 columns
Passing an integer creates a 1D array:
np.ones(5) # 1D array of length 5
Ranges
The arange() method generates arrays with sequential elements:
np.arange(start, stop, step)
For example:
np.arange(5) # [0 1 2 3 4]
np.arange(1, 5) # [1 2 3 4]
np.arange(1, 10, 2) # [1 3 5 7 9]
Linespace
Returns evenly spaced numbers over an interval:
np.linspace(start, stop, num_elements)
For example:
np.linspace(0, 5, 10)
# [0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 5. ]
Identity Matrix
The identity matrix contains 1s along the diagonal and 0s elsewhere. Pass in the dimensionality:
np.identity(n)
For example, 3×3 identity matrix:
np.identity(3)
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
Random
There are several ways to generate random number arrays:
from numpy import random
random.rand(shape) # uniform distribution
random.randn(shape) # normal distribution
random.randint(low, high, shape) # random integers
For example:
random.rand(3, 3) # 3x3 of random floats
random.randn(5) # 1D array of 5 random normally distributed numbers
random.randint(0, 10, (3,3)) # 3x3 random ints between 0-10
This covers the common ways to create new arrays from scratch. Now let‘s discuss how to manipulate existing arrays.
Array Indexing and Slicing
NumPy arrays support vectorized indexing and slicing operations. This works just like with Python lists, but now you can operate on whole subsets / views into arrays simultaneously.
Indexing
You can index elements at specific positions in the array:
arr = np.arange(10)
arr[0] #0
arr[4] #4
arr[-1] #9 (last element)
For 2D arrays, provide index for each dimension:
arr_2d = np.zeros((3,5))
arr_2d[1, 3] # row 1, column 3
You can also pass arrays of indices:
ind = np.array([1, 5, -3])
arr[ind] # [1 5 7]
Slicing
You can slice NumPy arrays similar to Python lists:
arr[start:stop] # elements from start to stop-1
arr[start:] # elements from start to end
arr[:stop] # elements from beginning up to stop-1
For 2D arrays provide slice for each dimension:
arr_2d[:2,::-1] # first two rows, reversed
Fancy Indexing
Fancy indexing provides more powerful capabilities, allowing you to select entire subsets / views into arrays:
arr[[1, 5, 7]] # elements at indices 1, 5, 7
bool_idx = (arr > 5)
arr[bool_idx] # elements greater than 5
rows = np.array([0, 2])
cols = np.array([1, 3])
arr[rows[:,cols]] # elements at [0,1], [2,3]
In essence, fancy indexing allows you to leverage NumPy‘s vectorized operations to directly index into arrays how you want.
Now that you know how to slice and dice arrays, let‘s go over how to modify arrays.
Modifying Arrays
When you assign values into an array via indexing, the original array is modified:
arr = np.zeros(5)
arr[0] = 1
print(arr)
# [1. 0. 0. 0. 0.]
For fancy indexing, broadcasting allows us to assign to multiple elements simultaneously:
arr = np.zeros(5)
ind = [1, 3, 4]
arr[ind] = 5
print(arr)
# [0. 5. 0. 5. 5.]
We can also modify entire slices:
arr[1:4] = 3
print(arr)
# [0. 3. 3. 3. 5.]
Copying Arrays
Often we want to copy the original array before making changes. This can be done via:
arr_copy = arr.copy()
Now arr_copy references a new array, separate from the original arr.
There are a few ways to create views or shallow copies of arrays:
arr_view = arr[:] #slicing
arr_view = arr.view() #view method
arr_view = arr.astype(float) #astype
The key difference between views and deep copies is that views share data in memory with the original. So changes to values are reflected across both. Deep copies fully allocate new memory.
Reshaping Arrays
The shape of an array defines the dimensionality and length along each axis.
You can reshape arrays to alter the dimensions without changing the data:
arr = np.arange(6) # [0 1 2 3 4 5]
arr.reshape(3, 2)
# [[0 1]
# [2 3]
# [4 5]]
arr.reshape(2, 3)
# [[0 1 2]
# [3 4 5]]
The -1 value lets NumPy infer one of the dimensions:
arr.reshape(3, -1)
Reshaping returns a new view on the data. The original array remains unchanged.
Transposing Arrays
The transpose switches the rows and columns:
arr = np.zeros((3, 5))
arr_t = arr.T
print(arr_t.shape)
# (5, 3)
For higher dimensional arrays, you can specify the two axes to swap:
arr.transpose((0, 2, 1))
Transposing creates a view of the original data without copying.
Broadcasting
Broadcasting allows vectorized operations to be performed on arrays of different sizes. NumPy expands the smaller array to "broadcast" across the larger one:
arr1 = np.zeros((4,5))
arr2 = np.ones((5))
arr1 + arr2
"""
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
"""
arr2 is broadcasted across the rows of arr1.
Rules for broadcasting:
- Dimensions are expanded from left to right
- Arrays with same dimensions are used directly
- 1s are prepended to smaller array until shapes match
Vectorized Operations and Ufuncs
Vectorization allows arithmetic and logical operations to be performed element-wise on arrays. This is much faster than using Python loops.
For example:
arr1 = np.array([1, 2, 3])
arr2 = np.array([0, 2, 2])
arr1 + arr2 # [1 4 5]
arr1 > 1 # [False True True]
NumPy provides unary and binary universal functions (ufuncs) that also operate element-wise:
np.sqrt(arr1) # square root
np.add(arr1, arr2) # addition
np.maximum(arr1, arr2) # element-wise maximum
These provide very fast element computations without needing to use Python for loops.
Aggregations
It‘s common to need to compute aggregations like sums, means across an entire array:
arr = np.random.randn(5)
arr.sum()
arr.mean()
arr.max()
arr.min()
You can compute aggregations across specific axis for multidimensional arrays:
arr_2d = np.random.randint(0, 10, (3,5))
arr_2d.sum(axis=0) # sums of each column
arr_2d.sum(axis=1) # sums of each row
Sorting Arrays
NumPy provides convenience methods to sort arrays lexicographically:
arr = np.random.randint(0, 10, 10)
arr.sort()
# To get indices of sort order:
ind = arr.argsort()
# Sort along specific axis:
arr_2d = np.random.randint(0, 10, (3,3))
arr_2d.sort(axis=1) # sort each row
Finding Unique Elements
You can extract the unique elements in an array using np.unique():
arr = np.array([2, 1, 3, 1, 2])
unique_vals = np.unique(arr)
# [1 2 3]
Set Operations
Let‘s say you have two arrays and want to find the intersection or difference. NumPy provides set functions to make this easy:
A = np.array([1, 2, 3, 4])
B = np.array([2, 4, 5])
np.intersect1d(A, B) # [2 4]
np.setdiff1d(A, B) # [1 3] Unique to A
Concatenating and Splitting
You can concatenate multiple arrays together:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
np.concatenate([arr1, arr2])
# [1 2 3 4 5 6]
Specify axis=0 to stack arrays vertically:
arr1 = np.zeros((2,3))
arr2 = np.ones((2,3))
np.concatenate([arr1, arr2], axis=0)
For splitting, use np.split(), np.hsplit(), np.vsplit():
arr = np.arange(8)
top, bottom = np.vsplit(arr, 2) # Split vertically
left, right = np.hsplit(arr, 2) # Split horizontally
Stacking and Tiling Arrays
The stack() method joins sequence of arrays along a new axis:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
np.stack([arr1, arr2])
# [[1 2 3]
# [4 5 6]]
Tiling repeats an array along axes:
arr = np.array([[1, 2], [3, 4]])
np.tile(arr, 2)
# [[1 2 1 2]
# [3 4 3 4]]
Specify tiling for each dimension:
np.tile(arr, (2,1)) # Tile rows
np.tile(arr, (1,3)) # Tile columns
Higher Dimensional Arrays
Everything we‘ve covered extends to arrays of higher dimensions.
Create higher dimensional arrays by passing a shape tuple:
np.zeros((2,2,3)) # 2D array with 2x2 blocks, each with 3 elements
Indexing works by providing an index or slice for each dimension:
arr_3d[0,1] # row 0 of 2D block 1
arr_3d[:,1,:] # all rows of 2D block 1
Compute aggregation across specified axis:
arr_3d.sum(axis=2) # Sums of 3 elements in each 2D block
And reshape/transpose dimensions:
arr_3d.transpose((1,0,2)) # Swap first two axes
arr_3d.reshape(4,3) # Flatten to 2D array
Higher dimensional arrays add even more flexibility!
Conclusion
This covers the core concepts and tools you‘ll need to work effectively with NumPy arrays! The key takeaways:
- NumPy arrays provide an efficient container for homogeneous, multidimensional data
- Offer fast vectorized operations without Python for loops
- Provides functions to generate new arrays and manipulate existing arrays
- Enables calculations across higher dimensions
- Ultimately NumPy forms the foundation for much of the scientific computing in Python. Mastering arrays is critical.
To learn more, be sure to check out NumPy‘s official documentation. You can also browse my other NumPy tutorials covering specific functionality and use cases in-depth. NumPy has many more features we didn‘t cover here like linear algebra routines, Fourier transforms, histograms, masks and more!