in

NumPy Arrays: A Comprehensive Introduction with Examples

NumPy is one of the most fundamental packages for scientific computing in Python. At its core is the NumPy array – an efficient multidimensional container for homogeneous data.

In this comprehensive tutorial, you‘ll learn:

  • The key differences between NumPy arrays and Python lists
  • How to create NumPy arrays using a variety of methods
  • How to manipulate arrays by slicing, indexing, and reshaping
  • Broadcasting and vectorized operations with NumPy
  • Aggregations like summing arrays, finding min/max
  • Sorting and searching arrays
  • Combining multiple arrays together (concatenation, stacking)
  • Working with arrays in higher dimensions

This tutorial will give you a deep understanding of arrays in NumPy through practical examples and clear explanations. Let‘s get started!

NumPy Arrays vs Python Lists

The key difference between NumPy arrays and Python‘s built-in lists is that arrays provide a fixed size, multidimensional container of homogeneous elements. Lists are mutable, meaning you can add and remove items, whereas NumPy arrays have a fixed size at creation.

Consider this simple Python list:

py_list = [1, 2, 3, 4]

We can convert this list to a NumPy array:

import numpy as np

np_arr = np.array(py_list)

np_arr is now a 1D NumPy array with four elements. Let‘s go over some key differences:

  • Data types: NumPy arrays have consistent, homogeneous data types for all elements. The default is np.float64 (64-bit float) whereas lists can contain mixed types.

  • Size: NumPy arrays have a fixed size (shaped defined at creation). Lists can keep growing dynamically in size.

  • Contiguous memory: Arrays store data in a contiguous block of memory, leading to efficient lookups and computations. Lists can have elements scattered in memory.

  • Broadcasting: Vectorized operations on arrays apply element-wise. Scalars broadcast across the dimensions.

  • Dimensionality: Arrays are multidimensional – 1D, 2D, 3D and beyond. Lists are 1D only.

These attributes make NumPy arrays ideal for scientific computing tasks. The vectorized operations, broadcasting capabilities and multidimensional nature of arrays enable fast computations without loops.

Now let‘s go over how to create NumPy arrays.

Creating NumPy Arrays

There are a variety of functions to create new arrays in NumPy:

From Python Lists

We already saw that you can create an array from a list using np.array(). For example:

data = [1, 2, 3, 4] 

arr = np.array(data)

One thing to note – if your list contains elements of varying types, NumPy will upcast them to a common type:

data = [1, 2, ‘3‘, 4.0]
arr = np.array(data) 

print(arr.dtype)
# float64 

Ones and Zeros

You can create arrays filled with zeros or ones:

np.zeros(shape) #creates array of all 0s
np.ones(shape)  #creates array of all 1s 

Shape is a tuple that defines each dimension, for example:

np.zeros((2, 3)) # 2D array with 2 rows, 3 columns

Passing an integer creates a 1D array:

np.ones(5) # 1D array of length 5

Ranges

The arange() method generates arrays with sequential elements:

np.arange(start, stop, step)  

For example:

np.arange(5) # [0 1 2 3 4] 
np.arange(1, 5) # [1 2 3 4]
np.arange(1, 10, 2) # [1 3 5 7 9] 

Linespace

Returns evenly spaced numbers over an interval:

np.linspace(start, stop, num_elements)

For example:

np.linspace(0, 5, 10) 
# [0.   0.5  1.   1.5  2.   2.5  3.   3.5  4.   5. ]

Identity Matrix

The identity matrix contains 1s along the diagonal and 0s elsewhere. Pass in the dimensionality:

np.identity(n) 

For example, 3×3 identity matrix:

np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],  
       [0., 0., 1.]])

Random

There are several ways to generate random number arrays:

from numpy import random

random.rand(shape) # uniform distribution 
random.randn(shape) # normal distribution
random.randint(low, high, shape) # random integers

For example:

random.rand(3, 3) # 3x3 of random floats
random.randn(5) # 1D array of 5 random normally distributed numbers 
random.randint(0, 10, (3,3)) # 3x3 random ints between 0-10  

This covers the common ways to create new arrays from scratch. Now let‘s discuss how to manipulate existing arrays.

Array Indexing and Slicing

NumPy arrays support vectorized indexing and slicing operations. This works just like with Python lists, but now you can operate on whole subsets / views into arrays simultaneously.

Indexing

You can index elements at specific positions in the array:

arr = np.arange(10)

arr[0] #0
arr[4] #4
arr[-1] #9 (last element)

For 2D arrays, provide index for each dimension:

arr_2d = np.zeros((3,5)) 

arr_2d[1, 3] # row 1, column 3

You can also pass arrays of indices:

ind = np.array([1, 5, -3])
arr[ind] # [1 5 7]

Slicing

You can slice NumPy arrays similar to Python lists:

arr[start:stop]  # elements from start to stop-1  
arr[start:] # elements from start to end
arr[:stop] # elements from beginning up to stop-1

For 2D arrays provide slice for each dimension:

arr_2d[:2,::-1] # first two rows, reversed

Fancy Indexing

Fancy indexing provides more powerful capabilities, allowing you to select entire subsets / views into arrays:

arr[[1, 5, 7]] # elements at indices 1, 5, 7

bool_idx = (arr > 5) 
arr[bool_idx] # elements greater than 5 

rows = np.array([0, 2])
cols = np.array([1, 3])
arr[rows[:,cols]] # elements at [0,1], [2,3]  

In essence, fancy indexing allows you to leverage NumPy‘s vectorized operations to directly index into arrays how you want.

Now that you know how to slice and dice arrays, let‘s go over how to modify arrays.

Modifying Arrays

When you assign values into an array via indexing, the original array is modified:

arr = np.zeros(5)
arr[0] = 1
print(arr)
# [1. 0. 0. 0. 0.] 

For fancy indexing, broadcasting allows us to assign to multiple elements simultaneously:

arr = np.zeros(5)
ind = [1, 3, 4]
arr[ind] = 5
print(arr) 
# [0. 5. 0. 5. 5.]

We can also modify entire slices:

arr[1:4] = 3
print(arr)
# [0. 3. 3. 3. 5.] 

Copying Arrays

Often we want to copy the original array before making changes. This can be done via:

arr_copy = arr.copy()

Now arr_copy references a new array, separate from the original arr.

There are a few ways to create views or shallow copies of arrays:

arr_view = arr[:]  #slicing
arr_view = arr.view() #view method 
arr_view = arr.astype(float) #astype 

The key difference between views and deep copies is that views share data in memory with the original. So changes to values are reflected across both. Deep copies fully allocate new memory.

Reshaping Arrays

The shape of an array defines the dimensionality and length along each axis.

You can reshape arrays to alter the dimensions without changing the data:

arr = np.arange(6) # [0 1 2 3 4 5]

arr.reshape(3, 2) 
# [[0 1]
#  [2 3] 
#  [4 5]] 

arr.reshape(2, 3)
# [[0 1 2]
#  [3 4 5]]  

The -1 value lets NumPy infer one of the dimensions:

arr.reshape(3, -1)

Reshaping returns a new view on the data. The original array remains unchanged.

Transposing Arrays

The transpose switches the rows and columns:

arr = np.zeros((3, 5))
arr_t = arr.T 

print(arr_t.shape)
# (5, 3) 

For higher dimensional arrays, you can specify the two axes to swap:

arr.transpose((0, 2, 1))

Transposing creates a view of the original data without copying.

Broadcasting

Broadcasting allows vectorized operations to be performed on arrays of different sizes. NumPy expands the smaller array to "broadcast" across the larger one:

arr1 = np.zeros((4,5)) 
arr2 = np.ones((5)) 

arr1 + arr2 

"""
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
"""

arr2 is broadcasted across the rows of arr1.

Rules for broadcasting:

  • Dimensions are expanded from left to right
  • Arrays with same dimensions are used directly
  • 1s are prepended to smaller array until shapes match

Vectorized Operations and Ufuncs

Vectorization allows arithmetic and logical operations to be performed element-wise on arrays. This is much faster than using Python loops.

For example:

arr1 = np.array([1, 2, 3])
arr2 = np.array([0, 2, 2]) 

arr1 + arr2 # [1 4 5] 

arr1 > 1 # [False True True]

NumPy provides unary and binary universal functions (ufuncs) that also operate element-wise:

np.sqrt(arr1) # square root 
np.add(arr1, arr2) # addition
np.maximum(arr1, arr2) # element-wise maximum

These provide very fast element computations without needing to use Python for loops.

Aggregations

It‘s common to need to compute aggregations like sums, means across an entire array:

arr = np.random.randn(5)

arr.sum()
arr.mean() 
arr.max()
arr.min()

You can compute aggregations across specific axis for multidimensional arrays:

arr_2d = np.random.randint(0, 10, (3,5))

arr_2d.sum(axis=0) # sums of each column 
arr_2d.sum(axis=1) # sums of each row

Sorting Arrays

NumPy provides convenience methods to sort arrays lexicographically:

arr = np.random.randint(0, 10, 10)

arr.sort() 

# To get indices of sort order: 
ind = arr.argsort() 

# Sort along specific axis:
arr_2d = np.random.randint(0, 10, (3,3))
arr_2d.sort(axis=1) # sort each row

Finding Unique Elements

You can extract the unique elements in an array using np.unique():

arr = np.array([2, 1, 3, 1, 2])

unique_vals = np.unique(arr)
# [1 2 3]

Set Operations

Let‘s say you have two arrays and want to find the intersection or difference. NumPy provides set functions to make this easy:

A = np.array([1, 2, 3, 4])
B = np.array([2, 4, 5])

np.intersect1d(A, B) # [2 4] 

np.setdiff1d(A, B) # [1 3] Unique to A

Concatenating and Splitting

You can concatenate multiple arrays together:

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

np.concatenate([arr1, arr2]) 
# [1 2 3 4 5 6]

Specify axis=0 to stack arrays vertically:

arr1 = np.zeros((2,3))
arr2 = np.ones((2,3)) 

np.concatenate([arr1, arr2], axis=0)

For splitting, use np.split(), np.hsplit(), np.vsplit():

arr = np.arange(8)

top, bottom = np.vsplit(arr, 2) # Split vertically 
left, right = np.hsplit(arr, 2) # Split horizontally

Stacking and Tiling Arrays

The stack() method joins sequence of arrays along a new axis:

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

np.stack([arr1, arr2]) 

# [[1 2 3]
#  [4 5 6]] 

Tiling repeats an array along axes:

arr = np.array([[1, 2], [3, 4]])

np.tile(arr, 2)

# [[1 2 1 2]
#  [3 4 3 4]] 

Specify tiling for each dimension:

np.tile(arr, (2,1)) # Tile rows  
np.tile(arr, (1,3)) # Tile columns

Higher Dimensional Arrays

Everything we‘ve covered extends to arrays of higher dimensions.

Create higher dimensional arrays by passing a shape tuple:

np.zeros((2,2,3)) # 2D array with 2x2 blocks, each with 3 elements 

Indexing works by providing an index or slice for each dimension:

arr_3d[0,1] # row 0 of 2D block 1 
arr_3d[:,1,:] # all rows of 2D block 1

Compute aggregation across specified axis:

arr_3d.sum(axis=2) # Sums of 3 elements in each 2D block

And reshape/transpose dimensions:

arr_3d.transpose((1,0,2)) # Swap first two axes
arr_3d.reshape(4,3) # Flatten to 2D array

Higher dimensional arrays add even more flexibility!

Conclusion

This covers the core concepts and tools you‘ll need to work effectively with NumPy arrays! The key takeaways:

  • NumPy arrays provide an efficient container for homogeneous, multidimensional data
  • Offer fast vectorized operations without Python for loops
  • Provides functions to generate new arrays and manipulate existing arrays
  • Enables calculations across higher dimensions
  • Ultimately NumPy forms the foundation for much of the scientific computing in Python. Mastering arrays is critical.

To learn more, be sure to check out NumPy‘s official documentation. You can also browse my other NumPy tutorials covering specific functionality and use cases in-depth. NumPy has many more features we didn‘t cover here like linear algebra routines, Fourier transforms, histograms, masks and more!

AlexisKestler

Written by Alexis Kestler

A female web designer and programmer - Now is a 36-year IT professional with over 15 years of experience living in NorCal. I enjoy keeping my feet wet in the world of technology through reading, working, and researching topics that pique my interest.