Learning Python: Part 7 – Introduction to Libraries NumPy and Pandas

Numpy and Pandas

When programming with Python, libraries play a crucial role in enhancing functionality and streamlining the development process. Two of the most popular libraries for data manipulation and analysis are NumPy and Pandas. In this post, we’ll introduce you to these libraries, guide you through the installation process, and provide basic usage examples to get you started.

The Jupyter notebook file associated with this blog post, which includes all of this information, can be found here. As always, we encourage you to code along in your own Jupyter notebook file and use the associated notebooks found on our GitHub page as a guideline.

What are Libraries and How to Install Them?

A library in programming is a collection of pre-written code that you can use to optimize your development process. Libraries are designed to handle specific tasks and can be reused across multiple projects, saving time and effort.

To install libraries in Python, you typically use the package manager pip and the approach to installing the library mirrors the approach from our previous post in this series when we installed Jupyter Notebook. First, you activate your desired python environment so that the libraries are installed to that environment and then you execute the following command to install NumPy and Pandas:

pip install numpy pandas

In the example image below, you can see that I first activate my chosen environment – I’ve created one named “PythonFoundationBeginner” for purposes of this series – then I install the two libraries. Since I already had numpy installed in this environment we see a message stating that the requirement is already satisfied before a download and installation begins for the pandas package:

Overview of NumPy and Pandas

NumPy: NumPy (Numerical Python) is a library used for numerical computations. It provides support for arrays, matrices, and many mathematical functions. NumPy is essential for performing numerical calculations in Python, especially in fields like data science, machine learning, and scientific computing.

Pandas: Pandas is a powerful library for data manipulation and analysis. It provides data structures like Series and DataFrame, which make it easy to handle structured data. With Pandas, you can perform operations like filtering, grouping, and merging datasets, making it an indispensable tool for data analysis.

Importing and Basic Usage of NumPy and Pandas

To start using NumPy and Pandas, you first need to import them into your Python script. Here’s how you can do that:

import numpy as np
import pandas as pd

Basic Usage of NumPy

Let’s create a simple array and perform basic operations using NumPy:

# Creating an array
arr = np.array([1,2,3,4,5])

# Performing operations
print("Array: ", arr)
print ("Sum: ", np.sum(arr))
print("Mean: ", np.mean(arr))
print("Standard Deviation: ", np.std(arr))

Basic Usage of Pandas

Now, let’s create a DataFrame and perform some basic operations using Pandas:

# Creating a DataFrame
data = {'Name': ['Alison', 'Bob', 'Charlie'],
        'Age': [24, 27, 22],
        'City': ['New York', 'Orlando', 'Chicago']}

df = pd.DataFrame(data)

# Displaying the DataFrame
print("DataFrame:\n", df)

# Basic operations
print("Mean Age:", df['Age'].mean())
print("Cities:\n", df['City'].unique())

Conclusion

NumPy and Pandas are essential libraries for anyone working with data in Python. They provide powerful tools for numerical computations and data manipulation, respectively. By learning how to install, import, and use these libraries, you can significantly enhance your data analysis capabilities. Start experimenting with NumPy and Pandas today to unlock the full potential of your data projects.