Learning Python: Part 9 – Working with Data: Pandas Basics

In this ninth part of our Learning Python series, we’ll explore Pandas, a powerful data manipulation library in Python. Understanding Pandas basics is essential for data analysis tasks. Pandas offers intuitive data structures and operations and I use the library often in my work.

First, we’ll examine two data structures that Pandas introduces, and then we’ll establish how to read data in from csv or excel files. We’ll also illustrate two actions that are useful for transformation tasks: melt and pivot.

The Jupyter notebook file associated with this blog post, which includes all of this information, can be found in our GitHub repo here. This post includes more detail in the .ipynb notebook file in the GitHub repo. We also provide small datasets to complement the notebook in the repo. Have a look! As always, we encourage you to code along in your own Jupyter notebook file and use the associated notebooks found on our GitHub page as a guideline.

Introduction to Series and DataFrames

Pandas introduces two main data structures: Series and DataFrames. A Series is a one-dimensional labeled array and a DataFrame is a two-dimensional labeled data structure (in other words, a DataFrame is similar to the table format we see in Microsoft Excel).

To start using Pandas, you first need to install it. If you’ve been following along then this step will already be done as outlined in our Learning Python: Part 7 – Introduction to Libraries NumPy and Pandas post.

# Install Pandas (if not installed)
!pip install pandas

# Import Pandas
import pandas as pd

# Create a Series
series = pd.Series([1, 2, 3, 4, 5])
print("Series:\n", series)

# Create a DataFrame
data = {'Name': ['Amanda', 'Bill', 'Carol'],
        'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print("DataFrame:\n", df)

Reading Data from CSV or Excel Files

Pandas makes it easy to read data from various file formats such as CSV and Excel.

# Read from CSV
df_csv = pd.read_csv('data.csv')

# Read from Excel
df_excel = pd.read_excel('data.xlsx')

Basic Data Manipulation with Pandas

Pandas provides tools for data manipulation, including filtering, grouping, and summarizing data.

# Example: Filtering data
filtered_df = df[df['Age'] > 30]
print("Filtered DataFrame:\n", filtered_df)

# Example: Grouping data
grouped_df = df.groupby('Age').size()
print("Grouped DataFrame:\n", grouped_df)

Melt and Pivot Survey Data

Pandas’ melt and pivot functions are useful for reshaping data. Here’s an example with survey data:

# Sample survey data
survey_data = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Q1': [5, 4],
    'Q2': [3, 5]
})

# Melt the DataFrame
melted_df = pd.melt(survey_data, id_vars=['Name'], var_name='Question', value_name='Score')
print("Melted DataFrame:\n", melted_df)

# Pivot the DataFrame
pivoted_df = melted_df.pivot(index='Name', columns='Question', values='Score')
print("Pivoted DataFrame:\n", pivoted_df)

Conclusion

In conclusion, by familiarizing yourself with these Pandas basics, you’ll be well-equipped to handle and analyze data efficiently and, if you would like to delve deeper into what Pandas has to offer, there are plenty of additional resources. For instance:

Official Pandas Documentation: This is the go-to resource for exploring the full potential of Pandas, including detailed descriptions of functions, methods, and use cases.
Pandas Documentation

Pandas Tutorial by W3Schools: A great beginner-friendly resource that provides simple, easy-to-understand examples and explanations for working with Pandas.
W3Schools Pandas Guide

Pandas Cookbook by Jupyter: A collection of practical Pandas recipes in Jupyter notebooks that cover different real-world data scenarios.
Pandas Cookbook