10. Pandas#

In this section, we will learn how use Pandas library. Pandas is the most commonly used library for data structures and data analysis in Python. It is built on top of NumPy, which is a library for numerical computation. Pandas is a powerful library that provides a wide range of methods for data manipulation and analysis. It is widely used in data science, machine learning, and other fields that require data analysis. So, let’s get started with Pandas!

10.1. Creating DataFrames#

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dictionary of Series objects. DataFrames are the most commonly used data structures in Pandas. You can create a DataFrame from a dictionary, a list of dictionaries, a list of lists, or a NumPy array. Let’s see some examples of creating DataFrames.

10.1.1. Creating a DataFrame from a dictionary#

You can create a DataFrame from a dictionary using the pd.DataFrame() function. The keys of the dictionary will be the column names, and the values will be the column values. Let’s see an example.

import pandas as pd

# Create a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Display the DataFrame
print(df)

Output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston
4    Emily   45      Phoenix

In this example, we created a DataFrame from a dictionary with three columns: Name, Age, and City. The keys of the dictionary became the column names, and the values became the column values.

10.1.2. Creating a DataFrame from a list of dictionaries#

You can also create a DataFrame from a list of dictionaries. Each dictionary in the list will become a row in the DataFrame. Let’s see an example.

import pandas as pd

# Create a list of dictionaries
data = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'},
    {'Name': 'David', 'Age': 40, 'City': 'Houston'},
    {'Name': 'Emily', 'Age': 45, 'City': 'Phoenix'}
]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)

# Display the DataFrame
print(df)

Output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston
4    Emily   45      Phoenix

In this example, we created a DataFrame from a list of dictionaries with three columns: Name, Age, and City. Each dictionary in the list became a row in the DataFrame.

10.1.3. Creating a DataFrame from a list of lists#

You can also create a DataFrame from a list of lists. Each list in the list will become a row in the DataFrame. Let’s see an example.

import pandas as pd

# Create a list of lists

data = [
    ['Alice', 25, 'New York'],
    ['Bob', 30, 'Los Angeles'],
    ['Charlie', 35, 'Chicago'],
    ['David', 40, 'Houston'],
    ['Emily', 45, 'Phoenix']
]

# Create a DataFrame from the list of lists
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])

# Display the DataFrame
print(df)

Output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston
4    Emily   45      Phoenix

In this example, we created a DataFrame from a list of lists with three columns: Name, Age, and City. Each list in the list became a row in the DataFrame.

10.1.4. Creating a DataFrame from a NumPy array#

You can also create a DataFrame from a NumPy array. Let’s see an example.

import pandas as pd
import numpy as np

# Create a NumPy array
data = np.array([
    ['Alice', 25, 'New York'],
    ['Bob', 30, 'Los Angeles'],
    ['Charlie', 35, 'Chicago'],
    ['David', 40, 'Houston'],
    ['Emily', 45, 'Phoenix']
])

# Create a DataFrame from the NumPy array
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])

# Display the DataFrame
print(df)

Output:

      Name Age         City
0    Alice  25     New York
1      Bob  30  Los Angeles
2  Charlie  35      Chicago
3    David  40      Houston
4    Emily  45      Phoenix

In this example, we created a DataFrame from a NumPy array with three columns: Name, Age, and City. Each row in the NumPy array became a row in the DataFrame.

10.2. Accessing DataFrames#

Once you have created a DataFrame, you can access its data using various methods. You can access individual columns, rows, or cells of the DataFrame. Let’s see some examples of accessing DataFrames.

10.2.1. Accessing individual columns#

You can access individual columns of a DataFrame using the column name. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Access the 'Name' column
print(df['Name'])

Output:

0      Alice
1        Bob
2    Charlie
3      David
4      Emily
Name: Name, dtype: object

In this example, we accessed the Name column of the DataFrame using the column name.

10.2.2. Accessing individual rows#

You can access individual rows of a DataFrame using the iloc[] method. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Access the first row
print(df.iloc[0])

Output:

Name        Alice
Age            25
City     New York
Name: 0, dtype: object

In this example, we accessed the first row of the DataFrame using the iloc[] method.

10.2.3. Accessing individual cells#

You can access individual cells of a DataFrame using the iloc[] method. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Access the cell at row 0 and column 'Name'
print(df.iloc[0]['Name'])

Output:

Alice

In this example, we accessed the cell at row 0 and column Name of the DataFrame using the iloc[] method.

10.3. Operations on DataFrames#

Once you have created a DataFrame, you can perform various operations on it. You can perform operations like filtering, sorting, grouping, and aggregating on the DataFrame. Let’s see some examples of operations on DataFrames.

10.3.1. Filtering DataFrames#

You can filter a DataFrame to select rows that satisfy a certain condition. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Filter the DataFrame to select rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]

# Display the filtered DataFrame
print(filtered_df)

Output:

    Name  Age     City
2  Charlie   35  Chicago
3    David   40  Houston
4    Emily   45  Phoenix

In this example, we filtered the DataFrame to select rows where Age is greater than 30.

10.3.2. Sorting DataFrames#

You can sort a DataFrame by one or more columns. Let’s see an example.

import pandas as pd

# Create a DataFrame

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Sort the DataFrame by Age in descending order
sorted_df = df.sort_values(by='Age', ascending=False)

# Display the sorted DataFrame
print(sorted_df)

Output:

      Name  Age         City
4    Emily   45      Phoenix
3    David   40      Houston
2  Charlie   35      Chicago
1      Bob   30  Los Angeles
0    Alice   25     New York

In this example, we sorted the DataFrame by Age in descending order.

10.3.3. Statistics on DataFrames#

You can perform various statistical operations on a DataFrame, such as mean, median, sum, min, max, etc. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Calculate the mean of Age
mean_age = df['Age'].mean()

# Display the mean of Age
print(mean_age)

Output:

35.0

In this example, we calculated the mean of Age in the DataFrame.

10.4. Adding and Removing Columns#

You can add and remove columns from a DataFrame. Let’s see some examples of adding and removing columns from DataFrames.

10.4.1. Adding a column#

You can add a new column to a DataFrame using the [] operator. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Add a new 
column df['Country'] = ['USA', 'USA', 'USA', 'USA', 'USA']

# Display the DataFrame
print(df)

Output:

      Name  Age         City Country
0    Alice   25     New York     USA
1      Bob   30  Los Angeles     USA
2  Charlie   35      Chicago     USA
3    David   40      Houston     USA
4    Emily   45      Phoenix     USA

In this example, we added a new column Country to the DataFrame.

10.4.2. Removing a column#

You can remove a column from a DataFrame using the drop() method. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Remove the 'City' column

df = df.drop('City', axis=1)

# Display the DataFrame
print(df)

Output:

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
3    David   40
4    Emily   45

In this example, we removed the City column from the DataFrame.

## Adding and Removing Rows

You can add and remove rows from a DataFrame. Let’s see some examples of adding and removing rows from DataFrames.

10.4.3. Adding a row#

You can add a new row to a DataFrame using the loc[] method. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Add a new row
df.loc[5] = ['Frank', 50, 'Las Vegas']

# Display the DataFrame
print(df)

Output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston
4    Emily   45      Phoenix
5    Frank   50    Las Vegas

In this example, we added a new row to the DataFrame.

10.4.4. Removing a row#

You can remove a row from a DataFrame using the drop() method. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Remove the row at index 2
df = df.drop(2)

# Display the DataFrame
print(df)

Output:

    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles
3  David   40      Houston
4  Emily   45      Phoenix

In this example, we removed the row at index 2 from the DataFrame.

10.5. Iterating over DataFrames#

You can iterate over a DataFrame to access its rows and columns. Let’s see some examples of iterating over DataFrames.

10.5.1. Iterating over rows#

You can iterate over the rows of a DataFrame using the iterrows() method. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Iterate over the rows of the DataFrame
for index, row in df.iterrows():
    print(index, row['Name'], row['Age'], row['City'])

Output:

0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston
4 Emily 45 Phoenix

In this example, we iterated over the rows of the DataFrame to access the Name, Age, and City columns of each row.

10.5.2. Iterating over columns#

You can iterate over the columns of a DataFrame using the iteritems() method. Let’s see an example.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Iterate over the columns of the DataFrame
for column, values in df.iteritems():
    print(column, values.values)

Output:

Name ['Alice' 'Bob' 'Charlie' 'David' 'Emily']
Age [25 30 35 40 45]
City ['New York' 'Los Angeles' 'Chicago' 'Houston' 'Phoenix']

In this example, we iterated over the columns of the DataFrame to access the values of each column.

10.6. Merging DataFrames#

You can merge two or more DataFrames into a single DataFrame. Let’s see some examples of merging DataFrames.

10.6.1. Merging two DataFrames#

You can merge two DataFrames into a single DataFrame using the merge() method. Let’s see an example.

import pandas as pd

# Create the first DataFrame

data1 = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45]
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame

data2 = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df2 = pd.DataFrame(data2)

# Merge the two DataFrames
df = pd.merge(df1, df2, on='Name')

# Display the merged DataFrame
print(df)

Output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston
4    Emily   45      Phoenix

In this example, we merged two DataFrames df1 and df2 into a single DataFrame df using the merge() method.

10.6.2. Merging multiple DataFrames#

You can merge multiple DataFrames into a single DataFrame using the merge() method. Let’s see an example.

import pandas as pd

# Create the first DataFrame

data1 = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45]
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame

data2 = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df2 = pd.DataFrame(data2)

# Create the third DataFrame

data3 = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Country': ['USA', 'USA', 'USA', 'USA', 'USA']
}
df3 = pd.DataFrame(data3)

# Merge the three DataFrames
df = pd.merge(df1, df2, on='Name')
df = pd.merge(df, df3, on='Name')

# Display the merged DataFrame
print(df)

Output:

      Name  Age         City Country
0    Alice   25     New York     USA
1      Bob   30  Los Angeles     USA
2  Charlie   35      Chicago     USA
3    David   40      Houston     USA
4    Emily   45      Phoenix     USA

In this example, we merged three DataFrames df1, df2, and df3 into a single DataFrame df using the merge() method.

10.7. Grouping and Aggregating DataFrames#

You can group a DataFrame by one or more columns and perform aggregations on the groups. Let’s see some examples of grouping and aggregating DataFrames.

10.7.1. Grouping and aggregating by one column#

You can group a DataFrame by one column and perform aggregations on the groups using the groupby() method. Let’s see an example.

import pandas as pd

# Create a DataFrame

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Group the DataFrame by City and calculate the mean of Age for each group
grouped_df = df.groupby('City').agg({'Age': 'mean'})

# Display the grouped and aggregated DataFrame
print(grouped_df)

Output:

             Age
City
Chicago       35
Houston       40
Los Angeles   30
New York      25
Phoenix       45

In this example, we grouped the DataFrame by City and calculated the mean of Age for each group.

10.7.2. Grouping and aggregating by multiple columns#

You can group a DataFrame by multiple columns and perform aggregations on the groups using the groupby() method. Let’s see an example.

import pandas as pd

# Create a DataFrame

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Group the DataFrame by City and Age and calculate the mean of Age for each group
grouped_df = df.groupby(['City', 'Age']).agg({'Age': 'mean'})

# Display the grouped and aggregated DataFrame
print(grouped_df)

Output:

                    Age
City        Age
Chicago     35       35
Houston     40       40
Los Angeles 30       30
New York    25       25
Phoenix     45       45

In this example, we grouped the DataFrame by City and Age and calculated the mean of Age for each group.

10.8. Reading and Writing DataFrames#

You can read data from various file formats into a DataFrame, and you can write a DataFrame to various file formats. Let’s see some examples of reading and writing DataFrames.

10.8.1. Reading a DataFrame from a CSV file#

You can read a DataFrame from a CSV file using the read_csv() method. Let’s see an example.

import pandas as pd

# Read a DataFrame from a CSV file
df = pd.read_csv('data.csv')

# Display the DataFrame
print(df)

In this example, we read a DataFrame from a CSV file called data.csv.

10.8.2. Writing a DataFrame to a CSV file#

You can write a DataFrame to a CSV file using the to_csv() method. Let’s see an example.

import pandas as pd

# Create a DataFrame

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

# Write the DataFrame to a CSV file
df.to_csv('data.csv', index=False)

In this example, we wrote a DataFrame to a CSV file called data.csv.

10.8.3. Reading a DataFrame from an Excel file#

You can read a DataFrame from an Excel file using the read_excel() method. Let’s see an example.

import pandas as pd

# Read a DataFrame from an Excel file
df = pd.read_excel('data.xlsx')

# Display the DataFrame
print(df)

In this example, we read a DataFrame from an Excel file called data.xlsx.

10.8.4. Writing a DataFrame to an Excel file#

You can write a DataFrame to an Excel file using the to_excel() method. Let’s see an example.

import pandas as pd

# Create a DataFrame

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

df = pd.DataFrame(data)

# Write the DataFrame to an Excel file
df.to_excel('data.xlsx', index=False)

In this example, we wrote a DataFrame to an Excel file called data.xlsx.

10.9. Conclusion#

In this section, we learned how to use Pandas library. We learned how to create DataFrames, access DataFrames, perform operations on DataFrames, add and remove columns from DataFrames, add and remove rows from DataFrames, and read and write DataFrames to various file formats. We also saw some examples of creating DataFrames from dictionaries, lists of dictionaries, lists of lists, and NumPy arrays. We saw some examples of accessing individual columns, rows, and cells of DataFrames. We saw some examples of filtering, sorting, and performing statistics on DataFrames. We saw some examples of adding and removing columns and rows from DataFrames. We saw some examples of reading and writing DataFrames to various file formats. We hope you found this section helpful and that you are now comfortable using Pandas for data manipulation and analysis.

10.10. Exercises#

After reading the previous section, you should be able to create your first Jupyter-Notebook and write a report about all the things you have learned in this section. In order to have a good structure, we recommend you to follow exactly the same structure as in this notebook.

But this time it would be different, since you will have to create your own data. Your dataframe must have at least 6 columns and 12 rows. You can use any data you want, but we recommend you to use data that you are familiar with. For example, you can use data from your work, your studies, your hobbies, etc. After you create your dataframe, you will have to recreate all the examples from the previous section using your dataframe and write a report about it.

Note

Your dataframe must have at least 6 columns and 12 rows. And it is very important to include a column named “Conections” in your dataframe, that will symbolize the connections between the rows. So, you will end up with a dataframe that has at least 6 columns and 12 rows, and one of the columns must be named “Connections” 6+1.