Introduction to Pandas

Chapter 1

Previous Next

📌 What is Pandas?

Pandas is the core library for data manipulation in Python. It's like "Excel on steroids" for programmers.

🚀 Setup

import pandas as pd
print(pd.__version__)

Pandas Series

Chapter 2

Previous Next

A Series is like a column in a table. It is a one-dimensional array holding data of any type.

a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar["y"]) # Returns 7

DataFrames

Chapter 3

Previous Next

📌 What is a DataFrame?

A DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)

graph TD Dictionary --> DataFrame List_of_Lists --> DataFrame CSV_File --> DataFrame

Reading Files

Chapter 4

Previous Next

📌 CSV & JSON

Data usually comes in files. Pandas makes loading them one-line magic.

df = pd.read_csv('data.csv')
df_json = pd.read_json('data.json')

Inspecting Data

Chapter 5

Previous Next

df.head(10): First 10 rows.
df.tail(): Last 5 rows.
df.info(): Data types and memory usage.
df.describe(): Statistical summary (mean, std, min, max).

Selecting & Filtering

Chapter 6

Previous Next

Filtering data is intuitive.

# Select column
ages = df['Age']

# Select Row (by index)
row = df.loc[0]

# Conditional Filtering
adults = df[df['Age'] > 18]

Cleaning Empty Cells

Chapter 7

Previous Next

Bad data can ruin your analysis. Handle it!

# Remove rows with empty cells
new_df = df.dropna()

# Fill empty cells with a value
df.fillna(130, inplace = True)

# Fill with Mean
x = df["Calories"].mean()
df["Calories"].fillna(x, inplace = True)

Cleaning Wrong Data

Chapter 8

Previous Next

Correcting logical errors.

# Set a max limit for duration
for x in df.index:
  if df.loc[x, "Duration"] > 120:
    df.loc[x, "Duration"] = 120

Removing Duplicates

Chapter 9

Previous Next

# Check for duplicates
print(df.duplicated())

# Remove duplicates
df.drop_duplicates(inplace = True)

Grouping & Aggregation

Chapter 10

Previous Next

The groupby() method helps group data into categories and apply functions.

# Average calories by Workout Type
numeric_df = df[["Duration", "Pulse", "Maxpulse", "Calories"]]
numeric_df.groupby(df["Type"]).mean()

Merging & Joins

Chapter 11

Previous Next

Combining multiple DataFrames.

merged = pd.merge(df1, df2, on='ID', how='inner')

Plotting Integration

Chapter 12

Previous Next

Pandas hooks directly into Matplotlib.

import matplotlib.pyplot as plt
df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')
plt.show()

🚀 Real World Projects

Chapter 13

Previous Next

🟢 Beginner: Dataset Explorer

Goal: Load a CSV, print stats, and fix missing values.

🟡 Intermediate: Sales Analysis

Goal: Group sales by Month and Product Category to find top performers.

🔴 Advanced: Stock Predictor (Prep)

Goal: Calculate moving averages and daily returns on historical finance data.

🎯 Pandas Mini Task

Chapter 14

Previous Next

Goal: Create your own DataFrame.

📋 Requirements:

Import pandas.
Create a DataFrame with columns: "Fruit" and "Color".
Add 3 rows (e.g., Apple-Red, Banana-Yellow).
Print the whole table.

Data Science starts here! 🧪

Pandas Masterclass

Introduction to Pandas

📌 What is Pandas?

🚀 Setup

Pandas Series

DataFrames

📌 What is a DataFrame?

Reading Files

📌 CSV & JSON

Inspecting Data

Selecting & Filtering

Cleaning Empty Cells

Cleaning Wrong Data

Removing Duplicates

Grouping & Aggregation

Merging & Joins

Plotting Integration

🚀 Real World Projects

🟢 Beginner: Dataset Explorer

🟡 Intermediate: Sales Analysis

🔴 Advanced: Stock Predictor (Prep)

🎯 Pandas Mini Task

📋 Requirements:

🎉 Congratulations!