Pandas Masterclass

Introduction to Pandas

Chapter 1

๐Ÿ“Œ What is Pandas?

Pandas is the core library for data manipulation in Python. It's like "Excel on steroids" for programmers.

๐Ÿš€ Setup

import pandas as pd
print(pd.__version__)
Live Editor

Pandas Series

Chapter 2

A Series is like a column in a table. It is a one-dimensional array holding data of any type.

a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar["y"]) # Returns 7
Live Editor

DataFrames

Chapter 3

๐Ÿ“Œ What is a DataFrame?

A DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
Live Editor
graph TD Dictionary --> DataFrame List_of_Lists --> DataFrame CSV_File --> DataFrame

Reading Files

Chapter 4

๐Ÿ“Œ CSV & JSON

Data usually comes in files. Pandas makes loading them one-line magic.

df = pd.read_csv('data.csv')
df_json = pd.read_json('data.json')
Live Editor

Inspecting Data

Chapter 5
  • df.head(10): First 10 rows.
  • df.tail(): Last 5 rows.
  • df.info(): Data types and memory usage.
  • df.describe(): Statistical summary (mean, std, min, max).

Selecting & Filtering

Chapter 6

Filtering data is intuitive.

# Select column
ages = df['Age']

# Select Row (by index)
row = df.loc[0]

# Conditional Filtering
adults = df[df['Age'] > 18]

Cleaning Empty Cells

Chapter 7

Bad data can ruin your analysis. Handle it!

# Remove rows with empty cells
new_df = df.dropna()

# Fill empty cells with a value
df.fillna(130, inplace = True)

# Fill with Mean
x = df["Calories"].mean()
df["Calories"].fillna(x, inplace = True)
Live Editor

Cleaning Wrong Data

Chapter 8

Correcting logical errors.

# Set a max limit for duration
for x in df.index:
  if df.loc[x, "Duration"] > 120:
    df.loc[x, "Duration"] = 120
Live Editor

Removing Duplicates

Chapter 9
# Check for duplicates
print(df.duplicated())

# Remove duplicates
df.drop_duplicates(inplace = True)

Grouping & Aggregation

Chapter 10

The groupby() method helps group data into categories and apply functions.

# Average calories by Workout Type
numeric_df = df[["Duration", "Pulse", "Maxpulse", "Calories"]]
numeric_df.groupby(df["Type"]).mean()
Live Editor

Merging & Joins

Chapter 11

Combining multiple DataFrames.

merged = pd.merge(df1, df2, on='ID', how='inner')
Live Editor

Plotting Integration

Chapter 12

Pandas hooks directly into Matplotlib.

import matplotlib.pyplot as plt
df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')
plt.show()

๐Ÿš€ Real World Projects

Chapter 13

๐ŸŸข Beginner: Dataset Explorer

Goal: Load a CSV, print stats, and fix missing values.

๐ŸŸก Intermediate: Sales Analysis

Goal: Group sales by Month and Product Category to find top performers.

๐Ÿ”ด Advanced: Stock Predictor (Prep)

Goal: Calculate moving averages and daily returns on historical finance data.

๐ŸŽฏ Pandas Mini Task

Chapter 14

Goal: Create your own DataFrame.

๐Ÿ“‹ Requirements:

  1. Import pandas.
  2. Create a DataFrame with columns: "Fruit" and "Color".
  3. Add 3 rows (e.g., Apple-Red, Banana-Yellow).
  4. Print the whole table.

Data Science starts here! ๐Ÿงช

๐ŸŽ‰ Congratulations!

You've completed the Pandas module.