Pandas code cards

Try me

Open In ColabBinder

How to use

  • Each card mirrors an A4 classroom prompt. Predict first (or discuss), then run the cell to check.

  • Detective cards show a buggy idea in Markdown; the code cell shows a fixed version.

  • Keep explanations short and schematic (whatwhy).

Turn Gemini into a coding tutor (no direct answers)

Paste this in your first chat with Gemini to keep it in “tutor mode”:

You are a **coding tutor** for Python in Jupyter/Colab. Follow the **course motto** “do not give up learning.”

### Role & Goals
- Use **Socratic guidance** and **test-first thinking** to help me solve problems myself.
- Help me read errors, reason about state, and make small, safe iterations.

### Strict Rules
1) **Do not** provide full working solutions or paste complete functions/programs.
   - You may show **tiny illustrative fragments (≤3 lines)** or **pseudo-code with TODOs**, but not a drop-in answer.
2) Prefer **questions over answers**; offer **one small next step** at a time.
3) When debugging, explain **what the traceback says**, give **2–3 hypotheses**, and propose the **smallest diff** in *plain English* first.
4) Encourage **TDD**: ask me to write/assert a test, predict, run, and report outputs.
5) Keep responses concise (≈120–150 words) unless I ask for a deeper explanation or code review.
6) Ask me to **run code and share results**; adapt based on the output.
7) If I request the full solution, remind me of the rules and offer a **higher-tier hint** instead.
8) When I finalize an exercise, reinforce learning lessons and suggest additional exercises

### Interaction Loop (use this structure)
- **Restate goal:** what I’m trying to accomplish in one line.
- **Diagnose:** key assumption to check or error to interpret.
- **Hint (tiered):**
  - Tier 1: Conceptual nudge (no code).
  - Tier 2: Directed hint (identify line/construct to change).
  - Tier 3: Pseudo-code with TODOs or a **1–3 line** pattern (still not a full solution).
- **Next action:** one concrete step for me to try now.
- **Ask back:** what to run/paste (output, test result, or traceback).

### When reviewing my code
- Comment on **correctness, clarity, naming, and complexity (big-O)**.
- Suggest **tests** I’m missing (boundaries, empty cases, error paths).

### Safety & Ethics
- No secrets or private data in prompts.
- avoid library functions/APIs unless I ask.

Stay in tutor mode for the whole session.

Code Cards

  1. Predict the output of this code:

import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
print(df.iloc[0, 1])
[ ]:

  1. Predict the output of this code:

import pandas as pd
data = {'X': [10, 20, 30], 'Y': [40, 50, 60], 'Z': [70, 80, 90]}
df = pd.DataFrame(data)
print(df[['X', 'Z']])
[ ]:

The next exercises use the dataframe df below, loading Diabetes data from a CSV file, and has the following columns:

  • Pregnancies: Number of times pregnant

  • Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test

  • BloodPressure: Diastolic blood pressure (mm Hg)

  • SkinThickness: Triceps skin fold thickness (mm)

  • Insulin: 2-Hour serum insulin (mu U/ml)

  • BMI: Body mass index (weight in kg/(height in m)^2)

  • DiabetesPedigreeFunction: Diabetes pedigree function

  • Age: Age (years)

  • Outcome: Class variable (0 or 1)

  • 268 of 768 are 1, the others are 0

  • Class Distribution: (class value 1 is interpreted as “tested positive for diabetes”)

The following code loads the dataset into a Pandas dataframe and shows the first 5 rows:

[1]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/ffraile/computer_science_tutorials/main/source/Data%20Manipulation/exercises/datasets/diabetes.csv')
df.head()
[1]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
  1. Complete the code below to calculate the average BMI of patients with diabetes (Outcome == 1) and without diabetes (Outcome == 0).

df_diabetes = df[?] # Filter rows where Outcome == 1
avg_bmi_diabetes = df_diabetes['BMI'].mean()
df_no_diabetes = df[?] # Filter rows where Outcome == 0
avg_bmi_no_diabetes = df_no_diabetes['BMI'].mean()
print("Average BMI of patients with diabetes:", avg_bmi_diabetes)
print("Average BMI of patients without diabetes:", avg_bmi_no_diabetes)
[ ]:

  1. Complete the code below to find cases where the number of pregnancies is greater than 2 and the glucose level is above 150.

filtered_df = df.query(?) # Filter rows where Pregnancies > 2 and Glucose > 150
print(filtered_df.describe())
[ ]:

  1. Complete the code below to find cases where patients are older than 50 or have a BMI greater than 30.

filtered_df = df.query(?) # Filter rows where Age > 50 or BMI > 30
print(filtered_df.describe())
[ ]:

The next exercises use the COVID dataset, which contains information about the COVID-19 pandemic. The dataset contains the following columns:

  • Date: Date of the record

  • Country: Country name

  • Confirmed: Cumulative number of confirmed cases

  • Recovered: Cumulative number of recovered cases

  • Deaths: Cumulative number of deaths

The following code loads the dataset into a Pandas dataframe:

[2]:
covid_pd = pd.read_csv('https://raw.githubusercontent.com/ffraile/computer_science_tutorials/main/source/Data%20Manipulation/exercises/datasets/covid.csv')

covid_pd
[2]:
Date Country Confirmed Recovered Deaths
0 2020-01-22 Afghanistan 0 0 0
1 2020-01-23 Afghanistan 0 0 0
2 2020-01-24 Afghanistan 0 0 0
3 2020-01-25 Afghanistan 0 0 0
4 2020-01-26 Afghanistan 0 0 0
... ... ... ... ... ...
143663 2022-01-19 Zimbabwe 226887 0 5266
143664 2022-01-20 Zimbabwe 227552 0 5276
143665 2022-01-21 Zimbabwe 227961 0 5288
143666 2022-01-22 Zimbabwe 228179 0 5292
143667 2022-01-23 Zimbabwe 228254 0 5294

143668 rows × 5 columns

  1. Complete the code below to obtain the time series of Deaths in Spain.

spain_deaths = covid_pd[?] # Filter rows where Country == 'Spain'
print(spain_deaths[['Date', 'Deaths']])
[ ]:

  1. Complete the code below to extract the Confirmed cases in China in 2020.

china_2020 = covid_pd.query() # Filter rows where Country == 'China' and Date is in 2020
print(china_2020[['Date', 'Confirmed']])
[ ]:

  1. Knowing that the function diff() computes the difference between consecutive rows, and that idmax() returns the index of the row with max value, complete the code below to find the date with the highest dead toll in the US.

us_deaths = covid_pd[?].copy() # Filter rows where Country == 'US'
us_deaths['Daily_Deaths'] = us_deaths['Deaths'].diff()
max_death_date = us_deaths.loc[us_deaths['Daily_Deaths'].idxmax()]
print("Date with highest daily deaths in US:", max_death_date['Date'], "with", max_death_date['Daily_Deaths'], "deaths")
[4]:

Date with highest daily deaths in US: 2021-01-20 with 4442.0 deaths

The next exercises use a dataframe that loads data from the following dictionary:

data = [
    {"id": 101, "name": "Scott Summers", "alias": "Cyclops", "program":"CS", "score": 9.5},
    {"id": 102, "name": "Jean Grey", "alias": "Phoenix", "program":"CS", "score": 8.7},
    {"id": 103, "name": "Logan Howlett", "alias": "Wolverine", "program":"CS", "score": 7.8},
    {"id": 104, "name": "Ororo Munroe", "alias": "Storm", "program":"CS", "score": 8.9},
    {"id": 105, "name": "Charles Xavier", "alias": "Professor X", "program":"CS", "score": 10},
    {"id": 101, "name": "Scott Summers", "alias": "Cyclops", "program":"Physics", "score": 10.0},
    {"id": 102, "name": "Jean Grey", "alias": "Phoenix", "program":"Physics", "score": 9.5},
    {"id": 103, "name": "Logan Howlett", "alias": "Wolverine", "program":"Physics", "score": 8.0},
    {"id": 104, "name": "Ororo Munroe", "alias": "Storm", "program":"Physics", "score": 9.1},
    {"id": 105, "name": "Charles Xavier", "alias": "Professor X", "program":"Physics", "score": 10.0}
]

The following code loads the dataset into a Pandas dataframe:

[5]:
data = [
    {"id": 101, "name": "Scott Summers", "alias": "Cyclops", "program":"CS", "score": 9.5},
    {"id": 102, "name": "Jean Grey", "alias": "Phoenix", "program":"CS", "score": 8.7},
    {"id": 103, "name": "Logan Howlett", "alias": "Wolverine", "program":"CS", "score": 7.8},
    {"id": 104, "name": "Ororo Munroe", "alias": "Storm", "program":"CS", "score": 8.9},
    {"id": 105, "name": "Charles Xavier", "alias": "Professor X", "program":"CS", "score": 10},
    {"id": 101, "name": "Scott Summers", "alias": "Cyclops", "program":"Physics", "score": 10.0},
    {"id": 102, "name": "Jean Grey", "alias": "Phoenix", "program":"Physics", "score": 9.5},
    {"id": 103, "name": "Logan Howlett", "alias": "Wolverine", "program":"Physics", "score": 8.0},
    {"id": 104, "name": "Ororo Munroe", "alias": "Storm", "program":"Physics", "score": 9.1},
    {"id": 105, "name": "Charles Xavier", "alias": "Professor X", "program":"Physics", "score": 10.0}
]
df1 = pd.DataFrame(data)
df1
[5]:
id name alias program score
0 101 Scott Summers Cyclops CS 9.5
1 102 Jean Grey Phoenix CS 8.7
2 103 Logan Howlett Wolverine CS 7.8
3 104 Ororo Munroe Storm CS 8.9
4 105 Charles Xavier Professor X CS 10.0
5 101 Scott Summers Cyclops Physics 10.0
6 102 Jean Grey Phoenix Physics 9.5
7 103 Logan Howlett Wolverine Physics 8.0
8 104 Ororo Munroe Storm Physics 9.1
9 105 Charles Xavier Professor X Physics 10.0
  1. Predict the output of this code:

df1_cs = df1[alias == 'Cyclops']
print(df1_cs)
[ ]:

  1. Predict the output of this code:

df1_high = df1[df1['score'] > 9.0]
print(df1_high)
[ ]:

  1. Complete the code below to find the average score of students in the CS program.

df1_cs = df1[?] # Filter rows where program == 'CS'
avg_score_cs = df1_cs['score'].mean()
print("Average score in CS program:", avg_score_cs)
[ ]:

  1. Complete the code below to find students with a score greater than 9.0 in the Physics program.

df1_physics_high = df1.query(?)# Filter rows where program is Physics and score is greater than  9.0
print(df1_physics_high)
[ ]: