Pandas code cards¶
Try me¶
How to use¶
Each card mirrors an A4 classroom prompt. Predict first (or discuss), then run the cell to check.
Detective cards show a buggy idea in Markdown; the code cell shows a fixed version.
Keep explanations short and schematic (what → why).
Turn Gemini into a coding tutor (no direct answers)¶
Paste this in your first chat with Gemini to keep it in “tutor mode”:
You are a **coding tutor** for Python in Jupyter/Colab. Follow the **course motto** “do not give up learning.”
### Role & Goals
- Use **Socratic guidance** and **test-first thinking** to help me solve problems myself.
- Help me read errors, reason about state, and make small, safe iterations.
### Strict Rules
1) **Do not** provide full working solutions or paste complete functions/programs.
- You may show **tiny illustrative fragments (≤3 lines)** or **pseudo-code with TODOs**, but not a drop-in answer.
2) Prefer **questions over answers**; offer **one small next step** at a time.
3) When debugging, explain **what the traceback says**, give **2–3 hypotheses**, and propose the **smallest diff** in *plain English* first.
4) Encourage **TDD**: ask me to write/assert a test, predict, run, and report outputs.
5) Keep responses concise (≈120–150 words) unless I ask for a deeper explanation or code review.
6) Ask me to **run code and share results**; adapt based on the output.
7) If I request the full solution, remind me of the rules and offer a **higher-tier hint** instead.
8) When I finalize an exercise, reinforce learning lessons and suggest additional exercises
### Interaction Loop (use this structure)
- **Restate goal:** what I’m trying to accomplish in one line.
- **Diagnose:** key assumption to check or error to interpret.
- **Hint (tiered):**
- Tier 1: Conceptual nudge (no code).
- Tier 2: Directed hint (identify line/construct to change).
- Tier 3: Pseudo-code with TODOs or a **1–3 line** pattern (still not a full solution).
- **Next action:** one concrete step for me to try now.
- **Ask back:** what to run/paste (output, test result, or traceback).
### When reviewing my code
- Comment on **correctness, clarity, naming, and complexity (big-O)**.
- Suggest **tests** I’m missing (boundaries, empty cases, error paths).
### Safety & Ethics
- No secrets or private data in prompts.
- avoid library functions/APIs unless I ask.
Stay in tutor mode for the whole session.
Code Cards¶
Predict the output of this code:
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
print(df.iloc[0, 1])
[ ]:
Predict the output of this code:
import pandas as pd
data = {'X': [10, 20, 30], 'Y': [40, 50, 60], 'Z': [70, 80, 90]}
df = pd.DataFrame(data)
print(df[['X', 'Z']])
[ ]:
The next exercises use the dataframe df below, loading Diabetes data from a CSV file, and has the following columns:
Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
BloodPressure: Diastolic blood pressure (mm Hg)
SkinThickness: Triceps skin fold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg/(height in m)^2)
DiabetesPedigreeFunction: Diabetes pedigree function
Age: Age (years)
Outcome: Class variable (0 or 1)
268 of 768 are 1, the others are 0
Class Distribution: (class value 1 is interpreted as “tested positive for diabetes”)
The following code loads the dataset into a Pandas dataframe and shows the first 5 rows:
[1]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/ffraile/computer_science_tutorials/main/source/Data%20Manipulation/exercises/datasets/diabetes.csv')
df.head()
[1]:
| Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | DiabetesPedigreeFunction | Age | Outcome | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 6 | 148 | 72 | 35 | 0 | 33.6 | 0.627 | 50 | 1 |
| 1 | 1 | 85 | 66 | 29 | 0 | 26.6 | 0.351 | 31 | 0 |
| 2 | 8 | 183 | 64 | 0 | 0 | 23.3 | 0.672 | 32 | 1 |
| 3 | 1 | 89 | 66 | 23 | 94 | 28.1 | 0.167 | 21 | 0 |
| 4 | 0 | 137 | 40 | 35 | 168 | 43.1 | 2.288 | 33 | 1 |
Complete the code below to calculate the average BMI of patients with diabetes (Outcome == 1) and without diabetes (Outcome == 0).
df_diabetes = df[?] # Filter rows where Outcome == 1
avg_bmi_diabetes = df_diabetes['BMI'].mean()
df_no_diabetes = df[?] # Filter rows where Outcome == 0
avg_bmi_no_diabetes = df_no_diabetes['BMI'].mean()
print("Average BMI of patients with diabetes:", avg_bmi_diabetes)
print("Average BMI of patients without diabetes:", avg_bmi_no_diabetes)
[ ]:
Complete the code below to find cases where the number of pregnancies is greater than 2 and the glucose level is above 150.
filtered_df = df.query(?) # Filter rows where Pregnancies > 2 and Glucose > 150
print(filtered_df.describe())
[ ]:
Complete the code below to find cases where patients are older than 50 or have a BMI greater than 30.
filtered_df = df.query(?) # Filter rows where Age > 50 or BMI > 30
print(filtered_df.describe())
[ ]:
The next exercises use the COVID dataset, which contains information about the COVID-19 pandemic. The dataset contains the following columns:
Date: Date of the record
Country: Country name
Confirmed: Cumulative number of confirmed cases
Recovered: Cumulative number of recovered cases
Deaths: Cumulative number of deaths
The following code loads the dataset into a Pandas dataframe:
[2]:
covid_pd = pd.read_csv('https://raw.githubusercontent.com/ffraile/computer_science_tutorials/main/source/Data%20Manipulation/exercises/datasets/covid.csv')
covid_pd
[2]:
| Date | Country | Confirmed | Recovered | Deaths | |
|---|---|---|---|---|---|
| 0 | 2020-01-22 | Afghanistan | 0 | 0 | 0 |
| 1 | 2020-01-23 | Afghanistan | 0 | 0 | 0 |
| 2 | 2020-01-24 | Afghanistan | 0 | 0 | 0 |
| 3 | 2020-01-25 | Afghanistan | 0 | 0 | 0 |
| 4 | 2020-01-26 | Afghanistan | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... |
| 143663 | 2022-01-19 | Zimbabwe | 226887 | 0 | 5266 |
| 143664 | 2022-01-20 | Zimbabwe | 227552 | 0 | 5276 |
| 143665 | 2022-01-21 | Zimbabwe | 227961 | 0 | 5288 |
| 143666 | 2022-01-22 | Zimbabwe | 228179 | 0 | 5292 |
| 143667 | 2022-01-23 | Zimbabwe | 228254 | 0 | 5294 |
143668 rows × 5 columns
Complete the code below to obtain the time series of Deaths in Spain.
spain_deaths = covid_pd[?] # Filter rows where Country == 'Spain'
print(spain_deaths[['Date', 'Deaths']])
[ ]:
Complete the code below to extract the Confirmed cases in China in 2020.
china_2020 = covid_pd.query() # Filter rows where Country == 'China' and Date is in 2020
print(china_2020[['Date', 'Confirmed']])
[ ]:
Knowing that the function
diff()computes the difference between consecutive rows, and thatidmax()returns the index of the row with max value, complete the code below to find the date with the highest dead toll in the US.
us_deaths = covid_pd[?].copy() # Filter rows where Country == 'US'
us_deaths['Daily_Deaths'] = us_deaths['Deaths'].diff()
max_death_date = us_deaths.loc[us_deaths['Daily_Deaths'].idxmax()]
print("Date with highest daily deaths in US:", max_death_date['Date'], "with", max_death_date['Daily_Deaths'], "deaths")
[4]:
Date with highest daily deaths in US: 2021-01-20 with 4442.0 deaths
The next exercises use a dataframe that loads data from the following dictionary:
data = [
{"id": 101, "name": "Scott Summers", "alias": "Cyclops", "program":"CS", "score": 9.5},
{"id": 102, "name": "Jean Grey", "alias": "Phoenix", "program":"CS", "score": 8.7},
{"id": 103, "name": "Logan Howlett", "alias": "Wolverine", "program":"CS", "score": 7.8},
{"id": 104, "name": "Ororo Munroe", "alias": "Storm", "program":"CS", "score": 8.9},
{"id": 105, "name": "Charles Xavier", "alias": "Professor X", "program":"CS", "score": 10},
{"id": 101, "name": "Scott Summers", "alias": "Cyclops", "program":"Physics", "score": 10.0},
{"id": 102, "name": "Jean Grey", "alias": "Phoenix", "program":"Physics", "score": 9.5},
{"id": 103, "name": "Logan Howlett", "alias": "Wolverine", "program":"Physics", "score": 8.0},
{"id": 104, "name": "Ororo Munroe", "alias": "Storm", "program":"Physics", "score": 9.1},
{"id": 105, "name": "Charles Xavier", "alias": "Professor X", "program":"Physics", "score": 10.0}
]
The following code loads the dataset into a Pandas dataframe:
[5]:
data = [
{"id": 101, "name": "Scott Summers", "alias": "Cyclops", "program":"CS", "score": 9.5},
{"id": 102, "name": "Jean Grey", "alias": "Phoenix", "program":"CS", "score": 8.7},
{"id": 103, "name": "Logan Howlett", "alias": "Wolverine", "program":"CS", "score": 7.8},
{"id": 104, "name": "Ororo Munroe", "alias": "Storm", "program":"CS", "score": 8.9},
{"id": 105, "name": "Charles Xavier", "alias": "Professor X", "program":"CS", "score": 10},
{"id": 101, "name": "Scott Summers", "alias": "Cyclops", "program":"Physics", "score": 10.0},
{"id": 102, "name": "Jean Grey", "alias": "Phoenix", "program":"Physics", "score": 9.5},
{"id": 103, "name": "Logan Howlett", "alias": "Wolverine", "program":"Physics", "score": 8.0},
{"id": 104, "name": "Ororo Munroe", "alias": "Storm", "program":"Physics", "score": 9.1},
{"id": 105, "name": "Charles Xavier", "alias": "Professor X", "program":"Physics", "score": 10.0}
]
df1 = pd.DataFrame(data)
df1
[5]:
| id | name | alias | program | score | |
|---|---|---|---|---|---|
| 0 | 101 | Scott Summers | Cyclops | CS | 9.5 |
| 1 | 102 | Jean Grey | Phoenix | CS | 8.7 |
| 2 | 103 | Logan Howlett | Wolverine | CS | 7.8 |
| 3 | 104 | Ororo Munroe | Storm | CS | 8.9 |
| 4 | 105 | Charles Xavier | Professor X | CS | 10.0 |
| 5 | 101 | Scott Summers | Cyclops | Physics | 10.0 |
| 6 | 102 | Jean Grey | Phoenix | Physics | 9.5 |
| 7 | 103 | Logan Howlett | Wolverine | Physics | 8.0 |
| 8 | 104 | Ororo Munroe | Storm | Physics | 9.1 |
| 9 | 105 | Charles Xavier | Professor X | Physics | 10.0 |
Predict the output of this code:
df1_cs = df1[alias == 'Cyclops']
print(df1_cs)
[ ]:
Predict the output of this code:
df1_high = df1[df1['score'] > 9.0]
print(df1_high)
[ ]:
Complete the code below to find the average score of students in the CS program.
df1_cs = df1[?] # Filter rows where program == 'CS'
avg_score_cs = df1_cs['score'].mean()
print("Average score in CS program:", avg_score_cs)
[ ]:
Complete the code below to find students with a score greater than 9.0 in the Physics program.
df1_physics_high = df1.query(?)# Filter rows where program is Physics and score is greater than 9.0
print(df1_physics_high)
[ ]: