Programming Exercises for Data Analysis, Level A2
These exercises are for people who are learning a new programming language and would like to apply it to data analysis. They work with Python, R, but also when you getting familiar with libraries like pandas.
During the exercises, you will analyze from the Gapminder Foundation www.gapminder.org on global demography.
In this challenge, you will learn:
- integrating data from multiple sources
- aggregation
- plotting, plotting, plotting
Preparations
Download the Gapminder datasets from www.gapminder.org on:
- fertility
- life expectancy
- population
Exercise 1
Load all three tables and determine their dimensions.
Exercise 2
Pick fertility and life expectancy for the year 2010. Integrate both in a single table.
Exercise 3
Remove all rows with missing values.
Exercise 4
Draw a scatterplot of fertility over life expectancy in 2010.
Exercise 5
Draw a histogram of life expectancy in 2010. Try different bin sizes.
Exercise 6
Make the histogram a publication-quality figure.
Exercise 7
Draw a bar plot displaying the fertility of a few selected countries in 2010.
Exercise 8
Repeat exercises 2-7 for the year 1960. Observe differences.
Exercise 9
Make it convenient to repeat the process for any given year.
Exercise 10
Calculate a correlation coefficient between fertility and life expectancy (for the year 2010).
Exercise 11
Fit a linear model allowing to model fertility by life expectancy.
Exercise 12
Read a list of country-continent pairs. Associate the continent with each country (e.g. as an extra column).
Exercise 13
Summarize the world population by continent over time as a scatterplot.
Exercise 14
Identify a few countries that are redundant in the dataset. Remove the respective entries.
Exercise 15
Integrate all three tables into a single data structure.
Exercise 16
Plot population, life expetancy and fertility from 1960 to 2010 in a single diagram.
Exercise 17
Create a series of scatterplots like in Exercise 4 (one for each year) where the color indicates the continent.
Exercise 18
Connect the scatterplots to an animation.