A complete knitted html
file is due on Sakai by beginning of class Tuesday July 10th (2:30pm).
The goal is to explore a new-to-you dataset. In particular, to begin to establish a workflow for data frames or “tibbles”. You will use dplyr
and ggplot2
to do some description and visualization.
Your homework should serve as your own personal cheatsheet in the future for things to do with a new dataset. Give yourself the cheatsheet you deserve! You’ll submit your work as an html file knit from your .Rmd
file. Remember:
dplyr
should be your data manipulation toolggplot2
should be your visualization toolYou will explore the reprohealth
dataset, which is distributed as an R package from my personal GitHub.
Install it, and remember to use this code only in your R console, not in a script or .Rmd file:
install.packages("remotes") # install the remotes package
library(remotes) # load remotes package so you can install from github
install_github("apreshill/reprohealth") # install the package
Then at the top of your lab, copy and paste this code:
library(tidyverse) # you'll need this too
library(reprohealth) # load the package
wb_stats <- wb_reprohealth # save the data to your local environment
Using R Markdown, do the following in R code chunks:
wb_stats
data (hint: just name it!).dplyr::glimpse
of the wb_stats
data.In the text portion, answer the following questions based on the output above:
Pick at least one categorical variable and at least one quantitative variable to explore individually (i.e., one-at-a-time).
Feel free to use summary stats, tables, figures. We’re NOT expecting high production value (yet).
See the ggplot2
tutorial for ideas. Also you might check out:
Make a few plots, probably of the same variable you chose to characterize numerically. Try to explore more than one plot type. Just as an example of what we mean:
You don’t have to use all the data in every plot! It’s fine to filter down to one country or small handful of countries.
Super-extra-bonus-points for playing with knitr
chunk options to improve the display of your plots in the knitted html file, try this blog post for tips.
You’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc. Give credit to your sources, whether it’s a blog post, a fellow student, an online tutorial, etc.
Our general grading guidelines
Check minus: There are some mistakes or omissions, such as the number of rows or variables in the data frame. Or maybe it is only in the output but not in the text. There are no plots or there’s just one type of plot (and its probably a scatterplot). It’s hard to figure out where to look for what in this doc.
Check: Hits all the elements. No obvious mistakes. Pleasant to read. No heroic detective work required. Solid.
Check plus: Some “above and beyond”, creativity, etc. You learned something new from reviewing their work and you’re eager to incorporate it into your work now. The ggplot2
figures are quite diverse. The knitted html is very organized and well formatted.
This lab was adapted from Jenny Bryan’s STAT545 class.