tidyverse
package (must be installed first! Do not install packages within your scripts)readr
package, we import our csv with read_csv
functionread_csv()
parentheses is the url to the datasetna = "."
, specifies that missing data in this dataset is denoted by a period<-
and call that object heart
library(tidyverse)
heart <- read_csv("http://faculty.washington.edu/kenrice/heartgraphs/nhanesmedium.csv",
na = ".")
From the data dictionary:
BPXSAR
: systolic blood pressure (mmHg)BPXDAR
: diastolic blood pressure (mmHg)BPXDI1
, BPXDI2
: two diastolic blood pressure readingsrace_ethc
: race/ethnicity, coded as:
gender
: sex, coded as Male/FemaleDR1TFOLA
: folate intake (μg/day)RIAGENDR
: sex, coded as 1/2BMXBMI
: body mass index (kg/m2)RIDAGEY
: age (years)heart
data?# your code here
dplyr
to answer these questions:# your code here
If this was easy: there are actually (at least) 4 ways to do this with dplyr
functions. Try to figure out four ways to do this!
dplyr
to do the following:heart
called RIDAGEMOS
that converts RIDAGEYR
to months.# your code here
ggplot2
to make a scatterplot with age in years on the x-axis and systolic blood pressure on the y-axis.# your code here
If this was easy: import the larger dataset (http://faculty.washington.edu/kenrice/heartgraphs/nhaneslarge.csv) and make a hexagonal heatmap of 2d bin counts. Apply a custom continuous color palette to the hexbins, and reverse the colors such that lighter colors are for lower counts and darker colors are for higher counts.
dplyr
and ggplot2
to do the following:?case_when
to read the help documents for this dplyr
function. Look carefully at the examples (hint: you might find the star_wars
example most helpful- you can run that code in your console!). Use this function to make a new variable called age_cat
with 3 values:
between(<name_of_age_var>, 0, 30)
is age_cat == "Under 30"
between(<name_of_age_var>, 31, 55)
is age_cat == "31-55"
age_cat == "56+"
age_cat
variable. Color the points by gender
.?facet_grid
, and recreate the same plot now using gender ~ age_cat
.# your case_when code here
# your plot code here
If this was easy, add linear regression lines to all six facetted grids (keep the standard error ribbons). Apply a custom color palette and remove the color legend guide. Play with alpha
levels for the points. Use something other than the default theme_gray()
.
very important: delete the text below in order to be able to knit (you won’t have the image file in your directory)