You have working versions of:
R and
RStudio
installed on your computer.
Our notepad: http://bit.ly/conj620-cm012
Graphics and statistics for cardiology: comparing categorical and continuous variables
Our notepad: http://bit.ly/conj620-cm012
Graphics and statistics for cardiology: comparing categorical and continuous variables
I provide this code in your lab worksheet
library(tidyverse)heart <- read_csv("http://faculty.washington.edu/kenrice/heartgraphs/nhanesmedium.csv", na = ".")
Some notes:
I'm going to ask that you trust me with this worksheet! You haven't learned about this kind of document yet, called an R Markdown (.Rmd
) file- please just go with it! I promise we'll actually explain it in a later lab 😇
Don't forget the notepad!: http://bit.ly/conj620-cm012
From the data dictionary:
BPXSAR
: systolic blood pressure (mmHg)BPXDAR
: diastolic blood pressure (mmHg)BPXDI1
, BPXDI2
: two diastolic blood pressure readingsrace_ethc
: race/ethnicity, coded as:gender
: sex, coded as Male/FemaleDR1TFOLA
: folate intake (μg/day)RIAGENDR
: sex, coded as 1/2BMXBMI
: body mass index (kg/m2)RIDAGEY
: age (years)print a tibble
heart
install a package
install.packages("dplyr")
load an installed package
library(dplyr)
assign a variable a name (<-
)
print a tibble
heart
install a package
install.packages("dplyr")
load an installed package
library(dplyr)
assign a variable a name (<-
)
dplyr::filter
dplyr::arrange
dplyr::mutate
%>%
"dataframe first, dataframe once"
%>%
"dataframe first, dataframe once"
library(dplyr)
%>%
"dataframe first, dataframe once"
library(dplyr)
RStudio Keyboard Shortcuts:
OSX: CMD
+ SHIFT
+ M
Else: CTRL
+ SHIFT
+ M
Nesting a dataframe inside a function is hard to read.
slice(heart, 1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
Nesting a dataframe inside a function is hard to read.
slice(heart, 1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
Here, the "sentence" starts with a verb.
Nesting a dataframe inside a function is hard to read.
slice(heart, 1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
Here, the "sentence" starts with a verb.
Piping a dataframe into a function lets you read L to R
heart %>% slice(1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
Nesting a dataframe inside a function is hard to read.
slice(heart, 1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
Here, the "sentence" starts with a verb.
Piping a dataframe into a function lets you read L to R
heart %>% slice(1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
Now, the "sentence" starts with a noun.
Sequences of functions make you read inside out
slice(filter(heart, gender == "Male"), 1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
Sequences of functions make you read inside out
slice(filter(heart, gender == "Male"), 1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
Chaining functions together lets you read L to R
heart %>% filter(gender == "Male") %>% slice(1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
heart %>% filter(gender == "Male") %>% slice(1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
heart %>% filter(gender == "Male") %>% slice(1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
This does the same thing:
heart %>% filter(.data = ., gender == "Male") %>% slice(.data = ., 1)
heart %>% filter(gender == "Male") %>% slice(1)
# A tibble: 1 x 10 BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7# ... with 1 more variable: RIDAGEYR <dbl>
This does the same thing:
heart %>% filter(.data = ., gender == "Male") %>% slice(.data = ., 1)
So does this:
heart %>% filter(., gender == "Male") %>% slice(., 1)
attach()
heart$gender
or other variants
filter
?base::Logic
Operator | Description | Usage |
---|---|---|
& | and | x & y |
| | or | x | y |
xor | exactly x or y | xor(x, y) |
! | not | !x |
Logical or (|
) is inclusive, so x | y
really means:
Exclusive or (xor
) is exclusive, so xor(x, y)
really means:
x <- c(0, 1, 0, 1)y <- c(0, 0, 1, 1)boolean_or <- x | yexclusive_or <- xor(x, y)cbind(x, y, boolean_or, exclusive_or)
x y boolean_or exclusive_or[1,] 0 0 0 0[2,] 1 0 1 1[3,] 0 1 1 1[4,] 1 1 1 0
?Comparison
Operator | Description | Usage |
---|---|---|
< | less than | x < y |
<= | less than or equal to | x <= y |
> | greater than | x > y |
>= | greater than or equal to | x >= y |
== | exactly equal to | x == y |
!= | not equal to | x != y |
%in% | group membership* | x %in% y |
is.na | is missing | is.na(x) |
!is.na | is not missing | !is.na(x) |
*(shortcut to using |
repeatedly with ==
)
mutate
mutate
Create a new variable based on other variables
Change an existing variable
mutate
Create a new variable based on other variables
Change an existing variable
heart_bp <- heart %>% select(BPXDI1, BPXDI2)heart_bp %>% mutate(year = 2015)
# A tibble: 200 x 3 BPXDI1 BPXDI2 year <dbl> <dbl> <dbl> 1 48. 48. 2015. 2 76. 78. 2015. 3 76. 76. 2015. 4 64. 56. 2015. 5 54. 56. 2015. 6 80. 78. 2015. 7 52. NA 2015. 8 NA 80. 2015. 9 76. NA 2015.10 90. 80. 2015.# ... with 190 more rows
mutate
Create a new variable with a specific value
Change an existing variable
mutate
Create a new variable with a specific value
Change an existing variable
heart_bp %>% mutate(bp_ratio = BPXDI1 / BPXDI2)
# A tibble: 200 x 3 BPXDI1 BPXDI2 bp_ratio <dbl> <dbl> <dbl> 1 48. 48. 1.00 2 76. 78. 0.974 3 76. 76. 1.00 4 64. 56. 1.14 5 54. 56. 0.964 6 80. 78. 1.03 7 52. NA NA 8 NA 80. NA 9 76. NA NA 10 90. 80. 1.12 # ... with 190 more rows
mutate
Create a new variable with a specific value
Create a new variable based on other variables
mutate
Create a new variable with a specific value
Create a new variable based on other variables
heart_bp %>% mutate(bp_ratio = bp_ratio * 100)
# A tibble: 200 x 3 BPXDI1 BPXDI2 bp_ratio <dbl> <dbl> <dbl> 1 48. 48. 100. 2 76. 78. 97.4 3 76. 76. 100. 4 64. 56. 114. 5 54. 56. 96.4 6 80. 78. 103. 7 52. NA NA 8 NA 80. NA 9 76. NA NA 10 90. 80. 112. # ... with 190 more rows
mutate
"stick"heart_bp <- heart_bp %>% mutate(bp_ratio = BPXDI1 / BPXDI2)
mutate
especially useful for mutate
See:
http://r4ds.had.co.nz/transform.html#mutate-funs
?Arithmetic
Operator | Description | Usage |
---|---|---|
+ | addition | x + y |
- | subtraction | x - y |
* | multiplication | x * y |
/ | division | x / y |
^ | raised to the power of | x ^ y |
abs | absolute value | abs(x) |
%/% | integer division | x %/% y |
%% | remainder after division | x %% y |
5 %/% 2 # 2 goes into 5 two times with...
[1] 2
5 %% 2 # 1 left over
[1] 1
all ggplot2
aes(x = , y = )
(aesthetics)aes(x = , y = , color = )
(add color)aes(x = , y = , size = )
(add size)+ facet_wrap(~ )
(facetting)x
-axis is variable age_yrs
and the y
-axis is variable systolic_bp
# A tibble: 4 x 4 age_yrs systolic_bp bmi_z gender <dbl> <dbl> <dbl> <chr> 1 8. 80. 1. male 2 9. 90. 2. male 3 10. 100. 3. female4 11. 110. 4. female
color
of the points corresponds to gender
size
of the points corresponds to bmi_z
gender
[1] Shamelessly borrowed with much appreciation to Chester Ismay
library(ggplot2)ggplot(nhanes, aes(age_yrs, systolic_bp)) + geom_point()
library(ggplot2)ggplot(nhanes, aes(age_yrs, systolic_bp)) + geom_point()
color
points by gender
library(ggplot2)ggplot(nhanes, aes(age_yrs, systolic_bp, color = gender)) + geom_point()
color
points by gender
library(ggplot2)ggplot(nhanes, aes(age_yrs, systolic_bp, color = gender)) + geom_point()
size
points by bmi_z
library(ggplot2)ggplot(nhanes, aes(age_yrs, systolic_bp, size = bmi_z)) + geom_point()
size
points by bmi_z
library(ggplot2)ggplot(nhanes, aes(age_yrs, systolic_bp, size = bmi_z)) + geom_point()
facet_wrap
by gender
library(ggplot2)ggplot(nhanes, aes(age_yrs, systolic_bp)) + geom_point() + facet_wrap(~gender)
facet_wrap
by gender
library(ggplot2)ggplot(nhanes, aes(age_yrs, systolic_bp)) + geom_point() + facet_wrap(~gender)
View it here
Download the file on Sakai to work on locally
You have working versions of:
R and
RStudio
installed on your computer.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |