+ - 0:00:00
Notes for current slide
Notes for next slide

CONJ620: CM 1.2

Introduction to the Tidyverse

Alison Hill

1 / 37

I'm assuming...

You have working versions of:

  • R and

  • RStudio

installed on your computer.

2 / 37

Today's in-class lab

3 / 37

Today's in-class lab

  • The data is from NHANES: the CDC's National Health and Nutrition Examination Survey
3 / 37

Set-up

I provide this code in your lab worksheet

library(tidyverse)
heart <- read_csv("http://faculty.washington.edu/kenrice/heartgraphs/nhanesmedium.csv",
na = ".")

Some notes:

  • I'm going to ask that you trust me with this worksheet! You haven't learned about this kind of document yet, called an R Markdown (.Rmd) file- please just go with it! I promise we'll actually explain it in a later lab 😇

  • Don't forget the notepad!: http://bit.ly/conj620-cm012

4 / 37

Data Dictionary

From the data dictionary:

  • BPXSAR: systolic blood pressure (mmHg)
  • BPXDAR: diastolic blood pressure (mmHg)
  • BPXDI1, BPXDI2: two diastolic blood pressure readings
  • race_ethc: race/ethnicity, coded as:
    • Hispanic,
    • White non-Hispanic,
    • Black non-Hispanic and
    • Other
  • gender: sex, coded as Male/Female
  • DR1TFOLA: folate intake (μg/day)
  • RIAGENDR: sex, coded as 1/2
  • BMXBMI: body mass index (kg/m2)
  • RIDAGEY: age (years)
5 / 37

Chapter 1: Data Wrangling

  • print a tibble

    • heart
  • install a package

    • install.packages("dplyr")
    • do 1x per machine
  • load an installed package

    • library(dplyr)
    • do 1x per work session
  • assign a variable a name (<-)

6 / 37

Chapter 1: Data Wrangling

  • print a tibble

    • heart
  • install a package

    • install.packages("dplyr")
    • do 1x per machine
  • load an installed package

    • library(dplyr)
    • do 1x per work session
  • assign a variable a name (<-)

  • dplyr::filter

  • dplyr::arrange

  • dplyr::mutate

6 / 37

Plus: %>%

image courtesy @LegoRLady

7 / 37

%>%

The pipe

"dataframe first, dataframe once"

8 / 37

%>%

The pipe

"dataframe first, dataframe once"

library(dplyr)
8 / 37

%>%

The pipe

"dataframe first, dataframe once"

library(dplyr)

RStudio Keyboard Shortcuts:

OSX: CMD + SHIFT + M

Else: CTRL + SHIFT + M

8 / 37

Nesting a dataframe inside a function is hard to read.

slice(heart, 1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>
9 / 37

Nesting a dataframe inside a function is hard to read.

slice(heart, 1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>

Here, the "sentence" starts with a verb.

9 / 37

Nesting a dataframe inside a function is hard to read.

slice(heart, 1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>

Here, the "sentence" starts with a verb.


Piping a dataframe into a function lets you read L to R

heart %>% slice(1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>
9 / 37

Nesting a dataframe inside a function is hard to read.

slice(heart, 1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>

Here, the "sentence" starts with a verb.


Piping a dataframe into a function lets you read L to R

heart %>% slice(1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>

Now, the "sentence" starts with a noun.

9 / 37

Sequences of functions make you read inside out

slice(filter(heart, gender == "Male"), 1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>
10 / 37

Sequences of functions make you read inside out

slice(filter(heart, gender == "Male"), 1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>

Chaining functions together lets you read L to R

heart %>% filter(gender == "Male") %>% slice(1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>
10 / 37

"dataframe first, dataframe once"

11 / 37
heart %>% filter(gender == "Male") %>% slice(1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>
12 / 37
heart %>% filter(gender == "Male") %>% slice(1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>

This does the same thing:

heart %>% filter(.data = ., gender == "Male") %>% slice(.data = ., 1)
12 / 37
heart %>% filter(gender == "Male") %>% slice(1)
# A tibble: 1 x 10
BPXSAR BPXDAR BPXDI1 BPXDI2 race_ethc gender DR1TFOLA RIAGENDR BMXBMI
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 129. 50.7 48. 48. White non-H… Male 334. 1. 19.7
# ... with 1 more variable: RIDAGEYR <dbl>

This does the same thing:

heart %>% filter(.data = ., gender == "Male") %>% slice(.data = ., 1)

So does this:

heart %>% filter(., gender == "Male") %>% slice(., 1)
12 / 37

attach()

heart$gender or other variants

13 / 37

⌛️

Let's review some helpful functions for filter

14 / 37

Base R + Tidyverse

15 / 37

💡

First:

Logical Operators

16 / 37
?base::Logic
Operator Description Usage
& and x & y
| or x | y
xor exactly x or y xor(x, y)
! not !x
17 / 37

Logical or (|) is inclusive, so x | y really means:

  • x or
  • y or
  • both x & y

Exclusive or (xor) is exclusive, so xor(x, y) really means:

  • x or
  • y...
  • but not both x & y
x <- c(0, 1, 0, 1)
y <- c(0, 0, 1, 1)
boolean_or <- x | y
exclusive_or <- xor(x, y)
cbind(x, y, boolean_or, exclusive_or)
x y boolean_or exclusive_or
[1,] 0 0 0 0
[2,] 1 0 1 1
[3,] 0 1 1 1
[4,] 1 1 1 0
18 / 37

💡

Second:

Comparisons

19 / 37
?Comparison
Operator Description Usage
< less than x < y
<= less than or equal to x <= y
> greater than x > y
>= greater than or equal to x >= y
== exactly equal to x == y
!= not equal to x != y
%in% group membership* x %in% y
is.na is missing is.na(x)
!is.na is not missing !is.na(x)

*(shortcut to using | repeatedly with ==)

20 / 37

⌛️

Let's review mutate

21 / 37

3 ways to mutate

  1. Create a new variable with a specific value
  2. Create a new variable based on other variables

  3. Change an existing variable

22 / 37

3 ways to mutate

  1. Create a new variable with a specific value
  2. Create a new variable based on other variables

  3. Change an existing variable

heart_bp <- heart %>%
select(BPXDI1, BPXDI2)
heart_bp %>%
mutate(year = 2015)
# A tibble: 200 x 3
BPXDI1 BPXDI2 year
<dbl> <dbl> <dbl>
1 48. 48. 2015.
2 76. 78. 2015.
3 76. 76. 2015.
4 64. 56. 2015.
5 54. 56. 2015.
6 80. 78. 2015.
7 52. NA 2015.
8 NA 80. 2015.
9 76. NA 2015.
10 90. 80. 2015.
# ... with 190 more rows
22 / 37

3 ways to mutate

  1. Create a new variable with a specific value

  2. Create a new variable based on other variables
  3. Change an existing variable

23 / 37

3 ways to mutate

  1. Create a new variable with a specific value

  2. Create a new variable based on other variables
  3. Change an existing variable

heart_bp %>%
mutate(bp_ratio = BPXDI1 / BPXDI2)
# A tibble: 200 x 3
BPXDI1 BPXDI2 bp_ratio
<dbl> <dbl> <dbl>
1 48. 48. 1.00
2 76. 78. 0.974
3 76. 76. 1.00
4 64. 56. 1.14
5 54. 56. 0.964
6 80. 78. 1.03
7 52. NA NA
8 NA 80. NA
9 76. NA NA
10 90. 80. 1.12
# ... with 190 more rows
23 / 37

3 ways to mutate

  1. Create a new variable with a specific value

  2. Create a new variable based on other variables

  3. Change an existing variable
24 / 37

3 ways to mutate

  1. Create a new variable with a specific value

  2. Create a new variable based on other variables

  3. Change an existing variable
heart_bp %>%
mutate(bp_ratio = bp_ratio * 100)
# A tibble: 200 x 3
BPXDI1 BPXDI2 bp_ratio
<dbl> <dbl> <dbl>
1 48. 48. 100.
2 76. 78. 97.4
3 76. 76. 100.
4 64. 56. 114.
5 54. 56. 96.4
6 80. 78. 103.
7 52. NA NA
8 NA 80. NA
9 76. NA NA
10 90. 80. 112.
# ... with 190 more rows
24 / 37

To make your mutate "stick"

heart_bp <- heart_bp %>%
mutate(bp_ratio = BPXDI1 / BPXDI2)
25 / 37

⌛️

Let's review some helpful functions for mutate

26 / 37

Remember:

Base R + Tidyverse

27 / 37

💡

First:

Arithmetic

especially useful for mutate

See:

http://r4ds.had.co.nz/transform.html#mutate-funs

28 / 37
?Arithmetic
Operator Description Usage
+ addition x + y
- subtraction x - y
* multiplication x * y
/ division x / y
^ raised to the power of x ^ y
abs absolute value abs(x)
%/% integer division x %/% y
%% remainder after division x %% y
5 %/% 2 # 2 goes into 5 two times with...
[1] 2
5 %% 2 # 1 left over
[1] 1
29 / 37

Chapter 2: Data Visualization

all ggplot2

  • aes(x = , y = ) (aesthetics)
  • aes(x = , y = , color = ) (add color)
  • aes(x = , y = , size = ) (add size)
  • + facet_wrap(~ ) (facetting)
30 / 37

Old School 1

  • Sketch the graphics below on paper, where the x-axis is variable age_yrs and the y-axis is variable systolic_bp
# A tibble: 4 x 4
age_yrs systolic_bp bmi_z gender
<dbl> <dbl> <dbl> <chr>
1 8. 80. 1. male
2 9. 90. 2. male
3 10. 100. 3. female
4 11. 110. 4. female
  1. A scatter plot
  2. A scatter plot where the color of the points corresponds to gender
  3. A scatter plot where the size of the points corresponds to bmi_z
  4. A scatter plot facetted by gender

[1] Shamelessly borrowed with much appreciation to Chester Ismay

31 / 37

1. A scatterplot

library(ggplot2)
ggplot(nhanes, aes(age_yrs, systolic_bp)) +
geom_point()
32 / 37

1. A scatterplot

library(ggplot2)
ggplot(nhanes, aes(age_yrs, systolic_bp)) +
geom_point()

32 / 37

2. color points by gender

library(ggplot2)
ggplot(nhanes, aes(age_yrs, systolic_bp, color = gender)) +
geom_point()
33 / 37

2. color points by gender

library(ggplot2)
ggplot(nhanes, aes(age_yrs, systolic_bp, color = gender)) +
geom_point()

33 / 37

3. size points by bmi_z

library(ggplot2)
ggplot(nhanes, aes(age_yrs, systolic_bp, size = bmi_z)) +
geom_point()
34 / 37

3. size points by bmi_z

library(ggplot2)
ggplot(nhanes, aes(age_yrs, systolic_bp, size = bmi_z)) +
geom_point()

34 / 37

4. facet_wrap by gender

library(ggplot2)
ggplot(nhanes, aes(age_yrs, systolic_bp)) +
geom_point() +
facet_wrap(~gender)
35 / 37

4. facet_wrap by gender

library(ggplot2)
ggplot(nhanes, aes(age_yrs, systolic_bp)) +
geom_point() +
facet_wrap(~gender)

35 / 37

Work your way through the lab worksheet now

View it here

Download the file on Sakai to work on locally

36 / 37

I'm assuming...

You have working versions of:

  • R and

  • RStudio

installed on your computer.

2 / 37
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow