Chapter 4 Transform

4.1 Introduction

Transforming cleaned data to create summaries and aggregations is an common part of the data analysis process along with extracting out data to create new features. With SQL-like commands you can answer numerous questions in the exploratory phase of a analysis prior to building a model with dplyr.

4.1.1 dplyr

dplyr: A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

4.1.1.1 Examples

library(dplyr)
library(insuranceData)
library(magrittr)
data("AutoBi")

# summary of loss by whether or not the claimant was wearing a seatbelt/child restraint

AutoBi %>% group_by(SEATBELT, MARITAL) %>% 
           summarise(mLOSS = median(LOSS))
## Source: local data frame [10 x 3]
## Groups: SEATBELT [?]
## 
##    SEATBELT MARITAL mLOSS
##       <int>   <int> <dbl>
## 1         1       1 2.641
## 2         1       2 1.780
## 3         1       3 1.703
## 4         1       4 3.845
## 5         1      NA 3.120
## 6         2       1 3.919
## 7         2       2 2.328
## 8        NA       1 1.985
## 9        NA       2 1.433
## 10       NA      NA 2.364