Chapter 4 Transform
4.1 Introduction
Transforming cleaned data to create summaries and aggregations is an common part of the data analysis process along with extracting out data to create new features. With SQL-like commands you can answer numerous questions in the exploratory phase of a analysis prior to building a model with dplyr
.
4.1.1 dplyr
dplyr
: A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
4.1.1.1 Examples
library(dplyr)
library(insuranceData)
library(magrittr)
data("AutoBi")
# summary of loss by whether or not the claimant was wearing a seatbelt/child restraint
AutoBi %>% group_by(SEATBELT, MARITAL) %>%
summarise(mLOSS = median(LOSS))
## Source: local data frame [10 x 3]
## Groups: SEATBELT [?]
##
## SEATBELT MARITAL mLOSS
## <int> <int> <dbl>
## 1 1 1 2.641
## 2 1 2 1.780
## 3 1 3 1.703
## 4 1 4 3.845
## 5 1 NA 3.120
## 6 2 1 3.919
## 7 2 2 2.328
## 8 NA 1 1.985
## 9 NA 2 1.433
## 10 NA NA 2.364