
There’s no way to list every possible function that you might use, but here’s a selection of functions that are frequently useful:Īrithmetic operators: +, -, *, /, ^. The key property is that the function must be vectorised: it must take a vector of values as input, return a vector with the same number of values as output. There are many functions for creating new variables that you can use with mutate(). Instead, use rename(), which is a variant of select() that keeps all the variables that aren’t explicitly mentioned:

Select() can be used to rename variables, but it’s rarely useful because it drops all of the variables not explicitly mentioned. Learn more about regular expressions in strings. This one matches any variables that contain repeated characters. Matches("(.)\\1"): selects variables that match a regular expression.

Starts_with("abc"): matches names that begin with “abc”.Įnds_with("xyz"): matches names that end with “xyz”.Ĭontains("ijk"): matches names that contain “ijk”. There are a number of helper functions you can use within select(): Let’s dive in and see how these verbs work. Together these properties make it easy to chain together multiple simple steps to achieve a complex result. Using the variable names (without quotes). The subsequent arguments describe what to do with the data frame, These six functions provide the verbs for a language of data manipulation. These can all be used in conjunction with group_by() which changes the scope of each function from operating on the entire dataset to operating on it group-by-group. Collapse many values down to a single summary ( summarise()).Create new variables with functions of existing variables ( mutate()).Pick variables by their names ( select()).Pick observations by their values ( filter()).In this chapter you are going to learn the five key dplyr functions that allow you to solve the vast majority of your data manipulation challenges: Lgl stands for logical, vectors that contain only TRUE or FALSE.įctr stands for factors, which R uses to represent categorical variables There are three other common types of variables that aren’t used in this dataset but you’ll encounter later in the book: These describe the type of each variable:Ĭhr stands for character vectors, or strings.ĭttm stands for date-times (a date + a time). You might also have noticed the row of three (or four) letter abbreviations under the column names.

For now, you don’t need to worry about the differences we’ll come back to tibbles in more detail in wrangle. Tibbles are data frames, but slightly tweaked to work better in the tidyverse. It prints differently because it’s a tibble. (To see the whole dataset, you can run View(flights) which will open the dataset in the RStudio viewer). You might notice that this data frame prints a little differently from other data frames you might have used in the past: it only shows the first few rows and all the columns that fit on one screen.
