group_by {dplyr} | R Documentation |
Most data operations are useful done on groups defined by variables in the
the dataset. The group_by
function takes an existing tbl
and converts it into a grouped tbl where operations are performed
"by group".
group_by(.data, ..., add = FALSE) group_by_(.data, ..., .dots, add = FALSE)
.data |
a tbl |
... |
variables to group by. All tbls accept variable names, some will also accept functions of variables. Duplicated groups will be silently dropped. |
add |
By default, when |
.dots |
Used to work around non-standard evaluation. See
|
group_by
is an S3 generic with methods for the three built-in
tbls. See the help for the corresponding classes and their manip
methods for more details:
data.frame: grouped_df
data.table: grouped_dt
SQLite: src_sqlite
PostgreSQL: src_postgres
MySQL: src_mysql
ungroup
for the inverse operation,
groups
for accessors that don't do special evaluation.
by_cyl <- group_by(mtcars, cyl) summarise(by_cyl, mean(disp), mean(hp)) filter(by_cyl, disp == max(disp)) # summarise peels off a single layer of grouping by_vs_am <- group_by(mtcars, vs, am) by_vs <- summarise(by_vs_am, n = n()) by_vs summarise(by_vs, n = sum(n)) # use ungroup() to remove if not wanted summarise(ungroup(by_vs), n = sum(n)) # You can group by expressions: this is just short-hand for # a mutate/rename followed by a simple group_by group_by(mtcars, vsam = vs + am) group_by(mtcars, vs2 = vs) # You can also group by a constant, but it's not very useful group_by(mtcars, "vs") # By default, group_by sets groups. Use add = TRUE to add groups groups(group_by(by_cyl, vs, am)) groups(group_by(by_cyl, vs, am, add = TRUE)) # Duplicate groups are silently dropped groups(group_by(by_cyl, cyl, cyl))