Generate a statistics table with the chosen statistical functions, nested if called with a grouped dataframe.
Usage
desc_table(data, ..., .auto, .labels)
# S3 method for default
desc_table(data, ..., .auto, .labels)
# S3 method for data.frame
desc_table(data, ..., .labels = NULL, .auto = stats_auto)
# S3 method for grouped_df
desc_table(data, ..., .auto = stats_auto, .labels = NULL)
Arguments
- data
The dataframe to analyze
- ...
A list of named statistics to apply to each element of the dataframe, or a function returning a list of named statistics
- .auto
A function to automatically determine appropriate statistics
- .labels
A named character vector of variable labels
Stats
The statistical functions to use in the table are passed as additional arguments.
If the argument is named (eg. N = length
) the name will be used as the column title instead of the function
name (here, N instead of length).
Any R function can be a statistical function, as long as it returns only one value when applied to a vector, or as many values as there are levels in a factor, plus one.
Users can also use purrr::map
-like formulas as quick anonymous functions (eg. Q1 = ~ quantile(., .25)
to get the first quantile in a
column named Q1)
If no statistical function is given to desc_table
, the .auto
argument is used to provide a function
that automatically determines the most appropriate statistical functions to use based on the contents of the table.
Labels
.labels
is a named character vector to provide "pretty" labels to variables.
If given, the variable names for which there is a label will be replaced by their corresponding label.
Not all variables need to have a label, and labels for non-existing variables are ignored.
labels must be given in the form c(unquoted_variable_name = "label")
Output
The output is either a dataframe in the case of a simple descriptive table, or nested dataframes in the case of a comparative table.
Examples
iris %>%
desc_table()
#> Variables N % Min Q1 Med Mean Q3 Max
#> 1 Sepal.Length 150 NA 4.3 5.1 5.80 5.843333 6.4 7.9
#> 2 Sepal.Width 150 NA 2.0 2.8 3.00 3.057333 3.3 4.4
#> 3 Petal.Length 150 NA 1.0 1.6 4.35 3.758000 5.1 6.9
#> 4 Petal.Width 150 NA 0.1 0.3 1.30 1.199333 1.8 2.5
#> 5 **Species** 150 NA NA NA NA NA NA NA
#> 6 **Species**: *setosa* 50 33.33333 NA NA NA NA NA NA
#> 7 **Species**: *versicolor* 50 33.33333 NA NA NA NA NA NA
#> 8 **Species**: *virginica* 50 33.33333 NA NA NA NA NA NA
#> sd IQR
#> 1 0.8280661 1.3
#> 2 0.4358663 0.5
#> 3 1.7652982 3.5
#> 4 0.7622377 1.5
#> 5 NA NA
#> 6 NA NA
#> 7 NA NA
#> 8 NA NA
# Does the same as stats_auto here
iris %>%
desc_table("N" = length,
"Min" = min,
"Q1" = ~quantile(., .25),
"Med" = median,
"Mean" = mean,
"Q3" = ~quantile(., .75),
"Max" = max,
"sd" = sd,
"IQR" = IQR)
#> Variables N Min Q1 Med Mean Q3 Max sd IQR
#> 1 Sepal.Length 150 4.3 5.1 5.80 5.843333 6.4 7.9 0.8280661 1.3
#> 2 Sepal.Width 150 2.0 2.8 3.00 3.057333 3.3 4.4 0.4358663 0.5
#> 3 Petal.Length 150 1.0 1.6 4.35 3.758000 5.1 6.9 1.7652982 3.5
#> 4 Petal.Width 150 0.1 0.3 1.30 1.199333 1.8 2.5 0.7622377 1.5
#> 5 **Species** 150 NA NA NA NA NA NA NA NA
#> 6 **Species**: *setosa* 50 NA NA NA NA NA NA NA NA
#> 7 **Species**: *versicolor* 50 NA NA NA NA NA NA NA NA
#> 8 **Species**: *virginica* 50 NA NA NA NA NA NA NA NA
# With grouping on a factor
iris %>%
group_by(Species) %>%
desc_table(.auto = stats_auto)
#> # A tibble: 3 × 4
#> # Groups: Species [3]
#> Species data .stats .vars
#> <fct> <list> <list> <list>
#> 1 setosa <tibble [50 × 4]> <df [4 × 8]> <df [4 × 1]>
#> 2 versicolor <tibble [50 × 4]> <df [4 × 8]> <df [4 × 1]>
#> 3 virginica <tibble [50 × 4]> <df [4 × 8]> <df [4 × 1]>