desctable usage vignette (deprecated)
Source:vignettes/desctable_deprecated.Rmd
desctable_deprecated.Rmd
Desctable is a comprehensive descriptive and comparative tables generator for R.
Every person doing data analysis has to create tables for descriptive summaries of data (a.k.a. Table.1), or comparative tables.
Many packages, such as the aptly named tableone,
address this issue. However, they often include hard-coded behaviors,
have outputs not easily manipulable with standard R tools, or their
syntax are out-of-style (e.g. the argument order makes them difficult to
use with the pipe (%>%
)).
Enter desctable, a package built with the following objectives in mind:
- generate descriptive and comparative statistics tables with nesting
- keep the syntax as simple as possible
- have good reasonable defaults
- be entirely customizable, using standard R tools and functions
- produce the simplest (as a data structure) output possible
- provide helpers for different outputs
- integrate with “modern” R usage, and the tidyverse set of tools
- apply functional paradigms
Descriptive tables
Simple usage
desctable uses and exports the pipe
(%>%
) operator (from packages magrittr
and dplyr fame), though it is not mandatory to use
it.
The single interface to the package is its eponymous
desctable
function.
When used on a data.frame, it returns a descriptive table:
## N % Min Q1 Med Mean Q3 Max sd IQR
## 1 Sepal.Length 150 NA 4.3 5.1 5.80 5.843333 6.4 7.9 0.8280661 1.3
## 2 Sepal.Width 150 NA 2.0 2.8 3.00 3.057333 3.3 4.4 0.4358663 0.5
## 3 Petal.Length 150 NA 1.0 1.6 4.35 3.758000 5.1 6.9 1.7652982 3.5
## 4 Petal.Width 150 NA 0.1 0.3 1.30 1.199333 1.8 2.5 0.7622377 1.5
## 5 Species 150 NA NA NA NA NA NA NA NA NA
## 6 Species: setosa 50 33.33333 NA NA NA NA NA NA NA NA
## 7 Species: versicolor 50 33.33333 NA NA NA NA NA NA NA NA
## 8 Species: virginica 50 33.33333 NA NA NA NA NA NA NA NA
desctable(mtcars)
## Min Q1 Med Mean Q3 Max sd
## 1 mpg 10.400 15.42500 19.200 20.090625 22.80 33.900 6.0269481
## 2 cyl 4.000 4.00000 6.000 6.187500 8.00 8.000 1.7859216
## 3 disp 71.100 120.82500 196.300 230.721875 326.00 472.000 123.9386938
## 4 hp 52.000 96.50000 123.000 146.687500 180.00 335.000 68.5628685
## 5 drat 2.760 3.08000 3.695 3.596563 3.92 4.930 0.5346787
## 6 wt 1.513 2.58125 3.325 3.217250 3.61 5.424 0.9784574
## 7 qsec 14.500 16.89250 17.710 17.848750 18.90 22.900 1.7869432
## 8 vs 0.000 0.00000 0.000 0.437500 1.00 1.000 0.5040161
## 9 am 0.000 0.00000 0.000 0.406250 1.00 1.000 0.4989909
## 10 gear 3.000 3.00000 4.000 3.687500 4.00 5.000 0.7378041
## 11 carb 1.000 2.00000 2.000 2.812500 4.00 8.000 1.6152000
## IQR
## 1 7.37500
## 2 4.00000
## 3 205.17500
## 4 83.50000
## 5 0.84000
## 6 1.02875
## 7 2.00750
## 8 1.00000
## 9 1.00000
## 10 1.00000
## 11 2.00000
As you can see with these two examples, desctable
describes every variable, with individual levels for factors. It picks
statistical functions depending on the type and distribution of the
variables in the data, and applies those statistical functions only on
the relevant variables.
Output
The object produced by desctable
is in fact a list of
data.frames, with a “desctable” class.
Methods for reduction to a simple dataframe (as.data.frame
,
automatically used for printing), conversion to markdown
(pander
), and interactive html output with
DT (datatable
) are provided:
N | % | Min | Q1 | Med | Mean | Q3 | Max | sd | IQR | |
---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length | 150 | 4.3 | 5.1 | 5.8 | 5.8 | 6.4 | 7.9 | 0.83 | 1.3 | |
Sepal.Width | 150 | 2 | 2.8 | 3 | 3.1 | 3.3 | 4.4 | 0.44 | 0.5 | |
Petal.Length | 150 | 1 | 1.6 | 4.3 | 3.8 | 5.1 | 6.9 | 1.8 | 3.5 | |
Petal.Width | 150 | 0.1 | 0.3 | 1.3 | 1.2 | 1.8 | 2.5 | 0.76 | 1.5 | |
Species | 150 | |||||||||
setosa | 50 | 33 | ||||||||
versicolor | 50 | 33 | ||||||||
virginica | 50 | 33 |
To use pander
you need to load the package yourself.
Calls to pander
and datatable
with
“regular” dataframes will not be affected by the defaults used in the
package, and you can modify these defaults for
desctable objects.
The datatable
wrapper function for desctable objects
comes with some default options and formatting such as freezing the row
names and table header, export buttons, and rounding of values. Both
pander
and datatable
wrapper take a
digits argument to set the number of decimals to show.
(pander
uses the digits, justify and
missing arguments of pandoc.table
, whereas
datatable
calls prettyNum
with the
digits
parameter, and removes NA
values. You
can set digits = NULL
if you want the full table and format
it yourself)
Subsequent outputs in this vignette will use DT.
Advanced usage
desctable
automatically chooses statistical functions if
none is provided, using the following algorithm:
- always show N
- if there are factors, show %
- if there are normally distributed variables, show Mean and SD
- if there are non-normally distributed variables, show Median and IQR
For each variable in the table, compute the relevant statistical
functions in that list (non-applicable functions will safely return
NA
).
You can specify the statistical functions yourself with the stats argument. This argument can either be:
- a function for automatic selection of appropriate statistical functions, depending on the data
- a named list of functions/formulas
The functions/formulas leverage the tidyverse way of working with anonymous functions, i.e.:
If a function, is is used as is. If a formula,
e.g. ‘~ .x + 1’ or ~ . + 1
, it is converted to a function.
There are three ways to refer to the arguments:
- For a single argument function, use ‘.’
- For a two argument function, use ‘.x’ and ‘.y’
- For more arguments, use ‘..1’, ‘..2’, ‘..3’ etc
This syntax allows you to create very compact anonymous functions,
and is the same as in the map
family of functions from
purrr.
Conditional formulas (condition ~ if_T | if F
)
from previous versions are no longer supported!
Automatic function
The default value for the stats argument is
stats_auto
, provided in the package.
Several other “automatic statistical functions” are defined in this
package: stats_auto
, stats_default
,
stats_normal
, stats_nonnormal
.
You can also provide your own automatic function, which needs to
- accept a dataframe as its argument (whether to use this dataframe or not in the function is your choice), and
- return a named list of statistical functions to use, as defined in the subsequent paragraphs.
# Strictly equivalent to iris %>% desctable() %>% datatable()
iris %>%
desctable(stats = stats_auto) %>%
datatable()
For reference, here is the body of the stats_auto
function in the package:
## function (data)
## {
## numeric <- data %>% lapply(is.numeric) %>% unlist() %>% any
## fact <- data %>% lapply(is.factor) %>% unlist() %>% any()
## stats <- list(Min = min, Q1 = ~quantile(., 0.25), Med = stats::median,
## Mean = mean, Q3 = ~quantile(., 0.75), Max = max, sd = stats::sd,
## IQR = IQR)
## if (fact & numeric)
## c(list(N = length, `%` = percent), stats)
## else if (fact & !numeric)
## list(N = length, `%` = percent)
## else if (!fact & numeric)
## stats
## }
## <bytecode: 0x5647791c1690>
## <environment: namespace:desctable>
Statistical functions
Statistical functions can be any function defined in
R that you want to use, such as length
or
mean
.
The only condition is that they return a single numerical value. One
exception is when they return a vector of length
1 + nlevels(x)
when applied to factors, as is needed for
the percent
function.
As mentioned above, they need to be used inside a named list, such as
The names will be used as column headers in the resulting table, and
the functions will be applied safely on the variables (errors return
NA
, and for factors the function will be used on individual
levels).
Several convenience functions are included in this package.
-
percent
, which prints percentages of levels in a factor -
IQR
, which re-implementsstats::IQR
but works better withNA
values -
is.normal
, which tests for normality using the following method:length(na.omit(x)) > 30 & shapiro.test(x)$p.value > .1
Be aware that all functions will be used on variables
stripped of their NA
values! This is necessary for
most statistical functions to be useful, and makes N
(length
) show only the number of observations in the
dataset for each variable.
Labels
It is often the case that variable names are not “pretty” enough to
be used as-is in a table.
Although you could still edit the variable labels in the table
afterwards using sub-setting or string replacement functions, we provide
a facility for this using the labels argument.
The labels argument is a named character vector
associating variable names and labels.
You don’t need to provide labels for all the variables, and extra labels
will be silently discarded. This allows you to define a “global” labels
vector and use it for multiple tables even after variable
selections.
mtlabels <- c(mpg = "Miles/(US) gallon",
cyl = "Number of cylinders",
disp = "Displacement (cu.in.)",
hp = "Gross horsepower",
drat = "Rear axle ratio",
wt = "Weight (1000 lbs)",
qsec = "¼ mile time",
vs = "V/S",
am = "Transmission",
gear = "Number of forward gears",
carb = "Number of carburetors")
mtcars %>%
dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
desctable(labels = mtlabels) %>%
datatable()
Comparative tables
Simple usage
Creating a comparative table (between groups defined by a factor)
using desctable
is as easy as creating a descriptive
table.
It leverages the group_by
function from
dplyr:
## Species: setosa (n=50) / Min Species: setosa (n=50) / Q1
## 1 Sepal.Length 4.3 4.8
## 2 Sepal.Width 2.3 3.2
## 3 Petal.Length 1.0 1.4
## 4 Petal.Width 0.1 0.2
## Species: setosa (n=50) / Med Species: setosa (n=50) / Mean
## 1 5.0 5.006
## 2 3.4 3.428
## 3 1.5 1.462
## 4 0.2 0.246
## Species: setosa (n=50) / Q3 Species: setosa (n=50) / Max
## 1 5.200 5.8
## 2 3.675 4.4
## 3 1.575 1.9
## 4 0.300 0.6
## Species: setosa (n=50) / sd Species: setosa (n=50) / IQR
## 1 0.3524897 0.400
## 2 0.3790644 0.475
## 3 0.1736640 0.175
## 4 0.1053856 0.100
## Species: versicolor (n=50) / Min Species: versicolor (n=50) / Q1
## 1 4.9 5.600
## 2 2.0 2.525
## 3 3.0 4.000
## 4 1.0 1.200
## Species: versicolor (n=50) / Med Species: versicolor (n=50) / Mean
## 1 5.90 5.936
## 2 2.80 2.770
## 3 4.35 4.260
## 4 1.30 1.326
## Species: versicolor (n=50) / Q3 Species: versicolor (n=50) / Max
## 1 6.3 7.0
## 2 3.0 3.4
## 3 4.6 5.1
## 4 1.5 1.8
## Species: versicolor (n=50) / sd Species: versicolor (n=50) / IQR
## 1 0.5161711 0.700
## 2 0.3137983 0.475
## 3 0.4699110 0.600
## 4 0.1977527 0.300
## Species: virginica (n=50) / Min Species: virginica (n=50) / Q1
## 1 4.9 6.225
## 2 2.2 2.800
## 3 4.5 5.100
## 4 1.4 1.800
## Species: virginica (n=50) / Med Species: virginica (n=50) / Mean
## 1 6.50 6.588
## 2 3.00 2.974
## 3 5.55 5.552
## 4 2.00 2.026
## Species: virginica (n=50) / Q3 Species: virginica (n=50) / Max
## 1 6.900 7.9
## 2 3.175 3.8
## 3 5.875 6.9
## 4 2.300 2.5
## Species: virginica (n=50) / sd Species: virginica (n=50) / IQR tests / p
## 1 0.6358796 0.675 8.918734e-22
## 2 0.3224966 0.375 1.569282e-14
## 3 0.5518947 0.775 4.803974e-29
## 4 0.2746501 0.500 3.261796e-29
## tests / test
## 1 kruskal.test
## 2 kruskal.test
## 3 kruskal.test
## 4 kruskal.test
The result is a table containing a descriptive sub-table for each level of the grouping factor (the statistical functions rules are applied to each sub-table independently), with the statistical tests performed, and their p values.
When displayed as a flat dataframe, the grouping header appears in each variable name.
You can also see the grouping headers by inspecting the resulting object, which is a nested list of dataframes, each dataframe being named after the grouping factor and its levels (with sample size for each).
str(iris_by_Species)
## List of 5
## $ Variables :'data.frame': 4 obs. of 1 variable:
## ..$ Variables: chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## $ Species: setosa (n=50) :'data.frame': 4 obs. of 8 variables:
## ..$ Min : num [1:4] 4.3 2.3 1 0.1
## ..$ Q1 : num [1:4] 4.8 3.2 1.4 0.2
## ..$ Med : num [1:4] 5 3.4 1.5 0.2
## ..$ Mean: num [1:4] 5.006 3.428 1.462 0.246
## ..$ Q3 : num [1:4] 5.2 3.68 1.58 0.3
## ..$ Max : num [1:4] 5.8 4.4 1.9 0.6
## ..$ sd : num [1:4] 0.352 0.379 0.174 0.105
## ..$ IQR : num [1:4] 0.4 0.475 0.175 0.1
## $ Species: versicolor (n=50):'data.frame': 4 obs. of 8 variables:
## ..$ Min : num [1:4] 4.9 2 3 1
## ..$ Q1 : num [1:4] 5.6 2.52 4 1.2
## ..$ Med : num [1:4] 5.9 2.8 4.35 1.3
## ..$ Mean: num [1:4] 5.94 2.77 4.26 1.33
## ..$ Q3 : num [1:4] 6.3 3 4.6 1.5
## ..$ Max : num [1:4] 7 3.4 5.1 1.8
## ..$ sd : num [1:4] 0.516 0.314 0.47 0.198
## ..$ IQR : num [1:4] 0.7 0.475 0.6 0.3
## $ Species: virginica (n=50) :'data.frame': 4 obs. of 8 variables:
## ..$ Min : num [1:4] 4.9 2.2 4.5 1.4
## ..$ Q1 : num [1:4] 6.23 2.8 5.1 1.8
## ..$ Med : num [1:4] 6.5 3 5.55 2
## ..$ Mean: num [1:4] 6.59 2.97 5.55 2.03
## ..$ Q3 : num [1:4] 6.9 3.18 5.88 2.3
## ..$ Max : num [1:4] 7.9 3.8 6.9 2.5
## ..$ sd : num [1:4] 0.636 0.322 0.552 0.275
## ..$ IQR : num [1:4] 0.675 0.375 0.775 0.5
## $ tests :'data.frame': 4 obs. of 2 variables:
## ..$ p : num [1:4] 8.92e-22 1.57e-14 4.80e-29 3.26e-29
## ..$ test: chr [1:4] "kruskal.test" "kruskal.test" "kruskal.test" "kruskal.test"
## - attr(*, "class")= chr "desctable"
You can specify groups based on any variable, not only factors:
cyl: 4 (n=11) Min |
Q1 |
Med |
Mean |
Q3 |
Max |
sd |
IQR |
cyl: 6 (n=7) Min |
Q1 |
Med |
Mean |
Q3 |
Max |
sd |
IQR |
cyl: 8 (n=14) Min |
Q1 |
Med |
Mean |
Q3 |
Max |
sd |
IQR |
tests p |
test |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mpg | 21 | 23 | 26 | 27 | 30 | 34 | 4.5 | 7.6 | 18 | 19 | 20 | 20 | 21 | 21 | 1.5 | 2.4 | 10 | 14 | 15 | 15 | 16 | 19 | 2.6 | 1.8 | 2.6e-06 | kruskal.test |
disp | 71 | 79 | 108 | 105 | 121 | 147 | 27 | 42 | 145 | 160 | 168 | 183 | 196 | 258 | 42 | 36 | 276 | 302 | 350 | 353 | 390 | 472 | 68 | 88 | 1.6e-06 | kruskal.test |
hp | 52 | 66 | 91 | 83 | 96 | 113 | 21 | 30 | 105 | 110 | 110 | 122 | 123 | 175 | 24 | 13 | 150 | 176 | 192 | 209 | 241 | 335 | 51 | 65 | 3.3e-06 | kruskal.test |
drat | 3.7 | 3.8 | 4.1 | 4.1 | 4.2 | 4.9 | 0.37 | 0.35 | 2.8 | 3.4 | 3.9 | 3.6 | 3.9 | 3.9 | 0.48 | 0.56 | 2.8 | 3.1 | 3.1 | 3.2 | 3.2 | 4.2 | 0.37 | 0.15 | 0.00075 | kruskal.test |
wt | 1.5 | 1.9 | 2.2 | 2.3 | 2.6 | 3.2 | 0.57 | 0.74 | 2.6 | 2.8 | 3.2 | 3.1 | 3.4 | 3.5 | 0.36 | 0.62 | 3.2 | 3.5 | 3.8 | 4 | 4 | 5.4 | 0.76 | 0.48 | 1.1e-05 | kruskal.test |
qsec | 17 | 19 | 19 | 19 | 20 | 23 | 1.7 | 1.4 | 16 | 17 | 18 | 18 | 19 | 20 | 1.7 | 2.4 | 14 | 16 | 17 | 17 | 18 | 18 | 1.2 | 1.5 | 0.0062 | kruskal.test |
vs | 0 | 1 | 1 | 0.91 | 1 | 1 | 0.3 | 0 | 0 | 0 | 1 | 0.57 | 1 | 1 | 0.53 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3.2e-05 | kruskal.test |
am | 0 | 0.5 | 1 | 0.73 | 1 | 1 | 0.47 | 0.5 | 0 | 0 | 0 | 0.43 | 1 | 1 | 0.53 | 1 | 0 | 0 | 0 | 0.14 | 0 | 1 | 0.36 | 0 | 0.014 | kruskal.test |
gear | 3 | 4 | 4 | 4.1 | 4 | 5 | 0.54 | 0 | 3 | 3.5 | 4 | 3.9 | 4 | 5 | 0.69 | 0.5 | 3 | 3 | 3 | 3.3 | 3 | 5 | 0.73 | 0 | 0.0062 | kruskal.test |
carb | 1 | 1 | 2 | 1.5 | 2 | 2 | 0.52 | 1 | 1 | 2.5 | 4 | 3.4 | 4 | 6 | 1.8 | 1.5 | 2 | 2.2 | 3.5 | 3.5 | 4 | 8 | 1.6 | 1.8 | 0.0017 | kruskal.test |
You can also specify groups based on an expression
Multiple nested groups are also possible:
mtcars %>%
dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
group_by(vs, am, cyl) %>%
desctable() %>%
datatable()
In the case of nested groups (a.k.a. sub-group analysis), statistical tests are performed only between the groups of the deepest grouping level.
Statistical tests are automatically selected depending on the data and the grouping factor.
Advanced usage
desctable
automatically chooses statistical functions if
none is provided, using the following algorithm:
- if the variable is a factor, use
fisher.test
- if the grouping factor has only one level, use the provided
no.test
(which does nothing) - if the grouping factor has two levels
- and the variable presents homoskedasticity (p value for
var.test
> .1) and normality of distribution in both groups, uset.test(var.equal = T)
- and the variable does not present homoskedasticity (p value for
var.test
< .1) but normality of distribution in both groups, uset.test(var.equal = F)
- else use
wilcox.test
- and the variable presents homoskedasticity (p value for
- if the grouping factor has more than two levels
- and the variable presents homoskedasticity (p value for
bartlett.test
> .1) and normality of distribution in all groups, useoneway.test(var.equal = T)
- and the variable does not present homoskedasticity (p value for
bartlett.test
< .1) but normality of distribution in all groups, useoneway.test(var.equal = F)
- else use
kruskal.test
- and the variable presents homoskedasticity (p value for
You can specify the statistical test functions yourself with the tests argument. This argument can either be:
- a function for automatic selection of appropriate statistical test functions, depending on the data
- a named list of statistical test functions
Please note that the statistical test functions must be given as formulas so as to capture the name of the test to display in the table. purrr style formulas are also actepted, as with the statistical functions. This also allows to specify optional arguments of such functions, and go around non-standard test functions (see Statistical test functions).
Automatic function
The default value for the tests argument is
tests_auto
, provided in the package.
You can also provide your own automatic function, which needs to
- accept a variable and a grouping factor as its arguments, and
- return a single-term formula containing a statistical test function.
This function will be used on every variable and every grouping factor to determine the appropriate test.
# Strictly equivalent to iris %>% group_by(Species) %>% desctable() %>% datatable()
iris %>%
group_by(Species) %>%
desctable(tests = tests_auto) %>%
datatable()
For reference, here is the body of the tests_auto
function in the package:
## function (var, grp)
## {
## grp <- factor(grp)
## if (nlevels(grp) < 2)
## ~no.test
## else if (is.factor(var)) {
## if (tryCatch(is.numeric(fisher.test(var ~ grp)$p.value),
## error = function(e) F))
## ~fisher.test
## else ~chisq.test
## }
## else if (nlevels(grp) == 2)
## ~wilcox.test
## else ~kruskal.test
## }
## <bytecode: 0x5647758255c0>
## <environment: namespace:desctable>
Statistical test functions
You can provide a named list of statistical functions, but here the mechanism is a bit different from the stats argument.
The list must contain either .auto
or
.default
.
-
.auto
needs to be an automatic function, such astests_auto
. It will be used by default on all variables to select a test -
.default
needs to be a single-term formula containing a statistical test function that will be used on all variables
You can also provide overrides to use specific tests for specific
variables.
This is done using list items named as the variable and containing a
single-term formula function.
iris %>%
group_by(Petal.Length > 5) %>%
desctable(tests = list(.auto = tests_auto,
Species = ~chisq.test)) %>%
datatable()
mtcars %>%
dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
group_by(am) %>%
desctable(tests = list(.default = ~wilcox.test,
mpg = ~t.test)) %>%
datatable()
Here’s an example of purrr style function:
iris %>%
group_by(Petal.Length > 5) %>%
desctable(tests = list(.auto = tests_auto,
Petal.Width = ~oneway.test(., var.equal = T)))
## Petal.Length > 5: FALSE (n=108) / N
## 1 Sepal.Length 108
## 2 Sepal.Width 108
## 3 Petal.Length 108
## 4 Petal.Width 108
## 5 Species 108
## 6 Species: setosa 50
## 7 Species: versicolor 49
## 8 Species: virginica 9
## Petal.Length > 5: FALSE (n=108) / % Petal.Length > 5: FALSE (n=108) / Min
## 1 NA 4.3
## 2 NA 2.0
## 3 NA 1.0
## 4 NA 0.1
## 5 NA NA
## 6 46.296296 NA
## 7 45.370370 NA
## 8 8.333333 NA
## Petal.Length > 5: FALSE (n=108) / Q1 Petal.Length > 5: FALSE (n=108) / Med
## 1 5.0 5.5
## 2 2.8 3.0
## 3 1.5 3.5
## 4 0.2 1.0
## 5 NA NA
## 6 NA NA
## 7 NA NA
## 8 NA NA
## Petal.Length > 5: FALSE (n=108) / Mean Petal.Length > 5: FALSE (n=108) / Q3
## 1 5.5018519 6.0
## 2 3.0666667 3.4
## 3 3.0074074 4.5
## 4 0.8638889 1.4
## 5 NA NA
## 6 NA NA
## 7 NA NA
## 8 NA NA
## Petal.Length > 5: FALSE (n=108) / Max Petal.Length > 5: FALSE (n=108) / sd
## 1 7.0 0.6386290
## 2 4.4 0.4800701
## 3 5.0 1.4885673
## 4 2.0 0.6110292
## 5 NA NA
## 6 NA NA
## 7 NA NA
## 8 NA NA
## Petal.Length > 5: FALSE (n=108) / IQR Petal.Length > 5: TRUE (n=42) / N
## 1 1.0 42
## 2 0.6 42
## 3 3.0 42
## 4 1.2 42
## 5 NA 42
## 6 NA 0
## 7 NA 1
## 8 NA 41
## Petal.Length > 5: TRUE (n=42) / % Petal.Length > 5: TRUE (n=42) / Min
## 1 NA 5.8
## 2 NA 2.5
## 3 NA 5.1
## 4 NA 1.4
## 5 NA NA
## 6 0.000000 NA
## 7 2.380952 NA
## 8 97.619048 NA
## Petal.Length > 5: TRUE (n=42) / Q1 Petal.Length > 5: TRUE (n=42) / Med
## 1 6.325 6.7
## 2 2.800 3.0
## 3 5.300 5.6
## 4 1.825 2.1
## 5 NA NA
## 6 NA NA
## 7 NA NA
## 8 NA NA
## Petal.Length > 5: TRUE (n=42) / Mean Petal.Length > 5: TRUE (n=42) / Q3
## 1 6.721429 7.175
## 2 3.033333 3.200
## 3 5.688095 5.975
## 4 2.061905 2.300
## 5 NA NA
## 6 NA NA
## 7 NA NA
## 8 NA NA
## Petal.Length > 5: TRUE (n=42) / Max Petal.Length > 5: TRUE (n=42) / sd
## 1 7.9 0.5748958
## 2 3.8 0.2968671
## 3 6.9 0.4919857
## 4 2.5 0.2802023
## 5 NA NA
## 6 NA NA
## 7 NA NA
## 8 NA NA
## Petal.Length > 5: TRUE (n=42) / IQR tests / p
## 1 0.850 1.553676e-15
## 2 0.400 6.927432e-01
## 3 0.675 2.076978e-21
## 4 0.475 3.982443e-24
## 5 NA 2.453675e-26
## 6 NA NA
## 7 NA NA
## 8 NA NA
## tests / test
## 1 wilcox.test
## 2 wilcox.test
## 3 wilcox.test
## 4 oneway.test(., var.equal = T)
## 5 fisher.test
## 6 <NA>
## 7 <NA>
## 8 <NA>
As with statistical functions, any statistical test function defined in R can be used.
The conditions are that the function
- accepts a formula (
variable ~ grouping_variable
) as a first positional argument (as is the case with most tests, liket.test
), and - returns an object with a
p.value
element.
Several convenience function are provided: formula versions for
chisq.test
and fisher.test
using generic S3
methods (thus the behavior of standard calls to chisq.test
and fisher.test
are not modified), and ANOVA
,
a partial application of oneway.test
with parameter
var.equal = T.