This function takes a data.frame as input, compares proportion of positive cases or mean measure in one subgroup and the remaining samples.

group_enrichment(
  df,
  grp_vars = NULL,
  enrich_vars = NULL,
  cross = TRUE,
  co_method = c("t.test", "wilcox.test")
)

Arguments

df

a data.frame.

grp_vars

character vector specifying group variables to split samples into subgroups (at least 2 subgroups, otherwise this variable will be skipped).

enrich_vars

character vector specifying measure variables to be compared. If variable is not numeric, only binary cases are accepted in the form of TRUE/FALSE or P/N (P for positive cases and N for negative cases). Of note, NA values set to negative cases.

cross

logical, default is TRUE, combine all situations provided by grp_vars and enrich_vars. For examples, c('A', 'B') and c('C', 'D') will construct 4 combinations(i.e. "AC", "AD", "BC" and "BD"). A variable can not be in both grp_vars and enrich_vars, such cases will be automatically drop. If FALSE, use pairwise combinations, see section "examples" for use cases.

co_method

test method for continuous variable, default is 't.test'.

Value

a data.table with following columns:

  • grp_var: group variable name.

  • enrich_var: enrich variable (variable to be compared) name.

  • grp1: the first group name, should be a member in grp_var column.

  • grp2: the remaining samples, marked as 'Rest'.

  • grp1_size: sample size for grp1.

  • grp1_pos_measure: for binary variable, it stores the proportion of positive cases in grp1; for continuous variable, it stores mean value.

  • grp2_size: sample size for grp2.

  • grp2_pos_measure: same as grp1_pos_measure but for grp2.

  • measure_observed: for binary variable, it stores odds ratio; for continuous variable, it stores scaled mean ratio.

  • measure_tested: only for binary variable, it stores estimated odds ratio and its 95% CI from fisher.test().

  • p_value: for binary variable, it stores p value from fisher.test(); for continuous variable, it stores value from wilcox.test() or t.test().

  • type: one of "binary" and "continuous".

  • method: one of "fish.test", "wilcox.test" and "t.test".

See also

Examples

set.seed(1234)
df <- dplyr::tibble(
  g1 = factor(abs(round(rnorm(99, 0, 1)))),
  g2 = rep(LETTERS[1:4], c(50, 40, 8, 1)),
  e1 = sample(c("P", "N"), 99, replace = TRUE),
  e2 = rnorm(99)
)

print(str(df))
print(head(df))

# Compare g1:e1, g1:e2, g2:e1 and g2:e2
x1 <- group_enrichment(df, grp_vars = c("g1", "g2"), enrich_vars = c("e1", "e2"))
x1

# Only compare g1:e1, g2:e2
x2 <- group_enrichment(df,
  grp_vars = c("g1", "g2"),
  enrich_vars = c("e1", "e2"),
  co_method = "wilcox.test",
  cross = FALSE
)
x2

# Visualization
p1 <- show_group_enrichment(x1, fill_by_p_value = TRUE)
p1
p2 <- show_group_enrichment(x1, fill_by_p_value = FALSE)
p2
p3 <- show_group_enrichment(x1, return_list = TRUE)
p3