Chapter 10 Group Analysis and Visualization

Group analysis is a common task in cancer study. Sigminer supports dividing samples into multiple groups and comparing genotype/phenotype feature measures.

10.1 Group Generation

There are multiple methods to generate groups, including ‘consensus’ (default, can be only used by result from sig_extract()), ‘k-means’ etc. After determining groups, sigminer will assign each group to a signature with maximum fraction. We may say a group is Sig_x enriched.

mt_grps <- get_groups(mt_sig, method = "consensus", match_consensus = TRUE)
#> [36mℹ[39m [2020-10-09 00:03:46]: Started.
#> [32m✓[39m [2020-10-09 00:03:46]: 'Signature' object detected.
#> [36mℹ[39m [2020-10-09 00:03:46]: Obtaining clusters from the hierarchical clustering of the consensus matrix...
#> [36mℹ[39m [2020-10-09 00:03:46]: Finding the dominant signature of each group...
#> => Generating a table of group and dominant signature:
#>    
#>     Sig1 Sig2 Sig3 Sig4 Sig5
#>   1   14    0    0    0    0
#>   2    0    2    4    0    0
#>   3    1    0    8   15    0
#>   4    0    2    0    0    8
#>   5    0   42    3    1    0
#> => Assigning a group to a signature with the maxium fraction (stored in 'map_table' attr)...
#> [36mℹ[39m [2020-10-09 00:03:46]: Summarizing...
#> 	group #1: 14 samples with Sig1 enriched.
#> 	group #2: 6 samples with Sig3 enriched.
#> 	group #3: 24 samples with Sig4 enriched.
#> 	group #4: 10 samples with Sig5 enriched.
#> 	group #5: 46 samples with Sig2 enriched.
#> [33m![39m [2020-10-09 00:03:46]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> [36mℹ[39m [2020-10-09 00:03:46]: 0.088 secs elapsed.
head(mt_grps)
#>                          sample group silhouette_width enrich_sig
#> 1: TCGA-PE-A5DD-01A-12D-A27P-09     1            0.572       Sig1
#> 2: TCGA-D8-A1JJ-01A-31D-A14K-09     1            0.279       Sig1
#> 3: TCGA-BH-A18K-01A-11D-A12B-09     1            0.125       Sig1
#> 4: TCGA-AC-A2FO-01A-11D-A17W-09     1            0.259       Sig1
#> 5: TCGA-E2-A1IH-01A-11D-A188-09     1            0.535       Sig1
#> 6: TCGA-E2-A152-01A-11D-A12B-09     1            0.482       Sig1

The returned sample orders match sample orders in clustered consensus matrix.

show_sig_consensusmap(mt_sig)

Sometimes, the mapping between groups and enriched signatures may not right. Users should check it and even correct it manually.

attr(mt_grps, "map_table")
#>    
#>     Sig1 Sig2 Sig3 Sig4 Sig5
#>   1   14    0    0    0    0
#>   2    0    2    4    0    0
#>   3    1    0    8   15    0
#>   4    0    2    0    0    8
#>   5    0   42    3    1    0

10.2 Group Comparison Analysis

load(system.file("extdata", "toy_copynumber_signature_by_M.RData",
  package = "sigminer", mustWork = TRUE
))

# Assign samples to clusters
groups <- get_groups(sig, method = "k-means")
#> [36mℹ[39m [2020-10-09 00:03:46]: Started.
#> [32m✓[39m [2020-10-09 00:03:46]: 'Signature' object detected.
#> [36mℹ[39m [2020-10-09 00:03:46]: Running k-means with 2 clusters...
#> [36mℹ[39m [2020-10-09 00:03:46]: Generating a table of group and signature contribution (stored in 'map_table' attr):
#>          Sig1      Sig2
#> 1 0.003428449 0.9965716
#> 2 0.031799383 0.9682006
#> [36mℹ[39m [2020-10-09 00:03:46]: Assigning a group to a signature with the maximum fraction...
#> [36mℹ[39m [2020-10-09 00:03:46]: Summarizing...
#> 	group #1: 6 samples with Sig2 enriched.
#> 	group #2: 4 samples with Sig2 enriched.
#> [33m![39m [2020-10-09 00:03:46]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> [36mℹ[39m [2020-10-09 00:03:46]: 0.042 secs elapsed.

set.seed(1234)

groups$prob <- rnorm(10)
groups$new_group <- sample(c("1", "2", "3", "4", NA), size = nrow(groups), replace = TRUE)

# Compare groups (filter NAs for categorical coloumns)
groups.cmp <- get_group_comparison(groups[, -1],
  col_group = "group",
  cols_to_compare = c("prob", "new_group"),
  type = c("co", "ca"), verbose = TRUE
)
#> Treat prob as continuous variable.
#> Treat new_group as categorical variable.

# Compare groups (Set NAs of categorical columns to 'Rest')
groups.cmp2 <- get_group_comparison(groups[, -1],
  col_group = "group",
  cols_to_compare = c("prob", "new_group"),
  type = c("co", "ca"), NAs = "Rest", verbose = TRUE
)
#> Treat prob as continuous variable.
#> Treat new_group as categorical variable.

10.3 Group Visualization

ggcomp <- show_group_comparison(groups.cmp2)
ggcomp$co_comb

ggcomp$ca_comb