Chapter 11 Datasets

11.1 Reference Annotation

sigminer stores many reference annotation datasets for internal calculation. It can be exported for other usage either by data() or get_genome_annotation().

Currently, there are the following datasets:

  • centromeres.hg19
  • centromeres.hg38
  • chromsize.hg19
  • chromsize.hg38
  • cytobands.hg19
  • cytobands.hg38

An example is given as below:

data("centromeres.hg19")
head(centromeres.hg19)
#>   chrom left.base right.base
#> 1  chr1 121535434  124535434
#> 2  chr2  92326171   95326171
#> 3  chr3  90504854   93504854
#> 4  chr4  49660117   52660117
#> 5  chr5  46405641   49405641
#> 6  chr6  58830166   61830166

get_genome_annotation() can better control the returned data.frame.

get_genome_annotation(
  data_type = "chr_size",
  chrs = c("chr1", "chr10", "chr20"),
  genome_build = "hg19"
)
#>   chrom      size
#> 1  chr1 249250621
#> 2 chr10 135534747
#> 3 chr20  63025520

More see ?get_genome_annotation.

11.2 Copy Number components setting

Dataset CN.features is a predefined component data table for identifying copy number signatures by method “Wang”. Users can define a custom table with similar structure and pass it to function like sig_tally().

Detail about how to generate this dataset can be viewed at https://github.com/ShixiangWang/sigminer/blob/master/data-raw/CN-features.R.

CN.features
#>     feature         component label  min max
#>  1:  BP10MB         BP10MB[0] point    0   0
#>  2:  BP10MB         BP10MB[1] point    1   1
#>  3:  BP10MB         BP10MB[2] point    2   2
#>  4:  BP10MB         BP10MB[3] point    3   3
#>  5:  BP10MB         BP10MB[4] point    4   4
#>  6:  BP10MB         BP10MB[5] point    5   5
#>  7:  BP10MB        BP10MB[>5] range    5 Inf
#>  8:   BPArm          BPArm[0] point    0   0
#>  9:   BPArm          BPArm[1] point    1   1
#> 10:   BPArm          BPArm[2] point    2   2
#> 11:   BPArm          BPArm[3] point    3   3
#> 12:   BPArm          BPArm[4] point    4   4
#> 13:   BPArm          BPArm[5] point    5   5
#> 14:   BPArm          BPArm[6] point    6   6
#> 15:   BPArm          BPArm[7] point    7   7
#> 16:   BPArm          BPArm[8] point    8   8
#> 17:   BPArm          BPArm[9] point    9   9
#> 18:   BPArm         BPArm[10] point   10  10
#> 19:   BPArm BPArm[>10 & <=20] range   10  20
#> 20:   BPArm BPArm[>20 & <=30] range   20  30
#> 21:   BPArm        BPArm[>30] range   30 Inf
#> 22:      CN             CN[0] point    0   0
#> 23:      CN             CN[1] point    1   1
#> 24:      CN             CN[2] point    2   2
#> 25:      CN             CN[3] point    3   3
#> 26:      CN             CN[4] point    4   4
#> 27:      CN      CN[>4 & <=8] range    4   8
#> 28:      CN            CN[>8] range    8 Inf
#> 29:    CNCP           CNCP[0] point    0   0
#> 30:    CNCP           CNCP[1] point    1   1
#> 31:    CNCP           CNCP[2] point    2   2
#> 32:    CNCP           CNCP[3] point    3   3
#> 33:    CNCP           CNCP[4] point    4   4
#> 34:    CNCP    CNCP[>4 & <=8] range    4   8
#> 35:    CNCP          CNCP[>8] range    8 Inf
#> 36:    OsCN           OsCN[0] point    0   0
#> 37:    OsCN           OsCN[1] point    1   1
#> 38:    OsCN           OsCN[2] point    2   2
#> 39:    OsCN           OsCN[3] point    3   3
#> 40:    OsCN           OsCN[4] point    4   4
#> 41:    OsCN   OsCN[>4 & <=10] range    4  10
#> 42:    OsCN         OsCN[>10] range   10 Inf
#> 43:      SS           SS[<=2] range -Inf   2
#> 44:      SS      SS[>2 & <=3] range    2   3
#> 45:      SS      SS[>3 & <=4] range    3   4
#> 46:      SS      SS[>4 & <=5] range    4   5
#> 47:      SS      SS[>5 & <=6] range    5   6
#> 48:      SS      SS[>6 & <=7] range    6   7
#> 49:      SS      SS[>7 & <=8] range    7   8
#> 50:      SS            SS[>8] range    8 Inf
#> 51:    NC50         NC50[<=2] range -Inf   2
#> 52:    NC50           NC50[3] point    3   3
#> 53:    NC50           NC50[4] point    4   4
#> 54:    NC50           NC50[5] point    5   5
#> 55:    NC50           NC50[6] point    6   6
#> 56:    NC50           NC50[7] point    7   7
#> 57:    NC50          NC50[>7] range    7 Inf
#> 58:   BoChr          BoChr[1] point    1   1
#> 59:   BoChr          BoChr[2] point    2   2
#> 60:   BoChr          BoChr[3] point    3   3
#> 61:   BoChr          BoChr[4] point    4   4
#> 62:   BoChr          BoChr[5] point    5   5
#> 63:   BoChr          BoChr[6] point    6   6
#> 64:   BoChr          BoChr[7] point    7   7
#> 65:   BoChr          BoChr[8] point    8   8
#> 66:   BoChr          BoChr[9] point    9   9
#> 67:   BoChr         BoChr[10] point   10  10
#> 68:   BoChr         BoChr[11] point   11  11
#> 69:   BoChr         BoChr[12] point   12  12
#> 70:   BoChr         BoChr[13] point   13  13
#> 71:   BoChr         BoChr[14] point   14  14
#> 72:   BoChr         BoChr[15] point   15  15
#> 73:   BoChr         BoChr[16] point   16  16
#> 74:   BoChr         BoChr[17] point   17  17
#> 75:   BoChr         BoChr[18] point   18  18
#> 76:   BoChr         BoChr[19] point   19  19
#> 77:   BoChr         BoChr[20] point   20  20
#> 78:   BoChr         BoChr[21] point   21  21
#> 79:   BoChr         BoChr[22] point   22  22
#> 80:   BoChr         BoChr[23] point   23  23
#>     feature         component label  min max