MethCNA: a database for integrating genomic and epigenomic data in cancer
 
Home Search Browse Cluster Tutorial Contact  
  Tutorial  
 

1. Sample-level data query

2. Gene-level data query

3. Public data browsing

4. Data clustering

 
   
  1. Sample-level data query

 

 

MethCNA contains about 10,000 preprocessed tumor samples representing 37 cancer types. All data were collected from NCBI Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases. The sample ID can be inputted by users to explore data of interest.

 

 

 

Figure 1. The interface of sample-level data query.

 

 

 

(1) Input GEO sample ID to investigate the genomic and epigenomic alterations of specific tumor sample.
(2) The entire list of array and series accession IDs that were integrated in MethCNA.
(3) Parameters and thresholds to detect copy number alterations and hyper- or hypo-methylation.
(4) Set different colors to represent alteration data.

 
     
 

Figure 2. The genomic and epigenomic alterations of each chromosome.

 
 

(1) The general view of copy number alterations for the whole genome.
(2) The differentially methylated positions and copy number alterations for each chromosome.
(3) Input start and end genomic positions (Mb) to zoom in on particular region.
(4) The example of zoomed-in view of a particular region. It allows further investigation of a specific genomic region of interest.

 
 
top
  2. Gene-level data query
  Gene-level data query allows users access to gene-specific aberration information in selected datasets. Cancer types and data series can be chosen in the list box, and multiple gene names can be specified separated with a semicolon. There are seven gene regions and four CpG island (CGI) regions that can be selected for analysis.  
 

Figure 3. The interface of Gene-level data query.

 
 

(1) Select a cancer type.
(2) Select data series. The multiple selection list box shows available series IDs for the selected cancer type.
(3) Input gene symbols. Multiple gene symbols can be provided and separated by comma for batch processing.
(4) The entire list of genes that can be used for analysis.
(5) Gene-centric and CpG island regions selection. The NCBI Reference Sequence gene annotation and UCSC Genome Browser are used to define gene functional regions and CpG island regions.

 
     
 

Figure 4. The results of gene-level data query.

 
 

(1) The Beta value distribution of queried gene in different groups.
(2) Frequency of copy number alterations.
(3) Related links for detailed information.

 
     
  If the web interface does not return the beta value distribution of queried gene, there are three possible reasons:  
 

Figure 5. Pinpoint the problem that no probe is mapping to a specific gene.

 
  (1) The gene symbol is invalid or there is no probe representing this gene on the 450k array. In this way, users can click on the hyperlink to check the complete gene list of 450k array, and to check whether there are probes mapping to a specific gene.
(2) For a given gene, there is no probe covering the queried region. In this way, users can click on the hyperlink to obtain a detailed annotation of all probes mapping to the input gene, and to check whether there are probes mapping to specific regions of the gene.
(3) The desired probes were filtered out from the queried dataset due to poor quality. In this way, although the probe-level data is not available, the CNA data at gene-level will still be displayed if available.
 
     
top
  3. Public data browsing
 

There are more than 50 preprocessed GEO or TCGA data series integrated into MethCNA. Click on the study title and the related genomic and epigenomic alteration data will be displayed.

 
 

Figure 6. An example of the GEO data visualization interface.

 

(1) The general information of the selected data series.
(2) Click to show the detailed Differentially Methylated Regions (DMRs).
(3) The information of DMRs and genes for each chromosome.
(4) The frequency plot of copy number alterations for the selected data series.

 
top
  4. Data clustering
 

4.1 Methylation data clustering

The hierarchical clustering heatmap shows the DNA methylation patterns. Gene-level DNA methylation data clustering can be performed. The NCBI Reference Sequence gene annotation and UCSC Genome Browser are used to define gene functional regions and CpG island regions. The Euclidean distance was used for hierarchical clustering of samples.

 
 

Figure 7. The interface of gene-level DNA methylation data clustering.

 
  (1) Select a cancer type.
(2) Select data series. The multiple selection list box shows available series IDs for the selected cancer type.
(3) Input gene symbols. Multiple gene symbols can be provided and separated by comma.
(4) Gene-centric and CpG island regions selection. The NCBI Reference Sequence gene annotation and UCSC Genome Browser are used to define gene functional regions and CpG island regions.
 
 

Figure 8. The hierarchical clustering of DNA methylation data.

 
 

(1) The list of input genes.
(2) The list of tumor samples.
(3) The methylation profile clustering of queried genes among all the samples. The interactive clustering diagram can be zoomed in.

 
  4.2 Frequency plot of copy number alteration  
 

Figure 9. Selection of cancer type and dataset for copy number alteration frequency analysis.

 
  (1) Select a cancer type. The standard cancer names are obtained from International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3). For TCGA data, the TCGA tumor-type abbreviations are provided.
(2) Select data series. The multiple selection list box shows available series IDs for the selected cancer type.
 
     
 

Figure 10. The hierarchical clustering of DNA copy number alterations.

 
 

 

 
top
 
© 2017 Cai Laboratory.  Last updated on 30 December 2017