Genomic Regions by Case Type

Emedgene uses regional information to process genomic data in a variety of ways. Regional information influences the variants presented in gene panels, exomes, and genomes, variant quality and variant annotations.

V35.0 and up:

Case TypeRegions of Interest BEDDefault Region of Interest BEDQC BED

Research Genome

Any customer-selected BED

None

Optional

Whole Genome

Any customer-selected BED

Full Genes

Optional

Exome

Any customer-selected BED

Clinical Regions

Optional

Custom Panel

Any customer-selected BED

Clinical Regions

Optional

V35.0+ Regions of Interest BEDs are applied as follows:

  • Research Genome: Customer can utilize any BED to restrict analysis. By default, there is no intersection and all variants are viewable (Note: data pipeline time will increase, and intergenic variants have very limited annotations).

  • Whole Genome: Customer can utilize any BED to restrict analysis. If no custom BED is selected, the default BED selected will be the "Full Genes" BED file (see below for more details), and only variants contained within this BED will be presented. The exception is CNV variants, which are always fully present.

  • Exome: Customer can utilize any BED to restrict analysis. If no custom BED is selected, variants presented in the output will be those included in the Emedgene BED file defining the "Clinical Regions" as described below.

  • Custom Panel: Customer can utilize any BED to restrict analysis, and also upload a separate BED file for QC. When the Panel BED file is provided in Gene Panels, variants processed from a VCF/FASTQ will be only those located within the ranges of the selected panel BED file. Typically the BED files are unique to each enrichment kit panel. If no kit BED is available, the 'Clinical Regions' BED will be used.

Management of custom BED files is conveniently located in the Settings page, in the BED files card


Prior to V35.0:

Case TypeRegions of Interest BEDQC BED

Research Genome

None

Optional

Whole Genome

Full Genes

Optional

Exome

Clinical Regions

Optional

Custom Panel

Clinical Regions OR custom BED uploaded to kit. If a custom BED was uploaded to a kit, it will be used.

Optional (same as regions of interest BED if one was used).

<V35.0 Regions of Interest BEDs are applied as follows:

  • Research Genome: No intersection, all variants are viewable (Note: data pipeline time will increase, and intergenic variants have very limited annotations).

  • Whole Genome: Only variants contained within the "Full Genes" BED file will be displayed (see below for more details). With the exception of CNV variants, which are always fully present.

  • Exome: Variants presented in the output will be those included in the Emedgene BED file defining the "Clinical Regions" as described below.

  • Custom Panel: When the Panel BED file is provided in Gene Panels, variants processed from a VCF/FASTQ will be only those located within the ranges of the selected panel BED file. BED files are usually provided by the customer and are unique to each enrichment kit panel.


Following are the latest regional BEDs in Emedgene platform that were designed based on the indicated sources for GRCh37 and GRCh38:

Clinical Regions

This is a comprehensive bed file that includes every clinically relevant region. The following are included:

  • “RefSeq Curated” and “GENCODE” regions with flanking areas of 50bp from each side 5UTR and 3UTR region for protein coding genes (based on RefSeq)

  • OMIM disease-related RNA genes (50bp flanking)

  • All Clinvar Pathogenic variants regions (flanking 50bp)

  • Promoters region (EPDnew human version 006, flanking 50bp)

  • Known STR regions (Dragen 4.0 specification file)

  • All microRNA genes (flanking 50bp based on HGNC)

  • Full mtDNA region

For consistency, the GRCh38 version includes the lifted over regions of GRCh37 (liftover using CrossMap).

Full Genes

A wide range of genomic regions BED file. It contains:

  • "RefSeq ALL" transcripts and "GENCODE" full genes regions with 5Kbp upstream and 5Kbp downstream

  • Within this range, all “Clinical Regions” are included

  • All dosage regions (HI/TS sig level 1, 2 or 3)

Moreover, liftover versions of both reference regions were included, for the current and previous range versions.

Sources:

  • Liftover done using CrossMap (v0.5.2), chain hg19ToHg38.over.chain.gz

  • NCBI RefSeq regions are based on the release 105 (hg19) and 110 (hg38)

  • Gencode regions are based on the release V19 (hg19) and V41 (hg38)

  • All microRNA genes based on HGNC miRNA definition December 2022

  • ClinGen Dosage region Dec 2022

  • Promoters from EPDnew human version V6

  • mtDNA CRS

  • RNA disease genes based on OMIM and HGNC (Dec 2022): ATXN8OS, TERC, IL12A-AS1, FAAHP1, NUTM2B-AS1, GAS8-AS1, RNU12, MIR204, IGHG2, SLC7A2-IT1, MIR99A, RMRP, XIST, MEG3, DIRC3, MIR17HG, GNAS-AS1, LRTOMT, LINC00299, DUX4L1, MIR137, MIR140, MIR605, SNORD118, RNU4ATAC, HELLPAR, IGHG1, IGHM, MIR19B1, RNU7-1, LINC00237, MIR2861, MIR4718, IGHV3-21, IGHV4-34, IGKC, KCNQ1OT1, MIR184, MIR96, H19, HYMAI, PCDHA9, UGT1A1, AFG3L2P1, DISC2, SNORA31, TRU-TCA1-1, PCDHGA4, TRAC, ECEL1P3, MIAT

  • Clinvar variants (ClinVar Dec- 2022) with any pathogenic or likely pathogenic significance (and some drug responses that are affiliated with pathogenicity)

  • 50K STR regions based on the Dragen4.0 Specification file

Table 1. Number of regions and total size per bed file

BED file name

Number of lines

Size in bp

GRC38_coding

206635

44959430

GRC38_clinical_regions

237652

121694892

GRC38_full_genes

37793

2200286025

GRC37_coding

200113

44420909

GRC37_clinical_regions

230619

119594638

GRC37_full_genes

35776

2368701647

Last updated