Overview
GenomeTraFaC is a database of conserved regulatory elements obtained by systematically analyzing the set of genes occurring in the mouse and human genomes, which are highly similar. It mainly focuses on all of the high-quality mRNA entries of mouse and human genes in the Reference Sequence (RefSeq) database of the NCBI.
The identification of conserved potential cis-regulatory regions was done in a computational pipeline fashion using an advanced version of our earlier developed TraFaC server. The availability of putative regulatory information for most of the well annotated genes can also greatly facilitate analyses of groups of co-expressed or functionally related genes for the occurrence of ortholog-conserved shared transcriptional machinery.
Using the TraFaC (Jegga et al., 2002), PipMaker (Schwartz et al., 2000) and MatInspector (Quandt et al., 1995) suite of programs, we have aligned and analyzed more than 12,000 human and mouse orthologous gene pairs that had a validated RefSeq ID from the Reference Sequence database of NCBI (Pruitt et al., 2003). The genomic sequences with flanking (upstream to 5’ and downstream to 3’) 40 kb base pairs were downloaded from the UCSC genome browser (Human May 2004 and March 2006 assemblies, and Mouse Aug 2005 and February 2006 assemblies). Sequence alignment was done using the BlastZ algorithm of PipMaker, while the transcription factor binding sites were found using MatInspector, which utilizes the position weight matrices (PWM) library for the binding sites. TraFaC server was used to identify the common cis-elements within the evolutionarily conserved regions in human-mouse sequence alignment.
The GenomeTraFaC database has cis-regulatory analysis results for more than 12,000 RefSeq annotated human and mouse gene pairs. We are in the process of updating the database as the new RefSeq orthologous gene pairs become available. However, if you are interested in any particular gene (that has human and mouse RefSeq annotations, specifically a "NM" accession number), please mail us the accession numbers or gene symbols or the sequences. We will let you know when the results are available.
From the homepage of GenomeTraFaC, you have three options for searching the database:
To search for cis-element clusters within BlastZ aligned pairs, go to the GenomeTraFaC homepage and click Cis-element clusters within BlastZ Aligments:
From this screen, you can peform a basic search, or search by disease, gene ontology, pathway, gene family or custom group.
1. From the drop-down menu at the top of the screen, select one of the following options, and enter related search terms in the text box:
2. Click Search:
A table of search results appears in the lower half of the screen.
3. Select the check box for each sequence you want to view:
4. Click Submit:
The BlastZ alignments page opens:
The first two columns show the sequence information for the human and mouse sequences.
The third column (Timestamp) indicates the date of entry.
From the last column (Action), you have two viewing options:
For more information about these and other views, see Choosing different views.
1. Go to the lower section of the search screen, and select the type of query you wish to perform.
2. Define your query in one of the following ways:


3. Verify that the correct query type is selected.
4. To process your query, click Search.
5. Follow steps 3 and 4 under Basic Search to select sequences for visualization.
To search for cis-element clusters in a gene segment pair of your choice, complete the following steps:
1. On the GenomeTraFaC homepage, click Cis-element clusters shared between any gene pair:
2. Search for your first sequence by using the basic search options in the top half of the screen or the options for searching by disease, pathway, ontology, phenotype, gene family, or custom group in the bottom half of the screen.
A table of search results appears in the lower half of the screen.
3. Go to the table, and select your first sequence:
4. Click Submit:
5. Select your second sequence, repeating step 2 if necessary, and click Submit.
6. Use the TraFaC query page to modify any parameters you wish, as indicated below:
7. When you are finished modifying parameters, click Submit.
An image depicting the shared transcription factor binding sites appears. We call this a TraFaC image. To learn more about this image and others, see Choosing different views.
To select one or more transcription binding sites and search all database genes for clusters containing the selected site(s), complete the following steps:
1. On the GenomeTraFaC homepage, click Conserved Cis-element Scanner:
2. Go to the top of the screen, and customize the search region if you wish. By default, the system searches the 10 KB region upstream of the first exon of all genes.
3. Go to the list of binding sites in the bottom section of the screen, and select one or more sites to include in your search:
4. Click Search:
The search results page appears:
As you search the database, you can view the data in several different ways, depending on which search option you chose. Click the following views to learn more about them:
The Local Alignment page consists of the following table, which is essentially a summary of the alignment results of the orthologous (mostly, human and mouse) genomic sequences. It is based upon the BlastZ sequence alignment uploaded to the GenomeTraFaC database.
The Concise Alignment page consists of the following table, which is essentially the same as the Local alignment view page but has additional columns for the display of the shared TF binding sites ("hits"), family-wise or individual matrix-wise.
The numbers in the Hits columns indicate the number of shared TF binding sites in a window of 200 base pairs between the two sequences that are compared. By clicking these numbers, you will go to the TraFaC image (shared transcription factor binding sites image), a graphical display of the shared TF binding sites between the two sequences that are compared.
The shared transcription factor binding sites image, or TraFaC Image, indicates the TF binding sites occurring in both the sequences. Here is an example, along with numbered annotations explaining its parts:
The Regulogram, or Cis-element Hit Density Image, depicts a moving-window average of the number of shared cis-elements occurring in phylogenetically conserved regions. Here is an example, along with numbered annotations explaining its parts:
Jegga,A.G., Sherwood, S.P., Carman, J.W., Pinski, A.T., Phillips, J.L., Pestian, J.P. and Aronow, B.J. (2002) Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes. Genome Res. 12, 1408-1417.
Pruitt, K.D., Tatusova, T. and Maglott, D.R. (2003) NCBI Reference Sequence project: update and current status. Nucleic Acids Res. 31, 34-37.
Quandt, K., Frech, K., Karas, H., Wingender, E., and Werner, T. (1995) MatInd and MatInspector: New fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23, 4878-4884.
Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. (2000) PipMaker-A web server for aligning two genomic DNA sequences. Genome Res. 10, 577-586.
What kinds of queries and retrievals are possible in GenomeTraFaC?
GenomeTraFaC can be searched with an approved gene symbol (HGNC for human and MGI for mouse). The database currently has precomputed regulatory region analysis results for more than 12,000 Reference Sequence genes that have a "NM" mRNA entry for human and mouse genes. All mutliple entries (gene symbols or accession numbers) should be comma separated.
I have entered a RefSeq accession number but GenomeTraFaC returned no results.
Since we used a single mRNA accession number for downloading the genomic sequence, there's a possibility that in all cases where there are alternate transcripts (therefore multiple accession numbers), your search results may return nothing. Try using the approved gene symbol instead.
I have entered gene symbol p53 against sequence name but no entries were found.
The HGNC/MGI approved symbol for p53 in human and mouse is TP53. As mentioned earlier, in the current version, the query supports approved symbols only. However, in the next version, you should be able to search with the aliases too. To get an approved symbol, use the NCBI's LocusLink database.
The start site and the promoter region of the human gene is mapping with the upstream or intronic region of the mouse gene.
This could result either because of the incorrect exon annotations, especially the first exon in one of the species, or could be because of the presence of an alternate transcript. This shouldn't pose a problem unless the first intron is larger than 40 kb because for all genes in GenomeTraFaC, we have added 40 kb flanking regions 5' and 3'.
What is the basis for the 40 kb flanking regions?
Earlier we used 10 kb flanking regions. But regulatory regions are known to occur further upstream. We thought 40 kb was a reasonable flanking sequence space to search for potential conserved cis-clusters. However, that doesn't preclude the fact that there are instances where regulatory regions are known to occur as far as 100 kb upstream of the start site.
In the TraFaC image, I don't find the binding sites that have been experimentally validated or have literature references.
There could be two principal reasons for this: the binding site may not be an ortholog conserved one. TraFaC shows the cis-elements which are conserved between two orthologous genes and occurring in a sequence conserved region. In such cases, try searching for binding sites in individual sequences separately by clicking on the graphs TF BS in the top frame of the regulogram image. The other reason could be the binding site may not be in the TRANSFAC library.
How can I copy the images (Regulogram and/or TraFaC image)?
If you are using a PC, right-click the image and use the Save As option to save as a JPEG image. If using a Mac, you can drag the image to your desktop and save it. For TFBS tables and exon tables, you can copy them and paste in an Excel worksheet.