TraFaC (Transcription Factor Binding Site Comparison) is a web-based system developed by the Biomedical Informatics division of Cincinnati Children's Hospital Medical Center that identifes regulatory regions using a comparative sequence analysis approach. Specifically, it integrates analysis results from applications such as PipMaker (Schwartz et al., 2000), which computes alignments of similar regions in two DNA sequences, and MatInspector professional (Quandt et al., 1995) and Match, which locate potential transcription factor (TF) binding sites. The result is a graphical depiction of conserved blocks of TF binding sites, or cis-elements, in the two sequences.
Submitting files for analysis
To submit files to TraFaC for analysis, you simply need to gather the required files and upload them to the database. Alternatively, you can email us the required files, and we will upload them for you.
After you submit your files for analysis, you can search them for cis-clusters and view results.
Required files
Sequence Files
The nucleotide sequence files need to be in the Fasta format.
RepeatMasker Output Files
The RepeatMasker web application is used to mask the repeats before aligning them. Given below are brief instructions for using the RepeatMasker. You can find a detailed set on the RepeatMasker web page.
- Use the Browse button to select the fasta file created above, or copy the fasta sequence into the text box.
- Select the "html" return format, the appropriate DNA Source and press Submit Sequence.
- Save the Matches as a text file to your computer by selecting File>Save As from the main menu. You can use copy and paste to save the Repeat Mask output.
- NOTE: The mask output alone is required. Do not save the masked sequence.
Exon Files
An Exon file is an optional text file providing the positions of transcriptional units in the first sequence. The directionality of a gene (< or >), its start and end positions and name should be on one line, followed
by separate lines specifying the start-positions and end-positions of each exon. An optional line beginning with a "+" character can indicate the first and last nucleotides of the translated region (including the initiation codon, Met, and the stop codon). Blank lines are ignored. Exons must be specified in order of increasing address even if the gene
is on the reverse strand (<). For example, the Exon file might begin as follows:
> 100 800 Gene 1
100 200
300 400
600 800
PipMaker Output Files
PipMaker computes alignments of similar regions in two DNA sequences. The resulting alignments are summarized with a Percent Identity Plot (PIP). It generates graphical output as a PDF document by default. For TraFaC, you need the BlastZ alignment file, the summary file and an optional PIP file. These are referred as "text," "concise" and "pip" files, respectively, in the PipMaker output data that you receive by email. Here are brief instructions for using the Advanced PipMaker. For detailed instructions refer the advanced PipMaker instructions page.
- Follow the Advanced PipMaker instructions (the files you have created above can be used).
- Enter your email address.
- Press the Submit button.
- When the alignment is finished, you will receive an email containing multiple attachments. Of the different output files, TraFaC needs only three of them:
- Concise Alignment file: This file has a summary of the sequence alignment information.
- Text file: This is a verbose file and is actually a BlastZ alignment file.
- PIP: a PDF file of the percent identity plot.
Store the first two files as text files.
MatInspector / Match Output Files
You can use either of these programs for detection of transcription factor binding sites. MatInspector Professional and Match are tools that utilize a library of matrix descriptions for transcription factor binding sites to locate matches in sequences of unlimited length. To access both these programs you need to have registered with them, which is free. You can find the detailed instructions on these pages. Here are some basic instructions:
- Be certain to choose the appropriate matrix family, for example, vertebrates.
- Follow the default options for the rest of the parameters.
- When you have the output, save the MatInspector files as HTML files, but save Match files as text files.
Uploading files
Before you upload files to the TraFaC database, you need to obtain a login account by emailing the TraFaC administrator. You also need to gather all required files.
Once you have an account and all required files, you can upload the files by completing the following steps:
1. Go to the left-hand manu, and click Login:
2. Type your Username and Password, and click Login:
A greeting should appear, and the TraFaC home page should now include an option for Advanced Tools.
3. Under Advanced Tools, click Upload/Parse Sequence data:
The Multi-Upload screen appears:
4. For each field, click the Browse button to locate and select a file for uploading.
Notes:
- All files should be in text format except for the MatInspector files and PIP, which should be in HTML and PDF format, respectively.
- Always enter an accession number, or unique ID, unless you are only uploading MatInspector files.
- The exon file, repeat mask and the actual sequence file are optional. However, we advise that you upload all optional files so that you can comprehend your results more clearly.
5. When you have selected all files for uploading, click Upload. Uploading may take some time depending on your sequence size. An "Upload successful" message indicates that you are ready to view your results.
Note:
You can upload any of these file types separately, but remember to associate all of them with a unique accession number.
Return to top
Searching the database
From the homepage of TraFaC, you have three options for searching the database:
- Cis-element clusters within BlastZ alignments - Finds conserved cis-clusters within BlastZ-identified conserved sequence alignment blocks. By clicking this link, you can visualize the alignment between an orthologous pair of genes (human and mouse sequences). Most important, you can view the common putative TF binding sites shared by the human and mouse genes in the context of the conserved regions. This utility is limited to a pairwise comparison of only those sequences for which the alignment data is present in the database. If you are interested in a particular gene(s) not already present in the database, email us; if the gene has a RefSeq human and mouse mRNA entry, we will upload it to the database.
- Cis-element clusters between any gene pair - Finds cis-element clusters between user-selected gene segment pairs. By clicking this link, you can explore the genes for regulatory elements irrespective of the sequence similarity. The main advantage of choosing this option is that you can compare any gene with any other gene or known promoters/enhancers in the TraFaC database. If you are analyzing a group of co-expressed or coordinately regulated genes, this approach is recommended, especially when you know the transcription start site.
- Conserved Cis-element Scanner - Enables you to select one or more transcription binding sites and search all genes in the database for clusters containing the selected site(s). Within each cluster, you can view the exact position of each binding site.
Cis-element clusters within BlastZ alignments
To search for cis-element clusters within BlastZ aligned pairs, complete the following steps:
1. On the TraFaC homepage, under Basic Tools, click Cis-element clusters within BlastZ Aligments:
2. At the search screen, enter any of the following search criteria:
- Accession Number - any RefSeq GenBank accession number (starting with NM) can be used; multiple entries should be separated by commas
- Sequence Name - HUGO nomenclature only; multiple entries should be separated by commas
- Description - any one-word description (for example, repair, apoptosis, etc.); be sure to enter one term only
- Sequence Group - a group of related genes; for example, selecting "DNA Repair" displays all genes belonging to the DNA repair group that have the BlastZ alignment data entered into TraFaC.
Note:
At present, the list of genes under each group is not exhaustive. We are in the process of building up the database and adding more genes and more groups. If you are interested in any particular gene or group, please
let us know so that we can add them to the database.
3. Click Search:
A table of search results appears in the lower half of the screen.
4. Select the check box next to each sequence you want to view:
5. Click Submit:
The BlastZ alignments page opens:
The first two columns show the sequence information for the human and mouse sequences.
The third column (Timestamp) indicates the date of entry.
From the last column (Action), you have two viewing options:
- View: clicking this option takes you to the local alignments page, which provides a summary view of the sequence alignment information
- Regulogram: clicking this option takes you to the regulogram page, which provides a cis-element hit density graph in the context of sequence similarity
- PIP: clicking this option takes you to a graphic display of the alignment image; this is a PDF file, so you need to have Adobe Reader installed to view it (available for free at http://www.adobe.com)
For more information about these and other views, see Choosing different views.
Cis-element clusters between any gene pair
To search for cis-element clusters in a gene segment pair of your choice, complete the following steps:
1. On the TraFaC homepage, under Basic Tools, click Cis-element clusters shared between any gene pair:
2. At the search screen, enter any of the following search criteria:
- Accession Number - any RefSeq GenBank accession number (starting with NM) can be used; multiple entries should be separated by commas
- Sequence Name - HUGO nomenclature only; multiple entries should be separated by commas
- Description - any one-word description (for example, repair, apoptosis, etc.); be sure to enter one term only
- Sequence Group - a group of related genes; for example, selecting "DNA Repair" displays all genes belonging to the DNA repair group that have the BlastZ alignment data entered into TraFaC.
Note: At present, the list of genes under each group is not exhaustive. We are in the process of building up the database and adding more genes and more groups. If you are interested in any particular gene or group, please
let us know so that we can add them to the database.
3. Click Search:
A table of search results appears in the lower half of the screen.
4. Go to the table, and select your first sequence:
5. Click Submit:
6. Select your second sequence, and click Submit.
7. Use the TraFaC query page to modify any parameters you wish, as indicated below:
- Use the Sequence Filter to modify or change the sequence coordinates.
- Use the Matrix Filter to select which matrices to view and which to block; currently, up to four different matrices are supported.
- Use these options to adjust the image size and quality, to combine similar matrices and show them as a family, and to highlight regions of cis-element clusters within a selected base pair window.
8. When you are finished modifying parameters, click Submit.
An image depicting the shared transcription factor binding sites appears. We call this a TraFaC image. To learn more about this image and others, see Choosing different views.
Conserved cis-element scanner
To select one or more transcription binding sites and search all database genes for clusters containing the selected site(s), complete the following steps:
1. On the TraFaC homepage, under Basic Tools, click Conserved Cis-element Scanner:
2. Go to the top of the screen, and customize the search region if you wish. By default, the system searches the 10 KB region upstream of the first exon of all genes.
Example:
Typing 15000 for (a), selecting Downstream for (b), and selecting Last Exon for (c) instructs the system to search the region 15000 base pairs downstream of the last exon of all genes.
3. Go to the list of binding sites in the bottom section of the screen, and select one or more sites to include in your search:
4. Click Search:
The search results page appears:
- These columns identify the human and mouse sequences in which the selected binding sites were found.
- These columns indicate the location of each cis-cluster in the human and mouse sequences.
- This column lists all sites in the cis-cluster, including the sites you selected in step 3.
- Clicking this link takes you to a TraFaC image, or graph of the shared binding sites in the two sequences.
- Clicking this link takes you to a regulogram, or a cis-element hit density graph in the context of sequence similarity.
- By going to the first column of the table, selecting one or more cis-clusters and clicking Show Binding Site Positions, you can view the start and end positions of all binding sites in each cluster.
- By clicking Download, you can download the entire contents of the screen in Microsoft Excel format.
- By clicking Modify query, you can return to the search screen and modify your original search range and/or binding site selections. Or by clicking New Search, you can start over with a blank search screen.
Return to top
Choosing different views
As you search the database, you can view the data in several different ways, depending on which search option you chose. Click the following views to learn more about them:
Local alignment view
The Local Alignment page consists of the following table, which is essentially a summary of the alignment results of the orthologous (mostly, human and mouse) genomic sequences. It is based upon the BlastZ sequence alignment uploaded to the TraFaC database.
- By going to the top of the page and clicking PIP, you will go to a display of the Percent Identity Plot (PIP), a graphical alignment image of the human and mouse sequences. The PIP is generated using the PipMaker software and is subsequently uploaded to the TraFaC database.
- By going to the last column and clicking View, you will go to a Concise alignment view page. This page is same as the local alignment view page but also has the shared TF binding sites displayed as number of "hits," family-wise or individual matrix-wise.
Concise alignment view
The Concise Alignment page consists of the following table, which is essentially the same as the Local alignment view page but has additional columns for the display of the shared TF binding sites ("Hits"), family-wise or individual matrix-wise.
- By going to the top of the page and clicking PIP, you will go to a display of the Percent Identity Plot (PIP), a graphical alignment image of the human and mouse sequences. The PIP is generated using the PipMaker software and is subsequently uploaded to the TraFaC database.
- The numbers in the Hits columns indicate the number of shared TF binding sites in a window of 200 base pairs between the two sequences that are compared. By clicking these numbers, you will go to the TraFaC image (Shared Transcription Factor Binding Sites Image), a graphical display of the shared TF binding sites between the two sequences that are compared.
- By going to the last column and clicking View, you will go to the Alignment Blocks page, which depicts the actual sequence alignment.
TraFaC image
The shared transcription factor binding sites image, or TraFaC Image, indicates the TF binding sites occurring in both the sequences. Here is an example, along with numbered annotations explaining its parts:
- The two gray vertical bars are the two genes that are compared. The numbers represent the nucleotide positions with respect to the sequences used.
- The TF binding sites occurring in both the genes are highlighted as colored bars drawn across the two genes. Click the image to zoom in on a site of interest. The TraFaC image can be viewed based upon the individual matrices of the TF binding sites or the matrix families.
- Indicates the names of the TF matrices. Click them to learn more about them. Note that these links work only if you have an account with the genomatix (http://www.genomatix.de).
- A table describing the putative sites displayed in the image. For each site, the start and end positions are listed along with the sequence string.
- Click Show Only Parallel Sites to display "ordered hits." Ordered hits would limit the shared cis-elements to only those that are positionally conserved (or are almost evenly spread and equidistant in both the ortholog genes). This feature helps in clearing or filtering out the cluttered or complex regions. Cis-clusters that have constituent cis-elements occurring parallel in the orthologous genes frequently tend to be involved in regulatory function.
- Click Show Query Parameters to view or modify the query parameters.
Regulogram
The Regulogram, or Cis-element Hit Density Image, depicts a moving-window average of the number of shared cis-elements occurring in phylogenetically conserved regions. Here is an example, along with numbered annotations explaining its parts:
- The grey horizontal bars are the nucleotide sequences of the two orthologous genes compared. The numbers represent the coordinates of the sequences used. The red bars are the exons.
- The green blocks plotted parallel to the genomic sequences are the repeat regions identified by RepeatMasker.
- The different-colored polygons stretching from one sequence to the other indicate the sequence similarity regions between the two genes.
- The Hits scale on the lower-left side refers to the number of shared cis-elements between the two sequences occurring in a sequence-conserved region.
- The TF BS Freq in the upper half of the left side refers to the frequencies of the binding sites in both the sequences separately.
- Percent Identical refers to the percent similarity between the two sequences.
- To view hits based on individual transcription factor binding site matrices or just the matrix family wise, select Combine unordered same-family matrices, and click Refresh.
- To modify the default size of 850 X 412, type a different value for the width, and click Refresh.
- To zoom in for more clarity, select the radio button next to the Zoom drop-down menu, select a different value from the drop-down menu (by default, 10x magnification is selected), and click the image window.
- To look at the actual TF binding sites (constituent elements of hits), select the radio button next to the drop-down menu to the top-left of the regulogram, select a value other than the default window size of 200 bp if you wish, and click any point on the hits graph. The TraFaC image for this point is displayed.
- A new feature we have added includes the option for plotting "ordered hits." Ordered hits would limit the shared cis-elements to only those that are positionally conserved (or are almost evenly spread and equidistant in
both the ortholog genes). This feature helps in clearing or filtering out the cluttered or complex regions. Cis-clusters that have constituent cis-elements occurring parallel in the orthologous genes frequently tend to be involved in regulatory function.
Return to top
References
Jegga, A.G., Sherwood, S.P., Carman, J.W., Pinski, A.T., Phillips, J.L., Pestian, J.P. and Aronow, B.J. (2002) Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes. Genome Res. 12, 1408-1417.
Pruitt, K.D., Tatusova, T. and Maglott, D.R. (2003) NCBI Reference Sequence project: update and current status. Nucleic Acids Res. 31, 34-37.
Quandt, K., Frech, K., Karas, H., Wingender, E., and Werner, T. (1995) MatInd and MatInspector: New fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23, 4878-4884.
Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. (2000) PipMaker-A web server for aligning two genomic DNA sequences. Genome Res. 10, 577-586.
Return to top