IntroductionThe Catalog of Inferred Sequence Binding Preferences (CIS-BP) is a library of transcription factor (TF) DNA binding motifs and specificities. The data are organized in a user friendly manner for ease of searching, browsing, and downloading. CIS-BP also includes built-in web tools for scanning DNA sequences for putative TF binding sites, predicting the DNA binding motif of a given TF, and identifying a TF that might recognize a given DNA motif.
Searching or browsing for TFsSearching and browsing capability is available for users interested in a specific TF, organism, data source, or TF family. To search for a specific TF by name or identifier, enter the search string into the box at the top of the home page labeled "Search for a TF by identifier", and press the GO! button. Wildcards (denoted as '*') are accepted, and the search is case insensitive. For example, a search for "hox*" will return all TFs whose name begins with "hox", in any organism. A spreadsheet file containing the search/browse results can be obtained by clicking on "Download excel spreadsheet (csv text format)" at the top of the page. Searches can be restricted by using the pull-down bars under the text search box. For example, all mouse bZIP family TFs whose names start with "cebp" can be found by entering "cebp*" in the search box, selecting "Mus_musculus" under the "Species" pulldown menu, and selecting "bZIP" under the "Domain Type" pulldown menu. To browse all mouse bZIP family TFs, simply remove the "cebp" search string from the search box. The "Motif evidence" pull-down menu offers several options to restrict to or browse TFs with specific motif evidence statuses. Motif evidence statuses indicate how the motif for a given TF was determined. "Direct" indicates that the motif was directly determined for the TF using an experimental assay. "Inferred" indicates that the motif was determined indirectly, by inferring the motif from a TF with a similar DNA binding domain (DBD). For example, the mouse Gata4 TF has a motif that has been directly determined using a Protein Binding Microarray (PBM) assay, so its motif status is "Direct". The zebra fish (Danio rerio) gata4 TF has not had its motif directly determined, but its DNA binding domain is 98.6% identical to the mouse Gata4 TF, so its motif can be “inferred” to be similar to the directly-determined mouse motif. We determined separate inference thresholds for each TF family, based on %DBD identities in our publication (Weirauch et al., Cell 2014).
TF pagesEach of the 160,000+ TFs contained in CIS-BP has its own page, which can be reached using the search and browse capabilities discussed above. At the top of each TF page is the name, organism, and TF family for the given TF. Each TF page is divided into several different sections, which are outlined below.
The "TF information" section provides basic information about the TF, and links to external databases. Clicking on the "Pfam ID" or "Interpro ID" links opens a new window for the corresponding domain database. Clicking on the "Gene ID" opens a link to the corresponding organism's genomic database (e.g. SGD for Saccharomyces cerevisiae, WormBase for Caenorhabditis elegans, etc). Clicking on the "Sequence source" opens a link to the corresponding database from which the given TF's amino acid sequence was obtained. A link to the AnimalTF database is also provided for metazoan TFs.
Directly determined binding motifs
This section contains information about the DNA binding motif(s) that have been directly experimentally determined for the given TF. Sequence logos are displayed that summarize the binding preferences for the given TF (in forward and reverse orientations). Clicking on a sequence logo provides a popup window with the corresponding position frequency matrix (PFM). Under "Type/Study/Study ID", information is provided about the technology used to generate the motif (i.e. PBM, HT-SELEX, ChIP-seq, etc). A link is also provided to Pubmed for the publication that the data were obtained from, along with the ID used in the study. Note: many motifs derived from Transfac require a license – hence, we do not provide these motifs, and instead indicate that a “Transfac license is required.”
Motifs from related TFs
This section provides motifs obtained for related TFs (i.e., TFs with DNA binding domains that are similar to the given TF). The format is similar to that of the "Directly determined binding motifs" section, with a few differences. For one, clicking on the name of the TF takes the user directly to the CIS-BP page for the corresponding TF. Second, the final column contains values indicating the degree of similarity of the corresponding TF to the current TF. A value of 1 means that the corresponding TF has identical amino acid sequences in its DNA binding domain (based on ClustalOmega alignments within each TF family - see Weirauch et al., Cell 2014 for more details). Different TF families have different identity thresholds for consideration as an inferred motif; the threshold for the corresponding family is indicated at the bottom of this section, and only TFs exceeding this threshold are displayed.
This section provides information about the DNA binding domain(s) of the corresponding experimental construct used to assay the TF’s binding specificity (when known). At the top, a schematic indicates the location of each DNA binding domain within each construct. Below, a table indicates the location of each domain, along with its corresponding amino acid sequence. Clicking on a “Motif ID” provides the full amino acid sequence of the construct.
DNA Binding Domains
This section provides information about the DNA binding domain(s) of the corresponding TF protein isoform. At the top, a schematic indicates the location of each DNA binding domain within each isoform of the corresponding TF. Below, a table indicates the location of each domain, along with its corresponding amino acid sequence. Clicking on a “Protein ID” provides the full amino acid sequence of the protein.
This section provides links to other TFs from the same organism, or from the same TF family.
This section shows all related TFs across all organisms, regardless of their motif. The “motif evidence” section indicates if the corresponding TF has a Direct or Inferred motif, or None.
Bulk downloadsThe bulk downloads section can be reached via the left navigation toolbar. Pre-compiled .zip files are available containing bulk downloads of various subsets of the data (and the entire dataset). Users can obtain all data for a specific organism or TF family, including sequence logos (in .png format), E- and Z-scores (as tab-delimited text files), PBM probe intensities (as tab-delimited text files), Position Frequency Matrices (text files), and TF information (see above "The TF download cart" section below for more information). We also provide raw MySQL table dumps.
TF download cartThroughout CIS-BP, you will find buttons for adding TFs to your cart. The CIS-BP cart acts in a similar manner to popular shopping websites such as Amazon, allowing the user to browse and search for TFs and add interesting TFs to the cart for later use. TFs can be added to (or removed from) the cart individually, or in groups (depending on the corresponding button). At any time, the user can view the contents of the cart by clicking on the "View cart" button in the left navigation window. The cart contains information on its current contents, as well as links to the individual TF pages. The cart can be emptied by clicking on the "Remove all TFs from the cart" link at the top. Data for the current TFs contained in the cart can be obtained by clicking on the "Download TFs in cart" link. Doing so opens a page allowing the user to download information such as sequence logos (in .png format), E and Z-scores (which provide comprehensive scores for all possible 8 base sequences and are available only for PBM data), Position Frequency Matrices (in simple text format), and information about the corresponding TFs (tab-delimited text format). Clicking on "Download Archive" initiates the downloading of a zipped archive containing the relevant files. Be aware that E- and Z-score files are large, and hence might take a while to download when many TFs are contained in the cart.
ToolsScan a single sequence for TF binding This tool allows the user to input a DNA sequence (or sequences) in multiple formats and scan for putative TF binding sites (on both strands) for any organism, using one of three different scoring systems. Accepted input formats (max 8000 base limit):