Mercator4 - plant protein functional annotation

optionally, verify the validity of the FASTA-format of the sequences by using the online
upload a file with the protein or cDNA sequences in a valid FASTA-format (for cDNA sequences all six reading frames will be tested)
specify the type of sequence, protein or cDNA sequence
optionally, add an additional protein annotation from the results of a Blast sequence comparison to the Swiss-Prot database
optionally, provide a job name and an email address to which the annotation results will be sent

list with a simple statistics on how many of the protein sequences were successfully categorized

Submitted sequences	total number of user-submitted sequences
Classified sequences (C)	protein sequences put into Mercator4 protein categories
Annotated sequences (A)	sum of (C) and the sequences for which a Swiss-Prot annotation is available (only relevant if the Swiss-Prot annotation option has been selected)
Occupied Mercator4 categories (O)	occupied Mercator4 protein categories for which the user-submitted set contains matching sequences
Mercator4 categories available	total number of the currently available Mercator4 protein categories
Expected Length sequences (EL)	sequences (C) whose length lies within the IQR of category-specific reference lengths
Short Length sequences (SL)	sequences (C) whose length is below the IQR of category-specific reference lengths
Long Length sequences (LL)	sequences (C) whose length is above the IQR of category-specific reference lengths
Summary	summary of the results in a nutshell (data in per cent)

bar chart summarising the protein assignments across the top-level context descriptions
Each bar represents a top-level context description and the percentage of its protein categories occupied by at least one protein from the submitted protein sequences.
bar chart displaying the distribution of protein lengths based on the differences to category-specific reference lengths
Each bar represents the number of proteins having a certain length difference to the median of reference lengths of the corresponding Mercator4 category.

download of protein annotation files for further processing on your local computer

mercator4_result_data_fasta.zip	FASTA-format that contains the protein annotations
mercator4_result_data.zip	specific tabular format that is required for the and the MapMan desktop application

Visualize result in tree viewer

The TreeViewer shows the protein categorization visualized as hierarchical tree with annotation context descriptions as branch nodes and protein categories as leaf nodes.

click on Result Tree Viewer
select a user job to display the results in the tree structure
optionally, add pre-evaluated protein annotations from one or more of the listed reference plant species
click on the button Show checked data on tree

Expanding the tree diagram at a location of interest displays the protein count per species per category. A mouse-over on a count tab pops up the individual names of the categorized proteins.
The color of the square in front of a protein description indicates the encoding genome of the corresponding protein

encoded by the nuclear genome
encoded by the plastidial genome in most or all plant clades
encoded by the mitochondrial genome in most or all plant clades

Visualize result in heatmap viewer

The HeatmapViewer displays the comparison of two protein sets with protein categories as spots colored according to the comparison outcome.

click on Result Heatmap Viewer
select a data set proteome A (user jobs will be displayed at the bottom of the drop down list)
select a second data set proteome B for comparison from the second drop down list
click on the button Show protein comparison on heatmap

A mouse-over on a spot pops up the description of the protein category and its context.
The color of a spot indicates whether the protein category is present in one or both protein sets and whether one of the protein sets has more or less proteins assigned to that protein category.

not present in either proteome (default state before selecting the protein sets)
present only in proteome A
present only in proteome B
- both proteomes contain the same number of paralogs
- proteome B contains 1 paralog more than proteome A
- proteome B contains 1 paralog less than proteome A
- proteome B contains 2+ paralogs more than proteome A
- proteome B contains 2+ paralogs less than proteome A

In addition, a green or blue background-color of a spot indicates a non-nuclear encoding of the corresponding protein

encoded by the plastidial genome in most or all plant clades
encoded by the mitochondrial genome in most or all plant clades

The Heatmap Viewer creates diagrams in Scalable Vector Graphics (SVG) format which conveniently can be downloaded with a browser that has an add-on for SVG export installed (for example the add-on SVG Export).

Online legacy protein annotation tool

Mercator version 3.6 is an older release of the context-based annotation approach based on a different annotation framework (see publication). The Mercator version 3.6 online tool is still available but any active maintenance has ended.

Publication

Lohse et al. (2014)
Mercator: a fast and simple web server for genome scale functional annotation of plant sequence data
Plant Cell Environ. 2014 May, 37(5): 1250-1258.

Online legacy protein annotation tool

Although it is recommended to use the latest version of Mercator4, it is possible to submit sequences to legacy versions. Please notice that the available older Mercator4 versions do no longer support the online tools and .

Online enrichment analysis of protein categories

The online Mercator4 enrichment analysis identifies protein classes that are over- or under-represented within the full set of Mercator4 protein categories (BINs). The method uses statistical approaches to identify significantly enriched or depleted groups of protein categories.

upload the Mercator4 mapping result file that maps protein sequences to Mercator4 protein categories (BINs)
select the type of the Fisher's exact test
- one-sided exact test - specify also the over-representation (enrichment) or the under-representation analysis (depletion)
- two-sided exact test to perform both the over- and under-representation analysis simultaneously
set the significance threshold by entering the False Discovery Rate (FDR)-adjusted p-value (e.g. 0.05) to specify the tolerated ratio of the false positive results to the total positive results
define the input lists
- Genes of Interest enter the list of identifiers you want to analyze (e.g., your Differentially Expressed Genes / DEGs)
- Background Genes enter the list of identifiers which represent the entire possibility space of your experiment
  - include ALL genes that were detectable/measurable in your experiment (e.g., all genes remaining after filtering for low counts)
  - ensure that the "Genes of Interest" are also included in the "Background Genes" list
  - avoid to use only the non-differentially expressed genes
  - don't use the entire genome if many genes were not expressed/detectable in the tissue examined or under the given conditions
  - example RNA-Seq experiment: list of all genes that had sufficient read counts to be tested for differential expression (detected genes), not just the non-significant ones
- input format requirements
  - one gene identifier per line
  - gene identifiers have to be identical to the identifiers used in the Mercator4 mapping result file

Result enrichment analysis

When an enrichment analysis finishes, the results are displayed.

Result Table displays the Mercator4 protein categories found to be enriched or depleted along with a description. Click on the Download CSV button to download the table.
MapMan BIN Chart visualizes the information from the table as an interactive BIN tree diagram with nodes colored according to an enrichment
- yellow nodes indicate over- or under-representation (two-sided test)
- green nodes indicate over-representation (one-sided test)
- red nodes indicate under-representation (one-sided test)
- a click on a node toggles between expansion and collapsing of the subtree
- diagrams created in Scalable Vector Graphics (SVG) format which conveniently can be downloaded with a browser that has an add-on for SVG export installed (for example the add-on SVG Export)
Enrichment Visualisation provides a graphical summary of the most significant categories as a bubble plot.
- X-Axis represents the magnitude of enrichment for each category
- Y-Axis displays the specific Mercator4 category identifiers and descriptions
- size of a bubble is proportional to the number of genes of interest assigned to that specific category (count)
- color gradient indicates the statistical significance level (FDR-adjusted p-value) colored using a heatmap red-to-blue color scheme

Online validation of FASTA-formatted sequences

The FASTA-format is a text-based format for representing protein or nucleotide sequences. The FASTA validator allows users to test the FASTA-format of a sequence file before submitting it to Mercator4. Each record in the FASTA-formatted file will be validated and all records not supported by Mercator4 will be listed. Optionally, the user can check the Create Mercator4-valid FASTA file checkbox and download a Mercator4-valid version of the file with all records containing errors removed.

Requirements for a FASTA-formatted file

General requirements for a FASTA-formatted file

each entry in the file starts with > followed by the name of the record
the record name must be unique within the file
the maximal sequence length allowed is 25000 characters
the file must not contain a mix of nucleotide and protein sequences

Single-letter codes for protein sequences supported in Mercator4

A,C,D,E,F,G,H,I,K,L,M,N,P,W,R,S,T,V,W,Y,U,O,B,J,Z
X for 'any'
* for 'stop'
- for 'gap'

Single-letter codes for nucleotide sequences supported in Mercator4

A,C,G,T,U
R,Y,S,W,K,M,B,D,H,V for certain ambiguous assignments
N for 'any'

Mercator4 - an online protein annotation tool

Mercator4 is an online tool to assign functional annotations to protein sequences of land plants (including flowering plants, ferns, horsetails, mosses, liverworts, and hornworts). Mercator4 can also annotate highly conserved proteins among the green algae groups of Archaeplastida. The results from user-submitted protein sequences can be visualized online and/or downloaded for further analysis.

The Mercator4 functional annotations are designed as a hierarchical framework (Mercator4/Mapman4 framework) that mainly describes the functional context of a protein with each child node term being more specialised than its parent node term. The framework has currently 29 core categories ( 1 Photosynthesis to 29 Plant organogenesis) and the non-core category 50 Uncharacterised context. The protein sequences in all these categories are conserved in many land plant clades. An additional category 30 Clade-specific metabolism contains poorly conserved proteins that are specific to certain plant families (currently only available for Brassicaceae and Rubiaceae). Protein sequences are assigned to the protein categories at the leaf-level of the hierarchy but the actual Mercator4 annotation is based on the complete hierarchical path, including all levels.

A protein's context and category is depicted as a hierarchical number. The first number of the hierarchy refers to one of the top-level categories. In an average land plant proteome, approximately 70% of the predicted protein sequences can be categorized by Mercator4 (version 8).
Protein sequences which cannot be categorized by Mercator4, are assigned by default to the pseudo-category 99 no Mercator4 annotation. Optionally, for these proteins, it is possible to check (by using a simple BLAST local alignment) whether a similar protein is contained in the reviewed Swiss-Prot protein set (Swiss-Prot dataset of Viridiplantae proteins). If the search is successful the proteins are assigned to the pseudo-category 99.1 no Mercator4 annotation.other annotation available.

Protein function annotation results

When a Mercator4 job finishes, an overview of the results is displayed as

list that gives a simple statistics on how many of the protein sequences were successfully categorized
bar chart that summarises the protein assignments across the top-level context descriptions
bar chart that displays the distribution of protein lengths based on the differences to category-specific reference lengths (each reference length has been evaluated from the median length of matching proteins from ~250 or more land plant species)

The Mercator4 protein annotation results can be downloaded for

any further processing on your local computer ('mercator4_result.zip' and 'mercator4_result_data_fasta.zip')
usage in the MapMan desktop application ('mercator4_result.zip')
usage in the online ('mercator4_result.zip')

The protein annotations can also be visualized by two interactive online tools

TreeViewer shows the protein categorization visualized as hierarchical tree with annotation context descriptions as branch nodes and protein categories as leaf nodes
HeatmapViewer displays the comparison of two protein sets with protein categories as spots colored according to the comparison outcome

Mercator4 updates

The hierarchical framework for Mercator4 is regularly updated and extended. For details about the version history see the . Although it is strongly recommended to use the latest version of Mercator4, it is also possible to submit sequences to legacy versions .

latest version of Mercator4 is release 8 (Dec 2025) with approximately 8800 individual protein categories

Terms of use

Usage of this site follows the GDPR Privacy Notice of Plant Biotechnology Information (plantBI), IBG-4, Jülich Research Centre (FZJ). The detailed GDPR Privacy Notice is available at .
In accordance with that policy, we use Matomo Analytics software to collect anonymised data on visits to, downloads from, and searches of this site. We make no warranties regarding the correctness of the data, and disclaim liability for any direct loss, damage or expense suffered by you or any third party, arising out of the use of Mercator4 (including any error in the information) or any transaction made in connection with Mercator4. Mercator4 does not endorse or recommend any commercial products or services.

Contact & publications

For any questions and suggestions, please feel free to contact us (plabipd@fz-juelich.de).

Bolger ME, Schwacke R, Usadel B (2021)
MapMan visualization of RNA-Seq data using Mercator4 functional annotations.
Methods Mol Biol. 2021, 2354: 195-212.
Schwacke R, Ponce-Soto GY, .., Bolger ME, Usadel B (2019)
Mapman4: a refined protein classification and annotation framework applicable to multi-omics data analysis.
Mol Plant. 2019 Jun 3, 12(6): 879-892.

Mercator4 v.8 (December 2025)
- added: new non-core category BIN-50 "Uncharacterised context"
- expanded: 26 of the 29 core categories
- updated: annotation framework
  - core categories BIN-01..BIN-29 (conserved proteins with characterised context)
    - 2155 context nodes
    - 6720 protein categories
  - non-core category BIN-30 (poorly conserved clade-specific proteins with characterised context)
    - 11 context nodes
    - 31 protein categories
  - non-core category BIN-50 (conserved proteins with poorly characterised or uncharacterised context)
    - 33 context nodes
    - 2052 protein categories

legacy versions

Mercator4 v.7 (October 2024)
- updated: annotation framework
  - BIN-01..BIN-30 + BIN-50 with 8689 nodes
    - top-level context nodes BIN-01..BIN-30
      - 2063 context nodes
      - 6567 protein categories
    - top-level context node BIN-50
      - 7 context nodes
      - 52 protein categories

Mercator4 v.6 (October 2023)
- added: evaluation of protein length distribution
- added: new top-level context (BIN-29 "Plant organogenesis")
- updated: annotation framework
  - BIN-01..BIN-30 + BIN-50 with 8131 nodes
    - top-level context nodes BIN-01..BIN-30
      - 1901 context nodes
      - 6171 protein categories
    - top-level context node BIN-50
      - 7 context nodes
      - 52 protein categories

Mercator4 v.5 (July 2022)
- updated: annotation framework
  - BIN-01..BIN-28 + BIN-30 consists of 7544 nodes
    - 29 top-level context nodes
    - 1782 context nodes
    - 5733 protein categories
Mercator4 v.4 (October 2021)
- added: HeatmapView online tool
- added: new top-level context (BIN-28 "Plant reproduction")
- updated: annotation framework
  - BIN-01..BIN-28 + BIN-30 consists of 6897 nodes
    - 29 top-level context nodes
    - 1667 context nodes
    - 5201 protein categories
Mercator4 v.3 (July 2020)
- added: download of FASTA-formatted annotation data
- added: new top-level context (BIN-30 "Clade-specific metabolism")
- updated: annotation framework
  - BIN-01..BIN-27 + BIN-30 consists of 6420 nodes
    - 28 top-level context nodes
    - 1573 context nodes
    - 4819 protein categories
Mercator4 v.2 (July 2019)
- updated: annotation framework
  - BIN-01..BIN-27 consists of 5934 nodes
    - 27 top-level context nodes
    - 1456 context nodes
    - 4450 protein categories
Mercator4 v.1 (May 2018)
- annotation framework
  - BIN-01..BIN-27 consists of 5427 nodes
    - 27 top-level context nodes
    - 1304 context nodes
    - 4095 protein categories

Can I submit a FASTA file containing both DNA and protein sequences?

No, this will result in an error. FASTA files submitted to Mercator4 must be exclusively DNA or Protein sequences.

Very few of my sequences are assigned to functional BINS. Why?

You should verify that that you have selected the correct sequence type (DNA or Protein) before submitting the mercator job. If you submit DNA sequences, but specify the type of sequence as Protein, an error will be not be generated, but very few sequences are likely to be assigned to functional BINs. If you are sure that you have selected the correct sequence type, verify that you have submitted gene sequences (introns must be removed). Mercator4 is designed for land plant protein annotations: if you submit sequences from non-plant organisms, the classification and annotation rate will likely be low.

I get an error that my sequences are incompatible with Mercator4 - what can I do?

Mercator4 has been upgraded to accept a number of ambiguous protein sequences. However, there are still certain criteria which a sequence has to meet. To validate your sequence for Mercator4, you can run your FASTA file on the button which will give a detailed report including the possibility to generate a Mercator4-valid FASTA file with the offending records removed.

I get an Unknown Error (or Internal Error or Server Error) when running my job. What does this mean?

We try to handle every error scenario and provide a detailed description why the job failed. If you experience such an error, please send an email to plabipd@fz-juelich.de with the 'JOB ID' (Starts with GFA-XXXXXXXX).

I ran my sequences on Mercator4 six months ago, but now the version has changed. Can I run my sequences against an older version?

Yes. We provide which will allow users to run against older versions of Mercator4.

My job has been queued for hours. Is it really running?

This cluster is capable of running many jobs in parallel, but can still be overpowered if many users submit jobs simultaneously. If your job has been queued for hours, submitting the same jobs again will not speed up the process. If your job has not completed after 4 hours, then you should contact us at plabipd@fz-juelich.de providing us with the 'JOB ID' .

My browser crashed while running a job, and now I cannot access my job any more. What can I do?

As we do not require users to login to submit a job, the only way we have to track your job is using a 'browser session'. If your browser has crashed, then a new session is created and the link to your jobs is lost. However, if you entered a email address when you submitted the job, you will still be notified (along with a link to the results) when the job has finished. If you did not enter an email address, but have taken a note of the JOB ID, then you can email us at plabipd@fz-juelich.de to get the results.