Mercator Fasta Validator

This tools allows users to validate a fasta file before submitting it to Mercator. Each record in the Mercator file will be validated and 'Mercator-invalid' records will be summarised under the error category (clicking on the error category will show the list of records). The user can optionally create a downloadable fasta file with 'Mercator-invalid' records removed. 

Mercator currently requires the fasta file to conform to the following criteria:-

  • General
    • Each record in the fasta file must start with the records name (the line which starts with '>'). 
    • The record name for each entry must be unique within the fasta file. 
    • The sequence must be between 30 and 20000 characters long (either nucleotide or protein). 
    • The fasta file cannot contain a mix of nucleotide and protein sequences. 
    • The period '.' which is sometimes used in fasta files is also concidered invalid. 
  • Protein sequence submission
    • The 20 standard amino acid codes (ACDEFGHIKLMNPWRSTVWY) together with * (for stop) and X (for any) are concidered valid.
    • U (for selenocysteine) O (for pyrrolysine) are not accepted.
    • The ambigious codes B(Aspartic acid - D or Asparagine - N), J(Leucine - L or Isoleucine - I), or Z(Glutamic acid - E or Glutamine - Q) are not accpted.
    • The '-' or '.' for gaps are also not accepted.
    • A warning will be provided if the submitted sequence are also valid DNA sequence. 
  • DNA sequence submissions
    •  ACGT and N (for unknown) are concidered valid
    • The nucleotide U (for Uracil) is not accepted. 
    • The following nucleotide codes are not accepted:
      • R(A or G), Y(C or T), S(G or C), W(A or T), K(G or T), M(A or C), B(C or G or T), D (A or G or T), H(A or C or T), V(A or C or G) 
    • The '-' or '.' for gaps are also not accepted.
    • A warning will be provided if the DNA sequences are valid Protein sequences.