Mercator4 Fasta Validator

This tools allows users to validate a fasta file before submitting it to Mercator4. Each record in the fasta file will be validated and 'Mercator4-invalid' records will be summarised (clicking on each error category will show the list of records).

The user can optionally create a downloadable 'Mercator4-valid' fasta file with all records which contained errors removed. 

 

Mercator4 currently requires the fasta file to conform to the following criteria:-

  • General criteria:-
    • Each record in the fasta file must start with the records name (the line which starts with '>'). 
    • The record name for each entry must be unique within the fasta file. 
    • The sequence must be between 5 and 25000 characters long (either nucleotide or protein). 
    • The fasta file cannot contain a mix of nucleotide and protein sequences.
  • Protein sequence submission:-
    • The amino acid codes (ACDEFGHIKLMNPWRSTVWYUOBJZ), together with * (for stop), X (for any) and - or . for gaps are concidered valid characters.
    • A warning will be provided if any of the submitted records are also valid DNA sequence (these are not concidered invalid, and will not be removed). 
  • DNA sequence submissions:-
    • The nucleotide sequences ACGTU and RYSWKMBDHV together with N (for unknown) are concidered valid characters.
    • The '-' or '.' for gaps are also not accepted chararacters for nucleotide submissions.