This tools allows users to validate a fasta file before submitting it to Mercator4. Each record in the fasta file will be validated and 'Mercator4-invalid' records will be summarised (clicking on each error category will show the list of records).
The user can optionally create a downloadable 'Mercator4-valid' fasta file with all records which contained errors removed.
Mercator4 currently requires the fasta file to conform to the following criteria:-
- General criteria:-
- Each record in the fasta file must start with the records name (the line which starts with '>').
- The record name for each entry must be unique within the fasta file.
- The sequence must be between 5 and 25000 characters long (either nucleotide or protein).
- The fasta file cannot contain a mix of nucleotide and protein sequences.
- Protein sequence submission:-
- The amino acid codes (ACDEFGHIKLMNPWRSTVWYUOBJZ), together with * (for stop), X (for any) and - or . for gaps are concidered valid characters.
- A warning will be provided if any of the submitted records are also valid DNA sequence (these are not concidered invalid, and will not be removed).
- DNA sequence submissions:-
- The nucleotide sequences ACGTU and RYSWKMBDHV together with N (for unknown) are concidered valid characters.
- The '-' or '.' for gaps are also not accepted chararacters for nucleotide submissions.