FASTA format description |
||||||||||||||||||||||||||||||||||||||||||||||||
A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (">") symbol in the first column. All lines should be shorter than 80 charcters. An example sequence in FASTA format is: >Name of the sequence ctgcgagNcgcgcgatgatagMMM-NNNnnnnncgcggcgagcatgtagcatgctagctgtcgcgagcactUUUURRRrrrrrrr cggccgagatcaggcgatgcatgcgcagggagcagcgagcgacgagcacagcatgctagctagatgcatgctaVvvvcgtaggcagc cgccgagagacgatggagctgc Sequences have to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are:
|
||||||||||||||||||||||||||||||||||||||||||||||||
| | Back | |