Contig Finder: Identify Contigs with Your Genes
Step 1: Upload Your Genome File
Drag & Drop Your Genome File Here in FASTA/FASTQ Format
(Max File Size is 250MB)
File Upload Status
Step 2: Provide Query Sequences in FASTA Format
Contigs Selected (0/10)
Search Results
Your Selected Contigs
Contig Finder Genome File Requirements
- Your genome file must be in standard FASTA or FASTQ format.
- The filename should end with a '.fasta', '.fa', '.fastq' or '.fq' extension.
- The file can only contain DNA sequences.
- ID lines should not contain special characters- we recommend only using letters, numbers, periods and dashes. Other characters may be removed by the program!
- The file must not exceed 250MB in size.
Gene File Requirements
- Files must be in standard FASTA or FASTQ format.
- File names should end with a '.fasta', '.fa', '.fastq' or '.fq' extension.
- Gene files may contain DNA or standard protein sequences.
- There is currently a maximum limit of 15 gene files.
- Each gene file is currently limited to contain a maximum of 100 gene sequences.
- Identical gene names appearing in the same file will have numbers appended (e.g. MyGene.1, MyGene.2, MyGene.3 etc.) and treated as separate genes.
- When using multiple gene files, gene names without identical spelling will be treated as separate genes (e.g. MyGene1 ≠ MyGene ≠ My-Gene).
- Individual files must not exceed 100MB in size.
Minimum Query Coverage Cutoff
Only query sequences with a percentage of basepairs/residues falling within significant BLAST hits at, or above, this value will appear in your results. Using the
full-length gene query sequence, we tile the hits or 'HSPs' (High-scoring Segment Pair) which map with the same strand/direction as the HSP with the highest bitscore. A simplified example is shown below:
Original Query Gene: 1 ACCACCTTGAACAATCC 17
Genome Contig Sequence: 1 AACACCTCTCTCTTAAACTTT 21
BLAST HIT 1:
Query 1 ACCACCT 7
| |||||
Sbjct 1 AACACCT 7
BLAST HIT 2:
Query 6 CTTGAACAAT 15
||| ||| |
Sbjct 12 CTTAAACTTT 21
Now we map the significant hits back to the original:
Original: ACCACCTTGAACAATCC
Hit1: A-CACCT
Hit2: CTT-AAC--T
Combined: ACCACCTTGAACAAT--
Coverage: 15/17 (88.24%)
Note how the gaps within BLAST hits are ignored when calculating the final coverage score. If the 'Minimum Query Coverage Cutoff' was set to 88% this gene would map, however, if it was set to 89% it would not. This feature is included to help avoid queries with only a small fragment mapping to a genome from cluttering up results. Setting the value to '1' will show any query with at least one significant hit in your results.
Circular Genome Mode
This mode can be useful when dealing with circular genomes from bacteria or mitochondria. When using this mode, Genome1 acts as a reference for the order that genes appear. All other genomes will then be rotated to maximize the number of genes which match this order. This mode requires each genome to have only one circular contig and will be disabled if you upload a genome with more than one contig.