EZmix


EZmix helps users to recognize areas of inter molecular similarity that may be indicative of the assembly of chimeric mitochondrial genomes. The tool produces a graphical output where such regions are highlighted.

We suggest this tool to be used on the complete set of preliminary assemblies (candidate mitochondrial genomes) coming from the sequencing of a non barcoded library including pooled DNAs from different species. While a correct assembly should avoid the possibility that sequences from different species are mixed in one assembled candidate genome, this possibility cannot be ruled out a priori. As such, visualizing similarities between the complete set of candidate genomes may be an useful step in the quality control of assemblies. Noteworthy, areas of inter molecular similarity may arise from phylogenetic relatedness (not indicative of an assembly error) or from chimeric assemblies (indicative of an error). Length and similarity thresholds may be used to optimize the output based on the features of the sequence set to help discriminate the two instances.

We suggest a lenght threshold of 200 bp and a similarity threshold of 95% as starting values.

Use EZmix it according to the following instructions:

  1. Prepare a fasta file (.fasta, .fa, .fsa, .fas, .fna, .ffn, .faa, .frn) with whole candidate mitogenome sequences. Please make sure that:
    • there are no duplicate taxon names;
    • there are no GAPs within the sequences (if there are, they will be automatically removed);
    • sequence names are informative if truncated to 10 characters (as in the grahical output);
  2. Upload the fasta file in the Upload link button.
  3. Choose the similarity threshold;
  4. Choose the sequence length threshold;
  5. If the analysis does not produce errors, a compressed folder will be downloaded. Both input and output files will be stored in the compressed folder.

Your data will go through the following steps:

  1. sequences will be checked for the presence of GAPs or duplicated sequence IDs;
  2. blast (nucleotide blast, default settings) will be used to identify regions of similarity between every pair of sequences;
  3. blast output will be filtered according to a length (minimum number of nucleotides) and similarity threshold (minimum % similarity);
  4. regions of similarity will be plotted over a graphical representation of the sequences, color-coded according to % similarity;

Download an input example


*Useful to organize your processed jobs. If not provided, a random ID will be created.