Components

These are the currently available assemblerflow components with a short description of their tasks. For a more detailed information, follow the links of each component.

Read Quality Control

  • Integrity_coverage: Tests the integrity of the provided FastQ files, provides the option to filter FastQ files based on the expected assembly coverage and provides information about the maximum read length and sequence encoding.
  • FastQC: Runs FastQC on paired-end FastQ files.
  • Trimmomatic: Runs Trimmomatic on paired-end FastQ files.
  • Fastqc_trimmomatic: Runs Trimmomatic on paired-end FastQ files informed by the FastQC report.
  • Check_coverage: Estimates the coverage for each sample and filters FastQ files according to a specified minimum coverage threshold.

Assembly

  • Spades: Assembles paired-end FastQ files using SPAdes.
  • Skesa: Assembles paired-end FastQ files using skesa.

Post-assembly

  • Process_spades: Processes the assembly output from Spades and performs filtering base on quality criteria of GC content k-mer coverage and read length.
  • Process_skesa: Processes the assembly output from Skesa and performs filtering base on quality criteria of GC content k-mer coverage and read length.
  • Assembly_mapping: Performs a mapping procedure of FastQ files into a their assembly and performs filtering based on quality criteria of read coverage and genome size.
  • Pilon: Corrects and filters assemblies using Pilon.

Annotation

  • Prokka: Performs assembly annotation using prokka.
  • Abricate: Performs anti-microbial gene screening using abricate.

MLST

  • MLST: Checks the ST of an assembly using mlst.
  • Chewbbaca: Performs a cg/wgMLST analysis using ChewBBACA.

Reads typing

  • Seq_typing: Determines the type of a given sample frm a set of reference sequences.
  • Patho_typing: In silico pathogenic typing from raw illumina reads.

Plasmids

  • mapping_patlas: Performs read mapping and generates a JSON input file for pATLAS.
  • mash_screen: Performs mash screen against a reference index plasmid database and generates a JSON input file for pATLAS. This component searches for containment of a given sequence in read sequencing data. However if a different database is provided it can use mash screen for other purporses.
  • mash_dist: Executes mash distance against a reference index plasmid database and generates a JSON for pATLAS. This component calculates pairwise distances between sequences (one from the database and the query sequence). However if a different database is provided it can use mash dist for other purposes.