flowcraft.templates.process_assembly module¶

Purpose¶

This module is intended to process the output of assemblies from a single sample from programs such as Spades or Skesa. The main input is an assembly file produced by an assembler, which will then be filtered according to user-specified parameters.

Expected input¶

The following variables are expected whether using NextFlow or the main() executor.

sample_id: Sample Identification string.
- e.g.: 'SampleA'
assembly: Fasta file with the assembly.
- e.g.: 'contigs.fasta'
opts: List of options for processing spades assembly.
1. Minimum contig length.
  
  e.g.: '150'
2. Minimum k-mer coverage.
  
  e.g.: '2'
3. Maximum number of contigs per 1.5Mb.
  
  e.g.: '100'
assembler: The name of the assembler
- e.g.: spades

Generated output¶

(Values within ${} are substituted by the corresponding variable.)

'${sample_id}.assembly.fasta' : Fasta file with the filtered assembly.
- e.g.: 'Sample1.assembly.fasta'
${sample_id}.report.fasta : CSV file with the results of the filters for each contig.
- e.g.: 'Sample1.report.csv'

Code documentation¶

class flowcraft.templates.process_assembly.Assembly(assembly_file, min_contig_len, min_kmer_cov, sample_id)[source]¶

Bases: object

Class that parses and filters a Fasta assembly file

This class parses an assembly fasta file, collects a number of summary statistics and metadata from the contigs, filters contigs based on user-defined metrics and writes filtered assemblies and reports.

Parameters:	assembly_file : str Path to assembly file. min_contig_len : int Minimum contig length when applying the initial assembly filter. min_kmer_cov : int Minimum k-mer coverage when applying the initial assembly. filter. sample_id : str Name of the sample for the current assembly.

Methods

`filter_contigs`(self, \*comparisons)	Filters the contigs of the assembly according to user provided comparisons.
`get_assembly_length`(self)	Returns the length of the assembly, without the filtered contigs.
`write_assembly`(self, output_file[, filtered])	Writes the assembly to a new file.
`write_report`(self, output_file)	Writes a report with the test results for the current assembly

contigs = None¶: dict: Dictionary storing data for each contig.

filtered_ids = None¶: list: List of filtered contig_ids.

min_gc = None¶: float: Sets the minimum GC content on a contig.

sample = None¶: str: The name of the sample for the assembly.

report = None¶: dict: Will contain the filtering results for each contig.

filters = None¶: list: Setting initial filters to check when parsing the assembly file. This can be later changed using the ‘filter_contigs’ method.

filter_contigs(self, *comparisons)[source]¶

Filters the contigs of the assembly according to user provided comparisons.

The comparisons must be a list of three elements with the contigs key, operator and test value. For example, to filter contigs with a minimum length of 250, a comparison would be:

self.filter_contigs(["length", ">=", 250])

The filtered contig ids will be stored in the filtered_ids list.

The result of the test for all contigs will be stored in the report dictionary.

Parameters:	comparisons : list List with contig key, operator and value to test.

get_assembly_length(self)[source]¶

Returns the length of the assembly, without the filtered contigs.

Returns:	x : int Total length of the assembly.

write_assembly(self, output_file, filtered=True)[source]¶

Writes the assembly to a new file.

The filtered option controls whether the new assembly will be filtered or not.

Parameters:	output_file : str Name of the output assembly file. filtered : bool If `True`, does not include filtered ids.

write_report(self, output_file)[source]¶

Writes a report with the test results for the current assembly

Parameters:	output_file : str Name of the output assembly file.