flowcraft.templates.process_assembly module¶
Purpose¶
This module is intended to process the output of assemblies from a single sample from programs such as Spades or Skesa. The main input is an assembly file produced by an assembler, which will then be filtered according to user-specified parameters.
Expected input¶
The following variables are expected whether using NextFlow or the
main()
executor.
sample_id
: Sample Identification string.- e.g.:
'SampleA'
- e.g.:
assembly
: Fasta file with the assembly.- e.g.:
'contigs.fasta'
- e.g.:
opts
: List of options for processing spades assembly.- Minimum contig length.
- e.g.:
'150'
- e.g.:
- Minimum k-mer coverage.
- e.g.:
'2'
- e.g.:
- Maximum number of contigs per 1.5Mb.
- e.g.:
'100'
- e.g.:
assembler
: The name of the assembler- e.g.:
spades
- e.g.:
Generated output¶
(Values within ${}
are substituted by the corresponding variable.)
'${sample_id}.assembly.fasta'
: Fasta file with the filtered assembly.- e.g.:
'Sample1.assembly.fasta'
- e.g.:
${sample_id}.report.fasta
: CSV file with the results of the filters for each contig.- e.g.:
'Sample1.report.csv'
- e.g.:
Code documentation¶
-
class
flowcraft.templates.process_assembly.
Assembly
(assembly_file, min_contig_len, min_kmer_cov, sample_id)[source]¶ Bases:
object
Class that parses and filters a Fasta assembly file
This class parses an assembly fasta file, collects a number of summary statistics and metadata from the contigs, filters contigs based on user-defined metrics and writes filtered assemblies and reports.
Parameters: - assembly_file : str
Path to assembly file.
- min_contig_len : int
Minimum contig length when applying the initial assembly filter.
- min_kmer_cov : int
Minimum k-mer coverage when applying the initial assembly. filter.
- sample_id : str
Name of the sample for the current assembly.
Methods
filter_contigs
(*comparisons)Filters the contigs of the assembly according to user provided comparisons. get_assembly_length
()Returns the length of the assembly, without the filtered contigs. write_assembly
(output_file[, filtered])Writes the assembly to a new file. write_report
(output_file)Writes a report with the test results for the current assembly -
contigs
= None¶ dict: Dictionary storing data for each contig.
-
filtered_ids
= None¶ list: List of filtered contig_ids.
-
min_gc
= None¶ float: Sets the minimum GC content on a contig.
-
sample
= None¶ str: The name of the sample for the assembly.
-
report
= None¶ dict: Will contain the filtering results for each contig.
-
filters
= None¶ list: Setting initial filters to check when parsing the assembly file. This can be later changed using the ‘filter_contigs’ method.
-
filter_contigs
(*comparisons)[source]¶ Filters the contigs of the assembly according to user provided comparisons.
The comparisons must be a list of three elements with the
contigs
key, operator and test value. For example, to filter contigs with a minimum length of 250, a comparison would be:self.filter_contigs(["length", ">=", 250])
The filtered contig ids will be stored in the
filtered_ids
list.The result of the test for all contigs will be stored in the
report
dictionary.Parameters: - comparisons : list
List with contig key, operator and value to test.
-
get_assembly_length
()[source]¶ Returns the length of the assembly, without the filtered contigs.
Returns: - x : int
Total length of the assembly.