flowcraft.templates.assembly_report module¶
Purpose¶
This module is intended to provide a summary report for a given assembly in Fasta format.
Expected input¶
The following variables are expected whether using NextFlow or the
main() executor.
sample_id: Sample Identification string.- e.g.:
'SampleA'
- e.g.:
assembly: Path to assembly file in Fasta format.- e.g.:
'assembly.fasta'
- e.g.:
Generated output¶
${sample_id}_assembly_report.csv: CSV with summary information of the assembly.- e.g.:
'SampleA_assembly_report.csv'
- e.g.:
Code documentation¶
-
class
flowcraft.templates.assembly_report.Assembly(assembly_file, sample_id)[source]¶ Bases:
objectClass that parses and filters an assembly file in Fasta format.
This class parses an assembly file, collects a number of summary statistics and metadata from the contigs and reports.
Parameters: - assembly_file : str
Path to assembly file.
- sample_id : str
Name of the sample for the current assembly.
Methods
get_coverage_sliding(self, coverage_file[, …])Parameters: get_gc_sliding(self[, window])Calculates a sliding window of the GC content for the assembly get_summary_stats(self[, output_csv])Generates a CSV report with summary statistics about the assembly -
summary_info= None¶ OrderedDict: Initialize summary information dictionary. Contains keys:
ncontigs: Number of contigsavg_contig_size: Average size of contigsn50: N50 metrictotal_len: Total assembly lengthavg_gc: Average GC proportionmissing_data: Count of missing data characters
-
contigs= None¶ OrderedDict: Object that maps the contig headers to the corresponding sequence
-
contig_coverage= None¶ OrderedDict: Object that maps the contig headers to the corresponding list of per-base coverage
-
sample= None¶ str: Sample id
-
contig_boundaries= None¶ dict: Maps the boundaries of each contig in the genome
-
get_summary_stats(self, output_csv=None)[source]¶ Generates a CSV report with summary statistics about the assembly
The calculated statistics are:
- Number of contigs
- Average contig size
- N50
- Total assembly length
- Average GC content
- Amount of missing data
Parameters: - output_csv: str
Name of the output CSV file.