Integrity_coverage¶
Purpose¶
This component is intended to test the integrity of the provided FastQ files.
It does so by attempting to parse uncompressed or compressed (gz
, bz2
or zip
) FastQ files (paired-end or single-end). During this parse, if the
FastQ files are not corrupt, it retrieves the following information:
sequence encoding: Estimates the sequence encoding based on the quality scores. This information can then be passed to other components that might required it.
estimated coverage: Provides a rough coverage estimation for each sample based on a user-provided genome size (see Parameters). This estimation is essentially
\[\frac{\text{number of base pairs}}{(\text{genome size} \times 1e^{6})}\]This information is written to the
reports
directory (See Published reports)maximum read length.: Retrieves the maximum read length for each sample.
Important
If the minCoverage
parameter value is set to higher than 0, this
component will filter samples with an estimated coverage below that
threshold.
Input/Output type¶
- Input type:
FastQ
- Output type:
FastQ
Note
The default input parameter for FastQ data is --fastq
. You can change
the --fastq
parameter default pattern (fastq/*_{1,2}.*
) according
to input file names (e.g.: --fastq "path/to/fastq/*R{1,2}.*"
).
Parameters¶
genomeSize
: Genome size estimate for the samples. It is used to estimate the coverage and other assembly parameters and checks.minCoverage
: Minimum coverage for a sample to proceed. Can be set to 0 to allow any coverage.
Note
You can use these parameters as in the following example:
--genomeSize 3
.
Published results¶
None.
Published reports¶
reports/coverage
: CSV table with estimated sequencing coverage for each sample.reports/corrupted
: Text file with list of corrupted samples.
Default directives¶
None.