Result files
Marple produces a lot of intermediate and result files but only files defined in output_files.yaml are kept, the rest are temporary and will be deleted when not needed in the any consecutive rules. If other files than the predefined are wanted you need to edit output_files.yaml or add --no-temp to the running command.
Files
The following output files are located in Results/-folder:
| File | Format | Description |
|---|---|---|
multiqc_DNA.html |
html | Aggregated QC values for entire sequence run, open in browser |
{sample}/{sample}.xlsx |
xlsx | Excel file with QC stats (primarily coverage) for each sample |
{sample}/{sample}_N.cram" |
cram | Deduplicated alignment file |
{sample}/{sample}_N.cram.crai |
crai | Index for alignment file |
{sample}/{sample}.hard-filtered.vcf.gz |
vcf.gz | Compressed VCF-file decomposed, normalized and annotated with vep |
{sample}/{sample}.hard-filtered.vcf.gz.tbi |
tbi | Index for variant file |
{sample}/{sample}.genome.vcf.gz |
genome.vcf.gz | Compressed VCF-file for all positions in the design, not decomposed nor normalized |
{sample}/{sample}.genome.vcf.gz.tbi |
tbi | Index for genome VCF-file |
{sample}/{sample}_exomedepth_SV.txt |
txt | Nexus SV text file with structural variants from ExomeDepth |
{sample}/{sample}_exomedepth.aed |
aed | aed text file with structural variants from ExomeDepth |
{sample}/{sample}.cnv.vcf.gz |
vcf.gz | Compressed VCF-file with structural variants from ExomeDepth |
{sample}/{sample}.cnv.vcf.gz.tbi |
tbi | Index for variant file from ExomeDepth |
{sample}/mobile_elements/{sample}.ALU.vcf.gz |
vcf.gz | Compressed VCF-file with predicted ALU elements |
{sample}/mobile_elements/{sample}.LINE1.vcf.gz |
vcf.gz | Compressed VCF-file with predicted LINE1 elements |
{sample}/mobile_elements/{sample}.HERVK.vcf.gz |
vcf.gz | Compressed VCF-file with predicted HERVK elements |
{sample}/mobile_elements/{sample}.SVA.vcf.gz |
vcf.gz | Compressed VCF-file with predicted SVA elements |
{sample}/mosaic/{sample}.deepmosaic.txt |
tsv | Candidate variants and their predictions from DeepMosaic |
{sample}/mosaic/{sample}.deepsomatic.vcf.gz |
vcf.gz | Compressed VCF-file from DeepSomatic where PASS are possible mosaic variants |
{sample}/mosaic/{sample}.deepsomatic.vcf.gz.tbi |
vcf.gz | Index for genome VCF-file |
{sample}/mosaic/{sample}.mosaicforecast.phasing |
tsv | Candidate mosaic variants based on phasing from MosaicForecast |
{sample}/mosaic/{sample}.mosaicforecast.DEL.predictions |
tsv | Candidate deletion variants and their predictions from MosaicForecast |
{sample}/mosaic/{sample}.mosaicforecast.INS.predictions |
tsv | Candidate insertion variants and their predictions from MosaicForecast |
{sample}/mosaic/{sample}.mosaicforecast.SNP.predictions |
tsv | Candidate SNP variants and their predictions from MosaicForecast |
{sequenceid}_config.yaml |
yaml | yaml config-file with programversion and extra settings used |
{sequenceid}_config_exomedepth.yaml |
yaml | yaml config-file with which reference was used for ExomeDepth |
MultiQC report
Marple produces a MultiQC-report for the entire sequencing run to enable easier QC tracking. The report starts with a general statistics table showing the most important QC-values followed by additional QC data and diagrams. The entire MultiQC html-file is interactive and you can filter, highlight, hide or export data using the ToolBox at the right edge of the report.
The report is configured based on a MultiQC config file.
Expand to view current MultiQC config.yaml
title: "Clinical Genomics MultiQC Report"
subtitle: "Twist Cancer"
intro_text: "The MultiQC report summarize analysis results from Twist Cancer data that been analyzed by the pipeline marple_rd_tc (https://github.com/clinical-genomics-uppsala/marple-rd-tc). Reference used: GRCh38."
report_header_info:
- Contact E-mail: "igp-klinsek-bioinfo@lists.uu.se"
- Application Type: "Twist Cancer Panel"
- Project Type: "Clinical Samples"
# - Sequencing Platform: "HiSeq 2500 High Output V4"
# - Sequencing Setup: "2x150"
decimalPoint_format: ','
## maste anpassa configen lite mera. 20x breadth, insert size och bases on target. Fold80?
extra_fn_clean_exts: ##from this until end
- '.duplication_metrics'
- '_N'
custom content:
order:
- fastqc
- mosdepth
- fastp
- peddy
- samtools
- picard
mosdepth_config:
include_contigs:
- "chr1"
- "chr2"
- "chr3"
- "chr4"
- "chr5"
- "chr6"
- "chr7"
- "chr8"
- "chr9"
- "chr10"
- "chr11"
- "chr12"
- "chr13"
- "chr14"
- "chr15"
- "chr16"
- "chr17"
- "chr18"
- "chr19"
- "chr20"
- "chr21"
- "chr22"
read_count_multiplier: 0.001
read_count_prefix: "K"
read_count_desc: "thousands"
table_columns_visible:
FastQC:
percent_duplicates: False
percent_gc: False
avg_sequence_length: False
percent_fails: False
total_sequences: False
fastp:
pct_adapter: True
pct_surviving: False
after_filtering_gc_content: False
filtering_result_passed_filter_reads: False
after_filtering_q30_bases: False
after_filtering_q30_rate: False
pct_duplication: False
mosdepth:
median_coverage: True
mean_coverage: False
1_x_pc: False
5_x_pc: False
10_x_pc: False
20_x_pc: False
30_x_pc: True
50_x_pc: True
Picard:
PCT_PF_READS_ALIGNED: False
summed_median: False
summed_mean: True
PERCENT_DUPLICATION: True
MEDIAN_COVERAGE: False
MEAN_COVERAGE: False
SD_COVERAGE: False
PCT_30X: False
PCT_TARGET_BASES_30X: False
FOLD_ENRICHMENT: False
TOTAL_READS: False
Samtools:
error_rate: False
non-primary_alignments: False
reads_mapped: False
reads_mapped_percent: True
reads_properly_paired_percent: True
reads_MQ0_percent: False
raw_total_sequences: True #only on bedfile not total of fastq, bases on target only
# Patriks plug in, addera egna columner till general stats
multiqc_cgs:
Picard:
FOLD_80_BASE_PENALTY:
title: "Fold80"
description: "Fold80 penalty from picard hs metrics"
min: 1
max: 3
scale: "RdYlGn-rev"
format: "{:.1f}"
PCT_SELECTED_BASES:
title: "Bases on Target"
description: "On+Near Bait Bases / PF Bases Aligned from Picard HsMetrics"
format: "{:.2%}"
ZERO_CVG_TARGETS_PCT:
title: "Target bases with zero coverage [%]"
description: "Target bases with zero coverage [%] from Picard"
min: 0
max: 100
scale: "RdYlGn-rev"
format: "{:.2%}"
Samtools:
average_quality:
title: "Average Quality"
description: "Ratio between the sum of base qualities and total length from Samtools stats"
min: 0
max: 60
scale: "RdYlGn"
mosdepth:
20_x_pc: #Cant get it to work
title: "20x percent"
description: "Fraction of genome with at least 20X coverage"
max: 100
min: 0
suffix: "%"
scale: "RdYlGn"
# Galler alla kolumner oberoende pa module!
table_columns_placement:
mosdepth:
median_coverage: 601
1_x_pc: 666
5_x_pc: 666
10_x_pc: 602
20_x_pc: 603
30_x_pc: 604
50_x_pc: 605
Samtools:
raw_total_sequences: 500
reads_mapped: 501
reads_mapped_percent: 502
reads_properly_paired_percent: 503
average_quality: 504
error_rate: 555
reads_MQ0_percent: 555
non-primary_alignments: 555
Picard:
TOTAL_READS: 500
PCT_SELECTED_BASES: 801
FOLD_80_BASE_PENALTY: 802
PCT_PF_READS_ALIGNED: 888
summed_median: 888
PERCENT_DUPLICATION: 803
summed_mean: 804
STANDARD_DEVIATION: 805
ZERO_CVG_TARGETS_PCT: 888
MEDIAN_COVERAGE: 888
MEAN_COVERAGE: 888
SD_COVERAGE: 888
PCT_30X: 888
PCT_TARGET_BASES_30X: 888
FOLD_ENRICHMENT: 888
General Statistics
The general statistics table are ordered based on the fastq-file "S"-index, e.g. sampleT_S1_R1_001.fastq.gz will be before sampleA_S2_R1_001.fastq.gz. This is done by renaming the samples in two steps using the script sample_order_multiqc.py. To toggle between "Sample Order" and "Sample Name" use the buttons just above General Stats header.
| Column Name | Origin | Comment |
|---|---|---|
| K Reads | Samtools stats | Total number of reads in inputfile (alignment/samtools_merge_bam/{sample}_{type}.bam) |
| % Mapped | Samtools stats | Percent reads mapped, anywhere in the reference (no design file used) |
| % Proper pairs | Samtools stats | Only reads on target (config[reference][design_bed]) |
| Average Quality | Samtools stats | Ratio between sum of base quality over total length. Only reads on target (config[reference][design_bed]) |
| Median | Mosdepth | Median Coverage over coding exon in design (config[reference][exon_bed]) |
| >= 30X | Mosdepth | Fraction of coding exons (config[reference][exon_bed]) with coverage over 30x |
| >=50X | Mosdepth | Fraction of coding exons (config[reference][exon_bed]) with coverage over 50x |
| Bases on Target | Picard HSMetrics | Bases inside the capture design (config[reference][design_intervals]) |
| Fold80 | Picard HSMetrics | The fold over-coverage necessary to raise 80% of bases in "non-zero-cvg" targets to the mean coverage level in those targets (config[reference][design_intervals]) |
| % Dups | Picard DuplicationMetrics | |
| Mean Insert Size | Picard InsertSizeMetrics | |
| Target Bases with zero coverage [%] | Picard HSMetrics | Percent target (config[reference][design_intervals]) bases with 0 coverage |
| % Adapter | fastp |