Pipeline specific rules/softwares used in Marple

Rules created specifically for Marple pipeline are listed here.

add_ref_to_vcf.smk

A pythonscript to add a line to vcf-files with reference path for Alissa to know which genome build used. The ##reference=-line need to contain either hg38 or GRCh38 for Alissa to understand that the reference is not hg19.

Rule

rule add_ref_to_vcf:
    input:
        vcf="snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.vcf.gz",
        ref=config["reference"]["fasta"],
    output:
        vcf=temp("snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.ref.vcf"),
    log:
        "snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.ref.vcf.log",
    benchmark:
        repeat(
            "snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.ref.vcf.benchmark.tsv",
            config.get("add_ref_to_vcf", {}).get("benchmark_repeats", 1),
        )
    resources:
        mem_mb=config.get("add_ref_to_vcf", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("add_ref_to_vcf", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("add_ref_to_vcf", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("add_ref_to_vcf", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("add_ref_to_vcf", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("add_ref_to_vcf", {}).get("container", config["default_container"])
    message:
        "{rule}: Add reference to the header of the deepvariant vcf: {input.vcf}"
    script:
        "../scripts/add_ref_to_vcf.py"

input / output files

Rule parameters	Key	Value	Description
input	vcf	`"snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.vcf.gz"`	final vcf where reference should be added to vcf-header
input	ref	`config["reference"]["fasta"]`	fasta reference used
output	vcf	`"snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.ref.vcf"`	final vcf with reference genome in vcf-header

Configuration

Software settings (`config.yaml`)

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container

Resources settings (`resources.yaml`)

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

exomedepth_export.smk

A Rscript to create output files from exomedepth results.

Rule

rule exomedepth_export:
    input:
        exon="cnv_sv/exomedepth_call/{sample}_{type}.RData",
    output:
        aed=temp("cnv_sv/exomedepth_call/{sample}_{type}.aed"),
        nexus_sv=temp("cnv_sv/exomedepth_call/{sample}_{type}_SV.txt"),
    params:
        extra=config.get("exomedepth_export", {}).get("extra", ""),
    log:
        "cnv_sv/exomedepth_call/{sample}_{type}_SV.txt.log",
    benchmark:
        repeat(
            "cnv_sv/exomedepth_call/{sample}_{type}_SV.txt.benchmark.tsv",
            config.get("exomedepth_export", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("exomedepth_export", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("exomedepth_export", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("exomedepth_export", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("exomedepth_export", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("exomedepth_export", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("exomedepth_export", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("exomedepth_export", {}).get("container", config["default_container"])
    message:
        "{rule}: Export exomedepth CNV results from {input.exon} "
    script:
        "../scripts/exomedepth_export.R"

input / output files

Rule parameters	Key	Value	Description
input	exon	`"cnv_sv/exomedepth_call/{sample}_{type}.RData"`	Rdata from exomedepth call
output	aed	`"cnv_sv/exomedepth_call/{sample}_{type}.aed"`	calls from exomedepth in aed format
output	nexus_sv	`"cnv_sv/exomedepth_call/{sample}_{type}_SV.txt"`	nexus SV txt file with exomedepth calls

Configuration

Software settings (`config.yaml`)

Key	Type	Description
container	string	name or path to docker/singularity container

Resources settings (`resources.yaml`)

Key	Type	Description
threads	integer	number of threads that will be used by exomedepth_export
time	string	max execution time for exomedepth_export
mem_mb	integer	memory used for exomedepth_export
mem_per_cpu	integer	memory used per cpu for exomedepth_export
partition	string	partition to use on the cluster for exomedepth_export

export_qc.smk

Rules that creates a .xlsx file per sample with aggregated coverage information.

Rule

rule export_qc_bedtools_intersect:
    input:
        left="qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz",
        coverage_csi="qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz.csi",
        right=config["reference"]["exon_bed"],
    output:
        results=temp("qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.txt"),
    params:
        extra=config.get("export_qc_bedtools_intersect", {}).get("extra", ""),
    log:
        "qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.log",
    benchmark:
        repeat(
            "qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.benchmark.tsv",
            config.get("export_qc_bedtools_intersect", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("export_qc_bedtools_intersect", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("export_qc_bedtools_intersect", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("export_qc_bedtools_intersect", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("export_qc_bedtools_intersect", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("export_qc_bedtools_intersect", {}).get("container", config["default_container"])
    message:
        "{rule}: export low cov regions from {input.left} based on {input.right}"
    wrapper:
        "v1.32.0/bio/bedtools/intersect"

input / output files

Rule parameters	Key	Value	Description
input	left	`"qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz"`	per-base coverage file from mosdepth
	coverage_csi	`"qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz.csi"`	index file for per-base.bed.gz file
	right	`config["reference"]["exon_bed"]`	design bed used to only look at coverage based on bedfile
output	results	`"qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.txt"`	.txt file with coverage per base for design file

Configuration

Software settings (`config.yaml`)

Key	Type	Description
container	string	path to container with bedtools (common)
extra	string	extra configuration for bedtools intersect

Resources settings (`resources.yaml`)

Key	Type	Description
threads	integer	number of threads that will be used by export_qc_bedtools_intersect
time	string	max execution time for export_qc_bedtools_intersect
mem_mb	integer	memory used for eexport_qc_bedtools_intersect
mem_per_cpu	integer	memory used per cpu for export_qc_bedtools_intersect
partition	string	partition to use on the cluster for export_qc_bedtools_intersect

Rule

rule export_qc_bedtools_intersect_pgrs:
    input:
        left="qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz",
        coverage_csi="qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz.csi",
        right=config["reference"]["pgrs_bed"],
    output:
        results=temp("qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.txt"),
    params:
        extra=config.get("export_qc_bedtools_intersect", {}).get("extra", ""),
    log:
        "qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.log",
    benchmark:
        repeat(
            "qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.benchmark.tsv",
            config.get("export_qc_bedtools_intersect", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("export_qc_bedtools_intersect", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("export_qc_bedtools_intersect", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("export_qc_bedtools_intersect", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("export_qc_bedtools_intersect", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("export_qc_bedtools_intersect", {}).get("container", config["default_container"])
    message:
        "{rule}: export low cov regions from {input.left} based on {input.right}"
    wrapper:
        "v1.32.0/bio/bedtools/intersect"

input / output files

Rule parameters	Key	Value	Description
input	left	`"qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz"`	per-base coverage file from mosdepth
	coverage_csi	`"qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz.csi"`	index file for per-base.bed.gz file
	right	`config["reference"]["pgrs_bed"]`	design bed used to only look at coverage based on bedfile, in this case pgrs positions
output	results	`"qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.txt"`	.txt file with coverage per base for design file

Configuration

Software settings (`config.yaml`)

Key	Type	Description
container	string	path to container with bedtools

Resources settings (`resources.yaml`)

Key	Type	Description
threads	integer	number of threads that will be used by export_qc_bedtools_intersect_pgrs
time	string	max execution time for export_qc_bedtools_intersect_pgrs
mem_mb	integer	memory used for eexport_qc_bedtools_intersect_pgrs
mem_per_cpu	integer	memory used per cpu for export_qc_bedtools_intersect_pgrs
partition	string	partition to use on the cluster for export_qc_bedtools_intersect_pgrs

Rule

rule export_qc_xlsx_report:
    input:
        mosdepth_summary="qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt",
        mosdepth_thresholds="qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz",
        mosdepth_regions="qc/mosdepth_bed/{sample}_{type}.regions.bed.gz",
        mosdepth_perbase="qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.txt",
        picard_dup="qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt",
        pgrs_coverage="qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.txt",
        design_bed=config["reference"]["design_bed"],
        pgrs_bed=config["reference"]["pgrs_bed"],
        wanted_transcripts=config["export_qc_xlsx_report"]["wanted_transcripts"],
    output:
        results=temp("qc/xlsx_report/{sample}_{type}.xlsx"),
    params:
        coverage_thresholds=config["mosdepth_bed"]["thresholds"],
        sequenceid=config["sequenceid"],
    log:
        "qc/xlsx_report/{sample}_{type}.xlsx.log",
    benchmark:
        repeat(
            "qc/xlsx_report/{sample}_{type}.xlsx.benchmark.tsv",
            config.get("export_qc_xlsx_report", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("export_qc_xlsx_report", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("export_qc_xlsx_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("export_qc_xlsx_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("export_qc_xlsx_report", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("export_qc_xlsx_report", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("export_qc_xlsx_report", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("export_qc_xlsx_report", {}).get("container", config["default_container"])
    message:
        "{rule}: collecting qc values into {output}"
    # localrule: True
    script:
        "../scripts/export_qc_xlsx_report.py"

input / output files

Rule parameters	Key	Value	Description
input	mosdepth_summary	`"qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt"`	mosdepth bed summary file
	mosdepth_thresholds	`"qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz"`	Mosdepth bed thresholds file
	mosdepth_regions	`"qc/mosdepth_bed/{sample}_{type}.regions.bed.gz"`	mosdepth bed coverage per region file
	mosdepth_perbase	`"qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.txt"`	mosdepth bed per-base result file subsampled into exons in export_qc_bedtools_intersect output
	picard_dup	`"qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt"`	picard collect duplication metrics results file
	pgrs_coverage	`"qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.txt"`	mosdepth per-base file from export_qc_bedtools_intersect_pgrs output
	design_bed	`config["reference"]["design_bed"]`	design bed defined in config-file
	pgrs_bed	`config["reference"]["pgrs_bed"]`	bedfile with PGRS score SNPs
	wanted_transcripts	`config["export_qc_xlsx_report"]["wanted_transcripts"]`	path to txt-file in bedformat of transcripts of interest
output	results	`"qc/xlsx_report/{sample}_{type}.xlsx"`	.xlsx file with summarized QC-values per sample

Configuration

Software settings (`config.yaml`)

Key	Type	Description
container	string	path to container, pyton3, gzip, date and xlsxwriter
wanted_transcripts	string	transcripts of interest to be highlighted in xlsx report

Resources settings (`resources.yaml`)

Key	Type	Description
threads	integer	number of threads that will be used by export_qc_xlsx_report
time	string	max execution time for export_qc_xlsx_report
mem_mb	integer	memory used for export_qc_xlsx_report
mem_per_cpu	integer	memory used per cpu for export_qc_xlsx_report
partition	string	partition to use on the cluster for export_qc_xlsx_report

sample_order_multiqc.smk

A python script to create sample_replacement and sample_order files to be used in MultiQC to order samples based on order of the "S"-index in the samplenames.

Rule

rule sample_order_multiqc:
    output:
        replacement=temp("qc/multiqc/sample_replacement.tsv"),
        order=temp("qc/multiqc/sample_order.tsv"),
    params:
        filelist=[(u.sample, u.fastq1) for u in units[units.type == "N"].itertuples()],
    log:
        "qc/multiqc/sample_order.tsv.log",
    benchmark:
        repeat("qc/multiqc/sample_order.tsv.benchmark.tsv", config.get("sample_order_multiqc", {}).get("benchmark_repeats", 1))
    threads: config.get("sample_order_multiqc", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("sample_order_multiqc", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("sample_order_multiqc", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("sample_order_multiqc", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("sample_order_multiqc", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("sample_order_multiqc", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("sample_order_multiqc", {}).get("container", config["default_container"])
    message:
        "{rule}: Create a sample order tsv based on S_index in {params.filelist} for multiqc"
    script:
        "../scripts/sample_order_multiqc.py"

input / output files

Rule parameters	Key	Value	Description
output	replacement	`"qc/multiqc/sample_replacement.tsv"`	list of sample name replacement, `sampleXXX` based on order in SampleSheet
output	order	`"qc/multiqc/sample_order.tsv"`	list of back-translated name from `sampleXXX` to original names

Configuration

Software settings (`config.yaml`)

Key	Type	Description
container	string	path to container

Resources settings (`resources.yaml`)

Key	Type	Description
threads	integer	number of threads that will be used by sample_order_multiqc
time	string	max execution time for sample_order_multiqc
mem_mb	integer	memory used for sample_order_multiqc
mem_per_cpu	integer	memory used per cpu for sample_order_multiqc
partition	string	partition to use on the cluster for sample_order_multiqc

[tsv2vcf]

Convert exomedepth calls in tsv format to VCF

Rule

rule tsv2vcf:
    input:
        tsv="cnv_sv/exomedepth_call/{sample}_{type}.txt",
        ref=config["reference"]["fasta"],
    output:
        vcf="cnv_sv/exomedepth_call/{sample}_{type}.vcf",
    params:
        extra=config.get("tsv2vcf", {}).get("extra", ""),
    log:
        "cnv_sv/exomedepth_call/{sample}_{type}.vcf.gz.log",
    benchmark:
        repeat(
            "cnv_sv/exomedepth_call/{sample}_{type}.vcf.gz.benchmark.tsv", config.get("tsv2vcf", {}).get("benchmark_repeats", 1)
        )
    threads: config.get("tsv2vcf", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("tsv2vcf", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("tsv2vcf", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("tsv2vcf", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("tsv2vcf", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("tsv2vcf", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("tsv2vcf", {}).get("container", config["default_container"])
    message:
        "{rule}: convert {input.tsv} to VCF"
    script:
        "../scripts/tsv2vcf.sh"

input / output files

Rule parameters	Key	Value	Description
input	tsv	`"cnv_sv/exomedepth_call/{sample}_{type}.txt"`	Exomdepth calls in csv format
input	ref	`config["reference"]["fasta"]`	reference geneome fasta file
output	vcf	`"cnv_sv/exomedepth_call/{sample}_{type}.vcf"`	Exomedepth calls in compressed VCF

Configuration

Software settings (`config.yaml`)

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

Pipeline specific rules/softwares used in Marple

add_ref_to_vcf.smk

Rule

input / output files

Configuration

Software settings (config.yaml)

Resources settings (resources.yaml)

exomedepth_export.smk

Rule

input / output files

Configuration

Software settings (config.yaml)

Resources settings (resources.yaml)

export_qc.smk

Rule

input / output files

Configuration

Software settings (config.yaml)

Resources settings (resources.yaml)

Rule

input / output files

Configuration

Software settings (config.yaml)

Resources settings (resources.yaml)

Rule

input / output files

Configuration

Software settings (config.yaml)

Resources settings (resources.yaml)

sample_order_multiqc.smk

Rule

input / output files

Configuration

Software settings (config.yaml)

Resources settings (resources.yaml)

[tsv2vcf]

Rule

input / output files

Configuration

Software settings (config.yaml)

Resources settings (resources.yaml)

Software settings (`config.yaml`)

Resources settings (`resources.yaml`)

Software settings (`config.yaml`)

Resources settings (`resources.yaml`)

Software settings (`config.yaml`)

Resources settings (`resources.yaml`)

Software settings (`config.yaml`)

Resources settings (`resources.yaml`)

Software settings (`config.yaml`)

Resources settings (`resources.yaml`)

Software settings (`config.yaml`)

Resources settings (`resources.yaml`)

Software settings (`config.yaml`)

Resources settings (`resources.yaml`)