Pipeline specific rules/softwares used in Marple
Rules created specifically for Marple pipeline are listed here.
add_ref_to_vcf.smk
A pythonscript to add a line to vcf-files with reference path for Alissa to know which genome build used. The ##reference=-line need to contain either hg38 or GRCh38 for Alissa to understand that the reference is not hg19.
Rule
rule add_ref_to_vcf:
input:
vcf="snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.vcf.gz",
ref=config["reference"]["fasta"],
output:
vcf=temp("snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.ref.vcf"),
log:
"snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.ref.vcf.log",
benchmark:
repeat(
"snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.ref.vcf.benchmark.tsv",
config.get("add_ref_to_vcf", {}).get("benchmark_repeats", 1),
)
resources:
mem_mb=config.get("add_ref_to_vcf", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("add_ref_to_vcf", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("add_ref_to_vcf", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("add_ref_to_vcf", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("add_ref_to_vcf", {}).get("time", config["default_resources"]["time"]),
container:
config.get("add_ref_to_vcf", {}).get("container", config["default_container"])
message:
"{rule}: Add reference to the header of the deepvariant vcf: {input.vcf}"
script:
"../scripts/add_ref_to_vcf.py"
| Rule parameters |
Key |
Value |
Description |
| input |
vcf |
"snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.vcf.gz" |
final vcf where reference should be added to vcf-header |
| ref |
config["reference"]["fasta"] |
fasta reference used |
| output |
vcf |
"snv_indels/deepvariant/{sample}_N.normalized.sorted.vep_annotated.ref.vcf" |
final vcf with reference genome in vcf-header |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
exomedepth_export.smk
A Rscript to create output files from exomedepth results.
Rule
rule exomedepth_export:
input:
exon="cnv_sv/exomedepth_call/{sample}_{type}.RData",
output:
aed=temp("cnv_sv/exomedepth_call/{sample}_{type}.aed"),
nexus_sv=temp("cnv_sv/exomedepth_call/{sample}_{type}_SV.txt"),
params:
extra=config.get("exomedepth_export", {}).get("extra", ""),
log:
"cnv_sv/exomedepth_call/{sample}_{type}_SV.txt.log",
benchmark:
repeat(
"cnv_sv/exomedepth_call/{sample}_{type}_SV.txt.benchmark.tsv",
config.get("exomedepth_export", {}).get("benchmark_repeats", 1),
)
threads: config.get("exomedepth_export", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("exomedepth_export", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("exomedepth_export", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("exomedepth_export", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("exomedepth_export", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("exomedepth_export", {}).get("time", config["default_resources"]["time"]),
container:
config.get("exomedepth_export", {}).get("container", config["default_container"])
message:
"{rule}: Export exomedepth CNV results from {input.exon} "
script:
"../scripts/exomedepth_export.R"
| Rule parameters |
Key |
Value |
Description |
| input |
exon |
"cnv_sv/exomedepth_call/{sample}_{type}.RData" |
Rdata from exomedepth call |
| output |
aed |
"cnv_sv/exomedepth_call/{sample}_{type}.aed" |
calls from exomedepth in aed format |
| nexus_sv |
"cnv_sv/exomedepth_call/{sample}_{type}_SV.txt" |
nexus SV txt file with exomedepth calls |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| container |
string |
name or path to docker/singularity container |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| threads |
integer |
number of threads that will be used by exomedepth_export |
| time |
string |
max execution time for exomedepth_export |
| mem_mb |
integer |
memory used for exomedepth_export |
| mem_per_cpu |
integer |
memory used per cpu for exomedepth_export |
| partition |
string |
partition to use on the cluster for exomedepth_export |
export_qc.smk
Rules that creates a .xlsx file per sample with aggregated coverage information.
Rule
rule export_qc_bedtools_intersect:
input:
left="qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz",
coverage_csi="qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz.csi",
right=config["reference"]["exon_bed"],
output:
results=temp("qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.txt"),
params:
extra=config.get("export_qc_bedtools_intersect", {}).get("extra", ""),
log:
"qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.log",
benchmark:
repeat(
"qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.benchmark.tsv",
config.get("export_qc_bedtools_intersect", {}).get("benchmark_repeats", 1),
)
threads: config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("export_qc_bedtools_intersect", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("export_qc_bedtools_intersect", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("export_qc_bedtools_intersect", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("export_qc_bedtools_intersect", {}).get("time", config["default_resources"]["time"]),
container:
config.get("export_qc_bedtools_intersect", {}).get("container", config["default_container"])
message:
"{rule}: export low cov regions from {input.left} based on {input.right}"
wrapper:
"v1.32.0/bio/bedtools/intersect"
| Rule parameters |
Key |
Value |
Description |
| input |
left |
"qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz" |
per-base coverage file from mosdepth |
| coverage_csi |
"qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz.csi" |
index file for per-base.bed.gz file |
| right |
config["reference"]["exon_bed"] |
design bed used to only look at coverage based on bedfile |
| output |
results |
"qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.txt" |
.txt file with coverage per base for design file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| container |
string |
path to container with bedtools (common) |
| extra |
string |
extra configuration for bedtools intersect |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| threads |
integer |
number of threads that will be used by export_qc_bedtools_intersect |
| time |
string |
max execution time for export_qc_bedtools_intersect |
| mem_mb |
integer |
memory used for eexport_qc_bedtools_intersect |
| mem_per_cpu |
integer |
memory used per cpu for export_qc_bedtools_intersect |
| partition |
string |
partition to use on the cluster for export_qc_bedtools_intersect |
Rule
rule export_qc_bedtools_intersect_pgrs:
input:
left="qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz",
coverage_csi="qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz.csi",
right=config["reference"]["pgrs_bed"],
output:
results=temp("qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.txt"),
params:
extra=config.get("export_qc_bedtools_intersect", {}).get("extra", ""),
log:
"qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.log",
benchmark:
repeat(
"qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.benchmark.tsv",
config.get("export_qc_bedtools_intersect", {}).get("benchmark_repeats", 1),
)
threads: config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("export_qc_bedtools_intersect", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("export_qc_bedtools_intersect", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("export_qc_bedtools_intersect", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("export_qc_bedtools_intersect", {}).get("time", config["default_resources"]["time"]),
container:
config.get("export_qc_bedtools_intersect", {}).get("container", config["default_container"])
message:
"{rule}: export low cov regions from {input.left} based on {input.right}"
wrapper:
"v1.32.0/bio/bedtools/intersect"
| Rule parameters |
Key |
Value |
Description |
| input |
left |
"qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz" |
per-base coverage file from mosdepth |
| coverage_csi |
"qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz.csi" |
index file for per-base.bed.gz file |
| right |
config["reference"]["pgrs_bed"] |
design bed used to only look at coverage based on bedfile, in this case pgrs positions |
| output |
results |
"qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.txt" |
.txt file with coverage per base for design file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| container |
string |
path to container with bedtools |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| threads |
integer |
number of threads that will be used by export_qc_bedtools_intersect_pgrs |
| time |
string |
max execution time for export_qc_bedtools_intersect_pgrs |
| mem_mb |
integer |
memory used for eexport_qc_bedtools_intersect_pgrs |
| mem_per_cpu |
integer |
memory used per cpu for export_qc_bedtools_intersect_pgrs |
| partition |
string |
partition to use on the cluster for export_qc_bedtools_intersect_pgrs |
Rule
rule export_qc_xlsx_report:
input:
mosdepth_summary="qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt",
mosdepth_thresholds="qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz",
mosdepth_regions="qc/mosdepth_bed/{sample}_{type}.regions.bed.gz",
mosdepth_perbase="qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.txt",
picard_dup="qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt",
pgrs_coverage="qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.txt",
design_bed=config["reference"]["design_bed"],
pgrs_bed=config["reference"]["pgrs_bed"],
wanted_transcripts=config["export_qc_xlsx_report"]["wanted_transcripts"],
output:
results=temp("qc/xlsx_report/{sample}_{type}.xlsx"),
params:
coverage_thresholds=config["mosdepth_bed"]["thresholds"],
sequenceid=config["sequenceid"],
log:
"qc/xlsx_report/{sample}_{type}.xlsx.log",
benchmark:
repeat(
"qc/xlsx_report/{sample}_{type}.xlsx.benchmark.tsv",
config.get("export_qc_xlsx_report", {}).get("benchmark_repeats", 1),
)
threads: config.get("export_qc_xlsx_report", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("export_qc_xlsx_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("export_qc_xlsx_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("export_qc_xlsx_report", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("export_qc_xlsx_report", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("export_qc_xlsx_report", {}).get("time", config["default_resources"]["time"]),
container:
config.get("export_qc_xlsx_report", {}).get("container", config["default_container"])
message:
"{rule}: collecting qc values into {output}"
# localrule: True
script:
"../scripts/export_qc_xlsx_report.py"
| Rule parameters |
Key |
Value |
Description |
| input |
mosdepth_summary |
"qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt" |
mosdepth bed summary file |
| mosdepth_thresholds |
"qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz" |
Mosdepth bed thresholds file |
| mosdepth_regions |
"qc/mosdepth_bed/{sample}_{type}.regions.bed.gz" |
mosdepth bed coverage per region file |
| mosdepth_perbase |
"qc/mosdepth_bed/{sample}_{type}.mosdepth.per-base.exon_bed.txt" |
mosdepth bed per-base result file subsampled into exons in export_qc_bedtools_intersect output |
| picard_dup |
"qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt" |
picard collect duplication metrics results file |
| pgrs_coverage |
"qc/mosdepth_bed/{sample}_{type}.mosdepth.pgrs_cov.txt" |
mosdepth per-base file from export_qc_bedtools_intersect_pgrs output |
| design_bed |
config["reference"]["design_bed"] |
design bed defined in config-file |
| pgrs_bed |
config["reference"]["pgrs_bed"] |
bedfile with PGRS score SNPs |
| wanted_transcripts |
config["export_qc_xlsx_report"]["wanted_transcripts"] |
path to txt-file in bedformat of transcripts of interest |
| output |
results |
"qc/xlsx_report/{sample}_{type}.xlsx" |
.xlsx file with summarized QC-values per sample |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| container |
string |
path to container, pyton3, gzip, date and xlsxwriter |
| wanted_transcripts |
string |
transcripts of interest to be highlighted in xlsx report |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| threads |
integer |
number of threads that will be used by export_qc_xlsx_report |
| time |
string |
max execution time for export_qc_xlsx_report |
| mem_mb |
integer |
memory used for export_qc_xlsx_report |
| mem_per_cpu |
integer |
memory used per cpu for export_qc_xlsx_report |
| partition |
string |
partition to use on the cluster for export_qc_xlsx_report |
sample_order_multiqc.smk
A python script to create sample_replacement and sample_order files to be used in MultiQC to order samples based on order of the "S"-index in the samplenames.
Rule
rule sample_order_multiqc:
output:
replacement=temp("qc/multiqc/sample_replacement.tsv"),
order=temp("qc/multiqc/sample_order.tsv"),
params:
filelist=[(u.sample, u.fastq1) for u in units[units.type == "N"].itertuples()],
log:
"qc/multiqc/sample_order.tsv.log",
benchmark:
repeat("qc/multiqc/sample_order.tsv.benchmark.tsv", config.get("sample_order_multiqc", {}).get("benchmark_repeats", 1))
threads: config.get("sample_order_multiqc", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("sample_order_multiqc", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("sample_order_multiqc", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("sample_order_multiqc", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("sample_order_multiqc", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("sample_order_multiqc", {}).get("time", config["default_resources"]["time"]),
container:
config.get("sample_order_multiqc", {}).get("container", config["default_container"])
message:
"{rule}: Create a sample order tsv based on S_index in {params.filelist} for multiqc"
script:
"../scripts/sample_order_multiqc.py"
| Rule parameters |
Key |
Value |
Description |
| output |
replacement |
"qc/multiqc/sample_replacement.tsv" |
list of sample name replacement, sampleXXX based on order in SampleSheet |
| order |
"qc/multiqc/sample_order.tsv" |
list of back-translated name from sampleXXX to original names |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| container |
string |
path to container |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| threads |
integer |
number of threads that will be used by sample_order_multiqc |
| time |
string |
max execution time for sample_order_multiqc |
| mem_mb |
integer |
memory used for sample_order_multiqc |
| mem_per_cpu |
integer |
memory used per cpu for sample_order_multiqc |
| partition |
string |
partition to use on the cluster for sample_order_multiqc |
[tsv2vcf]
Convert exomedepth calls in tsv format to VCF
Rule
rule tsv2vcf:
input:
tsv="cnv_sv/exomedepth_call/{sample}_{type}.txt",
ref=config["reference"]["fasta"],
output:
vcf="cnv_sv/exomedepth_call/{sample}_{type}.vcf",
params:
extra=config.get("tsv2vcf", {}).get("extra", ""),
log:
"cnv_sv/exomedepth_call/{sample}_{type}.vcf.gz.log",
benchmark:
repeat(
"cnv_sv/exomedepth_call/{sample}_{type}.vcf.gz.benchmark.tsv", config.get("tsv2vcf", {}).get("benchmark_repeats", 1)
)
threads: config.get("tsv2vcf", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("tsv2vcf", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("tsv2vcf", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("tsv2vcf", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("tsv2vcf", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("tsv2vcf", {}).get("time", config["default_resources"]["time"]),
container:
config.get("tsv2vcf", {}).get("container", config["default_container"])
message:
"{rule}: convert {input.tsv} to VCF"
script:
"../scripts/tsv2vcf.sh"
| Rule parameters |
Key |
Value |
Description |
| input |
tsv |
"cnv_sv/exomedepth_call/{sample}_{type}.txt" |
Exomdepth calls in csv format |
| ref |
config["reference"]["fasta"] |
reference geneome fasta file |
| output |
vcf |
"cnv_sv/exomedepth_call/{sample}_{type}.vcf" |
Exomedepth calls in compressed VCF |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |