Running the references pipeline

To run the reference pipeline you first have to setup and run Marple until you have produced .bam files for all samples you want to use in your normal pool.

Marple run command

To generate .bam and .bai-files for all samples you need to run Marple until a rule that uses the alignment/samtools_merge_bam/{sample}_{type}.bam and alignment/samtools_merge_bam/{sample}_{type}.bam.bai as an input files, e.g. qc_mosdepth_bed. Don't forget the --no-temp parameter!

# Run snakemake command with the extra config parameter called sequenceid
snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="normal_samples" -s /path/to/marple/workflow/Snakefile --no-temp --until qc_mosdepth_bed --config PATH_TO_REPO=/folder/containing/marple_rd_tc/

NOTE: If using the variable PATH_TO_REPO (folder containing marple_rd_tc) in the config-file this need to be defined in the commandline

Input files

Four different files need to be available in your runfolder and to be adapted to your compute-environment and sequence run; samples.tsv, units_references.tsv, config_references.yaml and resources.yaml.

Samples and Units

To run the references pipeline you can use the same samples.tsv format as the standard pipeline, but of course only with samples you want in your normalpool. The units_references.tsv looks a bit different than the standard format.

Column id	Description
sample	Sample name that matches the `samples.tsv`
type	Data type identifier (one letter), can be one of Tumor, Normal, RNA
bam	Path to bam file produced by Marple `alignment/samtools_merge_bam/{sample}_{type}.bam`

Config

A bare-bone version of the config file can be found in the config/config_references.yaml. This need to be adapted to match the local paths to referencefiles, bedfiles, caches etc on your system. Remember that the config['reference']['design_bed'] need to be the same bedfile used later in standard Marple (config['exomedepth_call']['bedfile'])!

Expand to view current reference config.yaml

samples: "samples.tsv"
units:  "units_references.tsv"
resources: "resources.yaml"

default_container: "docker://hydragenetics/common:1.8.1"

reference:
  design_bed: "/projects/wp3/nobackup/TwistCancer/Bedfiles/Twist_Cancer_230706_hg38_TE-98982205-wPGRS.merged_200bpwindows.bed"

exomedepth_reference:
  container: "docker://hydragenetics/exomedepth:1.1.15"

Resources

An resources.yaml file can also be found in the config/-folder. This is adapted to the Uppsala Clinical Genomics' compute cluster but can be used as an indication of resources needed for the different programs.

Run command

#Activate the virtual environment
source virtual/environment/bin/activate

# Run snakemake command
snakemake --profile snakemakeprofile -s /path/to/marple/workflow/Snakefile_references --configfile config_references.yaml

Result files

The Marple reference workflow only produces a single result file, a RData file containing the normalpool for exomedepth. This file is located at references/exomedepth_reference/RefCount.Rdata and should be moved and renamed before added to the default config.yaml.

...
exomedepth_call:
  container: "docker://hydragenetics/exomedepth:1.1.15"
  ref_count: "exomedepth_reference/RefCount_male.Rdata"
...