Running the references pipeline
To run the reference pipeline you first have to setup and run Marple until you have produced .bam files for all samples you want to use in your normal pool.
Marple run command
To generate .bam and .bai-files for all samples you need to run Marple until a rule that uses the alignment/samtools_merge_bam/{sample}_{type}.bam and alignment/samtools_merge_bam/{sample}_{type}.bam.bai as an input files, e.g. qc_mosdepth_bed. Don't forget the --no-temp parameter!
# Run snakemake command with the extra config parameter called sequenceid
snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="normal_samples" -s /path/to/marple/workflow/Snakefile --no-temp --until qc_mosdepth_bed --config PATH_TO_REPO=/folder/containing/marple_rd_tc/
NOTE: If using the variable
PATH_TO_REPO(folder containingmarple_rd_tc) in the config-file this need to be defined in the commandline
Input files
Four different files need to be available in your runfolder and to be adapted to your compute-environment and sequence run; samples.tsv, units_references.tsv, config_references.yaml and resources.yaml.
Samples and Units
To run the references pipeline you can use the same samples.tsv format as the standard pipeline, but of course only with samples you want in your normalpool. The units_references.tsv looks a bit different than the standard format.
| Column id | Description |
|---|---|
| sample | Sample name that matches the samples.tsv |
| type | Data type identifier (one letter), can be one of Tumor, Normal, RNA |
| bam | Path to bam file produced by Marple alignment/samtools_merge_bam/{sample}_{type}.bam |
Config
A bare-bone version of the config file can be found in the config/config_references.yaml. This need to be adapted to match the local paths to referencefiles, bedfiles, caches etc on your system. Remember that the config['reference']['design_bed'] need to be the same bedfile used later in standard Marple (config['exomedepth_call']['bedfile'])!
Expand to view current reference config.yaml
samples: "samples.tsv"
units: "units_references.tsv"
resources: "resources.yaml"
default_container: "docker://hydragenetics/common:1.8.1"
reference:
design_bed: "/projects/wp3/nobackup/TwistCancer/Bedfiles/Twist_Cancer_230706_hg38_TE-98982205-wPGRS.merged_200bpwindows.bed"
exomedepth_reference:
container: "docker://hydragenetics/exomedepth:1.1.15"
Resources
An resources.yaml file can also be found in the config/-folder. This is adapted to the Uppsala Clinical Genomics' compute cluster but can be used as an indication of resources needed for the different programs.
Run command
#Activate the virtual environment
source virtual/environment/bin/activate
# Run snakemake command
snakemake --profile snakemakeprofile -s /path/to/marple/workflow/Snakefile_references --configfile config_references.yaml
Result files
The Marple reference workflow only produces a single result file, a RData file containing the normalpool for exomedepth.
This file is located at references/exomedepth_reference/RefCount.Rdata and should be moved and renamed before added to the default config.yaml.
...
exomedepth_call:
container: "docker://hydragenetics/exomedepth:1.1.15"
ref_count: "exomedepth_reference/RefCount_male.Rdata"
...