Running hadge on multiple samples#
The pipeline is able to run on multiple samples.
In this scenario, the shared parameters for input data are retrieved from a sample sheet using params.multi_sample
, which is set to null
by default.
Sample sheet#
The sample sheet should contain a column called
sampleId
for unique sample IDs assigned to each sample.The sample sheet (example file see the Resources section below) must contain different columns depending on the mode and methods you want to run.
hashing mode:
sampleId
rna_matrix_raw
rna_matrix_filtered
hto_matrix_raw
hto_matrix_filtered
sample1
sample2
genetic mode: Set the value to “None” if the input data, for example, vcf_donor, is not available, similar to the single-sample mode. Do not forget to include the columns for HTO and RNA count matrices if
params.generate_anndata
orparams.generate_mudata
is enabled.sampleId
bam
bam_index
barcodes
nsample
celldata
vcf_mixed
vcf_donor
sample1
sample2
rescue mode:
sampleId
rna_matrix_raw
rna_matrix_filtered
hto_matrix_raw
hto_matrix_filtered
bam
bam_index
barcodes
nsample
celldata
vcf_mixed
vcf_donor
sample1
sample2
The remaining parameters for each process are specified in the nextflow.config file, just like when demultiplexing a single sample.
There is a distinction between running on a single sample and running on multiple samples. When processing multiple samples, the pipeline only permits a single value for each process parameter, whereas in the case of a single sample, multiple values separated by commas are allowed.
Output#
When running the pipeline on multiple samples, the pipeline output will be found in the folder "$projectDir/$params.outdir/$sampleId/$params.mode
.
Resources#
There is an example sample sheet for multi_sample
mode.