Welcome to hadge’s documentation!#
Introduction#
hadge is a one-stop pipeline for demultiplexing single cell mixtures. It consists of 14 methods across two workflows: hashing-based and genetics-based deconvolution methods, which can be run in 3 modes.
The genetics-based deconvolution workflow includes 5 methods:
Freemuxlet
Demuxlet
Vireo
Souporcell
scSplit
The hashing-based deconvolution includes 8 methods:
hashedDrops
Multiseq
HTODemux
Demuxem
HashSolo
Demuxmix
BFF
GMM_Demux
Installation#
The hadge pipeline is implemented in Nextflow. To get started, you need to install Nextflow. Please refer to Nextflow for more details. Alternatively, you can also install Nextflow via conda.
As next, please run the pipeline
nextflow run http://github.com/theislab/hadge
Quick start#
Choose the mode:
--mode=<genetic/hashing/rescue>
Specify the folder name
--outdir
to save the output files. This will create a folder automatically in the project directory.Specify the input data for each process.
The pipeline can be run either locally or on a HPC with different resource specifications. As default, the pipeline will run locally. You can also set the SLURM executor by running the pipeline with
-profile cluster
.Please also check Usage for more details.
Pipeline output#
By default, the pipeline is run on a single sample. In this case, all pipeline output will be saved in the folder $projectDir/$params.outdir/$params.mode
. When running the pipeline on multiple samples, the pipeline output will be found in the folder "$projectDir/$params.outdir/$sampleId/$params.mode/
. To simplify this, we’ll refer to this folder as $pipeline_output_folder
from now on.
Intermediate output#
The pipeline saves the output of each process for two workflows separately, so you will find the results of hashing-based and genetics-based deconvolution methods in the folder hash_demulti
and gene_demulti
respectively.
If the pipeline is run on single sample, each demultiplexing process will generate some intermediate files in the folder in the format $pipeline_output_folder/[method]/[method]_[task_ID]
, e.g. htodemux/htodemux_1
. If the pipeline is run on multiple samples, the task_ID
will be replaced by sampleId
. In the folder, you can find following files:
params.csv
: specified parameters in the taskOutput of the task, check Output for more details.
Final output#
After each demultiplexing workflow is complete, the pipeline will generate TSV files to summarize the results in the folder $pipeline_output_folder/[workflow]/[workflow]_summary
.
[method]_classification.csv
: classification of all trials for a given method[method]_assignment.csv
: assignment of all trials for a given method[method]_params.csv
: specified paramters of all trials for a given method[mode]_classification_all.csv
: classification of all trials across different methods[workflow]_assignment_all.csv
: save the assignment of all trials across different methodsadata
folder: stores Anndata object with filtered scRNA-seq read counts and assignment of each deconvolution method ifparams.generate_anndata
isTrue
.
Additional output for rescue mode#
Before running the donor-matching preocess, the pipeline merges the results of hashing and genetic demultiplexing tools into classification_all_genetic_and_hash.csv
and assignment_all_genetic_and_hash.csv
in the $pipeline_output_folder/summary
folder.
The output of the donor-matching process can be found in the folder donor_match
, check Output for more details.