Combining results: rescue mode#

The joint call of hashing and genetic deconvolution methods has been shown to be beneficial for cell recovery rate and calling accuracy. hadge provides a rescue mode to run both genotype- and hashing-based approaches jointly to rescue problematic hashing experiments in cases where donors are genetically distinct. In this scenario, samples of both hashing and genetic multiplexing experiments are deconvoluted simultaneously. Furthermore, hadge allows for the automatic determination of the best combination of hashing and SNP- based donor deconvolution tools.

Overview#

Quick start#

cd hadge
nextflow run main.nf -profile test

Parameter#


match_donor	Whether to match donors. Default: True
demultiplexing_result	A CSV file with demultiplexing assignment only when running in donor_match mode. In other modes, the input is passed by the pipeline automatically. Default: None
match_donor_method1	The method name to match donors. If None, all genotype-based methods are compared. Default: None
match_donor_method2	The method name to match donors. If None, all hashing-based methods are compared. Default: None
findVariants	Whether to extract a subset of informative variants when best genotype-based method for donor matching is vireo. `default`: subset as described in paper; `vireo`: subset by Vireo; `True`: subset using both methods; `False`: not extracting variants. Default: False
variant_count	The threshold for the minimal read depth of a variant in the cell group when subseting the informative variants by default. Default: 10
variant_pct	The threshold for the minimal frequency of the alternative or reference allele to determine the dominant allele of a variant in the cell group when subseting the informative variants by default. Default: 0.9
vireo_parent_dir	A parent folder which contains the output folder of vireo in the format of `vireo_[taskID/sampleId]` generated by hadge pipeline only when running in donor_match mode. In other modes, the input is passed by the pipeline automatically. Default: None

Output#

By default, the pipeline is run on a single sample. In this case, all pipeline output will be saved in the folder $projectDir/$params.outdir/rescue. When running the pipeline on multiple samples, the pipeline output will be found in the folder "$projectDir/$params.outdir/$sampleId/rescue. To simplify this, we’ll refer to this folder as $pipeline_output_folder from now on.

In rescue mode, the genotype- and hashing-based demultiplexing workflow run in parallel. They save their output in $pipeline_output_folder/[gene/hash]_demulti. Before running the donor-matching preocess, the pipeline merges the results of two workflows into classification_all_genetic_and_hash.csv and assignment_all_genetic_and_hash.csv in the $pipeline_output_folder/summary folder.

The following additional output can be found in $pipeline_output_folder/donor_match.

Optional output: Donor matching#

Folder[method1]_[task_ID/sampleId]_vs_[method2]_[task_ID/sampleId] with:
- correlation_res.csv: correlation scores of donor matching
- concordance_heatmap.png: a heatmap visualising the the correlation scores
- donor_match.csv: a map between hashtag and donor identity.
- all_assignment_after_match.csv: assignment of all cell barcodes after donor matching
- intersect_assignment_after_match.csv: assignment of joint singlets after donor matching
General output in the $pipeline_output_folder/donor_match folder:
- all_assignment_after_match.csv: assignment of all cell barcodes based on the donor matching of the optimal match
- donor_match.csv: a map between hashtags and donor identities based on the donor matching of the optimal match
- score_record.csv: a CSV file storing the matching score and the number of matched donors for each method pair

Optional output: scverse data structures#

Folder data_output with:

an Anndata object which contains the filtered scRNA-seq counts from params.rna_matrix_filered and the assignment of the best-matched method pair after donor matching
an Mudata object which contains the filtered scRNA-seq counts from params.rna_matrix_filered and the filtered HTO read counts from params.hto_matrix_filered with the assignment of the best-matched method pair after donor matching

Optional output: Extracting donor-specific variants#

Only when 1) best_method1 for the optimal match (best_method1 and best_method2) is vireo and 2) identification of donor-specific or discriminatory variants is enabled, then in folder donor_match/donor_match_[best_method1]_[best_method2]:

donor_specific_variants.csv: a list of donor-specific variants
donor_specific_variants_upset.png: An upset plot showing the number of donor-specific variants
donor_genotype_subset_by_default_matched.vcf: Donor genotypes of donor-specific variants
donor_genotype_subset_by_vireo.vcf: Donor genotypes of a set of discriminatory variants filtered by Vireo