# Combining results: rescue mode

The joint call of hashing and genetic deconvolution methods has been shown to be beneficial for cell recovery rate and calling accuracy. hadge provides a rescue mode to run both genotype- and hashing-based approaches jointly to rescue problematic hashing experiments in cases where donors are genetically distinct. In this scenario, samples of both hashing and genetic multiplexing experiments are deconvoluted simultaneously. Furthermore, hadge allows for the automatic determination of the best combination of hashing and SNP- based donor deconvolution tools.

## Overview

<p align="center">
<img src="_static/images/rescue.png" width="500">
</p>

## **Quick start**

```bash
cd hadge
nextflow run main.nf -profile test
```

## **Parameter**

|                       |                                                                                                                                                                                                                                                                         |
| --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| match_donor           | Whether to match donors. Default: True                                                                                                                                                                                                                                  |
| demultiplexing_result | A CSV file with demultiplexing assignment only when running in donor_match mode. In other modes, the input is passed by the pipeline automatically. Default: None                                                                                                       |
| match_donor_method1   | The method name to match donors. If None, all genotype-based methods are compared. Default: None                                                                                                                                                                        |
| match_donor_method2   | The method name to match donors. If None, all hashing-based methods are compared. Default: None                                                                                                                                                                         |
| findVariants          | Whether to extract a subset of informative variants when best genotype-based method for donor matching is vireo. `default`: subset as described in paper; `vireo`: subset by Vireo; `True`: subset using both methods; `False`: not extracting variants. Default: False |
| variant_count         | The threshold for the minimal read depth of a variant in the cell group when subseting the informative variants by default. Default: 10                                                                                                                                 |
| variant_pct           | The threshold for the minimal frequency of the alternative or reference allele to determine the dominant allele of a variant in the cell group when subseting the informative variants by default. Default: 0.9                                                         |
| vireo_parent_dir      | A parent folder which contains the output folder of vireo in the format of `vireo_[taskID/sampleId]` generated by hadge pipeline only when running in donor_match mode. In other modes, the input is passed by the pipeline automatically. Default: None                |

## **Output**

By default, the pipeline is run on a single sample. In this case, all pipeline output will be saved in the folder `$projectDir/$params.outdir/rescue`. When running the pipeline on multiple samples, the pipeline output will be found in the folder `"$projectDir/$params.outdir/$sampleId/rescue`.
To simplify this, we'll refer to this folder as `$pipeline_output_folder` from now on.

In rescue mode, the genotype- and hashing-based demultiplexing workflow run in parallel. They save their output in `$pipeline_output_folder/[gene/hash]_demulti`. Before running the donor-matching preocess, the pipeline merges the results of two workflows into `classification_all_genetic_and_hash.csv` and `assignment_all_genetic_and_hash.csv` in the `$pipeline_output_folder/summary` folder.

The following additional output can be found in `$pipeline_output_folder/donor_match`.

### Optional output: Donor matching

- Folder`[method1]_[task_ID/sampleId]_vs_[method2]_[task_ID/sampleId]` with:
  - `correlation_res.csv`: correlation scores of donor matching
  - `concordance_heatmap.png`: a heatmap visualising the the correlation scores
  - `donor_match.csv`: a map between hashtag and donor identity.
  - `all_assignment_after_match.csv`: assignment of all cell barcodes after donor matching
  - `intersect_assignment_after_match.csv`: assignment of joint singlets after donor matching
- General output in the `$pipeline_output_folder/donor_match` folder:
  - `all_assignment_after_match.csv`: assignment of all cell barcodes based on the donor matching of the optimal match
  - `donor_match.csv`: a map between hashtags and donor identities based on the donor matching of the optimal match
  - `score_record.csv`: a CSV file storing the matching score and the number of matched donors for each method pair

### Optional output: scverse data structures

Folder `data_output` with:

- an Anndata object which contains the filtered scRNA-seq counts from `params.rna_matrix_filered` and the assignment of the best-matched method pair after donor matching
- an Mudata object which contains the filtered scRNA-seq counts from `params.rna_matrix_filered` and the filtered HTO read counts from `params.hto_matrix_filered` with the assignment of the best-matched method pair after donor matching

### Optional output: Extracting donor-specific variants

Only when 1) `best_method1` for the optimal match (`best_method1` and `best_method2`) is `vireo` and 2) identification of donor-specific or discriminatory variants is enabled, then in folder `donor_match/donor_match_[best_method1]_[best_method2]`:

- `donor_specific_variants.csv`: a list of donor-specific variants
- `donor_specific_variants_upset.png`: An upset plot showing the number of donor-specific variants
- `donor_genotype_subset_by_default_matched.vcf`: Donor genotypes of donor-specific variants
- `donor_genotype_subset_by_vireo.vcf`: Donor genotypes of a set of discriminatory variants filtered by Vireo