MultiVis Directory Generation's Usages

Step 1. Preparation of chromsize json

Before running the generation script, you must provide a file specifying the chromosome sizes for the genome assembly of interest. An example is shown below. Note that it is not necessary to include every chromosome if your analysis does not require them all. An example of chromsize_hg19.json or chromsize_hg38.json can be found on our Github.


{
    "assembly_name": "hg19",
    "chromosomes": {
        "chr1": 249250621,
        "chr2": 243199373,
        "chr3": 198022430,
        "chr4": 191154276,
        "chr5": 180915260,
        "chr6": 171115067,
        "chr7": 159138663,
        "chr8": 146364022,
        "chr9": 141213431,
        "chr10": 135534747
    }
}

Step 2. Running MultiVis Directory Generation

To run MultiVis, use the main_sv.py script:

python main_sv.py -c <clusters_file> -s <genomic_file> -o <output_directory> -m <max_cluster_size> -n <min_cluster_size> -t

Arguments:

  • -c, --clusters: Path to the input clusters file (required).
  • -s, --genomic_size: Path to the JSON file containing genomic size information, e.g., chromsize_hg38.json (Sample can be found at chromsize_hg38.json and chromsize_hg19.json ) If you are can also create your own JSON file and use this as references.
  • -o, --heatmap_MultiVis_output: Output directory for the generated MultiVis heatmap file (default: MultiVis).
  • -m, --max_cluster_size: Maximum number of reads allowed in a read-cluster. Clusters with more reads than this value will be skipped (default: 1000).
  • -n, --min_cluster_size: Minimum number of reads required in a read-cluster. Clusters with fewer reads than this value will be skipped (default: 2).
  • -t, --start_only: Flag to indicate whether the cluster file contains only start positions. If this flag is set, the script assumes that the cluster file has start positions only (default: False).

If you are using, human.combined.mapq-ge10.clusters from GSE114242_human_combined_clusters.tar.gz. You can run the command below as an example:

python main_sv.py -c human.combined.mapq-ge10.clusters -s chromsize_hg19.json -o ge-10\ MULTIVis -m 100 -n 2 -t

Below is an example of the expected output. Upon running the tool, a directory named ge-10 MULTIVis will be generated. This directory contains subfolders for each chromosome pair of interest (e.g., chr1-chr1,chr1-chr2, and so on), providing all necessary files for interactive exploration in MultiVis.

MultiVis workflow for SPRITE, RD-SPRITE and scSPRITE dataset.