Quickstart¶

Two paste-and-run examples — one for each interface. For full walkthroughs see CLI Pipeline Walkthrough and Python API Tutorial. New to the terms? Skim Glossary.

Command-line pipeline¶

For stranded ChIP-seq with input controls:

cherimoya pipeline-json \
    -s hg38.fa -p peaks.narrowPeak \
    -i input.bam -c control.bam \
    -m JASPAR_2024.meme -n my_experiment -o pipeline.json

cherimoya pipeline -p pipeline.json

This calls peaks with MACS3, converts BAMs to bigWigs, samples GC-matched negatives, trains a Cherimoya model, computes attributions via saturation mutagenesis, calls seqlets, annotates them with tomtom-lite, and runs TF-MoDISco. All outputs land in the working directory; the full output list and per-step descriptions are in CLI Pipeline Walkthrough. Assay-specific recipes: Recipe: TF ChIP-seq, Recipe: ATAC-seq, Recipe: DNase-seq.

Python API¶

import torch
from cherimoya import Cherimoya

model = Cherimoya(n_filters=96, n_layers=9, n_outputs=1).cuda()
X = torch.randn(2, 4, 2114, device="cuda")
with torch.no_grad():
    y_profile, y_counts = model(X)

print(y_profile.shape)   # torch.Size([2, 1, 1000])
print(y_counts.shape)    # torch.Size([2, 1])

To one-hot encode real DNA, use tangermeme.utils.one_hot_encode (for a Python string) or tangermeme.io.extract_loci (for a FASTA plus a BED of loci). To train from scratch with the same defaults the CLI uses, see Python API Tutorial. To save and load checkpoints, see Saving and Loading Models. To compute attributions or score variants, see Attribution and Motif Analysis and Variant Effect Prediction.