Quickstart¶
Two paste-and-run examples — one for each interface. For full walkthroughs see CLI Pipeline Walkthrough and Python API Tutorial. New to the terms? Skim Glossary.
Command-line pipeline¶
For stranded ChIP-seq with input controls:
cherimoya pipeline-json \
-s hg38.fa -p peaks.narrowPeak \
-i input.bam -c control.bam \
-m JASPAR_2024.meme -n my_experiment -o pipeline.json
cherimoya pipeline -p pipeline.json
This calls peaks with MACS3, converts BAMs to bigWigs, samples GC-matched negatives, trains a Cherimoya model, computes attributions via saturation mutagenesis, calls seqlets, annotates them with tomtom-lite, and runs TF-MoDISco. All outputs land in the working directory; the full output list and per-step descriptions are in CLI Pipeline Walkthrough. Assay-specific recipes: Recipe: TF ChIP-seq, Recipe: ATAC-seq, Recipe: DNase-seq.
Python API¶
import torch
from cherimoya import Cherimoya
model = Cherimoya(n_filters=96, n_layers=9, n_outputs=1).cuda()
X = torch.randn(2, 4, 2114, device="cuda")
with torch.no_grad():
y_profile, y_counts = model(X)
print(y_profile.shape) # torch.Size([2, 1, 1000])
print(y_counts.shape) # torch.Size([2, 1])
To one-hot encode real DNA, use tangermeme.utils.one_hot_encode
(for a Python string) or tangermeme.io.extract_loci (for a FASTA
plus a BED of loci). To train from scratch with the same defaults the
CLI uses, see Python API Tutorial. To save and load
checkpoints, see Saving and Loading Models. To compute attributions
or score variants, see Attribution and Motif Analysis and
Variant Effect Prediction.