Quickstart¶
This page shows the two main ways to use Cherimoya: via the command-line pipeline or via the Python API.
Using the CLI Pipeline¶
The fastest way to go from raw data to trained model and motif analysis is the end-to-end pipeline. You need:
A genome FASTA file (e.g.,
hg38.fa)One or more signal files (BAM, SAM, BED, or bigWig)
(Optional) Control signal files
Step 1: Generate a pipeline JSON
cherimoya pipeline-json \
-s hg38.fa \
-i signal.bam \
-n my_experiment \
-o my_experiment.pipeline.json
Step 2: Run the full pipeline
cherimoya pipeline -p my_experiment.pipeline.json
This will automatically:
Call peaks using MACS3
Convert BAM files to bigWig format
Sample GC-matched negative regions
Train a Cherimoya model
Calculate attributions
Identify seqlets
Run TF-MoDISco motif discovery
Using the Python API¶
For more control, use the Python API directly.
Instantiate a model:
from cherimoya import Cherimoya
model = Cherimoya(
n_filters=96, # Number of convolutional filters (default 96)
n_layers=9, # Number of Cheri Blocks
n_outputs=2, # Number of output tracks (e.g., 2 for stranded)
n_control_tracks=0, # Number of control tracks (0 if no controls)
).cuda()
Load training data:
from cherimoya.io import PeakGenerator
training_data = PeakGenerator(
peaks="peaks.narrowPeak",
negatives="negatives.bed",
sequences="hg38.fa",
signals=["signal.+.bw", "signal.-.bw"],
chroms=["chr1", "chr2", "chr3"], # Training chromosomes
in_window=2114,
out_window=1000,
max_jitter=128,
batch_size=64,
)
Set up optimizers and train:
from torch.optim import AdamW, Muon
from torch.optim.lr_scheduler import LinearLR, CosineAnnealingLR, SequentialLR
# Separate parameters for Muon (2D weights) and AdamW (everything else)
muon_params, adam_params = [], []
for name, p in model.named_parameters():
if p.ndim == 2 and "weight" in name and name != "linear.weight":
muon_params.append(p)
else:
adam_params.append(p)
muon_optimizer = Muon(muon_params, lr=0.01)
adam_optimizer = AdamW(adam_params, lr=0.004)
# Warmup + cosine decay schedules
n_warmup = len(training_data) * 5
n_total = len(training_data) * 50
muon_scheduler = SequentialLR(muon_optimizer, schedulers=[
LinearLR(muon_optimizer, start_factor=0.01, total_iters=n_warmup),
CosineAnnealingLR(muon_optimizer, T_max=n_total, eta_min=1e-5),
], milestones=[n_warmup])
adam_scheduler = SequentialLR(adam_optimizer, schedulers=[
LinearLR(adam_optimizer, start_factor=0.01, total_iters=n_warmup),
CosineAnnealingLR(adam_optimizer, T_max=n_total, eta_min=1e-5),
], milestones=[n_warmup])
# Train
model.fit(
training_data,
muon_optimizer, adam_optimizer,
muon_scheduler, adam_scheduler,
X_valid=X_valid,
X_ctl_valid=None,
y_valid=y_valid,
max_epochs=50,
batch_size=64,
)
Make predictions:
from tangermeme.predict import predict
y_profile, y_counts = predict(
model, X_test,
batch_size=64,
device='cuda',
)
Next Steps¶
Architecture — understand the Cheri Block and model design
CLI Pipeline Tutorial — detailed CLI pipeline walkthrough
Python API Tutorial — full Python API tutorial
Attribution & Motif Analysis — attribution and motif analysis