Quickstart ========== This page shows the two main ways to use Cherimoya: via the **command-line pipeline** or via the **Python API**. Using the CLI Pipeline ---------------------- The fastest way to go from raw data to trained model and motif analysis is the end-to-end pipeline. You need: 1. A genome FASTA file (e.g., ``hg38.fa``) 2. One or more signal files (BAM, SAM, BED, or bigWig) 3. (Optional) Control signal files **Step 1: Generate a pipeline JSON** .. code-block:: bash cherimoya pipeline-json \ -s hg38.fa \ -i signal.bam \ -n my_experiment \ -o my_experiment.pipeline.json **Step 2: Run the full pipeline** .. code-block:: bash cherimoya pipeline -p my_experiment.pipeline.json This will automatically: - Call peaks using MACS3 - Convert BAM files to bigWig format - Sample GC-matched negative regions - Train a Cherimoya model - Calculate attributions - Identify seqlets - Run TF-MoDISco motif discovery Using the Python API -------------------- For more control, use the Python API directly. **Instantiate a model:** .. code-block:: python from cherimoya import Cherimoya model = Cherimoya( n_filters=96, # Number of convolutional filters (default 96) n_layers=9, # Number of Cheri Blocks n_outputs=2, # Number of output tracks (e.g., 2 for stranded) n_control_tracks=0, # Number of control tracks (0 if no controls) ).cuda() **Load training data:** .. code-block:: python from cherimoya.io import PeakGenerator training_data = PeakGenerator( peaks="peaks.narrowPeak", negatives="negatives.bed", sequences="hg38.fa", signals=["signal.+.bw", "signal.-.bw"], chroms=["chr1", "chr2", "chr3"], # Training chromosomes in_window=2114, out_window=1000, max_jitter=128, batch_size=64, ) **Set up optimizers and train:** .. code-block:: python from torch.optim import AdamW, Muon from torch.optim.lr_scheduler import LinearLR, CosineAnnealingLR, SequentialLR # Separate parameters for Muon (2D weights) and AdamW (everything else) muon_params, adam_params = [], [] for name, p in model.named_parameters(): if p.ndim == 2 and "weight" in name and name != "linear.weight": muon_params.append(p) else: adam_params.append(p) muon_optimizer = Muon(muon_params, lr=0.01) adam_optimizer = AdamW(adam_params, lr=0.004) # Warmup + cosine decay schedules n_warmup = len(training_data) * 5 n_total = len(training_data) * 50 muon_scheduler = SequentialLR(muon_optimizer, schedulers=[ LinearLR(muon_optimizer, start_factor=0.01, total_iters=n_warmup), CosineAnnealingLR(muon_optimizer, T_max=n_total, eta_min=1e-5), ], milestones=[n_warmup]) adam_scheduler = SequentialLR(adam_optimizer, schedulers=[ LinearLR(adam_optimizer, start_factor=0.01, total_iters=n_warmup), CosineAnnealingLR(adam_optimizer, T_max=n_total, eta_min=1e-5), ], milestones=[n_warmup]) # Train model.fit( training_data, muon_optimizer, adam_optimizer, muon_scheduler, adam_scheduler, X_valid=X_valid, X_ctl_valid=None, y_valid=y_valid, max_epochs=50, batch_size=64, ) **Make predictions:** .. code-block:: python from tangermeme.predict import predict y_profile, y_counts = predict( model, X_test, batch_size=64, device='cuda', ) Next Steps ---------- - :doc:`architecture` — understand the Cheri Block and model design - :doc:`tutorials/cli_pipeline` — detailed CLI pipeline walkthrough - :doc:`tutorials/python_api` — full Python API tutorial - :doc:`tutorials/attribution` — attribution and motif analysis