Cherimoya¶

A compact deep learning model for predicting genomic profile data from DNA sequence.

Cherimoya predicts genomic modalities — transcription factor binding, chromatin accessibility, and transcription initiation — directly from DNA sequence. It pairs a lightweight ConvNeXt-style backbone with custom Triton GPU kernels for both training and inference, and ships with an end-to-end CLI that takes BAM files through peak calling, training, attribution, and motif discovery in a single command.

Under Active Development

Cherimoya is still evolving and may introduce breaking changes between versions. Pin the version you train with if you need to reload checkpoints later.

Where to start¶

Bioinformaticians running Cherimoya on their data: read Installation, then CLI Pipeline Walkthrough for an end-to-end walkthrough, then pick the recipe that matches your assay in Recipe: TF ChIP-seq, Recipe: ATAC-seq, or Recipe: DNase-seq. Comparing across conditions? Recipe: Differential / Conditional Analysis.
Researchers using Cherimoya from Python: read Installation, then Python API Tutorial, then explore Attribution and Motif Analysis, Variant Effect Prediction, and Saving and Loading Models.
Developers integrating Cherimoya or contributing to it: read Development for repo layout and the test suite, then Architecture and the cherimoya.cherimoya / cherimoya.cheri reference pages.

If a term is unfamiliar, Glossary defines everything used in the rest of these docs. If something is going wrong, start with Troubleshooting and FAQ.

Design highlights¶

Cheri Blocks. A dilated depthwise convolution fused with a per-example layer normalization and a channel-mixing MLP, implemented as a custom Triton kernel. The default 9-layer model is ~340K parameters with a 1115 bp receptive field.
Three forward paths, one set of weights. A CPU fallback, a Triton fwd+bwd kernel for training, and a fwd-only megakernel for inference, all numerically equivalent up to ~1e-5 max-abs.
Dual-optimizer training. Muon for 2D projection weights, AdamW for everything else, with hyperparameters tuned via large-scale sweeps.
Learned loss balancing. Kendall-Gal uncertainty weighting with two learnable scalars replaces a fixed profile/counts loss weight.
EMA at evaluation. An exponential moving average of the parameters is maintained during training and used at evaluation, smoothing both the validation curve and the final predictions.
Stability-first defaults. Small fixed residual scale at initialization, no biases inside Cheri Blocks, no weight decay on Muon-routed weights, and a 5-epoch warmup before cosine decay.

See Architecture for the full story and Benchmarks for measured numbers.

—

Getting Started

Command-Line Interface

API Reference