Recipe: Differential / Conditional Analysis¶
This recipe covers comparing two (or more) conditions — treated vs. control, knockout vs. wildtype, time point A vs. B. The standard pattern in Cherimoya is to train one model per condition, then compare their predictions on a shared set of loci. This is simpler to set up than a single multi-output model and gives cleaner attribution and marginalization results.
If you have only a single condition, use the assay-specific recipe (Recipe: TF ChIP-seq, Recipe: ATAC-seq, or Recipe: DNase-seq) instead.
Inputs¶
Reference genome FASTA.
Per-condition signal BAMs (with replicates pooled or treated as separate
-ifiles).Per-condition control BAMs (for ChIP-seq) or none (for ATAC/DNase).
A motif database in MEME format.
Step 1: train one model per condition¶
Generate one pipeline JSON per condition and run them sequentially
(or in parallel via the batch subcommand; see CLI Reference):
cherimoya pipeline-json \
-s hg38.fa \
-i condA_rep1.bam -i condA_rep2.bam \
-c condA_input.bam \
-m JASPAR_2024.meme -n condA -o condA.pipeline.json
cherimoya pipeline-json \
-s hg38.fa \
-i condB_rep1.bam -i condB_rep2.bam \
-c condB_input.bam \
-m JASPAR_2024.meme -n condB -o condB.pipeline.json
cherimoya pipeline -p condA.pipeline.json
cherimoya pipeline -p condB.pipeline.json
Each run produces a model checkpoint (condA.torch /
condB.torch), per-track bigWigs, attributions, seqlets, and a
TF-MoDISco report scoped to that condition’s peaks.
Use the same training_chroms / validation_chroms in both
JSONs so the held-out evaluation is comparable.
Step 4: identify differential motifs¶
For motif-level differences, run marginalization on both models and compare. Each model’s pipeline already runs marginalization on its own negative loci; to compare directly, run marginalization on a shared background:
cherimoya marginalize -p marginalize_A.json
cherimoya marginalize -p marginalize_B.json
with the two JSONs differing only in model and output_filename,
but identical in loci (the shared background) and motifs. The
delta in per-motif marginalization scores between A and B is a
direct estimate of which motifs cause the condition-specific signal.
Caveats¶
Held-out chromosomes must match. A model trained with
chr8/chr20as validation cannot be compared with one trained withchr1/chr12as validation on common loci — some of those loci were in the second model’s training set. Use the same split in both JSONs.Replicate-to-replicate noise is the floor. Before treating
delta_log_countsas biological signal, compare to the technical-replicate baseline by training two models on independent replicates of the same condition and computing the same delta. Biological deltas should be larger than the replicate baseline.The two models share no parameters. Each model is fit independently and the comparison happens only at prediction time. Multi-output single-model training is possible by passing
-i condA.bw -i condB.bwto one pipeline (Cherimoya treats each signal as a separate output track), but in practice per-condition models give cleaner attribution and motif results.