Development =========== This page is for contributors and integrators: how the source tree is organized, how to run the tests, and the conventions the codebase follows. End users do not need to read this. Repository layout ----------------- :: cherimoya/ ├── cherimoya/ # The Python package │ ├── __init__.py # Public API re-exports: Cherimoya, CheriBlock, EMA │ ├── cherimoya.py # Cherimoya model + EMA wrapper + fit/save/load │ ├── cheri.py # CheriBlock + Triton kernels + dispatcher │ ├── io.py # PeakGenerator + PeakNegativeSampler │ ├── losses.py # Profile MNLL + log1pMSE mixture loss │ └── performance.py # Evaluation metrics ├── cherimoya_cli/ # The CLI entry-point package │ ├── __main__.py # Argparse driver and subcommand registry │ ├── defaults.py # All default JSON parameter dicts │ ├── utils.py # JSON merging and parameter helpers │ └── commands/ # One file per subcommand │ ├── pipeline.py │ ├── pipeline_json.py │ ├── batch.py │ ├── fit.py │ ├── evaluate.py │ ├── attribute.py │ ├── seqlets.py │ ├── marginalize.py │ └── negatives.py ├── tests/ # Pytest suite (see below) ├── docs/ # Sphinx docs (this site) ├── imgs/ # Architecture / pipeline diagrams ├── bench_kernels.py # Standalone forward-path benchmark └── pyproject.toml # Build, deps, and tooling config Two top-level packages: ``cherimoya`` is the model and data plumbing, ``cherimoya_cli`` is the command-line tool. They are independent — ``cherimoya_cli`` imports ``cherimoya``, never the reverse. Public vs. private API ---------------------- The convention is the standard Python one: anything prefixed with an underscore is private, and may change or be removed without notice. Explicitly: * **Public** symbols, re-exported from ``cherimoya.__init__``: :class:`~cherimoya.Cherimoya`, :class:`~cherimoya.CheriBlock`, :class:`~cherimoya.cherimoya.EMA`. * **Public** module-level symbols: :func:`~cherimoya.io.PeakGenerator`, :class:`~cherimoya.io.PeakNegativeSampler`, :func:`~cherimoya.cheri.fused_dilated_conv_norm`, :class:`~cherimoya.cheri.FusedDilatedConvNormFunc`, :func:`~cherimoya.performance.calculate_performance_measures` and its component metrics, :func:`~cherimoya.losses._mixture_loss` (despite the underscore — it is the trainer's loss function and the API is stable). * **Private** and may change: anything else, including the Triton kernels (``_fwd_*``, ``_bwd_*``, ``_fwd_inf_*``), the CPU fallback (``_cheri_conv_norm_cpu``), the CheriBlock weight cache (``_w_cache``), and the model's checkpoint-payload helper (``_init_kwargs``). Development install ------------------- For development, install in editable mode with the ``docs`` extra: .. code-block:: bash git clone https://github.com/jmschrei/cherimoya.git cd cherimoya pip install -e .[docs] The ``docs`` extra adds ``sphinx``, ``furo``, and ``sphinx-copybutton``, which you need to build this documentation locally: .. code-block:: bash cd docs sphinx-build -b html . _build The build produces ``docs/_build/index.html``. Read the Docs runs the same command with the same dependency set. Running the tests ----------------- The test suite lives in ``tests/`` and uses pytest. .. code-block:: bash pytest tests/ Test files: .. list-table:: :header-rows: 1 :widths: 30 70 * - File - Covers * - ``tests/test_cheri.py`` - Cheri Block forward parity (CPU vs training Triton vs inference megakernel), backward parity against CPU autograd, weight-cache invalidation, dtype matrix. * - ``tests/test_model.py`` - Full Cherimoya forward/backward parity, no_grad == grad-enabled equivalence, EMA-applied save/load round trip. * - ``tests/test_io.py`` - ``PeakGenerator`` and ``PeakNegativeSampler`` reproducibility, per-epoch determinism, multi-worker equivalence. * - ``tests/test_ema.py`` - EMA update/apply/restore semantics. * - ``tests/test_losses.py`` - ``_mixture_loss`` shapes and edge cases. * - ``tests/test_performance.py`` - Evaluation-metric correctness. * - ``tests/test_fit_wiring.py`` - End-to-end fit step on tiny data: confirms optimizers, schedulers, EMA, and checkpoint paths are wired correctly. * - ``tests/test_cli_utils.py`` - JSON merge and default-handling helpers. Markers: * ``@pytest.mark.cuda`` — requires a CUDA device; skipped on CPU-only hosts. * ``@pytest.mark.triton`` — requires both a CUDA device and a Triton install. Both markers are wired through ``tests/conftest.py``, which also disables ``torch.compile`` for the suite so tests don't pay the several-minute autotune cost on every run. To run only the CPU-safe subset: .. code-block:: bash pytest tests/ -m "not cuda and not triton" To run only the GPU parity tests: .. code-block:: bash pytest tests/ -m "cuda or triton" Benchmarking ------------ ``bench_kernels.py`` at the repo root is a standalone script that times the three forward paths and checks they all agree within machine precision. It is intentionally not packaged with the install. Run it with: .. code-block:: bash python bench_kernels.py See :doc:`benchmarks` for the published numbers and the measurement methodology. Coding conventions ------------------ * **Tabs, not spaces.** The codebase uses tab indentation throughout. * **Channels-last layout** ``(N, L, C)`` is used inside the Cheri Block backbone. The input stem and output heads do the necessary transpositions. New blocks should follow the same convention. * **fp32 for normalization statistics** even under bf16 autocast. Both the CPU fallback and the Triton kernels accumulate ``sum`` / ``sq_sum`` in fp32; this is load-bearing for stability and shouldn't be changed casually. * **Triton autotune keys.** Kernels are keyed by ``(C, L)`` so the same configuration is reused across batches with the same shapes. Adding a new kernel that depends on a new shape parameter should add that parameter to the key. * **No public bias terms inside Cheri Blocks.** The input stem, profile head, and count head use biases; the block layers do not. This is intentional (see :doc:`architecture`).