Skip to content

Cookbooks

Concrete, runnable recipes for the most common real-world annotation workflows. Every snippet has been validated against the GENCODE v49 basic-annotation test corpus fetched by benchmarks/download_corpora.py.

Cookbook Topic
GENCODE / Ensembl Deeply nested gene → transcript → exon hierarchies
NCBI RefSeq Massive chromosome records, Dbxref, Note, gbkey tags
MANE Filtering for tag=MANE_Select and tag=MANE_Plus_Clinical
Machine Learning Workflows Bulk feature extraction → PyArrow → Hugging Face / PyTorch with zero per-row Python overhead

Conventions

from gffbase import create_db, FeatureDB
db = create_db("annotation.gff3", "annotation.duckdb", force=True)
# …or re-open an existing DB:
db = FeatureDB("annotation.duckdb")

The cookbooks assume gffbase is on the import path (pip install gffbase or pip install -e . from the repo root) and DuckDB's spatial extension is available (it auto-installs on first ingest; see the per-seqid R-tree y-band design).