Cookbooks¶
Concrete, runnable recipes for the most common real-world annotation
workflows. Every snippet has been validated against the GENCODE v49
basic-annotation test corpus fetched by
benchmarks/download_corpora.py.
| Cookbook | Topic |
|---|---|
| GENCODE / Ensembl | Deeply nested gene → transcript → exon hierarchies |
| NCBI RefSeq | Massive chromosome records, Dbxref, Note, gbkey tags |
| MANE | Filtering for tag=MANE_Select and tag=MANE_Plus_Clinical |
| Machine Learning Workflows | Bulk feature extraction → PyArrow → Hugging Face / PyTorch with zero per-row Python overhead |
Conventions¶
from gffbase import create_db, FeatureDB
db = create_db("annotation.gff3", "annotation.duckdb", force=True)
# …or re-open an existing DB:
db = FeatureDB("annotation.duckdb")
The cookbooks assume gffbase is on the import path
(pip install gffbase or pip install -e . from the repo root) and
DuckDB's spatial extension is available (it auto-installs on first
ingest; see the per-seqid R-tree y-band design).