Config
helical.models.tahoe.TahoeConfig
Configuration class to use the Tahoe-1x Model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_size
|
Literal['70m', '1b', '3b']
|
The size of the model to use. Options are: - "70m": 12-layer transformer with 512 embedding dimensions - "1b": Larger model variant (1 billion parameters) - "3b": Largest model variant (3 billion parameters) |
"70m"
|
batch_size
|
int
|
The batch size for inference. |
8
|
emb_mode
|
Literal['cell', 'gene']
|
The embedding mode to use: - "cell": Returns cell-level embeddings (mean-pooled across genes) - "gene": Returns gene-level embeddings for each gene token |
"cell"
|
device
|
Literal['cpu', 'cuda']
|
The device to use. Either use "cuda" or "cpu". |
"cpu"
|
attn_impl
|
Literal['flash', 'torch']
|
The attention implementation to use: - "flash": Uses Flash Attention for speed and memory efficiency (doesn't support attention output) - "torch": Uses standard PyTorch attention (supports attention output but slower) |
"flash"
|
max_length
|
int
|
The maximum sequence length for tokenization. |
2048
|
num_workers
|
int
|
Number of workers for data loading. |
8
|
prefetch_factor
|
int
|
Number of batches to prefetch per worker. |
48
|
hf_repo_id
|
str
|
The Hugging Face repository ID to load the model from. |
"tahoebio/Tahoe-x1"
|
Returns:
| Type | Description |
|---|---|
TahoeConfig
|
The Tahoe configuration object |
Notes
The Tahoe-1x model is a foundation model for single-cell RNA-seq data that uses a transformer architecture to learn representations of cellular states. The model accepts raw count data and produces embeddings for cells and optionally genes.