Config
helical.models.c2s.Cell2SenConfig
Configuration class for the Cell2Sen Model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch_size
|
int
|
int: Number of samples to process in each batch during model operations. Default is 16. |
16
|
organism
|
str
|
The organism from which the cell data is derived (e.g., 'human', 'mouse'). |
None
|
perturbation_column
|
str
|
Column name in the input data that specifies the perturbation applied to cells. |
None
|
max_new_tokens
|
int
|
Maximum number of new tokens that the model can generate for prediction. Default is 200. One gene is roughly 4 tokens. |
200
|
return_fit
|
bool
|
Whether to return model fit parameters in outputs. Default is False. This fits a linear model (y=mx+c) to the gene rank and expression values in log10-transformed space
and can be used to map between expression values and gene ranks. The paper shows this is well captured by a linear model. The fit parameters are returned in the |
False
|
dtype
|
str
|
Data type for the model. Default is "bfloat16". |
'bfloat16'
|
model_size
|
str
|
Size of the model. Default is "2B". Choices are "2B" or "27B". |
'2B'
|
use_quantization
|
bool
|
Whether to use 4-bit quantization. Default is False. |
False
|
seed
|
int
|
Random seed for reproducibility. Default is 42. |
42
|
use_flash_attn
|
bool
|
Whether to use flash attention 2 for attention implementation. Default is False. Only available for CUDA devices. If True, the attention implementation will be set to "flash_attention_2". If False, the attention implementation will be set to "sdpa". |
False
|
max_genes
|
int
|
Maximum number of genes to use for the model. Default is None. If None, all nonzero expressed genes will be used. If a number is provided, the genes will be sorted by expression level and the top max_genes will be used. |
None
|
aggregation_type
|
Literal['mean_pool', 'last_token']
|
How to aggregate final-layer hidden states into a single embedding. Defaults to "mean_pool". "mean_pool": Computes the mean of all non-padding token embeddings in the last layer. "last_token": Uses only the embedding of the final non-padding token (i.e., the position where the model would predict the next token). |
'mean_pool'
|
embedding_prompt_template
|
str
|
Optional custom embedding prompt template used to query the model. If None, a default built-in prompt template is used. Example: 'You are given a list of genes in descending order of expression levels in a {organism} cell. Genes: {cell_sentence} Using this information, describe the function of the cell in a few words. Answer:' |
None
|
device
|
Literal['cpu', 'cuda']
|
Device to use for the model. Default is "cpu". Choices are "cpu" or "cuda". |
'cpu'
|
Source code in helical/models/c2s/config.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |