Model
helical.models.helix_mrna.HelixmRNA
Bases: HelicalRNAModel
Helix-mRNA Model.
The Helix-mRNA Model is a transformer-based model that can be used to extract mRNA embeddings from mRNA sequences. The model is based on the Mamba2 model, which is a transformer-based model trained on mRNA sequences. The model is available through this interface.
Example
from helical.models.helix_mrna import HelixmRNA, HelixmRNAConfig
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
helix_mrna_config = HelixmRNAConfig(batch_size=5, max_length=100, device=device)
helix_mrna = HelixmRNA(configurer=helix_mrna_config)
rna_sequences = ["EACUEGGG", "EACUEGGG", "EACUEGGG", "EACUEGGG", "EACUEGGG"]
dataset = helix_mrna.process_data(rna_sequences)
rna_embeddings = helix_mrna.get_embeddings(dataset)
print("Helix_mRNA embeddings shape: ", rna_embeddings.shape)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
configurer
|
HelixmRNAConfig
|
The configuration object for the Helix-mRNA model. |
default_configurer
|
Notes
Helix_mRNA was trained using a character in between each codon of the mRNA sequence. This is done to ensure that the model can learn the structure of the mRNA sequence. Although it can take a standard RNA sequence as input, it is recommended to add the letter E between each codon of the mRNA sequence to get better embeddings.
Source code in helical/models/helix_mrna/model.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
process_data(sequences)
Process the mRNA sequences and return a Dataset object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sequences
|
list[str] or DataFrame
|
The mRNA sequences. If a DataFrame is provided, it should have a column named 'Sequence'. |
required |
Returns:
Type | Description |
---|---|
Dataset
|
The dataset object. |
Source code in helical/models/helix_mrna/model.py
get_embeddings(dataset)
Get the embeddings for the mRNA sequences.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
HelixmRNADataset
|
The dataset object. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
The embeddings array. |