Update README.md
Browse files
README.md
CHANGED
@@ -96,7 +96,7 @@ This prototype is a [sentence-transformers](https://www.SBERT.net) based on [Min
|
|
96 |
I am planning to train this model with more epochs on current dataset, before moving on to a larger dataset with 6 million pairs generated from ChemBL34. However, this will take some time due to computational and financial constraints. A future project of mine is to develop a custom model specifically for cheminformatics to address any biases and optimization issues in repurposing an embedding model designed for NLP tasks.
|
97 |
|
98 |
### Update
|
99 |
-
This model won't be trained further on current natural products dataset nor the ChemBL34, since I've been working on pre-training a BERT-like base model that operates on SELFIES with a custom tokenizer for past two weeks. This base model was scheduled for release this week, but due to mistakes in parsing some SELFIES notations, the pre-training is halted and I am working intensely to correct these issues and continue the training. The base model will hopefully released next week. Following this, I
|
100 |
The timeline for these tasks depends on the availability of compute server and my own time constraints, as I also need to finish my undergrad thesis. Thank you for checking out this model.
|
101 |
|
102 |
### Disclaimer: For Academic Purposes Only
|
|
|
96 |
I am planning to train this model with more epochs on current dataset, before moving on to a larger dataset with 6 million pairs generated from ChemBL34. However, this will take some time due to computational and financial constraints. A future project of mine is to develop a custom model specifically for cheminformatics to address any biases and optimization issues in repurposing an embedding model designed for NLP tasks.
|
97 |
|
98 |
### Update
|
99 |
+
This model won't be trained further on current natural products dataset nor the ChemBL34, since I've been working on pre-training a BERT-like base model that operates on SELFIES with a custom tokenizer for past two weeks. This base model was scheduled for release this week, but due to mistakes in parsing some SELFIES notations, the pre-training is halted and I am working intensely to correct these issues and continue the training. The base model will hopefully released next week. Following this, I plan to fine-tune a sentence transformer and a classifier model built on top of that base model.
|
100 |
The timeline for these tasks depends on the availability of compute server and my own time constraints, as I also need to finish my undergrad thesis. Thank you for checking out this model.
|
101 |
|
102 |
### Disclaimer: For Academic Purposes Only
|