gbyuvd commited on
Commit
9ff1dc6
1 Parent(s): 4c201cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -96,7 +96,7 @@ This prototype is a [sentence-transformers](https://www.SBERT.net) based on [Min
96
  I am planning to train this model with more epochs on current dataset, before moving on to a larger dataset with 6 million pairs generated from ChemBL34. However, this will take some time due to computational and financial constraints. A future project of mine is to develop a custom model specifically for cheminformatics to address any biases and optimization issues in repurposing an embedding model designed for NLP tasks.
97
 
98
  ### Update
99
- This model won't be trained further on current natural products dataset nor the ChemBL34, since I've been working on pre-training a BERT-like base model that operates on SELFIES with a custom tokenizer for past two weeks. This base model was scheduled for release this week, but due to mistakes in parsing some SELFIES notations, the pre-training is halted and I am working intensely to correct these issues and continue the training. The base model will hopefully released next week. Following this, I plant to fine-tune a sentence transformer and a classifier model built on top of that base model.
100
  The timeline for these tasks depends on the availability of compute server and my own time constraints, as I also need to finish my undergrad thesis. Thank you for checking out this model.
101
 
102
  ### Disclaimer: For Academic Purposes Only
 
96
  I am planning to train this model with more epochs on current dataset, before moving on to a larger dataset with 6 million pairs generated from ChemBL34. However, this will take some time due to computational and financial constraints. A future project of mine is to develop a custom model specifically for cheminformatics to address any biases and optimization issues in repurposing an embedding model designed for NLP tasks.
97
 
98
  ### Update
99
+ This model won't be trained further on current natural products dataset nor the ChemBL34, since I've been working on pre-training a BERT-like base model that operates on SELFIES with a custom tokenizer for past two weeks. This base model was scheduled for release this week, but due to mistakes in parsing some SELFIES notations, the pre-training is halted and I am working intensely to correct these issues and continue the training. The base model will hopefully released next week. Following this, I plan to fine-tune a sentence transformer and a classifier model built on top of that base model.
100
  The timeline for these tasks depends on the availability of compute server and my own time constraints, as I also need to finish my undergrad thesis. Thank you for checking out this model.
101
 
102
  ### Disclaimer: For Academic Purposes Only