gbyuvd
/

ChemEmbed-v01

Sentence Similarity

sentence-transformers

feature-extraction

Generated from Trainer

dataset_size:1,183,174

loss:CosineSimilarityLoss

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

gbyuvd commited on Jul 11

Commit

9ff1dc6

•

1 Parent(s): 4c201cb

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -96,7 +96,7 @@ This prototype is a [sentence-transformers](https://www.SBERT.net) based on [Min
 I am planning to train this model with more epochs on current dataset, before moving on to a larger dataset with 6 million pairs generated from ChemBL34. However, this will take some time due to computational and financial constraints. A future project of mine is to develop a custom model specifically for cheminformatics to address any biases and optimization issues in repurposing an embedding model designed for NLP tasks.
 ### Update
-This model won't be trained further on current natural products dataset nor the ChemBL34, since I've been working on pre-training a BERT-like base model that operates on SELFIES with a custom tokenizer for past two weeks. This base model was scheduled for release this week, but due to mistakes in parsing some SELFIES notations, the pre-training is halted and I am working intensely to correct these issues and continue the training. The base model will hopefully released next week. Following this, I plant to fine-tune a sentence transformer and a classifier model built on top of that base model.
 The timeline for these tasks depends on the availability of compute server and my own time constraints, as I also need to finish my undergrad thesis. Thank you for checking out this model.
 ### Disclaimer: For Academic Purposes Only

 I am planning to train this model with more epochs on current dataset, before moving on to a larger dataset with 6 million pairs generated from ChemBL34. However, this will take some time due to computational and financial constraints. A future project of mine is to develop a custom model specifically for cheminformatics to address any biases and optimization issues in repurposing an embedding model designed for NLP tasks.
 ### Update
+This model won't be trained further on current natural products dataset nor the ChemBL34, since I've been working on pre-training a BERT-like base model that operates on SELFIES with a custom tokenizer for past two weeks. This base model was scheduled for release this week, but due to mistakes in parsing some SELFIES notations, the pre-training is halted and I am working intensely to correct these issues and continue the training. The base model will hopefully released next week. Following this, I plan to fine-tune a sentence transformer and a classifier model built on top of that base model.
 The timeline for these tasks depends on the availability of compute server and my own time constraints, as I also need to finish my undergrad thesis. Thank you for checking out this model.
 ### Disclaimer: For Academic Purposes Only