ChatterjeeLab
/

PepDoRA

Model card Files Files and versions Community

PepDoRA / README.md

pranamanam's picture

Update README.md

58a54b7 verified about 22 hours ago

|

history blame contribute delete

No virus

1.69 kB

	---
	license: cc-by-nc-nd-4.0
	---
	# PepDoRA: A Modified Peptide-Specific Language Model via Weight-Decomposed Low-Rank Adaptation

	In this work, we introduce PepDoRA, a SMILES transformer that fine-tunes the state-of-the-art [ChemBERTa-77M-MLM](https://huggingface.co/DeepChem/ChemBERTa-77M-MLM) transformer on modified peptide SMILES via [DoRA](https://nbasyl.github.io/DoRA-project-page/), a novel PEFT method that incorporates weight decomposition. These representations can be leveraged for numerous downstream tasks, including membrane permeability prediction and target binding assessment, for both unmodified and modified peptide sequences.

	Here's how to extract PepDoRA embeddings for your input peptide:

	```
	import torch
	from transformers import AutoModel, AutoTokenizer

	# Load the model and tokenizer
	model_name = "ChatterjeeLab/PepDoRA"
	model = AutoModel.from_pretrained(model_name, output_hidden_states=True)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Input peptide sequence
	peptide = "CC(C)C[C@H]1NC(=O)[C@@H](C)NCCCCCCNC(=O)[C@H](CO)NC1=O"

	# Tokenize the peptide
	inputs = tokenizer(peptide, return_tensors="pt")

	# Get the hidden states (embeddings) from the model
	with torch.no_grad():
	outputs = model(**inputs)

	# Extract the embeddings from the last hidden layer
	last_hidden_state = outputs.hidden_states[-1]

	# Print the embedding shape (or the embedding itself)
	print(last_hidden_state.shape)
	```

	## Repository Authors

	[Leyao Wang](mailto:[email protected]), Undergraduate Intern in the Chatterjee Lab <br>
	[Pranam Chatterjee](mailto:[email protected]), Assistant Professor at Duke University

	Reach out to us with any questions!