---
license: mit
---

# BART models fine tuned for keyphrase generation

## About

This repository contains 5 models that were trained and evaluated on the three datasets KPBiomed, KP20k and KPTimes.

Details about the models and the KPBiomed dataset can be found in the original paper: Maël Houbre, Florian Boudin and Béatrice Daille. 2022. A Large-Scale Dataset for Biomedical Keyphrase Generation. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI 2022).


## How to use

As this repository contains several models, using the huggingface API directly will not work.
To use one of the 4 models, you need to first download the desired zip file and unzip it.

For example, if we take the biobart-medium model and unzip it in our source directory. We will be able to load the model with the API as below.

	from transformers import BartTokenizerFast, BartForConditionalGeneration
	
	tokenizer = BartTokenizerFast.from_pretrained("biobart-medium")
	model = BartForConditionalGeneration.from_pretrained("biobart-medium")
	
	model.to("cuda")
	

We will then be able to generate keyphrases with the model using Hugging Face's generate function
    
	inputs = tokenizer(input_text, padding="max_length", max_length= 512, truncation=True, return_tensors='pt')

	input_ids = inputs.input_ids.to("cuda")
	attention_mask = inputs.attention_mask.to("cuda")
		        
	outputs = model.generate(inputs=input_ids,attention_mask=attention_mask,
		                     num_beams=20,
		                     num_return_sequences=20
		                     )
		                     
	keyphrase_sequence = tokenizer.batch_decode(outputs,skip_special_tokens=False)