--- license: mit --- # BART models fine tuned for keyphrase generation ## About This repository contains 5 models that were trained and evaluated on the three datasets KPBiomed, KP20k and KPTimes. Details about the models and the KPBiomed dataset can be found in the original paper: Maël Houbre, Florian Boudin and Béatrice Daille. 2022. A Large-Scale Dataset for Biomedical Keyphrase Generation. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI 2022). ## How to use As this repository contains several models, using the huggingface API directly will not work. To use one of the 4 models, you need to first download the desired zip file and unzip it. For example, if we take the biobart-medium model and unzip it in our source directory. We will be able to load the model with the API as below. from transformers import BartTokenizerFast, BartForConditionalGeneration tokenizer = BartTokenizerFast.from_pretrained("biobart-medium") model = BartForConditionalGeneration.from_pretrained("biobart-medium") model.to("cuda") We will then be able to generate keyphrases with the model using Hugging Face's generate function inputs = tokenizer(input_text, padding="max_length", max_length= 512, truncation=True, return_tensors='pt') input_ids = inputs.input_ids.to("cuda") attention_mask = inputs.attention_mask.to("cuda") outputs = model.generate(inputs=input_ids,attention_mask=attention_mask, num_beams=20, num_return_sequences=20 ) keyphrase_sequence = tokenizer.batch_decode(outputs,skip_special_tokens=False)