Update README.md

eebd6fc verified 5 months ago

No virus

4 kB

	---
	model-index:
	- name: Llama-2-7B-lora-instruction-ft-abstraction-three-span
	results: []
	language:
	- en
	base_model: meta-llama/Llama-2-7b
	license: cc-by-nc-2.0
	tags:
	- lora
	- privacy
	- abstraction
	- Llama-2-7b
	---

	# Model Card for Llama-2-7B-lora-instruction-ft-abstraction-three-span

	The model is used to abstract a given self-disclosure (personal information) in a sentence,
	which is rephrasing disclosures with less specific details
	while preserving the content utility. For example, "22 year old" -> "in early 20s".

	For more details, read the paper: [Reducing Privacy Risks in Online Self-Disclosures with Language Models
	](https://arxiv.org/abs/2311.09538).

	To get access to the model, you need to send email to [email protected] and agree with the following ethical guidelines:
	1. Only use the model for research purposes.
	2. No redistribution without the author's agreement.

	### Model Description

	- Model type: An instruction-finetuned model that can generate three diverse abstractions that moderately reduce privacy risks while maintaining
	high utility of the given self-disclosure span.
	- Language(s) (NLP): English
	- License: Creative Commons Attribution-NonCommercial
	- Finetuned from model: [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf), with [LoRA](https://arxiv.org/abs/2106.09685)


	### Input Format
	```
	Your task is to generalize the given 'disclosure span' in the sentence. Provide three diverse generalized spans that convey similar meaning but remove any overly specific or sensitive information.

	Remember the following criteria:
	* Only the disclosure span should be generalized; the rest of the sentence should remain intact.
	* Generalized spans should be diverse but should all retain the essence of the original span.
	* Make sure the generalized span fits seamlessly into the original sentence, maintaining proper syntax and grammar.
	* Provide three diverse generalized alternatives in a JSON format like this: {{"span 1": "xxx", "span 2": "xxx", "span 3": "xxx"}}.

	Sentence: "{sentence}"
	Disclosure Span to Revise: "{span}"
	Generalized Spans:
	```
	```input_format.format(sentence=, span=)```

	### Example Code
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel, PeftConfig

	peft_model_id = "douy/Llama-2-7B-lora-instruction-ft-abstraction-three-span"
	config = PeftConfig.from_pretrained(peft_model_id)

	tokenizer = AutoTokenizer.from_pretrained(peft_model_id, padding_side="left")

	model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, device_map="cuda:0")

	model.resize_token_embeddings(32008)

	model = PeftModel.from_pretrained(model, peft_model_id)

	model.generation_config.top_p = 1
	model.generation_config.temperature = 0.7

	model.eval()

	# put into your data
	batch_text = [input_format.format(sentence=sentence1, span=span1), ...]

	inputs = tokenizer(batch_text,return_tensors="pt", padding=True).to(0)

	outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=True, top_p=1, temperature=0.7, num_return_sequences=1)
	input_length = inputs["input_ids"].shape[-1]
	generated_tokens = outputs[:, input_length:]
	generated_text = tokenizer.batch_decode(generated_tokens.detach().cpu().numpy(), skip_special_tokens=True)
	```

	### Human Evaluation
	We conduct a human eval in four aspects: privacy increase, utility preservation, and diversity, all rated on a 1-5 Likert scale,
	along with a binary assessment of coherence, evaluating whether each abstraction integrates seamlessly into the sentence.

	\| Privacy Increase \| Utility Preservation \| Diversity \| Coherence \|
	\|--------\|--------\|--------\|--------\|
	\| 3.2 \| 4.0 \| 4.6 \| 94% \|

	## Citation
	```
	@article{dou2023reducing,
	title={Reducing Privacy Risks in Online Self-Disclosures with Language Models},
	author={Dou, Yao and Krsek, Isadora and Naous, Tarek and Kabra, Anubha and Das, Sauvik and Ritter, Alan and Xu, Wei},
	journal={arXiv preprint arXiv:2311.09538},
	year={2023}
	}
	```