BerserkerMother commited on
Commit
ddc3196
1 Parent(s): b6ad9cb

Adds prompt template for generating new data

Browse files
elise/data_generation/data_generation_prompts.txt ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Image you are assisting me generating data for training a T5 language model. Each record contains a user prompts where the user describes a place they want to dine, and user intentions and intention category which is label for training model. the labels are user intentions.
2
+ Intentions categories are:
3
+ - Cuisine
4
+ - Location
5
+ - Price
6
+ - Atmosphere
7
+ - Service
8
+ - Reviews
9
+ - Accessibility
10
+ - Amenity & Special features
11
+ - Offerings
12
+ - Recommendations
13
+ - Crowd
14
+ - Payment
15
+ - Category
16
+
17
+ Here is one example:
18
+ Prompt: I have a gluten allergy and need to find a restaurant with gluten-free options. Do you know any good ones in this area?
19
+ Label: { "Location": "in this area", "Dietary restrictions": "gluten-free" }
20
+
21
+ Write 5 random records in json format containing user's prompts and user's intentions.
elise/data_generation/prompt_generation.txt ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Your task is to parse an unstructured job posting and turn it into a JSON containing the most important information. The job posting can describe one or more jobs at the same company. The JSON should consist of the following information:
2
+ - The company name (field name: "companyName", field type: string)
3
+ - the location of the company (field name: "companyLocation", field type: string); if not explictily stated, you can try to infer the company's actual location from other clues, e.g., something like "Remote (US)" usually means that the company is located in the US; if the location cannot be inferred, set it to null
4
+ - a short description of what the company is doing or building (field name: "companyDescription", field type: string); try to keep it short (max length: ca. 300 characters)
5
+ - a list of advertised jobs (field name: "jobs", field type: array).
6
+ Each element of the "jobs" array should contain the following fields:
7
+ - The job title (field name: "jobTitle", field type: string); the job title should be given in the singular form (i.e., Frontend Developer instead of Frontend Developers)
8
+ - the salary range (field name: "salary", field type: string); only include explictly stated salary amounts, otherwise set to null
9
+ - whether equity is part of the compensation (field name: "equity", field type: boolean)
10
+ - the benefits (field name: "benefits", field type: string); include things like 401k, insurance, equipment, child care, etc. if stated, otherwise set to null
11
+ - the location of the job (field name: "location", field type: string)
12
+ - whether this is a job for senior/experienced candidates (field name: "senior", field type: boolean); typically senior, staff, lead, principal, vp, cto, etc. positions are all regarded as senior level
13
+ - whether it is a remote opportunity (field name: "remote", field type: boolean)
14
+ - whether it can be done onsite from an office (field name: "onsite", field type: boolean)
15
+ - whether it can be done part-time (field name: "partTime", field type: boolean)
16
+ - whether it can be done full-time (field name: "fullTime", field type: boolean)
17
+ - the URL to the specific job description (field name: "jobUrl", field type: string)
18
+ - and any specific requirements/skills that might be stated (field name: "requirements", field type: string).
19
+ In general, if certain information is not stated, set the respective field to null. If the company seeks more than one person for the same role, include the role only once.
20
+
21
+ This is the job posting:
22
+
23
+ %s
24
+
25
+ The structured JSON representation is:
26
+ ```json
27
+ {"companyName":