How giving negative reward to complex context makes model better?

by maywell - opened Nov 27, 2023

Discussion

maywell

Nov 27, 2023

•

edited Nov 27, 2023

Hi, I appreciate your amazing work.

I'm just wondering

# Note - Example of a complex phrase
good_phrases:
  - phrase: "The apple is in the bedroom"
    weight: 1
    contexts: ["Question: If I'm in the living room and pick up the apple, go to the bedroom and drop the apple, then walk to the kitchen, where is the apple? Explain your reasoning. Answer: "]

why you made these reasoning examples get negative reward.

Gryphe

Owner Nov 27, 2023

That's because the algorithm chases a single number (the probability of all phrases combined), which it tries to decrease as much as possible.

Bad phrases should have their probability lowered.
Good phrases should have their probability increased, which in the background is then flipped to also be subtracted from the total. So here lower also becomes desired.

maywell

Nov 27, 2023

Ah thank you I understood!

Gryphe

Owner Nov 27, 2023

You made me review the corresponding code and I just found a mistake in my logic (due to extensive additions since then) which I will be hotfixing shortly! So thank you for asking!

Gryphe changed discussion status to closed Nov 29, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment