PicoBlog

[Interesting content] InstructGPT, RLHF and SFT

Arize invited Long Ouyang and Ryan Lowe to their podcast to talk about InstructGPT, the model ChatGPT is based on, and the whole content is 🔥.

Key takeaways:

  • The concept of alignment (a term popularised by Stuart Russell from Berkeley, I link an interview with him in the comments).

  • InstructGPT is based on GPT-3, but it is aware that it is getting instructions while the older model was only "tricked" into performing them.

  • What is a Reward Model, and how is it incorporated into the training process

  • The difference between RLHF (reinforcement learning from human feedback) and SFT (Supervised fine-tuning): RLHF is for fine-grain tuning, while SFT causes more significant shift in model behaviour.

  • Regarding prompts, the major improvement is that older models needed to be prompted in a specific "almost coded" language, and now they can be prompted more intuitively. They are less sensitive to the prompts but still steerable and, most importantly, "naturally" steerable.

    (Full disclosure: I am an advisor of Arize AI.)

Further material

One interesting takeaway from the paper is the training cost of fine-tuning:

Due to the huge interest in ChatGPT, I plan to post regularly about it, so:

Or follow me on LinkedIn:

Follow me on LinkedIn

ncG1vNJzZmikkajHrbuNrKybq6SWsKx6wqikaKhfnru1sdGeqq2hnpx6pLvNrZynrF2eu7TA0a6arZ%2BgqXqzuMef

Almeda Bohannan

Update: 2024-12-04