Exploring LMS at Discord: Implementing Evaluation Strategies to Reduce Risk
As an industry expert, I'll share my insights on LMS, AI, and risk mitigation strategies from my experience leading teams at Discord and creating open-source projects like Prom Fu.
- 1. Speaker is experienced in leading teams at Discord, including the developer platform team and DevOps team, and has also led LM product teams.
- 2. They are a maintainer of an open-source library called Prom Fu for evals and red teaming.
- 3. Topics to be covered include: how Discord worked, what worked and didn't work, and how they handled LMS (large language models) at the company.
- 4. One of the most interesting products shipped was an agent called Clyde AI, a chatbot launched to over 200 million users on Discord.
- 5. The biggest challenge with Clyde AI was ensuring it didn't teach little kids how to build bombs or engage in harmful activities.
- 6. Major takeaway from this experience was that security, legal, safety, and sometimes policy were the biggest repeat launch blockers.
- 7. Examples of failure modes include teaching kids to make bombs, harassment, and racism.
- 8. The difficulty lies in quantifying these risks ahead of time to get stakeholders comfortable with launching the product.
- 9. LMS can generate harmful content if prompted in certain ways, which was a significant concern for Discord.
- 10. Prom Fu was used as an evaluation tool to help manage risks associated with LMS at Discord.
- 11. The speaker believes that the predeployment side of safeguarding LMs is more important than live filtering, and suggests breaking down risks into different categories such as brand risk and legal
- 12. They stress the importance of being "buttoned up" and prepared for users to input harmful queries within 10 minutes of launching an LM app on the internet.
- 13. Application-specific jailbreaks are possible when prompting LMs, introducing new vectors for adversarial inputs.
- 14. The speaker recommends using Prom Fu to Red Team and evaluate LMs, and encourages the audience to use Discord and purchase Nitro.
Source: AI Engineer via YouTube
❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!