Building a Viral AI Pictionary Game: Lessons in Multimodality with OpenAI and Clip

Join me, Joseph, as we dive into the viral game of Paint. WTF, built using OpenAI's CLIP, which had 120,000 players in its first week and learn about the lessons we learned from multimodality and the power of AI-driven creativity.

  • 1. Joseph and his team built a viral game called "paint.wtf" using GPT-3, CLIP, and an open source inference server.
  • 2. In its first week, the game had 120,000 players, processing seven requests per second.
  • 3. The goal of the game was to play AI Pictionary, where users were given a prompt generated by GPT-3, asked to draw it using a Microsoft Paint-like interface, and then CLIP judged which image was mos
  • 4. Users could use any device with a browser to participate in the game, leading to tens of thousands of hours spent on drawings.
  • 5. The team plans to build an MVP version of the app using less than 50 lines of Python and an open-source inference server.
  • 6. CLIP judges the vector similarity of the text embedding of the prompt and the image embedding, with whichever embeddings are most similar ranking at the top of the leaderboard.
  • 7. The game went viral on Reddit and Hacker News in its first week, processing over 120,000 requests.
  • 8. The team learned several lessons from building the app, including the importance of moderating user-generated content using CLIP to detect handwriting and NSFW images.
  • 9. CLIP's similarity scores are conservative, with the lowest score being 8% and the highest being 48%.
  • 10. The team also learned that strangers on the internet can be smart, often trying to sneak in extra content or draw something other than the original prompt.
  • 11. The game led to some interesting learnings about what CLIP knows and doesn't know, such as its ability to recognize specific objects like a John Deere tractor.
  • 12. The game also highlighted the potential for building new paradigms of AI models that can understand open-form, open-set concepts rather than just specific lists of classes.
  • 13. The team plans to build an app where a text embedding will be produced and used as the prompt for users to draw, with CLIP judging the similarity between the prompt and the user's drawing.
  • 14. The winning entry is the one that minimizes the distance between the prompt and the user's drawing.
  • 15. The team built the original version of the app in 48 hours, and plans to build a new version using OpenCV, Inference, and Supervisely for real-time object detection.
  • 16. The team will use an open source model called Rock Paper Scissors from Roboflow Universe, which has over 50,000 pre-trained models available.
  • 17. The new version of the app will use CLIP to judge the similarity between the prompt and the user's drawing.
  • 18. The team plans to display real-time results on a leaderboard, with users competing to create the most accurate drawing based on the prompt.
  • 19. The game highlights the potential for using AI models like GPT-3 and CLIP to build new types of online games and interactive experiences.
  • 20. The team's use of open source tools and pre-trained models also demonstrates the power of reusing existing resources to quickly build and deploy new applications.

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!