Critical Evaluation: The Need for Scientific AI Testing

Join me, Dr. Arman Chowy, CEO and co-founder of Human Intelligence, as we delve into the world of AI and the importance of critically evaluating its outputs to ensure responsible and equitable decision-making.

  • 1. Be cautious when using AI systems, and critically evaluate their outputs.
  • 2. Asking follow-up questions like "prove it" or "give me evidence for it" can help uncover biases and flaws in AI models.
  • 3. Use LLMs (large language models) as reference guides rather than sources of synthesized information.
  • 4. Adversarial testing is a common method to verify the content in AI outputs, by asking questions from different angles and testing the model's robustness.
  • 5. Dr. Arman Chowy is the CEO and co-founder of Human Intelligence, a tech nonprofit focused on AI evaluation and testing.
  • 6. Current evaluation methods for AI models are unscientific, with performance metrics being arbitrary constructs based on specific tests.
  • 7. The Biden administration appointed Dr. Chowy as the first United States science envoy for artificial intelligence.
  • 8. Public red teaming for generative AI involves working with various communities to test and evaluate AI systems through a wide range of scenarios.
  • 9. During his time at Twitter, Dr. Chowy led the machine learning ethics, transparency, and accountability team that conducted research on social media's impact on society.
  • 10. The first algorithmic bias found by Dr. Chowy and his team was in an image cropping model that favored lighter-skinned people over those with darker skin tones.
  • 11. AI models can perpetuate biases present in their training data, which often reflects discriminatory content from the internet.
  • 12. Responsible AI practices involve building models that help humanity and accurately provide input and feedback for everyone.
  • 13. Red teaming is a method of edge testing models by pushing them to extreme situations that could lead to societal harm.
  • 14. Attack strategies like setting up impossibility scenarios or acting confident with false information can reveal model vulnerabilities.
  • 15. The three H's (helpful, harmless, and honest) in AI systems can be manipulated to achieve adversarial outcomes.
  • 16. LLMs can hallucinate content, as seen when Dr. Chowy asked for canonical readings on artificial intelligence and received only white male authors, along with two nonexistent women scholars.
  • 17. The way prompts are formulated significantly influences the output of AI models.
  • 18. Human beings should maintain control over their thinking, rather than relying solely on AI systems.
  • 19. Gartner's theory of multiple intelligences includes kinesthetic and emotional intelligence, which aren't limited to economic productivity.
  • 20. The core value that should remain constant in AI development is human agency, or the ability for individuals to make their own decisions in life.
  • 21. Dr. Chowy sees a gap between the potential and reality of AI technology but considers it an opportunity for growth.
  • 22. A narrow definition of intelligence as productivity can overlook other important forms of intelligence, like empathy and collaboration.
  • 23. Public perception of intelligence often encompasses more than just economic productivity.
  • 24. Human agency is the most critical value to embed in AI systems and should be preserved throughout their development and deployment.

Source: EO via YouTube

❓ What do you think? What are the most effective ways to critically evaluate AI systems, ensuring they serve humanity's best interests rather than reinforcing existing biases or limitations? Feel free to share your thoughts in the comments!