Scaling AI for Mission-Critical Decisions: A Medical Doctor's Approach to Real-Time, Reference-Free Evaluations

Join Christopher Ljy, medical doctor turned AI engineer, as he shares his insights on building an evaluation system that works at scale and supports mission-critical decisions in healthcare.

  • 1. Christopher LJY is a medical doctor turned AI engineer who will discuss building an evaluation system that works at scale, focusing on mission-critical decisions like those in healthcare.
  • 2. At Anterior, they've scaled to serve insurance providers covering 50 million American lives, and share their insights from the last 18 months.
  • 3. Real-time reference-free evaluations can build customer trust by ensuring accuracy, especially in industries where mistakes are not tolerated, such as healthcare.
  • 4. Going from an MVP (Minimum Viable Product) to serving customers at scale comes with new challenges, including increased edge cases that may not be apparent during the initial development phase.
  • 5. An example of a potential error is misinterpreting medical records, where "suspicious" might imply no confirmed diagnosis, but actually means there is one.
  • 6. Mistakes in evaluations can lead to lawsuits for inappropriate AI automation use in US healthcare organizations.
  • 7. To identify and handle failure cases, consider performing human reviews of AI outputs, but be aware that this approach does not scale well as the volume of decisions increases.
  • 8. An internal clinical team and tooling can help make human reviews more efficient by surfacing context in an accessible way without requiring scrolling.
  • 9. Human reviewers can add critiques to flag incorrect answers, which can then be used to generate ground truths (descriptions of the correct answer) for offline evaluations.
  • 10. Offline evaluations using gold standard data sets can help iterate AI pipelines and monitor performance over time but relying solely on them might lead to identifying issues too late.
  • 11. A real-time reference-free evaluation system is crucial for large-scale, high-heterogeneity input spaces like medical records, as it allows for immediate evaluation and response to issues.
  • 12. Using an LLM (Language Learning Model) as a judge can help determine the confidence in outputs by evaluating the model's performance before human review.
  • 13. Scoring systems for LLMs as judges can evaluate helpfulness, conciseness, on-brand tone, and confidence levels in binary or multiclass classifications.
  • 14. Real-time reference-free evaluations can predict estimated performance across all cases, identify relevant cases with the highest probability of error, and dynamically prioritize human reviews bas
  • 15. Validating the validator process helps improve the system's ability to detect edge cases over time as it becomes harder for competitors to replicate.
  • 16. Incorporating a reference-free evaluation system into the pipeline can ensure customer trust by providing accurate outputs or taking further actions when necessary.
  • 17. At Anterior, this approach has enabled them to review tens of thousands of cases with a small team of clinical experts instead of hiring hundreds of nurses.
  • 18. Strong alignment between AI and human reviews has been achieved, along with quick error identification and response.
  • 19. Provably industry-leading performance at prior authorization has been attained, leading to customer trust and positive feedback.
  • 20. Principles for building a successful evaluation system include thinking big, using review data to improve the auditing system, evaluating on live production data, getting the best reviewers, empow
  • 21. An effective evaluation system provides real-time performance estimates and enables a scalable, cost-effective solution powered by a small team of experts.
  • 22. The talk recommends reaching out with thoughts or ideas, and encourages interested individuals to check out open positions at Anterior.

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!