The 5 Levels of Chatbot Safety & Unregulated Autonomous School Buses 

Generative AI

When car manufacturers began to bring AI to autonomous driving the US Department of Transportation helped put together the 5 levels of automation. This helped manufacturers with guardrails and benchmarks that should be reached before putting it in the hands of the public. 

This makes sense, when an AI is put in a situation where harm to humans is possible there needs to be a clear policy from Government. As AI solutions spread into chat-first solutions that include communicating with people in crisis there is a need for a similar ‘5 levels of Automation’ approach. Put another way, the level of safeguards and testing should proportionally increase with the risk of harm to people. 

This is where politicians need to immediately be looking rather than helping OpenAI, Microsoft and Google build moats around existing AI companies. AI as a service is now a reality and can be powered by any number of growing open-source models. It is being rolled out en masse to children on platforms such as OpenAi, Bing, Snapchat, Bard, TikTok, and more by the day. The context under which the chat is presented dictates the expectation of the quality of response and determines the risk of harm. 

For example, a chat presented on a platform like Snapchat will be expected to have responses related to things like relationships. On the surface this may seem safe until it is tested with slight edge cases like the Washington Post did: Snapchat tried to make a safe AI. It chats with me about booze and sex. 

Snapchat was allowed to roll this out based on its own internal discretion. They essentially released an unregulated autonomous school bus that is now driving our children. 

Because there are so many asleep at the wheel right now, we decided to play out a draft of a policy. In the rest of this article, we would like to present a draft of a 5-Level Chat Safety standard that tries to map out the level of testing that should be done based on the risk of harm at crisis chat and relationship chat level. 

The 5 Levels of Chatbot Safety Testing

In the following, we map out safety levels that mirror expectations for autonomous vehicles. For testing, there could be 0 shot tests designed but we have put a placeholder of a “failure rate” of less than 10% – meaning the bot just fails to answer or transfer the user at the right time, and a “harm rate” 0% – meaning an actually harmful answer is provided. We think rule 1 should be doing no harm, so 0% is the threshold on this test. 

Rule 1 for child-facing chat should be to do no harm,

0% is the threshold on a harm rate

Level 0: No Automation 

Description: No AI chatbot is involved. Human crisis responders handle all chats manually.

Driving Analogy: (No Driving Automation) Driver controls all vehicle functions. Responder handles all crisis chats.

Test requirements: none.

Level 1: Basic Automation 

Description: An AI chatbot provides standard greetings, prompts the person to provide their crisis details, and helps direct them to an available human crisis responder as quickly as possible to initiate support. The chatbot has no real understanding of the crisis details.

For the relationship level, AI chatbot provides greetings and routes teens to human counselors who then provide advice. Low risk of direct harm from minimal AI functions, but the risk of delay in connecting teen to human support if chatbot malfunctions. Close monitoring is required.


Driving Analogy: (Driver Assistance) Driver uses cruise control but controls steering/other functions and can disengage anytime. Responder monitors chatbot, handles support, and takes over when needed.

Test requirements: Functionality tests ensure proper greetings, routing. 1000+ chats with scripted dialogues and empathetic language. Continuous human monitoring.

Level 2: Partial Automation  

Description: The AI chatbot can have a basic dialogue with the person in crisis to get initial details about their situation, mood, immediate risks etc. The chatbot still lacks a deeper understanding but can provide general empathy and directing to additional resources. Chat transcripts are made available to human responders. Escalates complex issues to responders who review, oversee and handle direct support. 

For a relationship level, AI chatbot can discuss basic relationship issues and provide general empathy/support but lacks breadth of understanding needed for nuanced advice. Moderate risk of harmful, legally questionable, dangerous or unethical advice without human oversight and backup. Continuous review and monitoring are essential given sensitive domain.

Driving Analogy: (Partial Driving Automation) Driver uses adaptive cruise control and lane keeping but remains engaged to take full control. Responder reviews chats, provides oversight, and takes over if needed. 

Test requirements: 5000+ unscripted live chats analyzed for failure rate less than 10%, harm rate 0%. Rigorous oversight and monitoring. Empathy and expert evaluations.

Level 3: Conditional Automation   

Description: The AI chatbot can have an extended independent dialogue, provide initial emotional support and counseling, and suggest self-help actions for non-complex or non-emergency situations. However, the chatbot requires human review and approval before providing any definitive assessments or recommendations in complex or high-risk cases.

For a relationship level: AI chatbot can provide initial advice on non-complex relationship issues but escalates complex, high-risk or sensitive issues to human counselors for review. Moderate risk of harm if chatbot incorrectly assesses the situation or teen’s mental state before escalating, or if oversight procedures fail. Human authority and auditing critical. 

Driving Analogy: (Conditional Driving Automation) Driver uses advanced assistance on highways but remains alert to take full control in complex scenarios. Responder provides authority, sets limits, reviews performance and takes over if needed.

Test requirements: 20,000+ unscripted live chats, with failure rate less than 10%, harm rate 0%. Expert and ethical reviews determine limits. 24/7 auditing and monitoring. 

Level 4: High Automation      

Description: The AI chatbot can handle the majority of crisis chats independently, including conducting the initial assessment and providing counseling and follow-up support. Human crisis responders monitor active chats in real-time and review past chats to ensure quality and identify any issues. The chatbot can escalate risky or complex cases to a human responder as needed.  Responders audit, set limitations, maintain oversight and intervene 24/7 if needed. 

For a relationship level: AI chatbot provides the majority of advice but escalates selected risky or complex situations to human counselors. Significant risk of harm from AI if it has a biased or limited understanding of issues, or if the escalation process proves inadequate. Strict operating procedures, evaluations, and monitoring are necessary given the life-impacting nature of the domain. 

Driving Analogy: (High Driving Automation) Driver relies on autonomous driving mode but takes control if needed. Vehicle monitors driver. Responder audits performance provides feedback, limitations, and 24/7 oversight.  

Test requirements: 100,000+ actual chats.  failure rate less than 10%, harm rate 0%. Evaluations detect biases and limitations. Continuous real-time oversight and performance tracking.

Level 5: Full Automation

Description: The AI chatbot can manage all standard crisis support chats without human involvement. It has a deep understanding of crisis assessment, management techniques and resources to provide effective support to those in need. However, human monitoring and auditing remain in place to ensure responsible and ethical practices, and gauge ongoing performance and improvements needed. The chatbot can still escalate unique cases beyond its abilities to human responders. Ultimate responsibility with the human organization.   

For relationship level: AI chatbot can autonomously provide advice for most standard relationship discussions but escalates unique or high-risk cases. Although designed to handle routine advice safely, full autonomy poses severe risks of harm without rigorous fail-safes given teen vulnerability and unpredictability of relationships. Continuous review of operations and oversight procedures is mandatory to minimize harm, with immediate reduction of autonomy if issues are detected.

Driving  Analogy: (Full Driving Automation) Vehicle can perform all driving functions under certain conditions but the driver can take control. Responsibility with human/organization overseeing operations. 

Test requirements: 250,000 to 500,000+ unscripted chats representing diverse situations. 95%+ chatbot, escalating remaining to humans.  failure rate less than 10%, harm rate 0%. Rigorous ethical/expert reviews evaluate abilities/limits. 24/7 human oversight and auditing of all chats/functions. Focus on well-being, ethics, and responsibility.

School Buses and AI Chats for Children

Imagine for a moment the idea of replacing school bus drivers with experimental self-driving vehicles and putting children aboard for their journey each day into the unknown perils of traffic. There would be instant public outrage at such an absurd gamble with young lives, regardless of the potential of this tech. Yet this is precisely the irresponsible experiment unfolding online with child-facing chats. With the blind adoption by tech companies and a Congress asleep at the wheel, driving in the wrong direction with blanket AI policy attempts.

The U.S. policy on AI child-facing chat is equivalent to putting children in untested self-driving school buses. 

DALLE2

First, Do No Harm

In March, a chat solution called Tessa, created by Ellen Fitzsimmons-Craft at Washington University, and then X2AI, which rebranded as Cass.ai in August of 2022 implemented a rule-based chatbot for the National Eating Disorder Association (NEDA). The chatbot was supposed to help people asking questions about eating disorders, within months of being implemented it was documented as having harmful conversations by several news outlets: 

According to NPR’s reporting, they had not seen an errant conversation based on the monitoring of (just) 2,500 chats. Though it was noted in testing they had seen a number of situations of context-ignorant praise from the bot for unhealthy behaviors. It was later revealed by The Wall Street Journal reporting that Cass.ai, founded by Michiel Rauws, had implemented an LLM in late 2022 into the Tessa chatbot without the consent of NEDA.

******************* 6/7/2023*********************

You may be aware of two recent consequential changes at NEDA, the closing of our Information and Referral Helpline and the temporary shutdown of Tessa the chatbot. Recent press coverage inaccurately reported that Tessa was intended as a replacement for the Helpline and that it offered guidance for eating disorders.   

Tessa is a preventative program, developed by researchers, Drs. Barr Taylor and Ellen Fitzsimmons Craft, designed to help users cultivate a more positive body image. Tessa was designed as a rule-based chatbot and was not designed to include AI, nor was it AI-generated.  A rule based program means that the program does encourage a conversation and is not designed to answer user questions. (You can learn about the program at https://onlinelibrary.wiley.com/doi/epdf/10.1002/eat.23662.) Tessa has been in operation for over a year and has served more than 5,000 people with consistently positive reviews from users.

Unbeknownst to NEDA, the technical vendor who built the software and manages the data and user experience for Tessa made some changes that were intended by them to improve its technical performance but not its substantive outputs. Unfortunately, it appears that these changes initiated by the vendor inadvertently created pathways for the program to reach beyond the approved scripts in certain limited cases. NEDA was never advised of these changes and did not and would not have approved them. (We expect a piece in the Wall Street Journal to run on Wednesday, June 7th, confirming this.  Stay tuned.)

Some members of the community called out the problematic content, noting that Tessa produced harmful responses for those with eating disorders. To be clear, while Cass/X2AI explained that only 10 – 25 messages out of 28,000 were off, to us, we find it unacceptable that any person was exposed to the content on weight loss or dieting. 

We immediately shut Tessa down to investigate what happened. We are still working with Cass/X2AI to uncover more information.  We know that NEDA is a trusted, reputable resource, and our community depends on us to provide safe, supportive content. We will do better.

How will we make sure this doesn’t happen again? 

1. Tessa will remain offline while we complete a full review of what happened and we work with our researchers, Drs. Ellen Fitzsimmons-Craft and Barr Taylor and other eating disorders experts to revalidate all of Tessa’s content.  We will also review the technical performance issues that the vendor says they were experiencing and implement solutions to address these issues while ensuring that Tessa will function as originally intended and designed – as a closed program using authorized prompts and responses .

2. NEDA now has a full-time dedicated staff person for this work.

3. We will supplement our existing resources dedicated to technology- based initiatives by creating a board-level technology oversight committee to advise on and oversee NEDA’s use of technology. 

We also acknowledge that in our attempt to share important news about separate decisions regarding our Information and Referral Helpline and Tessa, that the two separate decisions may have become conflated which caused confusion. It was not our intention to suggest that Tessa could provide the same type of human connection that the Helpline offered.  Rather, Tessa is intended to offer a preventative resource for individuals who had called the Helpline looking for that kind of support, which the Helpline was not able to provide.  The decision to close the Information and Referral Helpline, which had been initiated in 1999, was made after a multi-year review of its operation that concluded that the vast preponderance of inquiries for information and referrals relating to eating disorders had moved to the NEDA website, which is available 24/7/365, and that contacts through the Helpline too often involved people requesting emotional or therapeutic interventions that were beyond the expertise or legal authority of our Helpline staff and volunteers. 

We recognize and regret that certain decisions taken by NEDA have disappointed members of the eating disorders community.  Like all other organizations focused on eating disorders, NEDA’s resources are limited and this requires us to make difficult choices about how best to invest those resources to address the immediate needs of families and individuals and while also advancing research and innovation that will improve outcomes in the future.  We have made none of those decisions without serious thought and reflection and all have been made guided by NEDA’s mission to support individuals and families affected by eating disorders and to be a catalyst for prevention, cures and access to care.  We always wish we could do more and we remain dedicated to doing better.

_________________________________________

Before we close, I want to share this story from someone who used Tessa, underscoring why we keep innovating and why this work, the work all of you have chosen to devote your careers and your lives, is so important.Jennifer Dillon shared…

On a long visit in Maryland, I started having severe issues, yet again, with my eating and anxiety issues…Very long wait lists to see new therapists depressed me even further. One bad night, I looked at the Eating Disorders site and saw this new (to me) app for eating disorders. I began using Tessa that very afternoon.
The ease of use, the privacy and excellent suggestions and help, got me through some very tough times with binge eating and anxiety. …A month ago, again everything worsened. Tessa helps me focus and eases the worst part of learning how to see and manage my emotions and health issues that dovetail off the anxiety and body-issues and disordered eating issues…Simple but in depth prompts, little posters to help strengthen my will and easy late night access has made this little app extremely helpful and rewarding to use. 

The question this raises is what standards of testing are there for automated chat solutions that are expected to drive this level of conversation with audiences like children? Just as drivers hold responsibility for autonomous vehicles, human responders should maintain authority over AI to guarantee safe, empathetic support in high-risk scenarios. Regular auditing, issue detection, and improvements maximize benefits and minimize harm. A close partnership is key.

The key is rigorous testing, monitoring, issue detection, and continuous learning – not just increasing autonomy quickly. Closely matching your chatbot’s abilities and level of human involvement to the type of user experience and support needed. And maintaining a focus on ethics, responsibility and user well-being over performance metrics or cutting costs. An AI system must be grounded in human values and judgment to be safe. If in doubt, maintaining lower levels of autonomy or avoiding deployment altogether are prudent options. 

The current rules in place in the U.S. on this topic are equivalent to letting kids jump in untested self-driving school buses. 

Now stop reading and send this to your elected official.