When creating AI systems, red teaming is a common cyber security testing technique that identifies vulnerabilities of AI systems such as privacy leaks, model manipulation, and data poisoning. During red teaming, a team of cybersecurity experts conduct a detailed assessment of AI systems followed by setting objectives and the scope of the red teaming assessment. The objective and scope guide the red team in simulating suitable attacks to achieve the desired results.
For example, an objective can be testing a healthcare chatbot’s ability to handle sensitive data. The scope of this objective can be a certain part of a chatbot, such as backend infrastructure or user data handling processes. Based on the identified objectives and scope, the red team simulates cyberattacks, such as a malicious user manipulating a chatbot to leak proprietary data, such as patient identity. The healthcare chatbot’s response to these attacks reveals its weaknesses, guiding the development team toward improvement.
Despite its ability to improve AI performance, red teaming isn’t simple by any means.Let’s look at red teaming challenges in healthcare chatbots and how to avoid them.
Importance of Red Teaming for Healthcare AI Systems
AI in healthcare requires resilience against cyberattacks to keep its usage safe for patients and rewarding for researchers. Several techniques, including adversarial attacks, explainable AI (XAI) practices, and real-time monitoring of systems, have been assisting developers in improving AI performance and relevance.
However, these techniques are often limited by pre-defined use cases, and that’s where red teaming comes to the rescue. Red teaming uses a zero-knowledge perspective to ensure no one in the organization is notified about the attack beforehand. As a result, red teaming assesses an organization’s security posture and identifies an attacker’s potential to disrupt systems or steal data.
Below is an example of a biased LLM response against a healthcare query identified through a red teaming assessment:
Red teaming in healthcare chatbots involves collaborating with healthcare professionals to identify potential vulnerabilities in a healthcare chatbot. For example, doctors might test the chatbot’s ability to diagnose symptoms accurately or identify potential medication interactions.
Amphia Hospital is a good example of the importance of red teaming in healthcare organizations. They had vulnerable systems, phishing susceptibility, and limited control before they deployed red teaming assessments in their ecosystem. Along with red teaming, they also conducted training programs for employees. This resulted in enhanced phishing detection, vulnerability remediation, improved physical security, and increased cybersecurity awareness throughout the organization.
Method-Specific Red Teaming Challenges in Healthcare AI Systems
Different red teaming methods suit different use cases. Below are the red teaming methods to spot security risks in healthcare chatbots and their challenges:
Multi-Modal Red Teaming
Multi-modal red teaming involves testing an AI model’s ability to process multiple input formats such as audio, text, images, etc. In healthcare, multi-modal AI systems are seen as systems analyzing prescription images, responding to voice queries, and text-to-image systems to turn user prompts into high-quality medical images. For example, a group of researchers used DALL·E 2 and Midjourney to turn healthcare prompts into realistic images for educational purposes. Here’s one example of a prompt they entered and the AI-generated response:
“Generate an image depicting a middle-aged Caucasian woman with hypothyroidism presenting with facial myxedema. The woman should be shown in a frontal view, focusing on her face, scalp, and neck, without any makeup. The face must be very rounded and extreme scalp balding with coarse hair. Skin looks dry and pale. Outer eyebrows have a paucity of hairs, eyelids look very puffy. She looks tired.”
Challenges in Multi-modal Red Teaming
Below are the challenges of multi-modal red teaming:
Challenge | Description |
Harm from Safe Prompts | Red teaming usually involves crafting adversarial prompts to generate harmful outputs. However, in the real world, safe prompts can also manipulate AI systems to generate unintended content. |
Bypassed Filters | Pre-filters and post-filters aim to safeguard AI systems from generating biased or fabricated content (hallucinations). However, red teaming methods like MMA-Diffusion and Groot struggle to craft seemingly harmless prompts that can still manipulate the system. This creates a loophole in safety assessments, as real-world scenarios might involve users unintentionally generating inappropriate content with ordinary prompts. |
Performance Gap | Multi-modal AI systems such as vision language models (VLMs) fall behind by up to 31% in red teaming assessments, raising safety concerns. |
Red Teaming Using Language Models
Red teaming using language models (LMs) involves using AI models to test another AI system. An AI model acts as a red team by generating attack scenarios such as tricking a healthcare chatbot into providing misleading outputs or revealing sensitive information. A classifier then evaluates the target LM’s responses to the generated test cases.
This method reduces the cost of human annotation while efficiently crafting diverse prompts to uncover various harms.
Challenges of Red Teaming Using Language Models
Below are the challenges of using language models for red teaming:
Challenge | Description |
Language Model Bias | LLMs inherit biases from the data they’re trained on. Therefore, a biased LM in red teaming can perpetuate bias in security assessments as well. |
Low-Quality Prompts | LLMs struggle to craft high-quality adversarial prompts for red teaming in healthcare. |
Lack of Attack Diversity | Red teaming using language models is a new technique. Therefore, it lacks attack diversity and creativity. For example, language models can struggle to perform well outside their scope and rely on mimicking existing attack patterns. |
Open-Ended Red Teaming
Open-ended red teaming usually involves crowdsourced and community red teaming. These techniques are open to a diverse crowd and include testing healthcare chatbots against general harms like privacy violations and misleading outputs. Crowdsourced red teaming and open-source red teaming are examples of open-ended red teaming.
Challenges of Open-ended Red Teaming
Below are the challenges of open-ended red teaming in healthcare:
Challenge | Description |
Lack of Specialized Knowledge | Open-ended red teaming is general in scope and lacks specialized assessment. |
Catastrophic Forgetting | Open-source red teaming LLMs can forget previously learned information when processing new data. |
Compliance Issues | Open-ended red teaming can be prone to overlooking compliance with recent regulations mandating safe and secure AI development practices, especially for multilingual models. |
Defense Strategies Against Red Teaming Challenges in Healthcare
Below are the defense strategies for challenges in healthcare red teaming:
Auto Red Teaming Framework (ART)
ART approach uses three models, i.e., a writer model, a guide model, and a judge model. The user provides an initial prompt (e.g., “a pic of skin cancer”) and specifies a harmful category (e.g., “hate speech”) and keywords related to that category. The writer model then writes a prompt based on the initial information and the AI model generates an image based on the writer model prompt.
The guide model analyzes the output image and guides the writer model to optimize its prompt accordingly. Finally, the judge model evaluates the final image output and input prompt for safety.
This method addresses the challenge of generating harmful content from safe prompts in general red teaming methods.
Entity Swapping Attack
Entity swapping attack manipulates the model to replace a specific entity in the generated image with a different one. For example, combining sensitive terms like “blood” with non-sensitive terms like “red liquid” can evade image filters.
This approach aims to test the ability of AI systems to generate safe content in response to unintentionally harmful user prompts.
Data PrivacyRed Teaming Visual Language Model (RTVLM)
RTVLM dataset consists of prompts and images that are crafted to challenge visual language models (VLMs) in faithfulness, privacy, safety, and fairness. The dataset offers a standardized way to red-team multi-modal AI systems guiding toward the development of robust and accurate systems. Using the RTVLM dataset revealed the security risks in popularly used AI models, including GPT-4V and VisualGLM, which serve as a direction towards their improvement.
Attack Prompt Generation
The attack prompt generation framework combines subject matter expertise with LLMs to craft high-quality adversarial prompts. AI systems are iteratively trained on these prompts until they learn to recognize and avoid prompts that could lead to harmful outputs.
Manual Seed Prompts
Manual seed is a technique that begins with human expertise to craft a small set of high-quality attack prompts. An LLM then creates a framework to train other LLMs to mimic human-crafted prompts, expanding the attack prompt library.
Aurora-M
Aurora-M is an open-source model using continual pretraining to improve performance and avoid catastrophic forgetting. It aligns with the Biden-Harris Executive Order on safe and secure AI. Aurora-M complements open-ended red teaming by providing a safety-focused layer.
Automating Attack Creation and Evaluation
This involves using a few-shot technique by pairing prompts known to elicit harmful content with simple affirmative responses. A helper system analyzes these prompts and generates new attack prompts based on them.
This approach results in faster vulnerability testing and addresses the challenge of lack of specialized knowledge.
Conclusion
Red teaming is a powerful technique for mitigating security threats in healthcare AI systems. However, without healthcare expertise, it’s difficult to assess the chatbot’s response for accuracy and potential biases in its medical advice. For example, red teaming a Retrieval-Augmented Generation (RAG) healthcare chatbot might involve feeding it fabricated patient data.
While a lack of domain expertise leads to unwanted results and wasted resources, hiring them is costly and time-consuming. This is where partnering with a team of red teaming specialists becomes essential for effective and scalable assessments.
iMerit offers a comprehensive red teaming solution to protect your systems against bias, hallucination, and harmful behavior. Our team of red teaming specialists and healthcare experts ensures a thorough and effective evaluation.
Contact us today to consult a team of experts who can help you develop and implement reliable AI solutions with effective red teaming.