A leading technology company in Asia, building an LLM-powered AI chatbot, collaborated with iMerit to improve their model output using Reinforcement Learning with Human Feedback. The company aims to recognize and filter out any content that might be risky or biased from what users ask while creating a diverse corpus for socially sensitive topics and personas.
“We aim to enhance the overall efficiency of our data pipeline with iMerit, ensuring that our LLM chatbot continues to evolve and adapt to meet the diverse needs of our users.”
– Head of Product Development
Problem
The company is committed to making its AI intelligent and respectful, and were looking for a partner who believes in the mission of socially responsible AI. With the diversity in languages and cultures across Asia, iMerit and the tech company built a team of language experts, cultural analysts, data annotators, solution architects, and other specialists. The AI chatbot must recognize and avoid prompts with bias or sensitive topics and be socio-linguistically aware.
Solution
The project required the manual creation of prompts through role-playing with different personas speaking diverse languages. iMerit’s team of experienced annotators generated these prompts, varying the sensitivity of issues and language proficiency levels. As part of the next steps of RLHF (Reinforcement Learning With Human Feedback), our team of experts trained the model with prompt-response pairs that go beyond boilerplate replies from the LLM bot.
Results
The project successfully showcased the capability to hand-create realistic prompts that emulate user interactions with a chatbot. It employed a simplified UI for prompt generation, focusing on the ingenuity required to generate a variety of prompts. The future roadmap for this annotation solution aligns with the vision of creating a more user-friendly and feature-rich environment, aiming to streamline the annotator’s role and enhance overall efficiency.
BOTTOM LINE IMPACT
Improved
LLM Model Output
20%
Improvement in Efficiency
Expert-in-the-Loop
For Better Performance