A leading technology company in Asia, building an LLM-powered AI chatbot, collaborated with iMerit to improve their model output using Reinforcement Learning with Human Feedback. The company aims to recognize and filter out any content that might be risky or biased from what users ask while creating a diverse corpus for socially sensitive topics and personas.

We aim to enhance the overall efficiency of our data pipeline with iMerit, ensuring that our LLM chatbot continues to evolve and adapt to meet the diverse needs of our users.

– Head of Product Development 



The company is committed to making its AI intelligent and respectful, and we are looking for a partner who believes in the mission of socially responsible AI. With the diversity in languages and cultures across Asia, iMerit and the tech company built a team of language experts, cultural analysts, data annotators, solution architects, and other specialists. The AI chatbot must recognize and avoid prompts with bias or sensitive topics and be socio-linguistically aware.


The project required the manual creation of prompts through role-playing with different personas speaking diverse languages. iMerit’s team of experienced annotators generated these prompts, varying the sensitivity of issues and language proficiency levels. As part of the next steps of RLHF (Reinforcement Learning With Human Feedback), our team of experts trained the model with prompt-response pairs that go beyond boilerplate replies from the LLM bot.


The project successfully showcased the capability to hand-create realistic prompts that emulate user interactions with a chatbot. It employed a simplified UI for prompt generation, focusing on the ingenuity required to generate a variety of prompts. The future roadmap for this annotation solution aligns with the vision of creating a more user-friendly and feature-rich environment, aiming to streamline the annotator’s role and enhance overall efficiency.





LLM Model Output


Improvement in Efficiency


For Better Performance