Creating "Reasoning" Data for LLM Training

Types of Reasoning
0 +
Classifications
0 +
Tasks
0

Global consumer tech company partnered with iMerit to create a diverse set of prompt-response pairs across various domains to improve LLM accuracy.

Challenge

A global consumer tech company needed to fine tune their LLM by developing a corpus of prompt-response pairs to create training examples of chain-of-thought reasoning across multiple domains.

The goal is to provide the LLM with models of step-by-step logical reasoning over specialized contexts, specific scenarios, social situations, emotional reasoning, spatial reasoning and formal mathematics.

The challenge was to produce a coherent, relevant response or logic continuation as if the LLM can externalize its own reasoning process, or as a textbook or counselor would explain answers. Human experts needed to write scenarios as a prompt and explanation with a step by step externalization of the reasoning and write a response that represents the reasoning.

As a supervised fine tuning (SFT) project. no models could be used and all answers must be human generated from experts. The quality of the generated response depends on the complexity and nature of the prompt.

““We needed a partner who could create a robust and diverse corpus to improve the reasoning capabilities and accuracy of our models””

Solution

  • Provided 80 experts on topics of applied mathematics, law, biology, linguistics, philosophy, journalism, world affairs, and economics.
  • Customized task presentation, task pairing to experts and automated routing for response writing
  • Embedded customer review process for transparency and feedback.

iMerit assembled a diverse team of 80 experts including language generalists for English, German, Spanish, and Vietnamese, as well as senior content writers and subject matter experts (SM Es) with backgrounds in applied mathematics, law, biology, linguistics, philosophy.journalism, world affairs, and economics.

A rigorous selection process was applied, including externally administered CEFR tests and custom skill-aligned assessments, to ensure high-quality of candidates and the necessary expertise to achieve the stringent quality standards and desired outcomes.

A complex workflow was designed within the iMerit Ango Hub platform for overall process automation, conditional routing and QC. Tasks are clustered by high level domains and class types, then routed to specialist work queues based on experience level criteria. Experts complete their tasks and submit prompts and responses.

First and second level QC occur by both iMerit subject matter experts and client review. QC process evaluates prompt and response content on a number of criteria based on minor vs major content errors and/or minor vs major stylistic errors. Based on the review the content is either passed, rejected or sent back for modification. Additional steps and routing for review can occur if answers were suspected to be fraudulent, plagiarized or produced using a model such as ChapGPT.

Result

The rigorous prequalification and vetting process ensured only qualified experts provided the prompt response generation. The Ango platform and advanced tools provided a highly customized interface for presenting tasks to users, ensuring the tasks were displayed in a manner that maximized clarity and engagement, making it easier to understand and complete.

This resulted in very high accuracy and efficiency, allowing iMerit to deliver high-quality data LLM training data with step by step reasoning, desired domain coverage and high relevance. By setting a new standard in Al deployment, iMerit ensured the client’s LLMs were robust, reliable, and ready for diverse applications in the global market.

 

““Our quality requirements were very high and iMerit delivered.””