Human-Centered Robot Training

200 HOURS

Recorded Household Task Footage

Task Categories

100%

Dataset Consistency

iMerit partnered with a stealth-mode robotics startup to record, annotate, and classify real-world household task data to train next-generation humanoid robots.

Challenge

A stealth-mode robotics startup developing next-generation humanoid robots needed a large corpus of authentic, real-world household task data to train embodied AI models. Their early research models lacked exposure to the physical complexity, variability, and unpredictability of everyday environments.

To improve manipulation, fine-motor skills, and object-interaction understanding, the team required high-quality video data captured directly from a human perspective.

The client sought a partner capable of coordinating 200 hours of in-home task recording, using Meta Quest 3 head-mounted cameras worn by human participants while completing daily activities. However, the raw footage required extensive structure: classification of 9 core household task types, expansion into 37 sub-classifications, and precise tracking of objects, motions, outcomes, and contextual cues. Previous attempts relied on lightweight internal tools that did not support complex labeling taxonomies, versioning, or scalable quality control.

“High-quality real-world data is essential for advancing embodied intelligence, and consistency in labeling is just as critical as the footage itself.”

- Head of Imaging

Solution

Coordinated 200 hours of real-world household task capture using participant-worn Meta Quest 3 devices
Defined and expanded a classification schema with 9 top-level household tasks and 37 detailed sub-classes
Leveraged the Ango platform for video annotation, taxonomy management, and workflow automation
Implemented multi-stage quality checks to ensure dataset consistency and remove edge-case sequences
Collaborated closely with the client to refine labeling rules as new behaviors and objects emerged

iMerit partnered with the robotics startup to design and manage a comprehensive real-world data collection and annotation workflow. The process began with jointly defining an initial taxonomy of household task categories, including dishwashing, setting and clearing table setting, pouring liquids and object handling. As recording progressed, iMerit’s annotation team expanded this taxonomy to 37 sub-classifications to accurately represent the complexity of objects and human behaviors.

Participants wore Meta Quest 3 mixed-reality headsets to capture hands-free, first-person video of daily activities in natural home environments.

iMerit organized and standardized 200 hours of raw video, ensuring coverage across all major task types and capturing fine-grained variations needed for training humanoid robotics systems.

Using the Ango platform, iMerit annotated each video sequence with detailed action labels, object interactions, motion segments, and contextual metadata. For example, sequences such as identifying silverware in a sink, selecting a utensil, cleaning it, and placing it correctly onto a drying rack were broken down into step-wise components with precise boundaries.

Throughout the engagement, iMerit introduced multiple feedback loops. New sub-labels were added when annotators encountered untagged behaviors, and edge cases—such as ambiguous object states, partial occlusions, or unsafe motions—were flagged, reviewed with the client, and excluded to maintain dataset purity. This iterative process ensured both coverage and consistency, allowing the client to refine their training corpus while avoiding mislabeled or low-quality inputs.

Result

The collaboration produced a structured, high-fidelity dataset of 200 hours of real-world household task demonstrations, enabling the client’s embodied AI models to better understand human actions, object states, and multi-step workflows. The expanded 9-category / 37-subcategory taxonomy gave the robotics research team a granular foundation for training manipulation, reasoning, and environment-interaction models.

iMerit’s iterative refinement process ensured that annotation quality improved over time—capturing subtle differences between similar actions while systematically excluding problematic edge cases.

The Ango-powered workflow enabled efficiency, version control, and transparent quality auditing, ultimately accelerating the client’s dataset readiness and research velocity.

This data now serves as a core component of the startup’s humanoid robotics research and training pipeline, informing new experiments, simulation calibration, and paving the way for large-scale behavior modeling.

Talk to an Expert

“iMerit’s structured approach to data collection and labeling gave us training data we could trust—clean, consistent, and ready for model development.”

- Head of Imaging