This major AI company came to iMerit to implement a vision-language-action model to improve model explainability, decision-making transparency, and overall safety.

We needed curated data from experts who understand the rules of the road as well as what our models need to perform to make our demo a success.”

– Head of Computer Vision



To improve model performance on US roads, this company needed a unique dataset with detailed information around human driving styles, rules of the road, and how to interact verbally with the passenger.


iMerit autonomous vehicle domain experts assessed real and synthetic driving scenarios and conditions to classify different features of each including traffic signs, brake lights, and crosswalks. Environmental abnormalities such as construction zones, collisions, and pedestrians out of place were classified and used to train the VLM to ask for guidance from vehicle operators when challenged.


After training the model with iMerit datasets, the client found iMerit’s classifications were 95% accurate. iMerit also successfully improved time-per-task by 50%. The VLM was able to provide more clarity of the vehicle’s actions to the vehicle operator.





Time-Per-Task Improvement


Classification Accuracy


Improved Efficiency