Building a Top Shelf, E-Commerce, AI Logic Engine

July 01, 2020

Often lost in valuing projects to develop commercially viable e-commerce AI algorithms is the human factor – the teams of developers, data analysts, and support staff who need to collaborate to build and train an effective logic engine. The best data analysts are smart, dogged, and with a creative streak – part detective and part long distance runner.

Algorithms aside, e-commerce platform development is at its core a very human process, one that requires an ability to think like a shopper, to understand motivation and salesmanship, and to have above all else, empathy for the consumer sitting in front of a computer screen, searching for the perfect result. An effective e-commerce data analyst also needs an eye for detail, the ability to spot and understand the often-subtle differences from one product to another based on multiple product attributes and descriptions.

Just as e-commerce algorithms rely on training data and a ramp up in development to achieve a benchmark for success, the data analysts who help train them also require an initial learning and development preamble to their project work. It can take five years or longer to train some e-commerce search engine algorithms before they hit their target accuracy, but the data annotation teams that develop the training data form and ramp up in a tiny fraction of that time. The nature and timing of that indoctrination depends on the complexity of the project, but it can typically require 2-4 weeks of classroom instruction and work floor training to get data analysts sufficiently up to speed.

Collaboration is Key

In the case of iMerit, for example, the company’s Learning and Development team, solutions architect, and production delivery manager will sit down with a client for an initial kickoff and  “train the trainer” orientation. The process includes a review of the data the team will analyze, a discussion of data relevance, accuracy metrics, and process, including the adapting client documents into usable training materials for the data analysts.

In subsequent weeks the client and team will review initial results to confirm accuracy and quality metrics before proceeding to full production. If there is a snag, the Learning and Development team will investigate to determine if there’s been human error on the part of the data analysts that requires additional training, a flaw in the training materials that needs to be corrected, or perhaps a miscommunication between client and team. 

Often, the work will mix a combination of tasks over the life of the project – everything from data mining, to mapping data against taxonomies and even “hard wiring” correct search results into a database. The Learning and Development team will play a central role in, for example, helping to delineate the differences between edge case outliers and basic, common queries.

All of that needs to be communicated to the team of data analysts assigned to the project, with an initial focus on building to an acceptable level of accuracy on data categorization and other aspects of the algorithm training materials. The first couple of weeks of actual work on the project represent something of a shakedown cruise managed through an initial QA process and feedback loop designed to both confirm a successful transition from the classroom to the work floor and identify any flaws in either the materials, or how they have been absorbed. 

If the team of data analysts aren’t generating the expected results the key question becomes: are the mistakes a function of human (data analyst) error, or something more systematic, such as a flaw in the training materials, or perhaps miscommunication requiring a reset between the client and the team? 

By the time the data analysts have finished this phase, usually two to four weeks in, they will be able to accurately analyze data points with an accuracy rate at or above the commonly accepted benchmark of 95 percent, and annotate data points as rapidly as one every 17-18 seconds for search relevance projects, and a minute or so each for data categorization data points – extraordinarily fast even if not quite the speed of what will eventually become a fully functioning algorithm.