skip to Main Content
Can You Engineer Your Way To Better Data?

Can You Engineer Your Way to Better Data?

Working with external teams to gather, clean and enhance datasets can be very rewarding. It minimizes the amount of repetitive work needed in-house, and can often bring even better data than working with your own teams. As we’ve talked about in previous weeks, however, it does come with its challenges.

This week, we’re offering a more technical tip: engineer your way through.

External teams will likely be completing tasks that require human intelligence. That doesn’t mean that engineering doesn’t play a key role. Instead, human intelligence can be augmented through automation or pre-processing of data.

In fact, by isolating the parts of the task that require human judgement from those that can be automated in some way, you’ll likely get even better results than with only automation or only human judgement.

What this means in practice is the development of simple interventions that make tasks easier and quicker. You could start with implementing keyboard shortcuts and auto-complete functions. While simple, they lighten the burden for your external teams, and improve their quality. If you’re working on a task like categorization (be it of items, comments, or service tickets), you could pre-process the dataset and tag items with potential categories. Then, external teams need only fill in the gaps, and remedy errors.

Along those lines, it is good practice to separate a big task into smaller tasks, each with their own skill needs. For example, you can have one team focus on segmenting an image, and another team on categorizing it, and yet another on transcribing included text. Each team develops proficiency in one skill, reaping rewards in terms of time to complete tasks and accuracy.

Finally, some cases are ripe for the use of APIs like Google Translate or RSS feeds to eliminate repetitive processes for your workers.



For an ecommerce marketplace client, we moderate posted items in order to identify repeat listings of the same item. (Items are often re-posted against marketplace rules in order to attempt to game search functionality.) We ran the task as presented to us by the client, and then again with a series of engineered interventions.

Ultimately, we found that three interventions created the best results and output:

  • Including a Google translate API to translate foreign language item names into English,
  • Incorporating worker-requested keyboard shortcuts throughout the process, and
  • Implementing automated string-comparison to prioritize obvious duplicates

With these three interventions, we were able to increase agreement by over 25 per cent.

Overall, the path to a good relationship with your external teams means remembering that though they’re external, they’re still a part of your team! Imagining them as new colleagues, who need an introduction to your company, who want to communicate, and who benefit from automated steps, will get you well on your way to the data you need.

Back To Top