skip to Main Content
The Foundation To Creating Datasets With External Teams

The Foundation to Creating Datasets with External Teams

The Foundation to Creating Datasets with External Teams

The path to the relevant, clean and complete dataset you need can be a long one, made up of many small, often time-consuming tasks. Maybe it’s tagging hand gestures in a video in order to build an algorithm training dataset. It could be reading individual user comments to keep your site clean and relevant. Or perhaps it’s conducting complex web research on financial entities.

These tasks take time and focus away from other core tasks, and the option of passing them along to an external team can be quite appealing. However, using external teams – from consultants to crowds – is not straightforward. Communication can be time consuming, and results may not match what you needed. To address these challenges, we pulled together tips we’ve learned along the way of our data journeys.

The first tip? Document, document, document.

No matter how you look at it, your external teams are like new hires. They don’t have the company knowledge or familiarity you do. That means it’s best to do all you can to start them off with a good infusion of knowledge.

To ensure good knowledge hand-off, start with a process document. Chances are this isn’t the first time you have gathered or enhanced the particular dataset in question, so walk through the process as you’ve found it to be working best and document that. Make notes of what teams can expect to see as they create and/or enhance the dataset, and include step-wise instructions as appropriate. Don’t stop there, though! Remember, these are just like new team members. That means…

Adopt the persona of a complete newcomer and revisit your instructions.

Make sure there’s no insider jargon, preconceived notions or assumptions that might derail your external workforce. Remember, nothing is obvious. Double-check your language for clarity, and imagine how it would read to someone entirely unfamiliar with the process and the context.

If you can find common ways to break your instruction design, then you can make it more robust out of the gate.

To find bugs, and weak spots in our instruction design, we have found it incredibly useful to discuss edge-cases and outliers. It’s hard (perhaps impossible) to account for all possible variants of edge-cases, but it’s critical to include even a few. Talk through how your teams – or other external teams – have handled edge-cases and outliers in the past. Do your best to explain the logic and assumptions behind decisions made that perhaps fall outside of the typical cases. This insight into your internal processes and priorities is invaluable to your external teams, and will help them even more than discussion of “typical” cases.

For one ecommerce client, we were asked to develop a set of tasks that would help them spot marketplace listings of counterfeit items. Though some items were quite obviously counterfeit, not all were as easily identifiable.


The less-well-known Pear brand smartphone

In addition to the clearer cases, we were able to identify some trends that marked the more difficult edge cases of counterfeit products. These included things like suspiciously low prices, or account names that seemed to suggest something suspicious was afoot (names like **CHEEP**REPROS** might be a give-away). By incorporating these special cases into documentation, we were able to ensure quicker identification of tough-to-spot products.

tip for process documentation


Keep this tip sheet handy for next time you need to document your data process, and stay tuned for more tips on using external teams.


Back To Top