Our guest post is by Brandon Van Volkenburgh, a leading expert in progressive technologies including cloud computing, blockchain, and machine learning. Brandon has played an integral role in the creation of innovative CrowdReason products such as TotalPropertyTax and MetaTasker.
By now, I’ve attended multitudes of tech conferences to learn about machine learning. And amidst the excitement generated by presentations, Q & A sessions, and keynote speakers, there’s usually an undercurrent of perplexity—where was the machine learning example that would show us how to get started using this technology? Most conference attendees (and businesspeople in general) agree that machine learning has great potential, but they also have practical concerns: how can I actually use it in my day-to-day business? Can I trust it? Where will I get the data to make it work?
For some companies, it seems like a technology that’s out of reach.
The truth is, machine learning isn’t as difficult to apply to business problems as you might think. Keep reading to find out how we at CrowdReason partnered with iMerit to provide an innovative software service—one based on machine learning—that has evolved to become one of our foundational offerings.
A Machine Learning business use case: MetaTasker, from CrowdReason
Our own story started a few years ago, when we first started working with a global telecommunications company that was using one of our property tax software products, TotalPropertyTax. The software was already saving them a great deal of time managing property taxes for what amounted to billions of dollars in tax liability. But they knew they could save even more time—and increase the value of the tax team’s contributions—by eliminating the data entry tasks that were inundating its highly skilled employees. They were ready and willing to pay for a solution ASAP; the only problem was—we didn’t have one.
Data extraction from documents is a perfect machine learning business case. But we’d run into previous challenges with data quality and had no processes in place to manage it, so we started thinking about it logically. We knew we needed three things in order to get machine learning into production:
- Technical expertise. (Among our team, we had this covered.)
- Good data on which to train machine learning algorithms. (We didn’t have this.)
- A process in place to handle exception cases of low confidence. (More on this in a minute.)
Since number one was covered, we focused on numbers two and three.
How we got good data
Getting “clean” data was a hurdle—we had lots of historical data but it certainly wasn’t the clean data necessary to train a machine learning algorithm. There was no standardization or control of how the data was collected in the past. How would we be able to provide our client highly accurate data from incoming documents on Day 1? To get what we needed fast, we had the idea of breaking work down into very easy tasks and assigning those tasks to iMerit workers.
For example, every time the client scanned in a property tax document bill, we got the data we needed by asking a series of specific questions about it:
- What’s the origin of this tax bill?
- Who is the collector?
- What’s the due date?
- What’s the amount due? And so on…
iMerit workers answered these questions and returned responses. To ensure data accuracy we triplicated the process, asking three different workers the same question. If they all agreed, we knew it was the right answer.
We then created a robotic process to automate a workflow and aggregate the data together, seamlessly surfacing the extracted data to our client in our TotalPropertyTax application. The secondary benefit was generating a “clean” database we could then use to train a machine learning algorithm for future use. Over time, we’ve worked machine learning into our production application. So where we initially began by using 100-percent iMerit (human-produced) data, we now use 20-40 percent iMerit data, with the rest being generated entirely by a machine learning algorithm.
How we handled exceptions
In some cases, the machine has “low confidence” in the data it generates. For those cases, we needed to put a process in place to handle them.
That’s where iMerit comes in again. If the machine provides a low confidence answer, our robotic process escalates it to an iMerit worker, where an actual person retrieves the correct data. When multiple iMerit workers aren’t able to achieve consensus, we then escalate to an iMerit team leader. Our robotic process then aggregates the data back together and provides a full extraction of document information, along with the image of the document itself, to the client.
Design your own machine learning example
Now, the data extraction solution we designed for one company is used by many companies. Our MetaTasker software handles data entry faster (less than a 24-hour turnaround time) and more accurately (greater than 99 percent) than an in-house team of humans. It represents an attractive option for organizations in any industry that want to utilize their skilled workers for more valuable, strategic tasks—in this case, those that can help them verify they are not overpaying on any of their tax bills.
iMerit was, and continues to be, an invaluable partner for us.
They provided us with accurate data early on, which helped us get up and running with the development of our software. At the same time we were using that data to train our machine learning algorithms, which we then worked into our production application.
They continue to work our data exceptions, acting as the “human-in-the-loop.” Whenever we have low confidence in our results, iMerit resolves those data points for accuracy. They provide a secure workforce, which gives us confidence that our client data will remain private.
With the right elements in place, including thoughtful design of a human-in-the-loop component, every organization can harness machine learning to do more and work smarter. To learn more about CrowdReason and our software, visit our website.