OPEN THE BLACK BOX OF AI
Creating a Feedback Loop Between the User and the Machine
There’s nothing unusual about having to deal with millions, even tens of millions of documents in litigation cases. DiscoveryIQ puts the power of advanced machine-learning capabilities to work in the case discovery phase to filter out irrelevant data so the legal team can speed the review through predictive coding. This Technology-Assisted Review process involving the use of a Machine Learning Algorithm to distinguish relevant from non-relevant documents, based on a subject matter expert’s coding of a training set of documents.
While integrating technology into human review process allows people to save time and put the efforts toward where needed, predictive coding can still be a daunting task to tackle. Human reviewers will have to feed to the machine rounds and rounds of their review decisions on thousands of documents in order for the algorithm to learn to categorize the document set.
As the lead interaction designer for Lexis® DiscoveryIQ, I worked with another UX architect to improve this product for better workflow and experience. During this process we used the design thinking framework in our problem solving process.
I have omitted confidential information in this case study to comply with my non-disclosure agreement. All information reflects my own opinion and does not necessarily indicate the views of the employer.
There's no method to determine what a machine has learned except by comparing metrics from the initial control set that human reviewers provided.
Once the reviewer kicks off a Predictive Coding (PC) review with a set of high quality reviewers, metrics such as F1/Precision/Recall, must be closely monitored to gauge the machine learning progress. While the PC algorithm learns pretty quickly during the early training rounds, in many cases, metrics level off just as quickly. What's more, this plateau often does not occur at the ideal level.
The reviewer would like to see those metrics higher, but there's no way to identify what, if anything, is preventing the team from seeing better results from the machine.
I collaborated with another UX architect in the early stages of the problem discovery and definition for this project. I then took a lead role in ideating potential design solutions. Later, we conducted usability testing to evaluate the effectiveness of my approach.
To better understand the challenges that our users are faced with, we started by interviewing case managers and reviewers who are experienced with predictive coding. While everyone appreciated the efficiency that the technology provided, compared to manually reviewing every single file, it was also apparent that there was distrust concerning the accuracy of the machine learning results. What's more, there's no way for humans to tell how the machine come to its conclusions. Thus, there's no way for a user to correct it, except to continuously feed it hundreds and thousands more documents and hope a better result will happen in the next round.
Reviewer codes the control set
We also took a close look at the as-is workflow of a predictive coding project:
1. Code Control Set
Reviewers pull a representative cross-section of documents from the full population of documents that need to be reviewed. Reviewers then label each document in the seed set as responsive or unresponsive and input those results into the predictive coding software.
2. Predictive formula generated
The software analyzes the seed set and creates an internal algorithm (formula) for predicting the responsiveness of future documents.
3. Refine algorithm through training rounds
A training set is generated by the machine each round for the user to code. The continuous coding and inputting samples will refine the algorithm. The iterative process allows the algorithm to be augmented until they achieve desired results.
4. Complete review
Once the desired metrics have been reached, the software applies that algorithm to the entire review set and codes all remaining documents as responsive or unresponsive.
While this is a fairly straight forward process, the unpredictable nature of the third step is a major pain point. While the user can see the metrics after each round of training, it's unclear what has contributed to these results. What the algorithm learns becomes a complete black box to the reviewers.
Iterative training rounds to refine the algorithm
Complete review by applying algorithm to code the rest docs
Is it possible to create a workflow that facilitates better communication between human and the machine so the learning can be more effective?
Based on our early discovery, there are two aspects of this problem. First, human understanding of the case can evolve as more and more evidence surfaces. The control set the reviewer coded at the very beginning rarely holds as the source of truth. Mistakes may have been made, but there's no way for reviewer to change it. However, all the future metrics use the control set coding decisions as the answer sheet to evaluate the training progress. Second, reviewers can not tell what training documents have contributed to the algorithm decision most when there's disagreement.
Our focus was to surface what the machine has been learning and give the reviewer an opportunity to correct human errors and resolve disagreements to improve outcomes.
A reviewer who is actively watching and managing a review to make sure it’s progressing efficiently and effectively can see the most impactful overturns first, resolve them, and witness the effect of those decisions upon the statistics measuring training progress.
Once we had clearly defined the problem and our goal, I started sketching ideas on paper; looking for alternative ways of viewing the problem. As the ideas became more fleshed out, I used Adobe XD to generate high fidelity mockups for future prototype and test purposes.
My main focus was to re-invent the workflow of the third step of the predictive coding process and thereby change the way reviewers would train and refine the algorithm. Instead of having the reviewer reply purely on metrics, I proposed that we make it a more interactive and allows for more direct feedback. Here are a few of my early approaches:
You can select very specific chunks of content and designate it as what makes the document relevant.
You can tag that chunk of content with a specific issue tag and the system can learn from that process.
You can “undo” or rollback decisions.
Integrate usability and technology
Designers are always challenged with bringing together what is desirable from a human point of view with what is technologically feasible. This role is nowhere more apparent than when dealing with machine learning capability.
The initial approach was to identify keywords from a document and label them as relevant or irrelevant, once the disagreement was spotted, so it feels the most direct and intuitive for a human reviewer. However, we soon discovered that this approach is not feasible due to the specific machine learning algorithm the product deploys.
I therefore pivoted towards a solution involving showing the contributing document set that the algorithm uses to form the decision. This allowed the reviewer to gain a better understanding of the "mindset" of the machine, while also giving them a second opportunity to review and correct past mistakes caused by human coding inconsistencies and shifted viewpoints as the case evolved.
A typical document reviewer goes through hundreds files a day, most of these files contain useless info. This tedious and repetitive work generates fatigue and posses a big challenge for reviewers to stay focused.
If our users have to look at a viewer for hours and hours, then the most trivia details, down to the location of a click button, affect their overall happiness.
From our early observations, the "Yes" and "No" buttons are the two main elements that a reviewer clicks the most. The reviewer also has to move their mouse back and forth from the two options as the review proceeds to the next document. We had to first determine if these buttons should be grouped horizontally or vertically. After testing this issue with our reviewers in real case review conditions, the horizontal button arrangement turned out to be more advantageous and effortless.
A dark theme was also deployed so the reviewer could devote his or her attention to the documents. We also implemented an auto-proceed function that took effect as the reviewer made decisions on concerning the document This saved them the hassle of moving the mouse to click on the “proceed arrow” hundreds of times a day.
The Re-invented Iterative Process
In the new design, the sample and refine step was re-imagined to create a direct feedback loop between the human reviewer and the machine. Instead of blindly feeding more and more documents, the reviewer can now review all the disagreements/overturns, then either change his or her initial viewpoints or point to the machine to which documents have wrongfully led to its conclusions.
Reviewer codes the control set
Training round to refine the algorithm
Training round to refine the algorithm
Review overturns and resolve some early mistakes
Prototype and Test
Each participate was asked to perform the following tasks:
Identify the overturns where the machine feels most confident.
Change the decision on one of the control set document.
Change the decision on one of the contributing document that led to the machine's decision.
We then use The System Usability Scale (SUS) to measure the usability. The final score revealed a higher perceived usability than 90% of other products.
It seems obvious that in order to design for something, you need to understand how it works first. However, many of us may overlook this fact. While machine learning and other Artificial Intelligence (AI) technology presents us vast possibilities, it also poses limitations and constraints within itself.
Humans have always had to interact with machines in a really abstract, complex way through the history of computing. How to facilitate that communication between the two will continue to be the center of our focus and passion as UX designers as the new technology emerges. In my opinion, having compassion for the user, having curiosity to learn, and having the guts to challenge the status quo and not accept what's taken for granted are critical to our success.