top of page

Legal tech: Beyond the myths #3 - How can robots read?

By Arnoud Engelfriet

“The question of whether machines can think is about as relevant as the question of whether submarines can swim”. This quote by computer scientist and visionary Edsger Dijkstra is still as relevant today as it was in 1984 when it was written. Computers do not think, they calculate and process information. This may result in outputs that look like the outcome of a thought process, but that is a mere coincidence. What does that mean for a language-oriented field like law? And how do robots then extract information from language?

Natural language processing

Already in the 1950s during the advent of AI the concept of robots reading and interpreting textual information came to the forefront. The so-called Turing Test, a test of a machine's ability to exhibit intelligent behaviour indistinguishable from that of a human, was created with the ability to interpret text in mind. In short, the test proposes that if a human asks questions and gets responses from various counterparts, the human must from the responses determine if he is communicating with a robot or not. If the human cannot, then the robot is considered “intelligent”.

The earliest work on natural language processing focused on rules. Given a collection of rules the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts. The classic example is the “Chinese room” designed by philosopher John Searle in 1980. Suppose we put in a room many, many books that instruct the reader which Chinese symbols to write down given certain Chinese symbols received on paper. (Yes, we write out each and every possible question-answer pair that is possible in Chinese.) Then, we get a Chinese-speaking person to write questions and put them under the door. In the Chinese room, a person (or a robot) applies the books to produce an answer, which is shoved back under the door to the person asking the questions. Can this person now tell if a Chinese speaker is in the room? If not, the robot must be considered intelligent.

Of course, this approach requires the creation of a gargantuan amount of rules, many of which even native speakers wouldn’t be able to formulate. Still, progress was significant and early experiments showed surprising results. For example, Joseph Weisenbaum’s ELIZA program simulated a psychologist able to engage in open discussion with “patients”, employing strategies such as “how do you feel about that” or “do you think this reveals something about your relationship you’re your parents” whenever the patient presented a topic that the system had no specific rule for.

The rise of machine learning

In the late 1980s the introduction of machine learning algorithms for language processing presented something of a revolution. In machine learning, algorithms build a model based on human-provided training data applying statistical techniques to identify correlations and patterns. Using this model, predictions or decisions can be arrived at without any explicit rules having to be configured. This made it possible to use statistics-based approaches to analyze and respond to textual input. The first breakthrough was in automatic translation, and some successes were achieved in specific domains.

The hardware and memory limitations of then-current computers did put an upper limit on what could be achieved. This changed in the early 2000’s with the advent of big data and cloud computing on the one hand and the exponential increase in publicly-available text data: the World-Wide Web. Now it was possible to take huge corpora of text and apply tremendously complex statistical calculations and pattern-recognition algorithms to distill rules and schemes to transform text into other text. Whether question and answer or writing from prompts or interpretation, machines could now do it.

A key setback remained that humans were needed to annotate the input from which the machine learning algorithms trained their models. This changed in the 2010s, where the rise of feature learning and deep neural networks allowed for so-called unsupervised learning of text features for recognition and interpretation. One of the keys to this breakthrough is the use of word embeddings. “A word is characterized by the company it keeps”, as English linguist John Rupert Firth put it. Meaning can thus be derived from context: if these and these words occur together, this other word must be involved and could for instance be used in the output.

Still, limitations remain. One common example is how to handle homonyms, as in the example “The club I tried yesterday was great!”. In this sentence, it is not clear if the term ‘club’ means a dance club, a social club, a golf club, a club sandwich or any other type of club. Humans can understand this from context, even when not given in the document itself: a senior lawyer in her sixties is more likely to mean the golf or social club than the twenty-year-old student known for his partying tendencies.

Machine learning on legal documents

As noted above, initial focus on natural language processing was on translation. This had one important reason: especially in government documents, multiple-language versions of the same document were often available. For instance, the European Union publishes official documents in all its 29 languages, allowing good comparison between the language structure and vocabulary of each. Further, there was a clear need for quick and “good enough” translation.

A second field where NLP made great strides was in transcribing dictation, especially in the medical sector. Doctors produce a large amount of dictated reports (e.g. autopsies or surgery reports) that needs to be transcribed, typically quickly. At the same time, absolute perfection is not necessary. And what’s more, the wording and phrases used will be limited and somewhat predictable: when trying to distinguish between, say, ‘patient’ and ‘patent’, it is safe to assume the doctor meant the former.

For similar reasons, machine translation of legal dictation has seen success, albeit in more limited form as the time and money factor present in government and medical fields is less pressing in the practice of law. The main focus of NLP in the legal field has been in automating legal processes, e.g. a case assessment to predict the outcome if it were to go to court. Here, NLP is a first step in the legal process: extracting the facts of a case, or identifying key factors that judges use when applying the law. But next steps require more advanced machine logic, e.g. figuring out which legal requirements apply. So far, success here has been limited.

Machine learning in contract review

A domain in the legal field where machine learning is quickly gaining attention is the review of contracts, mostly business-to-business agreements. Long this type of work has remained the realm of human experts, as such agreements represent significant business value (and risk), each agreement is different and the time factor for review was not considered crucial. Today, this has changed. More and more agreements (or at least, provisions therein) are considered standard, the cost for human review is becoming more and more prohibitive and speed is of the essence.

This change could first be seen in standard documents such as the confidentiality agreement (NDA), thousands of which are signed across the globe every day. While lawyers (correctly) stress the importance of reviewing each NDA provisions carefully, most businesspeople (also correctly) consider an NDA very much a standard text and just want to know “can I sign or not”. This has led to a value gap: businesspeople do not want to wait for, let alone pay for, a review of an NDA. Several legal tech providers have jumped in this niche to offer automated NDA review tools.

All of these use some variation on the same basic process: use statistical methods to recognize typical clauses found in such agreements, extract problematic aspects of such clauses (e.g. a too-long term or a liability cap) and report to the human user what was found. This works very well, mainly because the amount of variation in such clauses is very limited. There are only so many ways to declare the courts of Santa Clara, CA competent for any disputes. What’s more, NDA’s tend to contain a high level of ‘borrowed’ language. Our own tool NDA Lynn for instance has reviewed over 14.000 NDA’s and has found that for most clauses, there are only a handful of truly different structures. This type of limited variation makes analysis surprisingly effective.

Other document types may have similar attributes. For example, under the European Union’s General Data Protection Regulation (GDPR) a so-called data controller must have a specific type of agreement in place with its suppliers and other processing partners (“data processors”). This data processing agreement (DPA) must meet specific statutory obligations. While each organization has developed its own DPA, the language is very much shared as most lawyers tend to closely copy the letter of the law. Tools such as DPA Lynn thus can provide effective review of this document. However, automated review for contracts in general still seems far away due to the variability of the type of clause that may be present.

The Contract Understanding Atticus Dataset

A promising development in the field of contract review is the creation of the Contract Understanding Atticus Dataset (CUAD) by the Atticus Project, a US-based nonprofit organization of legal experts. This dataset was created with a year-long effort pushed forward by dozens of law student annotators, lawyers, and machine learning researchers. The dataset includes more than 500 contracts and more than 13,000 expert annotations that span 41 label categories (from applicable law to covenants not to sue, limitations of liability, payment obligations and warranties). Interestingly, the dataset contains human-made annotations of what a reviewer would like to know, such as the monetary cap on liability or the end date of a certain obligation. This allows a machine learning system to be accurately trained (or verified) on the CUAD dataset.

Employing the CUAD would provide a well-deserved boost to machine learning contract review. The dataset can be enhanced with company (or law firm)-specific contracts for additional focus. For instance, an IT focused firm would add IT insourcing agreements and categories relevant for technology services, while an international supplier of goods would focus on adding shipping costs, risk allocation and insurance clauses.

Going forward

Machine learning for contract review has come a long way. While it is true that no contract reviewing robot can claim to have an “understanding” of what it has read, a lawyerbot can certainly produce highly accurate reviews of typical legal agreements. This is especially true for standard agreements such as NDAs, but with the advent of large datasets such as CUAD more general contract review is right around the corner. The challenge for any business therefore is: how do we create value with automated contract review, while reducing any new risks that may appear? This is something for the next article.


Read Part 1&2

Legal tech: Beyond the myths #1

Legal tech is coming. With Artificial Intelligence on board. Ah, yes. We have seen and heard so many promises: it will transform our work. It will replace lawyers, reduce tedious work. And so on. Still, here we are, still typing away in Word while the shiny AI-powered workflow optimization tool gathers dust in the corner. Often the reason is the same: the tool was overpromised and underdelivers. This series will take a look at the various myths and misconceptions around AI in the legal sector. What can we expect, and what is still a fairytale?… [read on]

Legal tech: Beyond the myths #2 - The focus on accuracy

What’s the difference between a lawyer and a lawyerbot? In the twenty years I’ve worked as a lawyer, no one has ever asked me how accurate I worked. But every time we introduce our lawyerbot to a new audience, the first question we get is always “How accurate is it”. Which is fine, as we have a good answer: 95.1%. But what does that even mean in a legal context?… [Read on]


About the Author

Arnoud Engelfriet is co-founder of the legal tech company JuriBlox, and creator of its AI contract review tool Lynn Legal. Arnoud has been working as an IT lawyer since 1993. After a career at Royal Philips as IP counsel, he became a partner at ICTRecht Legal Services, which has grown from a two-person firm in 2008 to an 80+ person legal consultancy firm.