Artificial Intelligence (AI) refers to the ability of a machine to perform human intelligence tasks. This concept is having a very significant impact on our lives, although we often do not realise it. Google’s translator or the advertising that appears on different websites are examples of how Artificial Intelligence is getting into our daily lives. Specifically, Google’s translator learns from millions of examples and is able to provide results for our specific query, while advertising on different websites relies on our extensive click history.
Artificial Intelligence to manage contracts
At Bounsel we want to use these techniques to make your contracts smarter and more human, creating functionalities in our tool that help you execute tasks that would otherwise be unmanageable. Here we tell you about the different steps we have taken to establish the different categories into which we can classify our contracts, which is the first step towards the creation of a broad legal corpus with different types of documents. These are the keys to realising the full potential of AI techniques: the creation of a large database together with the use of a set of algorithms that are capable of extracting useful information from it to perform a specific task.
There are many, many types of documents. Within the legal sector alone, there are thousands of possibilities. In this respect, we face the challenge of reducing this immense number of categories to a more manageable number of groups. A person would be faced with the daunting challenge of sifting through the thousands of categories available to look for patterns and establish new groups. Fortunately, we can take advantage of machine learning (ML) techniques, so that it is our computer that performs this task. This allows us to find patterns that we would not be able to detect with the naked eye, as well as saving us hours and hours of manual work.
However, computers do not like words. Machines are more about numbers. Therefore, if we pass the possible thousands of categories to our computer, it is not going to know very well what to do with this information (as we can see in the image, our computer is confused with so much verbiage 😵💫). To convert words into numbers, there is Natural Language Processing (NLP). These techniques aim to make machines understand unstructured text and be able to extract relevant information from it. In this case we rely on a model trained with more than a billion sentences that is able to convert phrases and words into a numeric vector or embedding, so that our computers can start working And we can ask ourselves… In practice, how is a machine capable of converting a sequence of words into a numerical vector? The key is training. Artificial Intelligence techniques are based on a multitude of examples of all kinds. Thus, our computers are able to detect patterns, learn the contexts in which a word appears, and even grasp the overall meaning of a paragraph or an entire text.
One of the most characteristic examples of NLP in our Spanish language can be found in the MarIA project, trained with more than 570GB of data collected by the National Library of Spain (BNE), with a corpus of more than 135,000 million words resulting from web crawls carried out between 2009 and 2019. Analysing such a database is indeed a difficult task, but thanks to the processor revolution of the last century it has become an affordable task. And in the end we have to stay with one key idea: data is information and, as they say, information is power. Camouflaged within this huge database are the roots of our language, so that, through the use of AI, our computers are able to discern these patterns and learn all the intricacies of the language. And that learning process, at the computational level, translates into a numerical conversion of information that can be used to perform specific tasks.
Thus, by using a pre-trained model with a large database, AI is able to help us achieve our goal. In particular, we are now able to convert each type of document into a sequence of numbers, and our computer is able to compare these sequences with each other and establish groups or clusters, whose elements have common characteristics or patterns. It is then time to establish a label for each of the clusters formed. To this end, we can extract the most representative documents within each cluster, as well as the most frequent words (based on the similarity between their numerical representations), and in this way we can easily define a common category for all these types of documents.
At this point we can assume that the result provided by our computer is correct and take it as valid. However, science fiction movies teach us that it is not good to trust our computers to such an extent, so it makes sense to find a way to check that our results are correct. When we talk about our computer’s numerical representation of a text, we are not talking about a number from 1 to 10, or even a number from -∞to ∞. We are talking about numerical vectors of large dimensions (in this particular case, we are talking about a vector of 384 dimensions, i.e. 384 numbers per contract type 😱). How can we then check that the result is correct? Are we again faced with the impossible challenge of analysing 384 numbers for each type of contract, and checking that they are correct? Fortunately, once again, AI comes to the rescue.
Another of the many functionalities that AI has is the ability to reduce the complexity of a problem. In this particular case, it is able to reduce these 384 dimensions to their most characteristic dimensions, known as Principal Component Analysis (PCA). An example to help us understand this dimensionality reduction is the search for correlations. In this sense, AI relies (again, yes) on a large database (in this case all our documents) to establish correlated dimensions that can be grouped together to reduce the dimensions of our vector to something more tangible.
By using these techniques we reduce the dimensions of our problem to just two. We can then plot our results and check that each cluster (or each colour in the diagram in the figure) corresponds to a well-defined region of space. Each of these clusters has a specific category associated with it (among other examples we find buying and selling, warranty…), and with this we have achieved our goal: using AI we have been able to reduce the huge number of categories (each point in the diagram) to a number we can work with (each colour in the diagram, ~50).
Thanks to this process it is now possible to add the category of your document when you manage your contracts with Bounsel. And this is only the first step. At Bounsel we want to be pioneers in leveraging NLP techniques applied to the Spanish language to make your contracts smarter and more human, providing you with new and exciting applications in our tool to help you be more productive when working with documents. Interested to know more? Stay tuned and don’t miss anything!