The Bitter Aloe Project: Building a Prosopographic Understanding of Apartheid-Era Violence Through Advanced Machine Learning

Stephen Davis, University of Kentucky

Research Grant, 2021–2022


The Bitter Aloe Project uses machine learning models to extract data from large volumes of witness testimony and incident descriptions produced by South Africa’s Truth and Reconciliation Commission (TRC). Our goal is to open new avenues for research into political violence in South Africa during the apartheid era by bringing new forms of legibility to this massive archive through the extraction of data. This phase of the project is attempting to harness these cultivated datasets to build a prosopography of witnesses that appeared before the commission.

Prosopography is a genre of historical writing that consists of collective biographies of entire classes of individuals. The basic analytical assumption of prosopography is that collective biography can reveal more about a given time and place if common interests, outlooks, and motivations can be collated across a specific category of influential individuals. In general, conventional prosopography defines its subjects based on their social status. In its earliest form, prosopography focused on the lives of elites in the ancient world, but the subsequent impact of social history on the field shifted the focus of prosopography to the lives of ordinary people, where sources permitted.

The datasets we cultivated from TRC records allow for a new kind of prosopography that focuses on the experiences of victims, their families, or perpetrators. Our intention in writing such a prosopography is to use data to draw connections between individuals who had shared experiences of political violence, but may or may not otherwise fit within the same social category. So, unlike the prosopographies of classical elites, this ongoing work attempts to shed new light on political violence by grouping individuals by experience rather than status.

Navigating archives by the meanings contained in a statement preserves the humanistic qualities of testimony that make it such a compelling and useful vessel for documenting human rights abuses.

This approach would have been impossible without new developments occurring within the field of natural language processing. In particular, we employed an advanced technique known as document embedding to create a semantic search method that can string together lists of individuals who explained their experience of political violence in similar ways. This approach effectively allows us to read across dozens if not hundreds of testimonies by allowing a machine learning algorithm to identify semantic similarities that can be used to find commonalities that might be invisible to keyword searches, or prohibitively difficult to collate through close reading.

One practical example of the application of this method relates to expressions of grief, a common feature of witness testimonies. Witnesses who appeared before the commission often spent a lengthy portion of their time describing the sense of loss they felt after a loved one was severely injured, imprisoned, murdered, or disappeared. This issue of grief is well suited to a prosopography of experience because it links so many individuals across various social categories, but the sheer volume of this material makes parsing it into manageable and meaningful categories difficult without advanced computational methods. With document embedding we began to see the complete contours of a highly detailed typology of grief borne out in the expressions of a wide range of witnesses. This typology is far more reflective of the contents of the entire corpus than would have been possible with exhaustive keyword searches using user-generated lists of synonymous terms, or manual identification of related expressions through close readings of testimony after testimony.

Our primary finding is that machine learning methods such as document embedding have the potential to revolutionize the study of truth commission records. All too often, truth commissions produce extraordinarily descriptions and testimonies documenting human rights violations, but do not have the time or resources necessary to make their resultant archive truly accessible. Researchers and ordinary users alike can navigate document embeddings using queries that follow their desired lines of inquiry, rather than engage in the trial and error of guessing which keywords might turn up a useful result. Navigating archives by the meanings contained in a statement preserves the humanistic qualities of testimony that make it such a compelling and useful vessel for documenting human rights abuses. These techniques, for the first time, make such navigation possible.

Welcome to the website of The Harry Frank Guggenheim Foundation

Sign up here for Foundation news and updates on our programs and research.