This month’s feature is provided by Annabelle Lukin (Macquarie University) and Rodrigo Araujo and Castro (Minas Gerais University/Macquarie University). They introduce a newly available corpus, based on key texts in the international law of war, now available to be queried using corpus linguistics techniques. This corpus enables critical law scholars and linguists to collaborate on studies of approximately 170 years of international law of war, informed by legal and linguistic theories and methods.
Each text leaves its mark on the world. Some marks are like a small ripple on a pond, while others, like texts of the international law of war, are like mighty waves, as they set the legal framework for states’ use of lethal violence. -nations. These texts construct a semiotic universe in which, among other extreme forms of human behavior, the murder of children can either be legally imprimatur or be qualified as a “war crime”.
To enable better interdisciplinary collaboration on these key texts, we have created the Macquarie Laws of War Corpus (MQLWC), based on the texts included by the International Committee of the Red Cross (ICRC) in their International Humanitarian Law Database.
The MQLWC is hosted by the Sydney Corpus Lab. It begins with the 1856 Paris Declaration on Maritime Law, the first open multilateral treaty to which any State could become a party. The most recent document is the latest amendment to the Rome Statute (2019), the legal instrument that established the International Penal Court, the body responsible for trying those accused of war crimes, crimes against humanity, crimes of aggression and genocide.
Figure 1: An example of concordance lines for “civilian” in the MQLWC
The corpus comprises a total of 110 texts, nearly 392,000 words, and can be searched using basic corpus linguistic techniques such as word frequencies (see Table 1 for the top 20 MQLWC lexical items) , text scattering, concordances and collocations. The corpus can also be searched according to the categories to which these documents are attributed by the ICRC, such as “victims of armed conflict”, “methods and means of warfare”, “criminal repression”, etc. The dataset can also be downloaded. for use in other programs, such as #Lancsbox Where Indicator Tools.
Table 1: Twenty most frequent lexical (content) items in the MQLWC
Since the data is labeled by year of adoption, diachronic questions (i.e., how are trends changing over the period of nearly 170 years of data?) can be asked. Figure 2, using Voyant Tools, compares the words “military*” and “civilian*” (the asterisk indicating that the search includes all related word forms, for example, “civilian/s”) and shows the relative predominance of ‘military” instead of ‘civilian’ in the international law of war, and this ‘civilian’ becomes a concern of the international law of war over time.
Figure 2: Comparison of “military*” and “civilian*” in the chronology of the international law of war
Collocational searches allow us to see the typical words that accompany the keywords in these texts. Collocations are essential for understanding the meaning of a word and how it is used in a particular register. With a program like #Lancsbox, we can visualize the collocations of a word and find if it is close to another keyword. Figure 3 compares two words, “violence” (left) and “war” (right). The diagram shows how distinct these two words are in this corpus, “war being a clearly dominant concept, and “violence” being kept at a distance from “war”, a finding that echoes other data studies. Despite what should be a logical association, we continue to use the word “war” in a way that shields it from the negative semantics of “violence.”
Figure 3: Collocations of “violence” and “war” in the MQLWC
To learn more, read our recently published article where we give examples of how this corpus can be used to understand the powerful role of the laws of war, not only to contain geopolitical violence, but also in a very clear way to enable and legitimize it.