Introduction:

Clustering Methods For Identification of Australian Wars and Resistance

This is the Introduction to a collection of jupyter notebooks for differentiating and identifying Australian Wars and Resistance up to the 1930s using data from Colonial Frontier Massacres in Australia, 1788-1930 project (Ryan et al, 2025).

STDB Clustering AWR_STDBClustering.html
KNN Clustering AWR_STKNNClustering.html
Identified Map. AWR_IdentifiedMap.html

The first two apply two different clustering methods to help identify clusters. Use these if you want to see the working, or adjust parameters and interact with the maps for your own research.

The last shows the proposed identification of Australian Wars including their stages, the regions of conflict, and periods with adjustments informed by historical and Indigenous knowledge. Use this if you just want to see and interact with the results of this research. It is important to read the following to understand the limitations of these results and that they are only preliminary, and will be adjusted in future.

This doesn't include 2 broad regions of maritime conflict related to:

i) sealing and whaling in the south, (including Twofold Bay, Bass Straight and Albany and the south west)

ii) pearling and trepanging in the north (including northern WA, NT and NQLD).

The colonial frontier massacre data can easily be downloaded and visualised at TLCMap: https://tlcmap.org/layers/1336 A long term archive has also been created with the Australian Data Archive. https://doi.org/10.26193/L0WEID The original website on which the data was developed and presented is https://c21ch.newcastle.edu.au/colonialmassacres/

These notebooks use Kaine Usher's GeoJikuu Python library for spatio-temporal clustering.

Clustering

Clustering is a way to automatically find and seperate groups of things that are close to gether out of a set of points. There are many different methods of clustering. Here we use only two basic methods - distance based clustering and k-nearest neighbour.

What is Spatio-temporal Distance Based Clustering (STDB)?

Spatial distance based clustering uses distance in space to do this. For example, if you set a distance threshold to 10km, then any point within 10km of another point will be in the same cluster. If a point is more than 10km away it is not in that cluster, and may be part of different cluster of points. These points connect up - if you start with point A, and B is within 10km, and C is 10km within B, then A,B and C are all in the same cluster, even if point C is 15 km away from A. If point D is more than 10km away from all those points, but is within 10km of point E, then point E and D form another different cluster.

Temporal clustering uses distance in time. For example, if the threshold is set at 1 year, then two points within a year of each other will be in the same cluster.

Spatio-temporal clustering uses distance in both place and time. For example, a point within 10km AND within 1 year will be in the same cluster. A point further than either 10km OR 1 year would be in a different cluster.

Spatio-temporal distance based clustering helps us identify events that are close together in time and space. This demonstrates an intensity in that time and place.

What is 'near' and 'far' in arid, flat areas with low population density is different to fertile areas, mountainous and highly populated areas. It is more normal to travel for longer periods of time over greater distances in some places rather than others. For this reason rather than consider a single threshold for distance in space and time, we need to look at a range of thresholds. We can start at one extreme with very large thresholds where almost all points are in a single cluster, to very low thresholds where almost all points are in a cluster consisting of only one or two points. Somewhere in between these extremes are thresholds that show informative and useful groupings. Larger thresholds are more informative for arid areas, and shorter thresholds are needed to show detail in more fertile areas.

What is K-Nearest Neighbour (KNN)?

The way KNN works is that it starts with one point. You can set 'k' to any number. If 'k' is set to 1 it finds the 1 place that is closest to it. If 'k' is 2 it finds the 2 closest places. Then it takes those places and finds the nearest 'k' places to them, and so on. If the nearest place isn't already in the cluster it is added to it. The cluster can grow in a chain. Eg: the closest place to A is B, so they are in the same cluster. C is closer to B than B is to A, so C is added to the cluster A,B and C. If the closest place to C is B, then the chain ends and that is the cluster: A,B,C. Then a place that is not in a cluster is selected as the start of another cluster, and so on, until seperate clusters have all been identified.

KNN is a simple method, and can be useful in geospatial clustering over large spaces because it doesn't depend on specific distances in kilometres like some other clustering methods do. The sense of things being very 'far' is different in arid areas to fertile areas, and mountainous areas to plains, etc. If we simply look for the nearest place, we can avoid this problem, at least to some extent.

This particular version of KNN is spatio-temporal. It looks at the nearest neighbour in both space and time. If another point is very close in kilometres but happened long after it is not as close as one a little further away that happened soon after.

What does this tell us?

There has long been a call to recognise the Australian Wars and Resistance. These methods can help answer the questions 'What Wars? When and where?"

From 1788 to around 1930 open warfare occurred across parts of the continent as colonisation spread. This frontier warfare was often unofficial and was fought as guerilla warfare and resistance. These were 'small wars' (in Spanish: 'guerilla') but none the less meet any dictionary definition of war, and accepted theories of war such as Clausewitz or the Australian Army. There was open violent conflict between two groups of people over land, resources and the ability to exist as a people. It's hard to deny that is not war.

Colonisation, violence and resistance continues to this day, and some say the war hasn't ended. The distinction made here is between this phase of history, when the colonial or Australian government doesn't effectively control a region, when there is open violence between groups of people and the following 'mission phase' when the government controls access to land and resources and individual's lives very closely. This and other later phases of history involved changing forms of colonisation and violence.

This notebook was created to help distinguish one Australian War from another by regarding an intensity of open violence, represented by massacres, in a place and time as an indicator of a war. Seperation in space and time, as indicated by 'clusters', suggests a distinction between one war and another. We can at least say that between certain dates, in this region there was an intense period of open violence.

It is important to understand that clustering methods will identify clusters in any information, even completely random points, if you give it the right settings. This computational method is not the same as hypotheses testing where we take a theory and test it by gathering data and measuring correlations and confidence intervals etc. The purpose of using these clustering methods is twofold.

Firstly, a cluster does show us an intensity of massacres within a certain place and time, such as 9 massacres in 2 years. We don't need a computer to tell us that this is an intense and localised amount of violence. This indicates a war.

Secondly the aim is to differentiate specific wars in specific regions and times out of the overall spread of warfare across the continent. Clustering is an interpretative aid for doing this by helping us see patterns and summaries that we might not percieve in a large amount of points, or by reading descriptions of all of them one by one. Automatically identified clusters need to be checked against history and Indigenous knowledge to confirm they make sense according to the wider pattern of events.

That clustering is effective is demonstrated by: i) identifying wars that we are already aware of as distinct wars, such as the Eumeralla War, the Wiradjuri homeland war, or the Bunuba Resistance and ii) identifying wars we weren't previously aware of as distinct wars which make sense when we scrutinise the series of events that took place.

Following the clustering process there has been some slight augmentation to the data to properly recognise some known regions of war: the Tiwi Islands, the Eyre Peninsula, and; the Flinders Rangers. Only one or two known massacres are recorded in the data for these regions, but they are already recognised as specific regions of warfare at particular times. To enable them to be shown on the map at least 3 sites are needed to form a polygon. Other incidents have been selected and added to the data so the wars can be represented on the map. These are violent incidents that would be included as a result of further research.

There are some outliers in the data, such as the massacre of Macassans in the Northern Territory and a massacre in the south east in 1900, much later than other massacres in the area. These have been exluded from clusters.

Massacres are only part of the story of Australian Wars. There were many other violent incidents. Almost all massacres were by colonists against Aboriginal people. These were not evenly matched wars like wars in Europe or Asia, where for example an army of soldier, cavalry and artillery lines up against a similar army on the other side. These were asymmetrical wars, and the two sides could hardly be more different, from opposite sides of the world, with little knowledge of each other, radically different cultures, economies, technologies and knowledge of local lands and waters. Each side had different tactics and strategy. Massacre was a strategy of occupying powers against insurgence. The British manual on colonial warfare 'Small Wars' by Callwell explains the use of massacre, including of women and children, and attacks on food sources, to compell guerillas into open combat, so that they may then be massacred in a decisive confrontation. The strategy of defending guerilla warriors focuses more on raiding, stealth, evasion and survival. By using only massacre data we are not including Indigenous actions in the analysis. Data on that is not yet available. The project 'The South Australian Frontier and its Legacies' includes many incidents of violence beyond massacres, but is limited to South Australia. It is feasible but it would be a very large project to map all such incidents across the continent.

This is only the first step to identify a minimum kernal of each war as indicated by massacres - this is at least sufficient to demonstrate a war. By identifying specific wars we can better research and acknowledge them. Each war identified requires further research to understand in more detail the flow of events, and how it started and ended, who was involved. The first step asks What wars? When? Where? so that we can move on to Who, How and Why?

40 wars (including maritime conflict) have been identified. There should be at least 40 collaborative research projects, 40 books, and 40 documentaries and other productions about these wars and resistance.