Nearest Neighbor Analysis (QGIS3)

GIS is very useful in analyzing spatial relationship between features. One such analysis is finding out which features are closest to a given feature. There are multiple ways to do this analysis in QGIS. You can do a spatial join using the Join Attributes by Nearest or get the distances to all features from another layer using the Distance Matrix tool from the Processing Toolbox. In this tutorial, we will explore a tool named Distance to nearest hub from the Processing Toolbox that can not only find the distance to the closest feature but join it with a line to it for visualizing the results.

Overview of the task

Given the locations of all known earthquakes between years 1900 and 2000, find out the nearest populated place for each location where the earthquake happened.

Get the data

  1. For this tutorial we will download a dataset of earthquakes between 1900-2000 from NOAA’s National Geophysical Data Center produces a great dataset of all significant earthquakes since 2150 BC. Visit the NOAA NCEI portal and enter Min as 1900 and Max as 2000. This will return all earthquake incidents that occurred and were recorded by NOAA between those years. For other specific results, you can filter with different parameters. Click Search.

  1. As a result, we got 2585 earthquake incidents. Click on the Download TSV icon.


Natural Earth has a nice Populated Places dataset. Download the simple (less columns) dataset

For convenience, you may directly download a copy of both the datasets from the links below:




  1. Locate the downloaded file in the Browser panel and expand it. Drag the ne_10m_populated_places_simple.shp file to the canvas.

  1. You will see a new layer ne_10m_populated_places_simple loaded in the Layers panel. This layer contains the points representing populated places. Now we will load the earthquakes layer. This layer comes as a Tab Serepated Values (TSV) text file. To load this file, click the Open Data Source Manager button on the Data Source Toolbar. You can also use Ctrl + L keyboard shortcut.

  1. In the Data Source Manager dialog box, select Delimited Text.

  1. Click the ... button next to File name and browse to the downloaded earthquakes-2021-11-25_13-39-30_+0530.tsv file. Depending upon the operating system, you may not see the file at the downloaded directory. If that is the case, switch to All files (*; .) in the Choose a Delimited Text File to Open dialog. Once opened, select Custom delimiters in the File format section, and check Tab. In the Geometry definition section, choose Point coordinates. By default X field and Y field values will be auto-populated with the appropriate fields in the input. In our case, they are Longitude and Latitude. You can leave the Geometry CRS to the default EPSG:4326 - WGS 84 CRS. If your file contains coordinates in a different CRS, you could select the appropriate CRS here. Click Add followed by Close.

  1. Zoom around and explore both datasets. Each red point represents the location of an earthquake incidence, and each green point represents the location of a populated place. Our goal is to find out the nearest point from the populated places layer for each of the points in the earthquake layer. Let's inspect the Attribute table of the earthquakes layer. Select the layer and click on Open Attribute Table icon in Toolbar.

  1. There are 2586 features, but the data contains few entries with no latitude or longitude infomation. We have to remove that before proceeding further. Close the Attribute Table.

  1. Go to Processing ‣ Toolbox ‣ Vector geometry ‣ Remove null geometries tool. Double-click to open it.

  1. In the Remove Null Geometries dialog box, select earthquakes-2021-11-25_13-39-30_+0530 as the Input layer and check the box Also remove empty geometries. Click Run. Once the processing finishes, click Close.

  1. A new layer Non null geometries will be added to the Layers panel. For analysis we will use this layer instead of the original layer. Un-check the earthquakes-2021-11-25_13-39-30_+0530 layer in the Layers panel to hide it. Select the Non null geometries layer and click the Open Attribute Table button from the Attributes Toolbar.

  1. You will see a lower count for total features as all rows with empty latitude and longitude values were removed. Close the attribute table.

  1. Now it is time to perform the nearest neighbor analysis. Search and locate the Processing ‣ Toolbox ‣ Vector analysis ‣ Distance to nearest hub (line to hub) tool. Double-click to launch it.



We can also add a point layer as output, use the Distance to nearest hub (points) tool for that.

  1. In the Distance to Nearest Hub (Line to Hub) dialog box, select Non null geometries as the Source points layer. Select ne_10m_populated_places_simple as the Destination hubs layer. Select name as the Hub layer name attribute. The tool will also compute the straight-line distance between the populated place and the nearest earthquake. Set Kilometers as the Measurement unit. Click on ... in Hub Distance and click Save to File... to save the file as earthquakes_with_nearest_city.gpkg . Click Run. Once the processing finishes, click Close.

  1. Back in the main QGIS window, you will see a new line layer called earthquakes_with_nearest_city loaded in the Layers panel. This layer has line features connecting each earthquake point to the nearest populated place. Select the earthquakes_with_nearest_city layer and click Open Attribute Tabel icon in Toolbar.

  1. Scroll right to the last columns, and you will see 2 new attributes called HubName and HubDist added to the original earthquake features. This is the name of the distance to the nearest neighbor from the populated places layer.


If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)