Performing Table Joins (QGIS3)¶
Not every dataset you want to use comes in spatial format. Often the data would come as a table or a spreadsheet and you would need to link it with your existing spatial data for use in your analysis. This operation is known as a Table Join and is done using the
Join attributes by field value Processing algorithm.
Overview of the task¶
We will use a shapefile of census tracts for California and population data table from US Census Bureau to create a population density map for california.
Other skills you will learn¶
Loading CSV files that do not contain any geometry in QGIS.
Using DB Manager to perform SQL queries to calculate group statistics.
Get the data¶
Americal FactFinder is a repository of all census data for the US. You can use Advanced Search and query for the Topic - Basic Count/Estimate and Geographies - All Census Tracts in California to create a custom CSV and download it. This tutorial uses
TOTAL POPULATION | 2017 ACS 5-year estimates data.
For convenience, you may directly download a copy of both the datasets from the links below:
tl_2018_06_tract.zipfile in the QGIS Browser and expand it. Select the
tl_2018_06_tract.shpfile and drag it to the canvas.
You will see the layer
tl_2018_06_tractloaded in the Layers panel. This layer contains the boundaries of census tracts in California. Right-click on the
tl_2018_06_tractlayer and select Open Attribute Table.
Examine the attributes of the layer. To join a table with this layer, we need a unique and common attribute for each feature. In this case, the
GEOIDfield is a unique identifier for each tract and can be used to link this layer with any other layer or table containing the same ID.
ACS_17_5YR_B01003.zipfile and open the
ACS_17_5YR_B01003_with_ann.csvfile in a text editor. You will notice that each row of the file contains information about a tract along with the unique identifier we saw in the previous step. Note that this field is called
GEO.id2in the CSV. You will also note that the
HD01_VD01column has population value for each of the census tract.
Before importing this CSV file, we need to make a minor edit. QGIS CSV importer expects the first row of the file to contain the column headers and all remaining rows to contain the data for these columns. This file contains an extra row 2 with column labels. Delete that row and save the file.
Now we are ready to import the CSV file to QGIS. Go to.
In the Data Source Manager window, click the … button and browse to the CSV file and select it. Make sure you have selected File format as CSV (comma separated values). Since we are importing this as a table, we must specify that our file contains no geometry using the No geometry (attribute table only) option. Verify that Sample Data preview at the bottom looks fine and click Add followed by Close.
The CSV will now be imported as a table to QGIS and will appear as
ACS_17_5YR_B01003_with_annin the Layers panel. Now we are ready to create the table join. Go to .
First we need to change a default setting in the Processing Toolbox. Click the Options button.
In the Processing Options tab, check the Use filename as layer name option. When using algorithms from Processing Toolbox, this option makes the output layer names much more intuitive and useful. Click OK.
Back in the Processing Toolbox, search and locate the algorithm and double-click it to open it.
In the Join Attributes by Field Values dialog, select
tl_2018_06_tractas Input layer and
GEOIDas the Table field. Select
ACS_17_5YR_B01003_with_annas the Input layer 2 and
GEO.id2as the Table field 2. Leave other options to their default values and click the … button to select the output file location and select
Save to GeoPackage....
Name the output geopackage as
joined.gpkgand the output layer as
joined. Click Run.
Once the processing finishes, verify that the algorithm was successful and click Close.
You will see a new layer
joinedloaded in the Layers panel. At this point, the fields from the CSV file are joined with the census tracts layer. You can close the Processing Toolbox for now. Right-click on the
joinedlayer and select Open Attribute Table.
You will see a new set of fields, including the
HD01_VD01field containing population estimates.
Now that we have the population data in the census tracts layer, we can style it to create a visualization of population density distribution. Select the
joinedlayer and click the Open the Layer Styling Panel button.
In the Layer Styling panel, select
Graduatedfrom the drop-down menu. As we are looking to create a population density map, we want to assign different color to each census tract feature based on the population density. We have the population in the HD01_VD01 field, but we don’t have population density in any fields to select as the Value. Fortunately, QGIS allows us to input an expression here. Click Expression button.
When creating a thematic (choropleth) map such as this, it is important to normalize the values you are mapping. Mapping total counts per polygon is not correct. It is important to normalize the values dividing by the area. If you are displaying totals such as crime, you can normalize them by diving by total population, thus mapping crime rate and not crime. Learn more
Enter the following expression to calculate the population density.
$areacalculates the area of the feature in square meters. We then convert it to square miles and calculate the population density with the formula Population/Area. Click OK.
Back in the Layer Styling Panel, choose a color ramp of your choice and click Classify. You can adjust the class ranges to be more appropriate to the region.
The visualization feels a bit cluttered because of the polygon borders. Click on the dropdown next to Symbol. Select Simple fill and check Transparent stroke.
Now we have a nice looking information visualization of population density in California.