Msitu iko wapi? – 2

Posted on Jan 11, 2016

Classifying Landsat data is a well proven approach of extracting land use classes from satellite frequency bands. This post of the series will present how you can use QGIS to perform this work flow, visualize my results and also jump into a first discussion of data and methodological issues.

Due to the high fragmentation of farms, adjacent trees, bare soil and finally forest, it is obvious that the Landsat data can only very generally serve as input for the extraction of the different land use classes. The spatial resolution of 15 to 60 meters -depending on band and satellite- is the major constraint of how diverse your classes should be. For the classification of the research area I chose four classes which can easily be differentiated:

  • Primary forest – This land use class is dominated by trees that have attained a great age and mostly form huge areas of dense vegetation. This is also assumed to be the most important land use class for this analysis, as it hints on a healthy environment and diversity.
  • Secondary forest – This class is usually close to areas of primary forest and consists trees that have been re-grown. It can be differentiated from the primary forest class as it is light green on the RGB composite.
  • Grassland – Within forest areas and also on the forest edges you can find grassland. These areas have the lightest green and a low NDVI compared to the forest classes.
  • Farmland – This is the most complicated class: As most farms in this region are quite small, sometimes partly covered by shade trees and the vegetation coverage might be different depending on the vegetation periods of various crops, this land use class describes a wide range of pixel values.

Below, you will find a slide show of the classes.



To classify Landsat data with QGIS there is no way around the neat Semi-Automatic Classification Plugin developed by Luca Congedo. This tool is well maintained and handles Landsat data as well as Sentinel-2 data. I acquired most of my knowledge for this project from Luca’s homepage which serves with multiple tutorials and detailed documentation on how to use the tool. The plugin can basically assist you from downloading through processing the data up to the classification and validation.

For this first blog post there were the following tasks to be done:

  1. Download and prepare Landsat 5 and Landsat 8 data
  2. Create training Shapefile for regions of interests (ROI)
  3. Classify images to extract the land use classes
  4. Discuss first results

I will not write about each of these tasks, as mostly everything would just repeat the general documentation on Luca’s homepage. Thus, I directly hop into my first results and conclusions.


First of all, it is important to mention that the QGIS classification plugin is doing a great job. Once being used to all the options, working with it was very comfortable. Therefore, I experienced almost no technical issues while classifying. Most of the current issues stem much more from the limits of the possibilities the Landsat data offers. The research area in Kenya is quite fragmented. On the one hand this makes it even more reasonable to use automatic classification tools to examine the land use change, but one the other hand it also means that the post-processing and validation will be much more demanding.

The first and non-post-processed results of the classification are definitely acceptable and will provide a decent basis for further processing. I chose to use the Maximum Likelihood Classification method because it produced the most convincing result compared to the Minimum Distance and Spectral Angle Mapping methods.


The first classifications reflect the extent of forest quite well. General assumptions can be visually confirmed: The Kakamega Forest seems to recover since it is under protection. Grassland areas within the forest are decreasing and the east-west gap in the middle of the forest is almost closed. On the same time, both Nandi forest areas are loosing acreage. You can especially see this in the western part of South Nandi Forest.

Regarding the quality of the classification, you can see that there is still some work to be done. Especially in the 2015-scene a lot of small areas were classified as “primary forest”, although these cannot be defined as forest for their pure smallness. These patches are usually a bunch of shade spending trees within the farmland and should be reclassified. Same applies for the grassland pixels in obvious farmland. For the final examination I only want the grassland areas within or adjacent to the forest to be retained, as these can be somehow counted to the natural land use. Lastly, we can also observe a quite noisy change between primary and secondary forest in both pictures. I will therefore optimize these results in the following blog post and also discuss meaningful approaches to validate the classification.

# # # # #