TY - JOUR
T1 - Combined analysis of satellite and ground data for winter wheat yield forecasting
AU - Broms, Camilla
AU - Nilsson, Mikael
AU - Oxenstierna, Andreas
AU - Sopasakis, Alexandros
AU - Åström, Karl
N1 - Publisher Copyright:
© 2022 The Authors
PY - 2023/2
Y1 - 2023/2
N2 - We built machine learning and image analysis tools in order to forecast winter wheat yield based on a rich multi dimensional tensor of agricultural information spanning different scales. This information consists of satellite multi-band images, local soil samples obtained from national databases, local weather as well as field data from 23 farms cultivating winter wheat in southern Sweden. This is inherently a large multi-scale problem due to the large temporal and spatial variation of the input data. We aggregate the data on spatially averaged features over grids which temporally span a seasonal timeline from seeding to harvest. Data cleaning is performed through interpolation for satellite images due to cloud obstructions. Furthermore data is heavily imbalanced since the amount of satellite information far exceeds that of the ground data. Data variance therefore can be an issue which we counter by using a decision tree approach. We find that the Light Gradient Boosting decision tree trained on 262 input features is able to predict winter wheat yield with 82% accuracy. Subsequently we employ game theory in order to better understand the relational importance of specific input features towards forecasting yield. Specifically we find that some of the most important features towards the resulting predictions are the percent clay and magnesium in the soil. Similarly the most important features from the satellite data are: a) the NORM index (Euclidean distance of all bands) computed in the second week of April, b) the NORM index computed in the middle of May as well as c) the second spectral band from the last week of June.
AB - We built machine learning and image analysis tools in order to forecast winter wheat yield based on a rich multi dimensional tensor of agricultural information spanning different scales. This information consists of satellite multi-band images, local soil samples obtained from national databases, local weather as well as field data from 23 farms cultivating winter wheat in southern Sweden. This is inherently a large multi-scale problem due to the large temporal and spatial variation of the input data. We aggregate the data on spatially averaged features over grids which temporally span a seasonal timeline from seeding to harvest. Data cleaning is performed through interpolation for satellite images due to cloud obstructions. Furthermore data is heavily imbalanced since the amount of satellite information far exceeds that of the ground data. Data variance therefore can be an issue which we counter by using a decision tree approach. We find that the Light Gradient Boosting decision tree trained on 262 input features is able to predict winter wheat yield with 82% accuracy. Subsequently we employ game theory in order to better understand the relational importance of specific input features towards forecasting yield. Specifically we find that some of the most important features towards the resulting predictions are the percent clay and magnesium in the soil. Similarly the most important features from the satellite data are: a) the NORM index (Euclidean distance of all bands) computed in the second week of April, b) the NORM index computed in the middle of May as well as c) the second spectral band from the last week of June.
KW - Decision trees
KW - Relational importance
KW - Satellite
KW - Shapley values
KW - Soil samples
KW - Winter wheat yield
UR - https://www.scopus.com/pages/publications/85148379558
U2 - 10.1016/j.atech.2022.100107
DO - 10.1016/j.atech.2022.100107
M3 - Article
AN - SCOPUS:85148379558
SN - 2772-3755
VL - 3
JO - Smart Agricultural Technology
JF - Smart Agricultural Technology
M1 - 100107
ER -