<!DOCTYPE html>
LCAS Report
LCAS team
2023-09-04
This document provides a sample report for automated insights into drivers and outcomes of crop prodution systems at landscape level as part of the LCAS system.
Initial overview
First, we need to load the datast of interest. Make sure it has been cleaned properly using the cleaning files.
We can then produce some key statistics to provide and overview of the dataset. The dataset contains data from the following countries: India, from the years 2017, 2018 for Rice. The average landholding size was 1.15 ha and the average size of the largest plot was 0.238 ha. Data has been collected in the following states/provinces: Bihar, UttarPradesh. The dataset contains 9,694 samples. And the following distribution of samples across the sample provinces/states:
State/Province | % of samples |
---|---|
Bihar | 73.66 |
UttarPradesh | 26.34 |
Showing the sample locations
Yield and crop variety patterns
The average yield across the dataset is 4.04 t ha-1 with a standard deviation of 1.12 t ha-1.
The mean yield and standard deviation per state/province in descending order:
State/Province | Mean yield | Standard Deviation |
---|---|---|
UttarPradesh | 4.24 | 1.05 |
Bihar | 3.97 | 1.14 |
The same table per variety looks as follows:
Variety Type | Mean yield | Standard Deviation |
---|---|---|
Improved | 4.09 | 1.15 |
Hybrid | 3.95 | 1.03 |
Basmati | 3.30 | 1.11 |
Local | 3.26 | 1.26 |
The most common varieites grown are the following:
TODO Need script for cleaning variety names, lower_case, no space etc.
Variety name | % of samples |
---|---|
MTU_7029 | 0.25 |
Arize6444Gold | 0.19 |
BPT5204 | 0.08 |
Sarju52 | 0.06 |
Moti | 0.05 |
Management practices
On average, rice crops have been (trans)planted or directly sown on 12. of July and harvested on 14. of November with 3.95 irrigation events and a total of 127.34 kg urea and 48.46 kg npk applied per plot.
Irrigation
For the number of irrigation, the basic statistics look as follows:
Mean: 3.95
Median: 3
CV: 0.6
N Rate
For the N rate, the basic statistics look as follows:
Mean: 127.34
Median: 129.31
CV: 0.28
P rate
For the P rate, the basic statistics look as follows:
Mean: 48.46
Median: 50.1
CV: 0.48
Farm gate price
For the Farm gate price, the basic statistics look as follows:
Unit: INR / 100 kg
Mean: 1273.93
Median: 1250
CV: 0.17
Establishment date
For the Establishment date, the basic statistics look as follows:
Mean: 193.07
Median: 193
CV: 0.07
Yield drivers
Constructing random forest yield prediction model and showing the key predictors as well as CART. Note that this is not a robust model, but an automatically generated first brush overview to get a glimpse of variables that are strongly associated with yield outcomes. Random forest models are generally robust in identifying some important variables, but tend to overlook others.
library(ranger)
df1 <- df[,names(df) %in% c("yield_lp_t_ha",
"irrigation_times",
"n_rate",
"p_rate" ,
"soil_texture",
"transplanting_date_doy" ,
"variety_type" ,
"population_density" ,
"agri_income_importance" ,
"Year" ,
"market_distance",
"market_sale_share",
"total_crop_cult_area_ha",
"insecticide_applied",
"avg_farm_gate_price",
"manual_weeding_times",
"lodging_perc"
)]
df1 <- na.omit(df1)
fit <- ranger(data = df1, yield_lp_t_ha ~ .,importance = "permutation")
library(vip)
library(randomForestExplainer)
vip::vip(fit,n=20)+
theme_classic2()
library(rpart)
library(rpart.plot)
#df1$yield_class <- ifelse(df1$yield_lp_t_ha <3.5, "low",
# ifelse(df1$yield_lp_t_ha>3.5 & df1$yield_lp_t_ha<5,"medium","high" ))
fit.tree = rpart(yield_lp_t_ha ~ ., data=df1, method="anova", cp=0.0035)
bestcp <- fit.tree$cptable[which.min(fit.tree$cptable[,"xerror"]),"CP"]
pruned.tree <- prune(fit.tree, cp = bestcp)
rpart.plot(pruned.tree)