<!DOCTYPE html>

LCAS Report



This document provides a sample report for automated insights into drivers and outcomes of crop prodution systems at landscape level as part of the LCAS system.

Initial overview

First, we need to load the datast of interest. Make sure it has been cleaned properly using the cleaning files.

We can then produce some key statistics to provide and overview of the dataset. The dataset contains data from the following countries: India, from the years 2017, 2018 for Rice. The average landholding size was 1.15 ha and the average size of the largest plot was 0.238 ha. Data has been collected in the following states/provinces: Bihar, UttarPradesh. The dataset contains 9,694 samples. And the following distribution of samples across the sample provinces/states:

State/Province % of samples
Bihar 73.66
UttarPradesh 26.34

Showing the sample locations

Yield and crop variety patterns

The average yield across the dataset is 4.04 t ha-1 with a standard deviation of 1.12 t ha-1.

The mean yield and standard deviation per state/province in descending order:

State/Province Mean yield Standard Deviation
UttarPradesh 4.24 1.05
Bihar 3.97 1.14

The same table per variety looks as follows:

Variety Type Mean yield Standard Deviation
Improved 4.09 1.15
Hybrid 3.95 1.03
Basmati 3.30 1.11
Local 3.26 1.26

The most common varieites grown are the following:

TODO Need script for cleaning variety names, lower_case, no space etc.

Variety name % of samples
MTU_7029 0.25
Arize6444Gold 0.19
BPT5204 0.08
Sarju52 0.06
Moti 0.05
The yield distribution looks as follows:

Management practices

On average, rice crops have been (trans)planted or directly sown on 12. of July and harvested on 14. of November with 3.95 irrigation events and a total of 127.34 kg urea and 48.46 kg npk applied per plot.

Irrigation

For the number of irrigation, the basic statistics look as follows:

Mean: 3.95

Median: 3

CV: 0.6

N Rate

For the N rate, the basic statistics look as follows:

Mean: 127.34

Median: 129.31

CV: 0.28

P rate

For the P rate, the basic statistics look as follows:

Mean: 48.46

Median: 50.1

CV: 0.48

Farm gate price

For the Farm gate price, the basic statistics look as follows:

Unit: INR / 100 kg

Mean: 1273.93

Median: 1250

CV: 0.17

Establishment date

For the Establishment date, the basic statistics look as follows:

Mean: 193.07

Median: 193

CV: 0.07

Agriculture income share

For the agriculture income share, the basic statistics look as follows:

Mean: 37.46

Median: 40

CV: 0.66

Market sales share

For the Market sales share, the basic statistics look as follows:

Mean: 32.35

Median: 30

CV: 0.84

Yield drivers

Constructing random forest yield prediction model and showing the key predictors as well as CART. Note that this is not a robust model, but an automatically generated first brush overview to get a glimpse of variables that are strongly associated with yield outcomes. Random forest models are generally robust in identifying some important variables, but tend to overlook others.

library(ranger)
df1 <- df[,names(df) %in% c("yield_lp_t_ha",
                            "irrigation_times",
                            "n_rate", 
                            "p_rate" , 
                            "soil_texture",
                            "transplanting_date_doy" , 
                            "variety_type" , 
                            "population_density" , 
                            "agri_income_importance" ,
                            "Year" , 
                            "market_distance",
                            "market_sale_share",
                            "total_crop_cult_area_ha",
                            "insecticide_applied",
                            "avg_farm_gate_price",
                            "manual_weeding_times",
                            "lodging_perc"
                            )]
df1 <- na.omit(df1) 
fit <- ranger(data = df1,  yield_lp_t_ha ~ .,importance = "permutation")
library(vip)
library(randomForestExplainer)
vip::vip(fit,n=20)+
  theme_classic2()

library(rpart)
library(rpart.plot)
#df1$yield_class <- ifelse(df1$yield_lp_t_ha <3.5, "low",
#                          ifelse(df1$yield_lp_t_ha>3.5 & df1$yield_lp_t_ha<5,"medium","high" ))
fit.tree = rpart(yield_lp_t_ha ~ ., data=df1,  method="anova", cp=0.0035)
bestcp <- fit.tree$cptable[which.min(fit.tree$cptable[,"xerror"]),"CP"]
pruned.tree <- prune(fit.tree, cp = bestcp)
rpart.plot(pruned.tree)