Build a Simple Crop Disease Detection Model with PyTorch

October 12, 2020

By Rose Wambui


There has been an increase in deep learning applications in recent years, such as credit card fraud detection in finance, smart farming in agriculture, etc.

In this tutorial, we will be creating a simple crop disease detection using PyTorch. We will use a plant leaf dataset that consists of 39 different classes of crop diseases with RGB images. We will leverage the power of the Convolutional Neural Network(CNN) to achieve this.


1.Install PyTorch

2. Understanding of Convolutional Neural Network(CNN)

CNN is a type of neural network which includes convolutional and pooling layers.

Creating a CNN will involve the following:

Step 1: Data loading and transformation

Step 2: Defining the CNN architecture

Step 3: Define loss and optimizer functions

Step 4: Training the model using the training set of data

Step 5: Validating the model using the test set

Step 6: Predict

1.1 import our packages

import torch 
from torchvision import datasets, transforms, models

1.2 Load data

Set up the data directory folder

data_dir = "data/"

Every image is in the form of pixels that translate into arrays. PyTorch uses PIL - A python library for image processing.

Pytorch uses the torchvision module to load datasets. The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. We will use the ImageFolder class to load our dataset.

To load data using the ImageFolder data must be arranged in this format:



and NOT this format:


1.3 Split the dataset int train and validation sets

It’s advisable to set aside validation data for inference purposes.

I have created a module split_data that splits any given image classification data into train and validation with a ratio of 0.8:0.2.

train_data = datasets.ImageFolder(data_dir + '/train')
val_data = datasets.ImageFolder(data_dir + '/val')

1.4 Make the data Iterable

dataiter = iter(train_data)
images, clases = dataiter

The command above raises:

This means we can not iterate(meaning loop through) over the dataset. Pytorch use DataLoader to make the dataset iterable.

train_loader =, shuffle=True)
val_loader =,)

dataiter = iter(train_loader)
images, clases =

The code above raises:

The getitem method of ImageFolder returns an unprocessed PIL image. PyTorch uses tensors; since we will pass this data through PyTorch models, we need to transform the image to a tensor before using the data loader.

train_transforms = transforms.Compose([transforms.ToTensor()])

val_transforms = transforms.Compose([transforms.ToTensor(),

train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
val_data = datasets.ImageFolder(data_dir + '/val', transform=val_transforms)

train_loader =, batch_size=8, shuffle=True)
val_loader =, batch_size=8)

batch_size means run eight samples per iterations

Rerun the dataiter. Which will raise a runtime error;

In most scenarios, you will get images that are of different dimensions. In image processing, it’s recommended to transform the images to equal dimensions to ensure that the model can not prioritize predicting based on the dimensions. Thus we need to resize the images to the same shape then transform it into a tensor. The code below which combines all the steps we have discussed above.

Data Transformation and Augmentation

train_transforms = transforms.Compose([transforms.RandomRotation(30), #data augumnetation
                                       transforms.RandomHorizontalFlip(), #data augumnetation

val_transforms = transforms.Compose([
                                      transforms.RandomResizedCrop(224), #resize

train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
val_data = datasets.ImageFolder(data_dir + '/val', transform=val_transforms)

train_loader =, batch_size=8, shuffle=True)
val_loader =, batch_size=8)

dataiter = iter(train_loader)
images, classes =

Step 2: Model architecture

We’ll be using PyTorch nn module to build models.

When creating CNN, understanding the output dimensions after every convolutional and pooling layer is important.

2.1 Calculate output dimensions

Below is the formula to calculate dimensions through a convolutional layer


O - The output height/width
W - The input height/width
K - The kernel size
P - Padding
S - Stride

The formula below calculates dimensions after a max pool layer

We will create a sample CNN model of this architecture:

Input_image shape(RGB) = 224, 224, 3

1st convolutional layer

(224 - 3 ) + 1 = 222

the output will be 222 x 222 x 16 : Note 16 in the channel/color dimensions we have selected.

1st max-pooling layer

shape = 222 x 222 x 16 k = 2

222 / 2 = 111

The output image will be 111 x 111 x 16 (the color channel does not change after a max pool layer)

2nd Convolutional Layer

(111 - 3 + 2*1)/2 + 1 = 56

output image 56x 56 x 32

2nd max-pooling layer

56/2 = 28

output image 28 x 28 x 32

Fully connected layer

In the fully connected layer, you pass a flattened image, and the number of output classes required in this case is 39.

import torch.nn as nn
import numpy as np

class CropDetectCNN(nn.Module):
    # initialize the class and the parameters
    def __init__(self):
        super(CropDetectCNN, self).__init__()
        # convolutional layer 1 & max pool layer 1
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3),
        # convolutional layer 2 & max pool layer 2
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=3, padding=1, stride=2),
        #Fully connected layer
        self.fc = nn.Linear(32*28*28, 39)

    # Feed forward the network
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

model = CropDetectCNN()

Step 3: Loss and Optimizer

Loss determines how far the model deviates from predicting true values. Optimizer is the function used to change the neural networks’ attributes/parameters such as weights and learning rates.

These functions are dependant on the type of machine learning problem you are trying to solve. In our case, we are dealing with multi-class classification. You can research more on loss and optimization in neural networks.

For this case, we’ll use Cross-Entropy Loss and Stochastic Gradient Descent(SGD)

import torch.optim as optim

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.parameters(), lr=0.01)

Image analysis requires very high processing power, you can leverage free GPUs in the market. PyTorch uses CUDA to enable developers to run their products on GPU enabled environment.

# run on GPU if available else run on a CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Step 4 & 5: Model Training and Validation

epochs = 1 #run more iterations

for epoch in range(epochs):
    running_loss = 0
    for images, classes in train_loader:

        # To device - to transfrom the image and classes to CPU|GPU
        images, classes =,
        # clears old gradients from the last step
        # train the images
        outputs = model(images)
        #calculate the loss given the outputs and the classes
        loss = criterion(outputs, classes)
        # compute the loss of every parameter
        # apply the optimizer and its parameters
        #update the loss
        running_loss += loss.item()
        validation_loss = 0
        accuracy = 0
        # to make the model run faster we are using the gradients on the train
        with torch.no_grad():

            # specify that this is validation and not training

            for images, classes in val_loader:
                # Use GPU
                images, classes =,
                # validate the images
                outputs = model(images)
                # compute validation loss
                loss = criterion(outputs, classes)
                #update loss
                validation_loss += loss.item()
                # get the exponential of the outputs
                ps = torch.exp(outputs)
                #Returns the k largest elements of the given input tensor along a given dimension.
                top_p, top_class = ps.topk(1, dim=1)
                # reshape the tensor
                equals = top_class == classes.view(*top_class.shape)
                # calculate the accuracy.
                accuracy += torch.mean(equals.type(torch.FloatTensor))
        # change the mode to train for the next epochs

        print("Epoch: {}/{}.. ".format(epoch+1, epochs),
              "Training Loss: {:.3f}.. ".format(running_loss/len(train_loader)),
              "Valid Loss: {:.3f}.. ".format(validation_loss/len(val_loader)),
              "Valid Accuracy: {:.3f}".format(accuracy/len(val_loader)))

Step 6: Model prediction

Let’s see how our model can predict one of the images.

In the PyTorch ImageFolder we used, we have a variable class_to_idx which converted the class names to respective index. Since training uses the index we need to convert the predicted index to the corresponding class name

model.class_to_idx = train_data.class_to_idx

6.1 Process the image

from PIL import Image
import numpy as np

# Plot the image
def imshow(image_numpy_array):
    fig, ax = plt.subplots()
    # convert the shape from (3, 256, 256) to (256, 256, 3)
    image = image.transpose(0, 1, 2)

    return ax

def process_image(image_path):
    test_transform = transforms.Compose([
    im =
    im = test_transform(im)

    return im

def predict(image, model):
    # we have to process the image as we did while training the others
    image = process_image(image)
    #returns a new tensor with a given dimension
    image_input = image.unsqueeze(0)
    # Convert the image to either gpu|cpu
    # Pass the image through the model
    outputs = model(image_input)

    ps = torch.exp(outputs)
    # return the top 5 most predicted classes
    top_p, top_cls = ps.topk(5, dim=1)

    # convert to numpy, then to list 
    top_cls = top_cls.detach().numpy().tolist()[0]
    # covert indices to classes
    idx_to_class = {v: k for k, v in model.class_to_idx.items()}
    top_cls = [idx_to_class[top_class] for top_class in top_cls]
    return top_p, top_cls

import seaborn as sns
import matplotlib.pyplot as plt

def plot_solution(image_path, ps, classes):
    plt.figure(figsize = (6,10))
    image = process_image(image_path)

    sns.barplot(x=ps, y=classes, color=sns.color_palette()[2]);

Image sample prediction

image = "data/val/Apple___Apple_scab/image (102).JPG"
ps, classes = predict(image, model)
ps = ps.detach().numpy().tolist()[0]

plot_solution(image, ps, classes)
