<aside> 💡 Github page: https://github.com/anya765/defective-solar-cell-classification/

</aside>

A typical problem with the installation and maintenance of solar cells is determining when a cell is defective.

problem and model overview

Dataset

Below is a dataset of functional and defective solar cells that is extracted from electroluminescence imagery of photovoltaic modules

https://github.com/zae-bayern/elpv-dataset

“The dataset contains 2,624 samples of 300x300 pixels 8-bit grayscale images of functional and defective solar cells with varying degree of degradations extracted from 44 different solar modules. The defects in the annotated images are either of intrinsic or extrinsic type and are known to reduce the power efficiency of solar modules.

All images are normalized with respect to size and perspective. Additionally, any distortion induced by the camera lens used to capture the EL images was eliminated prior to solar cell extraction.”

1. open images + read labels from dataset

# pytorch needs to know how to open up images + read labels

import cv2

class PlainDataset(torch.utils.data.Dataset):
  def __init__(self, img_array_list, transform, label_list):
    self.transform = transform 
    self.img_array_list = img_array_list
    self.label_list = label_list

  def __len__(self):
    return len(self.img_array_list)

  def __getitem__(self, idx):
    gray = self.img_array_list[idx]
    gray = cv2.resize(gray, dsize = (40,40), interpolation = cv2. INTER_CUBIC)
    gray = gray[4:36, 4:36] # 64 px square
    image = cv2.merge((gray, gray, gray))
    image = self.transform(image)

    label = self.label_list[idx]

    return image, label

This code defines a class called PlainDataset, which is a subclass of torch.utils.data.Dataset. This is a useful class to subclass when you want to create a custom dataset to use with PyTorch's data loading utilities.

The PlainDataset class has three methods:

  1. __init__: constructor of the class, takes three arguments: img_array_list, transform, and label_list + stores these arguments as instance variables so that they can be accessed by other methods of the class.
  2. __len__: returns the length of the img_array_list instance variable, which is used by PyTorch's data loading utilities to determine the number of samples in the dataset
  3. __getitem__: called by PyTorch's data loading utilities to get a specific sample from the dataset, takes an index (idx) as an argument and returns a tuple containing the image and label for the sample at that index; image is obtained by selecting the appropriate element from the img_array_list instance variable, resizing it, cropping it to a 64x64 square, and applying the transform function to it. label is obtained by selecting the corresponding element from the label_list instance variable.

2. setting up workflow to train CNN

import numpy as np

# normalizing images before loading into cnn
transform = transforms.Compose(
    [transforms.ToTensor(), 
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# images to be in update step
batch_size = 4

# split dataset into test and train (2000/624), randomly sampled
shuffle_idx = [ind for ind in range(len(elpv_images))]
np.random.shuffle(shuffle_idx)

# instead of computing a continuous probability of whether a solar cell si defective - network will classify image as 0 (which is good), or 1, 2, 3 with 3 being most defective
trainset = PlainDataset(img_array_list=[elpv_images[j] for j in shuffle_idx[:2000]], 
                        transform=transform, 
                        label_list=[int(elpv_proba[j]*3) for j in shuffle_idx[:2000]])
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = PlainDataset(img_array_list=[elpv_images[j] for j in shuffle_idx[2000:]], 
                        transform=transform, 
                        label_list=[int(elpv_proba[j]*3) for j in shuffle_idx[2000:]])
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)
  1. imports the numpy module and assigns it to the alias np.
  2. creates a transform object that can be used to convert images to tensors and normalize them by subtracting the mean and dividing by the standard deviation of each channel.
  3. defines a batch_size variable with a value of 4.