<aside> 💡 Github page: https://github.com/anya765/defective-solar-cell-classification/
</aside>
A typical problem with the installation and maintenance of solar cells is determining when a cell is defective.
Below is a dataset of functional and defective solar cells that is extracted from electroluminescence imagery of photovoltaic modules
https://github.com/zae-bayern/elpv-dataset
“The dataset contains 2,624 samples of 300x300 pixels 8-bit grayscale images of functional and defective solar cells with varying degree of degradations extracted from 44 different solar modules. The defects in the annotated images are either of intrinsic or extrinsic type and are known to reduce the power efficiency of solar modules.
All images are normalized with respect to size and perspective. Additionally, any distortion induced by the camera lens used to capture the EL images was eliminated prior to solar cell extraction.”
# pytorch needs to know how to open up images + read labels
import cv2
class PlainDataset(torch.utils.data.Dataset):
def __init__(self, img_array_list, transform, label_list):
self.transform = transform
self.img_array_list = img_array_list
self.label_list = label_list
def __len__(self):
return len(self.img_array_list)
def __getitem__(self, idx):
gray = self.img_array_list[idx]
gray = cv2.resize(gray, dsize = (40,40), interpolation = cv2. INTER_CUBIC)
gray = gray[4:36, 4:36] # 64 px square
image = cv2.merge((gray, gray, gray))
image = self.transform(image)
label = self.label_list[idx]
return image, label
This code defines a class called PlainDataset
, which is a subclass of torch.utils.data.Dataset
. This is a useful class to subclass when you want to create a custom dataset to use with PyTorch's data loading utilities.
The PlainDataset
class has three methods:
__init__
: constructor of the class, takes three arguments: img_array_list
, transform
, and label_list
+ stores these arguments as instance variables so that they can be accessed by other methods of the class.__len__
: returns the length of the img_array_list
instance variable, which is used by PyTorch's data loading utilities to determine the number of samples in the dataset__getitem__
: called by PyTorch's data loading utilities to get a specific sample from the dataset, takes an index (idx
) as an argument and returns a tuple containing the image and label for the sample at that index; image is obtained by selecting the appropriate element from the img_array_list
instance variable, resizing it, cropping it to a 64x64 square, and applying the transform
function to it. label is obtained by selecting the corresponding element from the label_list
instance variable.import numpy as np
# normalizing images before loading into cnn
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# images to be in update step
batch_size = 4
# split dataset into test and train (2000/624), randomly sampled
shuffle_idx = [ind for ind in range(len(elpv_images))]
np.random.shuffle(shuffle_idx)
# instead of computing a continuous probability of whether a solar cell si defective - network will classify image as 0 (which is good), or 1, 2, 3 with 3 being most defective
trainset = PlainDataset(img_array_list=[elpv_images[j] for j in shuffle_idx[:2000]],
transform=transform,
label_list=[int(elpv_proba[j]*3) for j in shuffle_idx[:2000]])
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True, num_workers=2)
testset = PlainDataset(img_array_list=[elpv_images[j] for j in shuffle_idx[2000:]],
transform=transform,
label_list=[int(elpv_proba[j]*3) for j in shuffle_idx[2000:]])
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
shuffle=False, num_workers=2)
numpy
module and assigns it to the alias np
.transform
object that can be used to convert images to tensors and normalize them by subtracting the mean and dividing by the standard deviation of each channel.batch_size
variable with a value of 4
.