*Hoverview is a series of blog posts throughout the ICAERUS project discussing everything drone technologies. This third instalment presents an implementation of quantization for deep learning models to improve the inference speed on an ICAERUS dataset.*

## Why do we need quantization?

Unmanned Aerial Vehicles (UAVs) are increasingly posed as an important tool for achieving precision agricultural practices. Precision agriculture is a paradigm in agriculture where precise data from various sensors and platforms on the farm is acquired to inform various activities at the farm. Using this precise data, farmholders are expected to make more informed decisions, which should increase yields, provide healthier food, and be a more sustainable approach altogether (Ryan, Isakhanyan & Tekinerdogan, 2023).

One such use case where UAVs are applied at the farm level is in cattle management. According to Alenzi et al. (2022) in a review on UAVs for cattle management is: “(the UAV) strength is the capability to reach a remote location with minimum time, effort, and energy, without human presence”. Within this use case, cattle detection and counting is an ideal candidate for applying the UAV.

In cattle detection and counting, the UAV provides and aerial perspective in which image data is analyzed and the individual cattle are detected and counted. Especially the use of Deep Learning has attracted much attention to solving the cattle detection problem (Alenzi et al., 2022). Various object detection networks have been shown remarkable accuracy in detecting cattle (Mahmud, 2021). The problem however with these networks is that they require additional hardware to be applied in the field. Ideally, the Deep Learning networks can be used on the hardware of the UAV itself. Where the UAV provides insights into the status of the cattle, whilst airborne. This enables the UAV to be used in the real-time decision-making loop of the farmholder.

A potential pathway to provide on-device, real-time insights is through quantization. In quantization, all the decimal numbers (floating points) that make up the majority of the detection network are replaced by whole numbers (integers). These integers enable much smaller networks (in byte size), as well as increase the processing speed, often at a loss of accuracy. Two approaches exist to convert the floating point number to integers in deep learning: quantization-aware training (QAT) and post-training quantization (PTQ) (Jacob et al., 2017). QAT in short is that most model weights are already clamped to integer representations during training time, which requires specific training methods. PTQ however, applies quantization after training time, and can be applied to any network.

This post uses a specific PTQ approach called piecewise linear quantization (PWLQ) from Fang et al. (2020). In PWLQ, the entire quantization range is weighted to where the most values are present, this method maintains a higher accuracy over a naive quantization approach (Fang et al., 2020).

## What is shown in this blog post?

- Utilizing the ssdlite320 MobileNet V3 large pretrained model fine-tuned on a dedicated cow dataset for drone-based cow object detection
- Implementation of Piecewise Linear Quantization (PWLQ) for varied bit sizes applied to model parameters
- Impact on Precision, Recall, and Processing Time analysis of different bit sizes in PWLQ for optimized cow detection

Through this exploration, our aim is to uncover insights into the nuanced effects of piecewise linear quantization, offering a comprehensive understanding of the trade-offs and optimizations achievable by adjusting bit sizes in the context of cow detection from drone imagery.

### Setting up the environment

` ````
```import os
import numpy as np
from PIL import Image
import xml.etree.ElementTree as ET
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import torch
import torchvision
from torchvision.models.detection import ssdlite320_mobilenet_v3_large
from torchvision.datasets import VisionDataset
from torchvision.transforms import v2
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
import torchvision.transforms.v2 as transforms

First we create a custom VisionDataset class to read our JPEG images and xml annotations of cows. For convenience we already predefined a train-val-test split. The dataset can be downloaded here.

` ````
```class Cows_VOCDataset(VisionDataset):
def __init__(self, root_dir,label_map,transform=None,image_set="train"):
self.root_dir = root_dir
self.transform = transform
self.image_set = image_set # You may adjust this for "train", "val", or "test"
self.image_ids = self.load_image_ids()
self.label_map = label_map
def load_image_ids(self):
image_set_file = os.path.join(self.root_dir, f"ImageSets/Main/{self.image_set}.txt")
with open(image_set_file, "r") as f:
image_ids = [line.strip() for line in f.readlines()]
return image_ids
def __getitem__(self, idx):
img_id = self.image_ids[idx]
img_path = os.path.join(self.root_dir, f"JPEGImages/{img_id}.JPG")
annotation_path = os.path.join(self.root_dir, f"Annotations/{img_id}.xml")
# Load image
image = Image.open(img_path).convert("RGB")
# Parse XML for bounding box annotations
tree = ET.parse(annotation_path)
root = tree.getroot()
boxes = []
labels = []
for obj in root.findall("object"):
label = obj.find("name").text
if label != "cow":
continue
labels.append(label)
bbox = obj.find("bndbox")
xmin = float(bbox.find("xmin").text)
ymin = float(bbox.find("ymin").text)
xmax = float(bbox.find("xmax").text)
ymax = float(bbox.find("ymax").text)
boxes.append([xmin, ymin, xmax, ymax])
# Map labels to numerical indices
labels = [self.label_map[label] for label in labels]
target = {"boxes": torch.tensor(boxes,dtype=torch.float32),
"labels": torch.tensor(labels,dtype=torch.int8)}
if self.transform:
image, target = self.transform(image, target)
return image, target
def __len__(self):
return len(self.image_ids)

Based on the previous post, setup the python file structures as you please, in our case we find the cow imagery and annotations under: /home/jovyan/a16-winterschool-cowtization/data/voc_cows’.

` ````
```directory = '/home/jovyan/a16-winterschool-cowtization/data/voc_cows'
train_dataset = Cows_VOCDataset(root_dir=directory, image_set='train',label_map = {"cow": 1})
test_dataset = Cows_VOCDataset(root_dir=directory, image_set='test',label_map = {"cow": 1})
val_dataset = Cows_VOCDataset(root_dir=directory, image_set='val',label_map = {"cow": 1})

Let’s visualize the first item of our training dataset without and with the annotation of the cows (bounding boxes).

` ````
```img, target = next(iter(train_dataset))
img = np.asarray(img)
boxes = np.asarray(target["boxes"])
fig, ax = plt.subplots(1, 2, figsize=(15, 10))
ax[0].imshow(img / 255.0)
ax[1].imshow(img / 255.0)
for box in boxes:
xmin, ymin, xmax, ymax = box[0], box[1], box[2], box[3]
rectangle = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, linewidth=2, edgecolor='r', facecolor='none')
ax[1].add_patch(rectangle)
ax[0].axis('off')
ax[1].axis('off')
plt.show()

## Preparing the dataset by tiling

As shown above, the images in our dataset are incredibly large (3000 by 4000 pixels), in order to effectively process these images we first have to tile them into 320 by 320 pixels images. For this we made the tiling function below, which crops both the image into these 320 by 320 pixel tiles and makes sure the bounding boxes of the cows still match.

*Note: Best-practice is to perform random tiling during training and use a method like sliding window to calculate the performance over the whole image during inference. For simplicity all downstream analysis were conducted using the tiled images, where empty, i.e. no cows in tile, were discarded*

` ````
```def tile_dataset(torch_dataset, tile_size):
imgs = []
boxes = []
labs = []
for item in torch_dataset:
image, target = item
bboxes = target["boxes"]
labels = target["labels"]
width, height = image.size
for ymin in range(0, height, tile_size):
for xmin in range(0, width, tile_size):
ymax = ymin + tile_size
xmax = xmin + tile_size
tile = image.crop((xmin, ymin, xmax, ymax))
adjusted_bboxes = []
for box in bboxes:
box_xmin, box_ymin, box_xmax, box_ymax = box
if (box_xmax > xmin and box_xmin < xmax and
box_ymax > ymin and box_ymin < ymax):
# Adjust coordinates to local tile coordinates
adjusted_box = [max(0, box_xmin - xmin), max(0, box_ymin - ymin),
min(tile_size, box_xmax - xmin), min(tile_size, box_ymax - ymin)]
adjusted_bboxes.append(adjusted_box)
if adjusted_bboxes:
imgs.append(tile)
boxes.append(adjusted_bboxes)
zeros_list = [0]* len(adjusted_bboxes)
labs.append(zeros_list)
return imgs, boxes, labs
train_imgs, train_boxes, train_labs = tile_dataset(train_dataset, tile_size=320)

After tiling we create a new CustomDataset. We also transform the tiles to tensors and normalize the input values between 0-1 as this is required by SSDlite.

` ````
```class CustomDataset(Dataset):
def __init__(self, images, labels, boxes, transform=None):
self.images = images
self.labels = labels
self.boxes = boxes
self.transform = transform
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
image = self.images[idx]
labels = torch.tensor(self.labels[idx], dtype=torch.int8)
boxes = torch.tensor(self.boxes[idx], dtype = torch.float32)
target = {'boxes': boxes,
'labels': labels}
if self.transform:
image = self.transform(image)
return image, target
def collate_fn(batch):
return tuple(zip(*batch))

` ````
```tfs = transforms.Compose([
transforms.PILToTensor(),
#for RGB images requried as input for SSDlite: normalize to 0:1 range
transforms.ToDtype(torch.float32),
transforms.Normalize(mean=[0.0,0.0,0.0],std=[1.0,1.0,1.0]),
]
)
train_set = CustomDataset(images= train_imgs, labels = train_labs, boxes = train_boxes, transform = tfs)
train_loader = DataLoader(train_set, batch_size=1, shuffle=False, collate_fn = collate_fn)

Let’s now visualize a few items of our tiled training dataset with the annotation of the cows (bounding boxes).

*Note: for futher analysis we already implemented the plot function with the ability to show also the prediction output*

` ````
```def show_tiles(dataloader, model=None, iou=0.1):
fig, ax = plt.subplots(3, 4, figsize=(20, 15))
data_iter = iter(dataloader)
# i, j could be an argument, but for now this is okay
for i in range(0, 3):
for j in range(0, 4):
image, target = next(data_iter)
ax[i,j].imshow(np.transpose(image[0].numpy(), (1, 2, 0)) / 255.0)
for box in target[0]["boxes"]:
xmin, ymin, xmax, ymax = box[0], box[1], box[2], box[3]
rectangle = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, linewidth=2, edgecolor='r', facecolor='none')
ax[i,j].add_patch(rectangle)
# Here we add prediction
# Note that this also includes the forward pass through the network
if model:
prediction = model(image)
for box, score, label in zip(prediction[0]["boxes"], prediction[0]["scores"], prediction[0]["labels"]):
box = box.detach().numpy()
score = score.detach().numpy()
label = label.detach().numpy()
# We only show cows (label 1) with a score >= iou (default = 0.1)
if score >= iou and label == 1:
xmin, ymin, xmax, ymax = box[0], box[1], box[2], box[3]
rectangle = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, linewidth=2, edgecolor='b', facecolor='none')
ax[i,j].add_patch(rectangle)
ax[i,j].axis('off')
plt.subplots_adjust(wspace=0.1, hspace=0.1)
plt.show()

## Inference of ssdlite320 MobileNet V3 for cow detection

We used the prerained ssdlite320 MobileNet V3 large inline with Fang et al., 2020.

It’s essential to note that the pretrained model was originally trained on 91 classes, of which cows was not a trained class. Throughout our investigation, we experimented with various ssdlite320 configurations, namely:

- Train from scratch and restructuring the model to accommodate only 2 classes (background, cow).
- Swapping the final classification layer with a new 2-class output layer (background, cow), as opposed to the original 91 classes.
- Fine-tuning the pretrained model while overwriting class 1 with the cow class.

Interestingly, the third approach yielded the best performance. The model weights can be downloaded here.

*Note: There is a lot of room for improvement in training a better object detection model for cow detection. As this was not our main focus we used the third approach for downstream analysis. Also, as performance is very low we are going to cheat a bit by using the training dataset also for downstream analysis, i.e. quantization experiments.*

` ````
```model = ssdlite320_mobilenet_v3_large(weights= "SSDLite320_MobileNet_V3_Large_Weights.DEFAULT")
model.load_state_dict(torch.load("models/ssdlite_cows_model_v2_500e.pth"))
model = model.eval()

` ````
```show_tiles(train_loader, model=model)

` ````
```# Define a function to calculate mAP
def calculate_iou(box1, box2):
# Calculate IoU between two bounding boxes
x1, y1, w1, h1 = box1
x2, y2, w2, h2 = box2
intersect_x = max(0, min(x1 + w1, x2 + w2) - max(x1, x2))
intersect_y = max(0, min(y1 + h1, y2 + h2) - max(y1, y2))
intersection = intersect_x * intersect_y
union = w1 * h1 + w2 * h2 - intersection
return intersection / union
# Define a function to calculate mAP
def calculate_performance(model, data_loader, iou_threshold=0.5):
model.eval()
model.to("cpu")
true_positives = 0
false_positives = 0
false_negatives = 0
final_iou = []
with torch.no_grad():
for images, targets in data_loader:
images = [image for image in images]
gt_boxes = [t["boxes"].detach().numpy() for t in targets][0]
predictions = model(images)
predictions = [{k: v.detach().numpy() for k, v in pred.items()} for pred in predictions]
for box, score, label in zip(predictions[0]["boxes"], predictions[0]["scores"], predictions[0]["labels"]):
if score >= 0.1 and label == 1:
match = False
iou_found = 0
for gt_box in gt_boxes:
iou = calculate_iou(box, gt_box)
if iou > iou_threshold:
iou_found = iou
match = True
if match:
true_positives += 1
else:
false_positives += 1
final_iou.append(iou_found)
for gt_box in gt_boxes:
match = False
for box, score, label in zip(predictions[0]["boxes"], predictions[0]["scores"], predictions[0]["labels"]):
if score >= 0.1 and label == 1:
iou = calculate_iou(box, gt_box)
if iou > iou_threshold:
match = True
if not match:
false_negatives += 1
precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)
return precision, recall

` ````
```precision, recall = calculate_performance(model, train_loader)
print("Precision:", precision, "Recall:", recall)

*Precision: 0.36016655100624567 Recall: 0.001084655365909219*

As you can see the precision and especially recall on this model is… not great. It misses most of the cows in the image. Moving on! Let’s get to the meat of this post: Quantization

## Piecewise Linear Quantization (PWLQ)

Piecewise Linear Quantization (PWLQ) aims to optimize the quantization of a weight tensor through an iterative process (Fang et al., 2020.). Across three stages, it systematically explores various breakpoint ratios, refining the search range to minimize quantization errors. The function outputs the quantized weights with the least error, the optimal breakpoint ratio, and the minimum quantization error. In essence, PWLQ offers a methodical strategy for determining optimal breakpoints, enhancing the efficiency of piecewise linear quantization for a given weight tensor.

` ````
```## This code was provided by the authors (Fang et al., 2020.): https://github.com/jun-fang/PWLQ ##
## We adjusted the codes readability by simplifying and adding documentation ##
def piecewise_linear_quant(w, bits=4.0, scale_bits=0.0, search_range=10):
"""
Perform piecewise linear quantization on a given weight tensor.
Parameters:
- w (torch.Tensor): Input weight tensor to be quantized.
- bits (float): Total number of bits for quantization.
- scale_bits (float): Number of bits dedicated to scaling.
- search_range (int): Range for searching breakpoints.
Returns:
- torch.Tensor: Quantized weights with the least error.
- float: The optimal ratio of the breakpoint.
- float: Minimum quantization error achieved.
"""
min_err = 10000
abs_max = torch.max(torch.abs(w))
## first stage
for bkp_ratio in np.arange(0.1, 1.0, 0.1):
break_point = bkp_ratio * abs_max
err, qw = pwlq_quant_error(w, bits, scale_bits, abs_max, break_point)
if err < min_err:
min_err = err
best_ratio = bkp_ratio
best_qw = qw
## second stage
ratio_start, ratio_end = best_ratio - 0.01 * search_range, best_ratio + 0.01 * search_range
for bkp_ratio in np.arange(ratio_start, ratio_end, 0.01):
break_point = bkp_ratio * abs_max
err, qw = pwlq_quant_error(w, bits, scale_bits, abs_max, break_point)
if err < min_err:
min_err = err
best_ratio = bkp_ratio
best_qw = qw
## third stage
ratio_start, ratio_end = best_ratio - 0.001 * search_range, best_ratio + 0.001 * search_range
for bkp_ratio in np.arange(ratio_start, ratio_end, 0.001):
break_point = bkp_ratio * abs_max
err, qw = pwlq_quant_error(w, bits, scale_bits, abs_max, break_point)
if err < min_err:
min_err = err
best_ratio = bkp_ratio
best_qw = qw
return best_qw, best_ratio, min_err
def pwlq_quant_error(w, bits, scale_bits, abs_max, break_point):
"""
Calculate quantization error and perform piecewise linear quantization.
Parameters:
- w (torch.Tensor): Input weight tensor to be quantized.
- bits (float): Total number of bits for quantization.
- scale_bits (float): Number of bits dedicated to scaling.
- abs_max (torch.Tensor): Absolute maximum value in the weight tensor.
- break_point (float): Breakpoint for piecewise linear quantization.
Returns:
- float: Quantization error.
- torch.Tensor: Quantized weights based on the piecewise linear approach.
"""
qw_tail = uniform_symmetric_quantizer(w,
bits=bits, scale_bits=scale_bits, minv=-abs_max, maxv=abs_max)
qw_middle = uniform_symmetric_quantizer(w,
bits=bits, scale_bits=scale_bits, minv=-break_point, maxv=break_point)
qw = torch.where(-break_point < w, qw_middle, qw_tail)
qw = torch.where(break_point > w, qw, qw_tail)
err = torch.sqrt(torch.sum(torch.mul(qw - w, qw - w)))
return err, qw
def uniform_symmetric_quantizer(x, bits=8.0, minv=None, maxv=None, signed=True,
scale_bits=0.0, num_levels=None, scale=None, simulated=True):
"""
Perform uniform symmetric quantization on a given tensor.
Parameters:
- x (torch.Tensor): Input tensor to be quantized.
- bits (float): Total number of bits for quantization.
- minv (float): Minimum value for quantization.
- maxv (float): Maximum value for quantization.
- signed (bool): Whether the quantization is signed or not.
- scale_bits (float): Number of bits dedicated to scaling.
- num_levels (int): Number of quantization levels.
- scale (torch.Tensor): Scaling factor for quantization.
- simulated (bool): Flag indicating simulated or actual quantization.
Returns:
- torch.Tensor: Quantized tensor (if simulated).
- torch.Tensor: Dequantized tensor.
"""
if minv is None:
maxv = torch.max(torch.abs(x))
minv = - maxv if signed else 0
if signed:
maxv = np.max([-float(minv), float(maxv)])
minv = - maxv
else:
minv = 0
if num_levels is None:
num_levels = 2 ** bits
if scale is None:
scale = (maxv - minv) / (num_levels - 1)
if scale_bits > 0:
scale_levels = 2 ** scale_bits
scale = torch.round(torch.mul(scale, scale_levels)) / scale_levels
x = torch.clamp(x, min=float(minv), max=float(maxv))
x_int = torch.round(x / scale)
if signed:
x_quant = torch.clamp(x_int, min=-num_levels/2, max=num_levels/2 - 1)
assert(minv == - maxv)
else:
x_quant = torch.clamp(x_int, min=0, max=num_levels - 1)
assert(minv == 0 and maxv > 0)
x_dequant = x_quant * scale
return x_dequant if simulated else x_quant

Now we create three copies of the initial model and perform quantization based on various bit sizes: 32, 8 and 4. *Note: bit size 32 is the same as the original model*

` ````
```import copy
model.eval()
models = [copy.deepcopy(model), copy.deepcopy(model), copy.deepcopy(model)]
# Quant models with 32 (original), 8 and 4 bits
quantization = ["32", "8", "4"]
models_quant = []
for m, quant in zip(models, quantization):
for name, param in m.named_parameters():
param = piecewise_linear_quant(param, bits=int(quant))[0]
quantized_param = torch.nn.Parameter(param)
m.state_dict()[name].copy_(quantized_param)
models_quant.append(m)

We can visualize how the values of the weights change for each different setting of bit size in the PWLQ. Notice, that from 32 to 8 bits there is not a lot of information lost. However moving to an even lower bit size, you do notice a difference especially for (very) high and low values.

` ````
```import copy
model.eval()
models = [copy.deepcopy(model), copy.deepcopy(model), copy.deepcopy(model)]
# Quant models with 32 (original), 8 and 4 bits
quantization = ["32", "8", "4"]
models_quant = []
for m, quant in zip(models, quantization):
for name, param in m.named_parameters():
param = piecewise_linear_quant(param, bits=int(quant))[0]
quantized_param = torch.nn.Parameter(param)
m.state_dict()[name].copy_(quantized_param)
models_quant.append(m)

Next, we can identify the difference in precision, recall and time for the different quantization methods. Similar as described above, there is little change in performance between 32 and 8 bits, while 4 bits does drastically reduce the model performance. Do note, that the qualitative result show that the results are not exactly the same anymore, and some boxes have changed for better or for worse.

Finally, the time for each smaller bit size reduces slightly. This change is now only subtle most likely due to the large amount of overhead in calculating the performance, i.e. performing the forward pass is only a fraction of the total time.

` ````
```# Calculate precision recall
import time
for i, model in enumerate(models_quant):
size = quantization[i]
print(f"Now showing results for {size} quant model")
start = time.time()
precision, recall = calculate_performance(model, train_loader)
end = time.time()
print("Precision:", precision, "Recall:", recall, "Time:", end-start)
print("")

**Now showing results for 32 quant model**

**Precision: 0.36016655100624567 Recall: 0.001084655365909219 Time: 64.98871850967407**

**Now showing results for 8 quant model**

**Precision: 0.37833827893175076 Recall: 0.0010638830479623512 Time: 64.54551482200623**

**Now showing results for 4 quant model**

**Precision: 0.1421251949209178 Recall: 0.0013851648411294087 Time: 63.691110134124756**

` ````
```# models_quant[1] == 8 bit model
show_tiles(train_loader, model=models_quant[1])

## Conclusion

In this post, we covered the concept and application of quantization in a difficult object-detection task of cows in imagery taken from an aerial platform. The images were taken from three different locations in France, and provided with bounding-box annotations, every location conformed to a training, validation or testing subset. On which a well-known object-detection network, SSD-lite, was trained. The SSD-lite model was trained from random initialization as well as from an Image Net pretrained backbone but did not show significant differences in accuracy. The chosen quantization scheme PWLQ by Fang et al. (2021), does not significantly degrade the accuracy of the model in the higher bit ranges (8 bits), although starts to suffer from higher accuracy losses when quantized to only 4 bits. Sady the inference time does not increase with lower bit ranges, the size of the model itself in memory is reduced by more than half its’ original size.

## References

Alanezi, M. A., Shahriar, M. S., Hasan, M. B., Ahmed, S., Sha’aban, Y. A., & Bouchekara, H. R. E. H. (2022). Livestock Management With Unmanned Aerial Vehicles: A Review. *IEEE Access*, *10*, 45001–45028. https://doi.org/10.1109/ACCESS.2022.3168295

Fang, J., Shafiee, A., Abdel-Aziz, H., Thorsley, D., Georgiadis, G., & Hassoun, J. H. (2020). Post-Training Piecewise Linear Quantization for Deep Neural Networks. *Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)*, *12347 LNCS*, 69–86. https://doi.org/10.1007/978-3-030-58536-5_5

Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2017). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. *Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition*, 2704–2713. https://doi.org/10.1109/CVPR.2018.00286

Mahmud, M. S., Zahid, A., Das, A. K., Muzammil, M., & Khan, M. U. (2021). A systematic literature review on deep learning applications for precision cattle farming. *Computers and Electronics in Agriculture*, *187*, 106313. https://doi.org/10.1016/J.COMPAG.2021.106313

Ryan, M., Isakhanyan, G., & Tekinerdogan, B. (2023). *An interdisciplinary approach to artificial intelligence in agriculture*. https://doi.org/10.1080/27685241.2023.2168568