Summary

Multiple key points: top, center, bottom, left, and right have been used to estimate the shape of the fruit and a provision has been made to harvest the fruit from either the center or bottom key points based on high detection accuracy, which has enabled to perform fruit harvesting in any view. YOLOv8 pose model has been trained on public and custom datasets and it has been evaluated with pre-trained and custom-trained YOLOv8 object detection models.

YOLOv8 pose has performed better than the baseline models in terms of detection accuracy and ensuring a stable grasp. The model has a higher harvesting success percentage as compared to the baseline models when testing is conducted for similar targets and situations. Due to limited resources, small weights have been used to reduce the training time and inference time for the harvesting task. The approach has been tested and evaluated successfully with the robot setup in the laboratory on the orange plant. The following are the benefits of the proposed approach:

The approach detects the key points with the fruits in a single stage and lesser effort is required as compared with two-stage approaches, which perform the detection of fruit first and then use the detections to isolate the point cloud of fruits.
The fruits with bottom and center key points are proposed for goal position with the proposed methodology and there is no dependency on any fixed side or bottom view.
The methodology has performed better as compared with the object detection model based approach.

The following are the limitations of the proposed approach:

YOLOv8 pose model detects only one class. It is not possible to train one model for multiple classes of fruits.
The methodology with the setup has been tested for small region movements due to the working range limits of the camera and geometry constraints of the arm.
The methodology has been tested on a small plant and requires testing on a tree or on a farm where the occlusion and obstruction from branches, and leaves are higher.

Contribution

The key point-based methodology has been present in the fruit harvesting works, and most of them revolve around determining the mask and estimating the centroid or fixed pixel distance and so on. Multiple key point detections with computer vision models have been tested before for estimating fruit and vegetable shape, for instance, with Detectron2 [Ref.], however, they have not been deployed in the scope of fruit harvesting with robots. The latest version of the human pose estimation model from Ultralytics, YOLOv8 pose, has been tuned for key points detection in oranges directly without any post-processing. The devised approach estimates the shape based on averaging out the distance between key points to approximate the fruit and ensures that the fruit whether visible in the side view, bottom view, or any other view, the robot arm could approach the fruit and perform the harvesting task successfully.

Future works

The key point estimation model-based approach has been tested and compared with the object detection models. The methodology could be evaluated with the point cloud-based approaches or with segmentation model-based approaches. Testing in the farm and making the complete process autonomous is the next step, with the mobile base navigating in a prescribed area on a map or with outdoor navigation with the Global Position System (GPS), height adjustment of the robot arm with prismatic joint setup, and performing the fruit harvesting. A possible improvement in the approach could be the use of large NN weights, which on one hand, would stabilize the detections and would require higher computation resources.