State-of-the-art¶
A wide range of approaches have come up for vision based robotic fruit harvesting task and to get a holistic view of the trend, the state of the art works are categorized based on the problems they aim to solve during grasping and harvesting with robots:
Occlusion is a difficult problem to tackle and the trend in recent works is to estimate the shape of the hidden portion of fruit by processing point cloud data. The visibility of only a few portions of the fruit has been a challenge in the works. False detections may lead to fruit and gripper damage and there is a need and scope for robust detection and damage-free grasping. If a computer vision model is incorporated that gives the information for multiple grasping points like the CenterNet model in [Ref.] uses one point for the whole object, it would increase the chances of safe and successful grasping. In the segmentation improvement works, the trend is towards utilizing the information of the surroundings or determining the fruit axis. These type of approaches have attempted to incorporate tree branch or stem information, nonetheless, it does not apply to all the cases and requires considerable data annotation time and effort for individual fruits. Moreover, it is a time-intensive task to label branches in hundreds or thousands of images. Transformer-based networks are also utilized for the generation of masks.
The instance-segmented masks are more in use than semantic-segmentated masks in the related works, as individual fruit masks are detected. Another network for point cloud processing is used to identify the grasping pose as the quality of masks are affected by obstruction from leaves, lighting conditions, surrounding fruits, branches, etc. Whereas in the object detection-related works, the models generate the estimated rectangular bounding box for fruit and additional features are integrated for grasping points like bottom or stem. If all the points inside bounding boxes are used for point cloud filtering like that in the instance segmentation improvement works for getting fruit points, the background points and noise would also be added and the instance segmentation networks are frequently used over the object detection model in the recent works. The analysis of works tells that a suitable approach for robotic grasping should be focused on certain distinct and identifiable key features of fruit, that can help in making a quick judgment. Keeping in mind that occlusion is inevitable, there is a need to form an approach that focuses on the available information and how it can be molded together to map the fruit shape accurately to make the robot grasping task easier. For instance, fruit center, stem, etc. are the identifiable features and should be sufficient to judge the size in most of the situations. Therefore, an approach focusing on incorporating key features of fruit would be a feasible solution and it could be generalized and simplify robot harvesting.
Author, year |
Methodology |
Data Type |
Key Innovations |
Pros |
Cons |
Harvest / Grasp fruits |
---|---|---|---|---|---|---|
Li et al. (2022)[Ref.] |
Occlusion work around |
RGB-D |
Frustum point cloud fitting |
Robust against occlusion |
Structured farm testing |
Yes, Apples |
Gong et al. (2022) [Ref.] |
Occlusion work around |
RGB-D, Infrared |
Reconstruction with CNNs |
Restoration of shape |
Collision |
Yes, Tomatoes |
Menon et al. (2022) [Ref.] |
Occlusion work around |
RGB-D |
Reconstruction with software |
Less manual touch |
Complicated |
Yes, Sweet peppers |
Liu et al. (2022) [Ref.] |
Occlusion work around |
RGB |
Key point estimation |
Circular Bounding boxes |
Not tested on robot |
Yes, Tomatoes |
Yan et al. (2023) [Ref.] |
Segmentation improvement |
RGB-D |
Transformer segmentation |
Stem & Grasping key points |
Not tested on robot |
Yes, Pumpkin |
Kang et al. (2020)[Ref.] |
Segmentation improvement |
RGB-D |
Dasnet |
Fruit & branches segmentation |
Obstruction from leaves |
Yes, Apples |
Kang et al. (2020) [Ref.] |
Segmentation improvement |
RGB-D |
Mobile-Dasnet and PointNet |
Robust fruit point cloud |
Obstruction from other fruits |
Yes, Apples |
Kang et al. (2021) [Ref.] |
Segmentation improvement |
RGB-D |
YOLACT (You Only Look at Coefficients) & PointNet |
Robust fruit point cloud |
Tested in structured farm |
Yes, Apples |
Lin et al. (2019)[Ref.] |
Segmentation improvement |
RGB-D |
Branch normals prediction |
Fruit axis estimation |
Occlusion affected results |
Yes, Guava |
Lin et al. (2019) [Ref.] |
Segmentation improvement |
RGB-D |
Gaussian Mixture Models |
Adaptable for multi fruits |
Not tested on robot |
Yes, Citrus fruits |
Yu et al. (2020)[Ref.] |
Object detection |
RGB-D |
Oriented Bounded Boxes |
Stem orientation |
False detections |
Yes, Strawberries |
Onishi et al. (2019)[Ref.] |
Object detection |
RGB-D |
Underside grasping |
Damage free grasping |
Vertical orientation only |
Yes, Apples |
Chen et al. (2022)[Ref.] |
Object detection |
RGB |
Vision based impedance |
Damage free grasping |
Planar surface grasping |
No, Apples, Oranges |
Lin et al. (2023) [Ref.] |
Grasping rectangle proposals |
RGB |
Shape approximation |
Work for unseen objects |
Planar surface grasping |
No, Banana |
Chen et al. (2023) [Ref.] |
Reinforcement learning |
RGB-D |
Soft Actor-Critic(SAC) algorithm |
Learning in simulation |
Planar surface grasping |
No, Banana |
Table: Literature review