State-of-the-art

A wide range of approaches have come up for vision based robotic fruit harvesting task and to get a holistic view of the trend, the state of the art works are categorized based on the problems they aim to solve during grasping and harvesting with robots:

  1. Occlusion-related works [Ref.] [Ref.] [Ref.] [Ref.]

  2. Segmentation improvement-related works [Ref.] [Ref.] [Ref.] [Ref.] [Ref.] [Ref.]

  3. Localization improvement-related works [Ref.] [Ref.] [Ref.]

  4. Novel architectures or approaches [Ref.] [Ref.]

Occlusion is a difficult problem to tackle and the trend in recent works is to estimate the shape of the hidden portion of fruit by processing point cloud data. The visibility of only a few portions of the fruit has been a challenge in the works. False detections may lead to fruit and gripper damage and there is a need and scope for robust detection and damage-free grasping. If a computer vision model is incorporated that gives the information for multiple grasping points like the CenterNet model in [Ref.] uses one point for the whole object, it would increase the chances of safe and successful grasping. In the segmentation improvement works, the trend is towards utilizing the information of the surroundings or determining the fruit axis. These type of approaches have attempted to incorporate tree branch or stem information, nonetheless, it does not apply to all the cases and requires considerable data annotation time and effort for individual fruits. Moreover, it is a time-intensive task to label branches in hundreds or thousands of images. Transformer-based networks are also utilized for the generation of masks.

The instance-segmented masks are more in use than semantic-segmentated masks in the related works, as individual fruit masks are detected. Another network for point cloud processing is used to identify the grasping pose as the quality of masks are affected by obstruction from leaves, lighting conditions, surrounding fruits, branches, etc. Whereas in the object detection-related works, the models generate the estimated rectangular bounding box for fruit and additional features are integrated for grasping points like bottom or stem. If all the points inside bounding boxes are used for point cloud filtering like that in the instance segmentation improvement works for getting fruit points, the background points and noise would also be added and the instance segmentation networks are frequently used over the object detection model in the recent works. The analysis of works tells that a suitable approach for robotic grasping should be focused on certain distinct and identifiable key features of fruit, that can help in making a quick judgment. Keeping in mind that occlusion is inevitable, there is a need to form an approach that focuses on the available information and how it can be molded together to map the fruit shape accurately to make the robot grasping task easier. For instance, fruit center, stem, etc. are the identifiable features and should be sufficient to judge the size in most of the situations. Therefore, an approach focusing on incorporating key features of fruit would be a feasible solution and it could be generalized and simplify robot harvesting.

Author, year

Methodology

Data Type

Key Innovations

Pros

Cons

Harvest / Grasp fruits

Li et al. (2022)[Ref.]

Occlusion work around

RGB-D

Frustum point cloud fitting

Robust against occlusion

Structured farm testing

Yes, Apples

Gong et al. (2022) [Ref.]

Occlusion work around

RGB-D, Infrared

Reconstruction with CNNs

Restoration of shape

Collision

Yes, Tomatoes

Menon et al. (2022) [Ref.]

Occlusion work around

RGB-D

Reconstruction with software

Less manual touch

Complicated

Yes, Sweet peppers

Liu et al. (2022) [Ref.]

Occlusion work around

RGB

Key point estimation

Circular Bounding boxes

Not tested on robot

Yes, Tomatoes

Yan et al. (2023) [Ref.]

Segmentation improvement

RGB-D

Transformer segmentation

Stem & Grasping key points

Not tested on robot

Yes, Pumpkin

Kang et al. (2020)[Ref.]

Segmentation improvement

RGB-D

Dasnet

Fruit & branches segmentation

Obstruction from leaves

Yes, Apples

Kang et al. (2020) [Ref.]

Segmentation improvement

RGB-D

Mobile-Dasnet and PointNet

Robust fruit point cloud

Obstruction from other fruits

Yes, Apples

Kang et al. (2021) [Ref.]

Segmentation improvement

RGB-D

YOLACT (You Only Look at Coefficients) & PointNet

Robust fruit point cloud

Tested in structured farm

Yes, Apples

Lin et al. (2019)[Ref.]

Segmentation improvement

RGB-D

Branch normals prediction

Fruit axis estimation

Occlusion affected results

Yes, Guava

Lin et al. (2019) [Ref.]

Segmentation improvement

RGB-D

Gaussian Mixture Models

Adaptable for multi fruits

Not tested on robot

Yes, Citrus fruits

Yu et al. (2020)[Ref.]

Object detection

RGB-D

Oriented Bounded Boxes

Stem orientation

False detections

Yes, Strawberries

Onishi et al. (2019)[Ref.]

Object detection

RGB-D

Underside grasping

Damage free grasping

Vertical orientation only

Yes, Apples

Chen et al. (2022)[Ref.]

Object detection

RGB

Vision based impedance

Damage free grasping

Planar surface grasping

No, Apples, Oranges

Lin et al. (2023) [Ref.]

Grasping rectangle proposals

RGB

Shape approximation

Work for unseen objects

Planar surface grasping

No, Banana

Chen et al. (2023) [Ref.]

Reinforcement learning

RGB-D

Soft Actor-Critic(SAC) algorithm

Learning in simulation

Planar surface grasping

No, Banana

Table: Literature review