State-of-the-art

A wide range of approaches have come up for vision based robotic fruit harvesting task and to get a holistic view of the trend, the state of the art works are categorized based on the problems they aim to solve during grasping and harvesting with robots:

Occlusion-related works [Ref.] [Ref.] [Ref.] [Ref.]
Segmentation improvement-related works [Ref.] [Ref.] [Ref.] [Ref.] [Ref.] [Ref.]
Localization improvement-related works [Ref.] [Ref.] [Ref.]
Novel architectures or approaches [Ref.] [Ref.]

Occlusion is a difficult problem to tackle and the trend in recent works is to estimate the shape of the hidden portion of fruit by processing point cloud data. The visibility of only a few portions of the fruit has been a challenge in the works. False detections may lead to fruit and gripper damage and there is a need and scope for robust detection and damage-free grasping. If a computer vision model is incorporated that gives the information for multiple grasping points like the CenterNet model in [Ref.] uses one point for the whole object, it would increase the chances of safe and successful grasping. In the segmentation improvement works, the trend is towards utilizing the information of the surroundings or determining the fruit axis. These type of approaches have attempted to incorporate tree branch or stem information, nonetheless, it does not apply to all the cases and requires considerable data annotation time and effort for individual fruits. Moreover, it is a time-intensive task to label branches in hundreds or thousands of images. Transformer-based networks are also utilized for the generation of masks.

The instance-segmented masks are more in use than semantic-segmentated masks in the related works, as individual fruit masks are detected. Another network for point cloud processing is used to identify the grasping pose as the quality of masks are affected by obstruction from leaves, lighting conditions, surrounding fruits, branches, etc. Whereas in the object detection-related works, the models generate the estimated rectangular bounding box for fruit and additional features are integrated for grasping points like bottom or stem. If all the points inside bounding boxes are used for point cloud filtering like that in the instance segmentation improvement works for getting fruit points, the background points and noise would also be added and the instance segmentation networks are frequently used over the object detection model in the recent works. The analysis of works tells that a suitable approach for robotic grasping should be focused on certain distinct and identifiable key features of fruit, that can help in making a quick judgment. Keeping in mind that occlusion is inevitable, there is a need to form an approach that focuses on the available information and how it can be molded together to map the fruit shape accurately to make the robot grasping task easier. For instance, fruit center, stem, etc. are the identifiable features and should be sufficient to judge the size in most of the situations. Therefore, an approach focusing on incorporating key features of fruit would be a feasible solution and it could be generalized and simplify robot harvesting.

Author, year	Methodology	Data Type	Key Innovations	Pros	Cons	Harvest / Grasp fruits
Li et al. (2022)[Ref.]	Occlusion work around	RGB-D	Frustum point cloud fitting	Robust against occlusion	Structured farm testing	Yes, Apples
Gong et al. (2022) [Ref.]	Occlusion work around	RGB-D, Infrared	Reconstruction with CNNs	Restoration of shape	Collision	Yes, Tomatoes
Menon et al. (2022) [Ref.]	Occlusion work around	RGB-D	Reconstruction with software	Less manual touch	Complicated	Yes, Sweet peppers
Liu et al. (2022) [Ref.]	Occlusion work around	RGB	Key point estimation	Circular Bounding boxes	Not tested on robot	Yes, Tomatoes
Yan et al. (2023) [Ref.]	Segmentation improvement	RGB-D	Transformer segmentation	Stem & Grasping key points	Not tested on robot	Yes, Pumpkin
Kang et al. (2020)[Ref.]	Segmentation improvement	RGB-D	Dasnet	Fruit & branches segmentation	Obstruction from leaves	Yes, Apples
Kang et al. (2020) [Ref.]	Segmentation improvement	RGB-D	Mobile-Dasnet and PointNet	Robust fruit point cloud	Obstruction from other fruits	Yes, Apples
Kang et al. (2021) [Ref.]	Segmentation improvement	RGB-D	YOLACT (You Only Look at Coefficients) & PointNet	Robust fruit point cloud	Tested in structured farm	Yes, Apples
Lin et al. (2019)[Ref.]	Segmentation improvement	RGB-D	Branch normals prediction	Fruit axis estimation	Occlusion affected results	Yes, Guava
Lin et al. (2019) [Ref.]	Segmentation improvement	RGB-D	Gaussian Mixture Models	Adaptable for multi fruits	Not tested on robot	Yes, Citrus fruits
Yu et al. (2020)[Ref.]	Object detection	RGB-D	Oriented Bounded Boxes	Stem orientation	False detections	Yes, Strawberries
Onishi et al. (2019)[Ref.]	Object detection	RGB-D	Underside grasping	Damage free grasping	Vertical orientation only	Yes, Apples
Chen et al. (2022)[Ref.]	Object detection	RGB	Vision based impedance	Damage free grasping	Planar surface grasping	No, Apples, Oranges
Lin et al. (2023) [Ref.]	Grasping rectangle proposals	RGB	Shape approximation	Work for unseen objects	Planar surface grasping	No, Banana
Chen et al. (2023) [Ref.]	Reinforcement learning	RGB-D	Soft Actor-Critic(SAC) algorithm	Learning in simulation	Planar surface grasping	No, Banana

Table: Literature review