Occlusion in Sports Analytics: Overcoming Occlusion in Object Detection through Annotation and Post-Processing
Object detection is an important aspect of sports analytics, but occlusions, as players or objects collide, and or obstruct, significantly impact accurate detection. In this paper, we propose a new approach to enhance object detection in sports footage, by merging explicit annotation and post-processing techniques. The approach is to annotate locations when occlusion is likely, then a post-processing pipeline identifies overlapping movements and discards occluded frames. By using data from non-occluded frames prior to and just after the occlusion, object positions are interpolated allowing continuity and accuracy. Experimental results demonstrate a significant improvement in detection precision and recall in occlusion-heavy sports scenarios, showing the effectiveness of this approach in real-time sports analytics. The proposed method enhances the robustness of object detection models, making them better suited for dynamic environments where occlusions frequently occur.
INTRODUCTION
In recent years, sports analytics has emerged as a crucial tool for optimizing player performance, refining team strategies, and enhancing the viewing experience. At the core of many sports analytics applications lies object detection, the ability to automatically identify and track players, balls, and other relevant entities in video footage. Whether in football, basketball, or tennis, the capacity to track player movements, analyze teamformations, or monitor ball trajectories offers significant insights for teams, coaches, and spectators alike. However, one persistent challenge in object detection, particularly in the dynamic environment of sports, is occlusion. Occlusion occurs when objects or players overlap or are obscured from the camera's view, resulting in
inaccuracies in detection and tracking algorithms.This paper introduces a novel approach to mitigate the issue of occlusion in object detection within sports analytics. Our proposed solution centres on two key components: explicit annotation of
boundary cases and post-processing of occluded frames. By integrating these techniques, we aim to significantly improve the accuracy and reliability of object detection in sports video, thus enhancing the robustness of these systems in real-time sports analytics applications.
1.1 The Role of Object Detection in Sports Analytics
Object detection is fundamental in sports analytics, playing a pivotal role in tracking player positions, calculating speed and distance, analyzing team formations, and assessing individual performance. Accurate detection enables teams to extract critical insights into game dynamics, evaluate performance metrics, and identify areas for potential injury prevention.The rise of live sports data analytics has further underscored the need for advanced object detection systems. Automated systems that provide real-time statistics on player positions, game momentum, and predictive outcomes are transforming the spectator experience and the tools available to coaches. For example, live heatmaps of player movement in soccer or shot trajectory analysis in tennis provide a more immersive viewing experience, while coaches gain access to data-driven insights that can influence in-game decisions and long-term strategy.Despite significant advances in deep learning and computer vision technologies, occlusions remain a major challenge in object detection systems, leading to inaccuracies that can compromise the quality of analytics.
RELATED WORK
Object detection becomes critical in the applications like tracking players and the ball in games in which visual occlusion becomes one of the major challenges. From advanced machine learning, computer vision, and object detection algorithms, researchers have developed numerous techniques to handle occlusions.We review here relevant works that have been published for occlusion handling in the scope of sports analytics using object detection, all those approaches using deep learning, tracking algorithms, multi-view systems, as well as real-time applications.Okihisa UTSUMI et al.presents a novel method for object detection and tracking in soccer broadcasts, utilizing color rarity and local edge properties. The approach includes field region extraction, noise reduction with a Laplacian or Gradient filter, and player tracking through color-based template matching, achieving high accuracy for non-occluded players while highlighting challenges with occlusions and fast camera movements.
Noor Ul Huda et al. addresses the challenges of counting soccer players in occluded scenarios using thermal cameras, leveraging machine learning for classification and max likelihood estimation to enhance accuracy. The methodology involves simulating player positions and occlusions to train a bagged tree classifier, marking a significant advancement in outdoor player detection methods compared to previous indoor-focused research.
Karakostas et al. (2021) suggest a context-aware method in handling occlusion for object tracking, which can be applied to sports analytics. Using contextual information regarding the scene, this paper "Occlusion Detection and Drift-Avoidance Framework for 2D Visual Object Tracking" deals with occlusions and develops a framework that makes occlusion detections through spatial relations among objects as well as predicts the occurrence of occlusion events based on movement patterns of the game actors and other objects in the scene. Using context, such as player formations and motion trajectories, the system shall predict the likelihood and manner of occlusions so that superior tracking performance is attained at the occlusion periods.
One of the significant issues in sports analytics is the identification of the ball. It is hard due to the environment and players' occlusions. In reality, according to Rezaei and Wu, the detection of the ball is much tougher in broadcast soccer videos because the size of the ball is small, its movement is fast, the possession time of players becomes long, etc. (Rezaei & Wu, 2022). Naik and Hashmi believe in this argument; they state that the existing methodologies do not detect the ball with sufficient accuracy while making high-velocity movements and under occlusion conditions (Naik & Hashmi, 2022). Abulwafa et al. have also iterated the fact that occlusions are not only challenging but also present poor lighting and low color contrast, which are major inhibitors to detect the balls (Abulwafa et al., 2021). Zhu and Peng, who, in the paper "A Boosted Multi-Task Model for Pedestrian Detection with Occlusion Handling" presented a study conducted in 2015. Despite having pedestrian detection as its purpose, the principles apply exceptionally well to sports analytics.The authors propose a multi-task model that addresses occlusions by learning relationships between occluded and non-occluded samples, thereby improving detection performance even in heavily occluded scenes. It may be utilized in application to sports analytics in tracking players at times of partial or entire occlusion by other players or obstacles like goal posts or referees.
PROPOSED WORK
In this section, we discuss the detailed methodology for overcoming occlusions in object detection in the context of sports analytics. Our method is therefore based on two key techniques: explicit annotation of boundary cases and post-processing through temporal interpolation. This two-stage process will instead strive to minimize the errors arising from occlusions, which occur most frequently when players overlap or block each other from the camera's view. Our solution is therefore meant to improve the accuracy of object detection systems, thus coming in handy when used in dynamic and fast-paced environments such as team sports.2.1 Explicit Boundary Case Annotation
The first step in our proposed method is the explicit annotation of boundary cases where occlusion is likely or certain to take place. The importance here lies in the fact that it informs the system that some frames contain instances that are prone to occlusion and need special handling in the object detection process.- 2.1.1 Identify Boundary Cases: This phase thus acts as a primary goal of manually annotating frames in occlusion locations. In sports analytics, occlusions often occur predictively in certain scenarios; the first is when multiple players group together near the ball or near the basket in basketball where player congestion is frequent. Other predictable scenarios include set-pieces like free kicks or corner kicks in soccer.The manual identification of boundary cases takes several factors into account: Player Proximity: Where players are close together, frames are bound to contain occlusion. In soccer or basketball, for example, action is frequently built up by positioning many players in small areas of the pitch or court. These frames are marked as boundary cases.
Object Overlap: When a player or object such as the ball is likely to be occluded by another player wholly or partially, the frame is annotated. For instance, a soccer player dribbling the ball and is being closely followed by another player means that the ball may be occluded. These are typical scenarios in dynamic environments for sports.
Game Context: Certain events in a game are more likely to lead to occlusions. For example, free kicks, corners, or scrambles near the goal in soccer often lead to overlapping players. Similarly, fast breaks or congested plays near the rim in basketball frequently cause occlusions. By annotating these situations as boundary cases, the system is preemptively prepared to handle potential occlusions.
2.1.2 Manual Annotation Process: Once the boundary cases have been found, human annotators will screen the video material and manually mark the frames. This process requires that an annotator be well-aware both of the sport and of the subtleties of object detection, so that possibly highly occluded frames are marked correctly.Each frame will have additional meta-data on top specifying the type of occlusion to expect (e.g., player-to-player occlusion, player-to-ball occlusion), the degree of occlusion (partial or full), and other contextual information that may help the post-processing step.Although manual annotation is labor-intensive, it has several benefits:
Accuracy in Challenging Cases: Good quality manual annotation also ensures that the system recognizes all complex occlusion scenarios properly. Automatic detection systems often fail to recognize a nuanced occlusion or situations when the detection was ambiguous. Informing Model Training: The annotated data can be used to train models that learn to identify occlusions in future datasets, potentially automating parts of the process in the long term.
The manual annotation phase lays the groundwork for the post-processing phase, where the actual object detection is carried out with the explicit knowledge of occlusion-prone frames.
2.1.3 Addressing Basic Image Limitations: While this algorithm performs well in complex, dynamic situations such as sports, it fails with simple or abstract images where occlusion is rare or not present. In simple images, object interactions are minimal, resulting in fewer boundary cases and reduced predictive accuracy for occlusions.For instance, in static images or synthetic data, where objects are placed on plain backgrounds, explicit annotation becomes less of a necessity. Therefore, our method can be most productive in environments where real-world dynamics, like movements and interactions among players, often cause occlusions.
2.2 Post-Processing of Occluded Frames
The second phase of our proposed solution is post-processing, which addresses occlusion in the detection pipeline by leveraging temporal data from non-occluded frames. The goal is to discard unreliable detection results from occluded frames and then reconstruct object positions using data from before and after the occlusion event.- 2.2.1 Occlusion Detection and Discarding: Once the object detection algorithm runs against the annotated frames, the system will detect and remove the occluded frames based on the manual annotations and automated overlap detection. It follows these steps:
Detection of Overlap: The system tracks the movement of an object from one frame to another. When there is overlap or occlusion of objects, for example, players, it flags the video frames as unreliable. Detection of overlap relies on comparing the position and bounding boxes of objects between consecutive frames. In the case of overlap sharing a large percentage of the objects' bounding box, the system marks the frame as containing occlusion.
Occlusion Discarding: The system removes the identified occluded frames temporarily from the detection pipeline. It is an important step because most of the detections in occluded frames are false positives or missed detections. The system thus avoids bringing in errors into tracking data by discarding occluded frames.
2.2.2 Temporal Interpolation for Occluded Frames: To deal with the lost data due to the discarded frames, the post-processing makes use of the temporal interpolation. The underlying strategy is to fill in the lost positions of occluded objects by making use of the data of the non-occluded neighboring frames. This maintains continuity for the object detection system even though parts of the data may not be available because of occlusion.
Linear Interpolation: This estimates the occluded object by supposing its motion follows a straight line between its position before and after the time of occlusion. This is the simplest kind of method, but for a short occlusion with relatively smooth and predictable movement of the object, it's workable.
2.2.3 Application to Sports Footage: To test the effectiveness of this post-processing method, we applied it to sports footage, specifically soccer and basketball, where occlusions are common because of the number of players on the field and the playing nature. The system tracked both players and the ball in several different occlusion-heavy scenarios, including corner kicks, free throws, and congested plays near the basket.
Player Tracking: In soccer, players frequently occlude each other during corner kicks, free kicks, or when multiple players converge on the ball. By discarding occluded frames and using interpolation, the system maintained accurate player positions throughout the sequence.In sports analytics, the interpolation methods decreased the false detections and smoothed out the tracking data, with both methods showing the advantages of post-processing in dealing with occlusion.
Experimental Results and Discussion
We report here experimental results testing our proposed method with respect to the occlusions in the object detection task in sports analytics. We experimented on two substantially different datasets: one is dynamic sports footage, for example, soccer and basketball game streams, where occlusions are pretty frequent, and the other are only basic images with negligible object interaction, where occlusions are rarely noticeable. Overall, the results demonstrate large improvements over the state of affairs in very occlusion-heavy environments while also revealing weaknesses within our approach in more simple contexts.3.1 Results on Sports Dataset
We will use a sports dataset which will contain soccer and basketball footage annotated with occlusion-prone scenarios; then over several key metrics, our method will be tested which is explicit annotation of occluded frames followed by post-processing.- 3.1.1 Precision, Recall, F1 score: Among the most basic metrics used for object detection system performance evaluation, these are precision and recall. Improvement was attained to be very significant for both the metrics, especially in scenarios where occlusions become harder to deal with. Making explicit the frames containing occlusion and post-processing helped improve better recovery of objects occluded through reduction in false positives.
Accuracy: Our approach produced an improvement in average precision of 12.5% for different occlusion cases. In addition, the system drastically brought down false alarms once it rejected detections on which it could not depend in case of occlusion within the frame. Improved Recall. Our interpolation mechanism was able to recover from the temporary occlusions. Since our interpolation mechanism was able to produce accurate location prediction of players and objects, including the ball, even in non-observable cases in an occlusion event by using the information collected in surrounding frames, we obtained average improvements in recall up to 15.8%.
For example, in soccer, a corner kick would be something like: In complex sports animation shots, such as corner kicks and free kicks, there was much occlusion because many players converged to the ball. In basketball players who overlap beside the basket often hide critical parts of the scene, such as the ball or important players. In all of these challenging situations, the postprocessing method generated a smooth and correct detection stream.
F1 Score- It calculates the balance of both precision and recall; it increases on average by 14.1%. That is indeed the overall efficiency of our method when dealing with occlusions, which ensures constant detection of an object.
3.1.2 Occlusion Recovery Metric (ORM): Finally, we introduced the Occlusion Recovery Metric, which quantified how well our system could recover object positions during occlusion events: in short, how well did it recover the whereabouts of occluded objects by interpolating their positions and comparing with annotations of manually ground-truthed data?. ORM : The mean ORM was set to 85.4% for comparison, which indicates that the estimated positions were quite close to the true positions of occluded objects. It provides some means to measure the capability of the system in keeping reliable object detection in occlusion events, where the latter information is not directly available.
Before Occlusion Handling
After Occlusion Handling
3.2 Discussion
Our experiments' results show the efficiency of the suggested approach in the case when increasing the accuracy of object detection in sports analytics with frequent and unavoidable occlusions is obligatory. Explicit annotations of occlusion-sensitive frames along with discarding unreliable detections diminish the false positives to be computed otherwise. Recall becomes severely higher with occlusion interpolation-meaning that the correct objects might be occluded, but are still detected.The experiments also demonstrate how context-dependent ratings of object detection systems are. In particular, in domains such as sports where occlusion is predominant, methods like ours need to be preserved in order to ensure the accuracy and reliability in the object detection methods used. With static or rather relatively simple image datasets where occlusions are rarer, our method has little advantageScope and Observation
In this paper we addressed the problem of occlusion in object detection within sports analytics, where the dynamic environments of the squash and basketball games introduce occlusions most of the time. Due to overlap between players or objects, occlusions can occur frequently. We examine boundary case annotation explicitly as well as post-processing in an attempt to improve the detection accuracy in regimes where classical methods are not particularly accurate. The paper shows how annotations of frames with characteristics of liable occlusions like crowded plays enhanced the capability of dealing with occlusions by discarding the unreliable frames and then filling the missing part by using temporal interpolation in order to keep continuity of objects.
As a result, experimental results indicate that precision and recall significantly increase in occlusion-heavy situations while the ORM indicates 85.4% in terms of recovery accuracy. Such an approach, despite its effectiveness in complex sports settings, when used in the same contexts, provides relatively minor benefits for simpler ones with very minimal interaction. Advanced models like the RCNN might be the subject of future work to further enhance occlusion handling. Overall, this is a good solution in the sense that it can easily be used to develop efficient object detection systems with accuracy and reliability in real-time sports analytics.
CONCLUSION AND FUTURE WORK
Here, we introduce a novel approach to address occlusions in object detection for sports analytics by taking explicit advantage of the annotation of frames that are prone to occlusion and a post-processing interpolation mechanism from which occluded objects may be recovered with higher accuracy. Experiments carried out on dynamic content from sports such as soccer and basketball reveal clear improvements in precision, recall, and F1 score relative to more traditional approaches to detection. It also did a pretty good job in cases which show too many occlusions, such as corner kicks in soccer and congested zones inside the basketball basket, where significant object overlaps are expected.
Our experiments actually lead toward a promising outcome: the method that we advance increases detection performance due to decreases in false positives and improved recovery of occluded objects. We designed and presented the Occlusion Recovery Metric to test how correct our mechanism of interpolation was. It scored pretty high, showing that the method could "fill in the gaps" during the occurrence of occlusion events, thus ensuring credible tracking for players and objects like the ball. That it seems also pretty robust at doing occlusion-rich scenes, thereby meaning robustness in a changing environment for sports events. However, when applied to some simple static images with minimal object interaction and rare occlusions wherein more conventional object detection algorithms perform quite well, its contribution is really quite very limited.
Future Work: Investigating RCNN as an Occlusion Approach
Although YOLO is effective in real-time object detection in sports analytics, future research could be conducted using RCNN to improve the performance in handling occlusions. The region proposals of RCNN and instance segmentation of Mask RCNN predict objects with partial occlusions much better, especially in crowded sports scenarios such as soccer or basketball. However, the RCNN model is just a tad too slow for real-time applications. Future work should be concentrated on further optimization of models like RCNN through pruning or GPU acceleration of algorithms or hybrid approaches combining high speed with YOLO's precision. Additional improvement in detecting and tracking scenarios in dynamic sports scenarios can be achieved by incorporating temporal models and deep learning into occlusion prediction.
- Karakostas, I., Mygdalis, V., Tefas, A., & Pitas, I. (2021). Occlusion Detection and Drift-Avoidance Framework for 2D Visual Object Tracking. Signal Processing: Image Communication.
- Zhu, C., & Peng, Y. (2015). A Boosted Multi-Task Model for Pedestrian Detection with Occlusion Handling. IEEE Transactions on Image Processing.
- Abulwafa, A., Saleh, A., Saraya, M., & Ali, H. (2021). A new ball detection strategy for enhancing the performance of ball bees based on a fuzzy inference engine. International Journal of Intelligent Systems, 37(11), 9620-9654.
- Naik, B., Hashmi, M., Geem, Z., & Bokde, N. (2022). Deepplayer-track: player and referee tracking with jersey color recognition in soccer. Ieee Access, 10, 32494-32509. Rezaei, A. andWu, L. (2022). Automated soccer head impact exposure tracking using video and deep learning. Scientific Reports, 12(1).
- Rezaei, A. and Wu, L. (2022). Automated soccer head impact exposure tracking using video and deep learning. Scientific Reports,
- Noor Ul Huda, Kasper H. Jensen, Rikke Gade and Thomas B. Moeslund. Estimating the Number of Soccer Players using Simulation-based Occlusion Handling.
- Okihisa UTSUMI,Koichi MIURA, Ichiro IDE, Shuichi SAKAI, Hidehiko TANAKA.AN OBJECT DETECTION METHOD FOR DESCRIBING SOCCER GAMES FROM VIDEO.
References
Australia
470 St Kilda Rd
Melbourne Vic 3004
USA
Venture X, 2451 W Grapevine Mills Cir,
Grapevine, TX 76051, United States
Netherlands
Landfort 64. Lelystad 8219AL
Canada
4025 River Mill Way, Mississauga, ON L4W 4C1, Canada
India
4A, Maple High Street, Hoshangabad Road, Bhopal, MP.