AI Revolution in Football: One Camera too Rule Them All?
Table of Contents
- AI Revolution in Football: One Camera too Rule Them All?
- The Growing Role of AI in Football
- Democratizing AI: The Quest for a Single-Camera Solution
- WorldPose: A Leap Towards Single-Camera Analysis
- Monocular Pose Estimation: Reconstructing 3D from 2D
- FIFA’s Challenge: Pushing the Boundaries of AI in Sports
- The Devil in the Details: Calibration and Synchronization
- Zooming In on the Problem: Limitations of Current Technology
- The Future of Football Analysis: An Open Invitation
Researchers are striving to make AI-driven football analysis more accessible by developing systems that rely on a single camera, possibly democratizing advanced analytics for all levels of the sport. This could revolutionize how games are analyzed and officiated.
The Growing Role of AI in Football
Artificial intelligence (AI) is increasingly prevalent in professional football, assisting in analyzing player movements and aiding referees in crucial decisions, such as offside calls. The semi-automatic offside recognition (SAOT) system, utilized by Video Assistant Referees (VAR), aims to ensure fairer outcomes by digitally tracking player positions and movements in real-time.[[1]].
Currently, the implementation of such computer-aided systems is largely confined to major tournaments due to the significant costs and complexity involved. These systems typically require ten to twelve synchronized static cameras strategically positioned around the stadium to capture events from multiple angles. According to Tianjian Jiang, a computer science expert, All cameras must be perfectly synchronized so that it results in a precise digital image.
Democratizing AI: The Quest for a Single-Camera Solution
Researchers at the Advanced Interactive Technologies (AIT) Lab of ETH Zurich, in collaboration with FIFA, are exploring innovative technological solutions to broaden access to AI in football. Their primary objective is to simplify the existing systems, reducing the reliance on multiple cameras to just one. This approach could significantly lower the barrier to entry for smaller leagues and clubs.
The rationale behind this approach is that virtually every professional football match is already recorded and broadcast using a single transmission camera positioned along the sideline. Data suggests that approximately 75% of televised game footage originates from this camera. Leveraging this existing resource could dramatically reduce the infrastructure requirements for AI-driven analysis.
WorldPose: A Leap Towards Single-Camera Analysis
While a fully reliable single-camera video analysis system is still several years away, the AIT Lab has achieved a significant milestone by digitizing nearly 50 minutes of video footage from the 2022 World Cup. This thorough dataset, known as WorldPose, contains over 2.5 million individual players in 3D, enabling detailed analysis of player positioning and movements with and without the ball.
In the realm of machine learning, this process is known as pose estimation.Since computers lack the inherent visual understanding of humans, they rely on extensive data to identify and interpret the positions and movements of people and objects within a space. Through continuous training, computers learn to extract meaningful details from image and video data, identifying patterns and making informed decisions. [[2]]. Algorithms empower the machine to learn autonomously, minimizing the need for human programming. [[3]].
Monocular Pose Estimation: Reconstructing 3D from 2D
Existing algorithms can generate three-dimensional representations of objects and bodies directly from two-dimensional images. Monocular Pose Estimation (MPE) allows a computer to discern the location and movement of people and objects using images from a single camera. The computer analyzes posture and movement trajectories without the depth information provided by 3D cameras or multi-camera setups.
While current MPE methods excel at predicting the poses of individual players, they struggle to maintain accurate tracking over extended distances, especially during the dynamic flow of a 90-minute football match. Jiang explains, We wanted to find an algorithm that is precisely enough even at larger distances.
FIFA’s Challenge: Pushing the Boundaries of AI in Sports
In 2021, FIFA partnered with ETH Zurich to create a comprehensive dataset for training AI systems in pose estimation and to evaluate the capabilities of existing MPE methods. FIFA provided video sequences from the 2022 World Cup in Qatar, captured using various camera types, along with detailed measurements of the playing fields.
The three-year project, considered an extensive undertaking in the rapidly evolving field of AI, initially faced unexpected challenges. Jiang recalls, At the beginning we expected to have an exact data record quickly.At the time, we already had a system that could digitally present poses and movements and assumed that it could simply be transferred to the World Cup recordings.
The researchers soon discovered the complexities of scaling the system to handle large datasets, encountering issues such as player occlusion, motion blur, and camera distortions.
The Devil in the Details: Calibration and Synchronization
To ensure accurate alignment between real and digital players, the researchers meticulously calibrated video recordings from multiple static cameras, accounting for variations in focal length, sensor size, and optical distortions. Digital reference lines were overlaid onto the camera images to visually assess the calibration accuracy.
If the calibration is right, the digital field line overlaps perfectly with the real field line – from all perspectives,
Jiang notes, emphasizing the importance of precise calibration.
Using the calibrated parameters from the static cameras,the computer can estimate player poses and movement pathways,representing the digital body using the SMPL model,a standard in computer vision. This data is then used to calibrate the movable transmission camera, accounting for its movements and zoom capabilities.
Zooming In on the Problem: Limitations of Current Technology
Analysis of the WorldPose dataset revealed that existing MPE technologies struggle with the complexities of single-camera analysis, particularly with frequent zooming. While pose estimation performs well in small spaces and with individual movements, it struggles to accurately determine the relative positions of multiple players, especially with camera zoom.
This confirmed that a lot of research is still required for a functioning and stable system,
Jiang concludes.
The Future of Football Analysis: An Open Invitation
To accelerate progress in this field, FIFA has launched an Innovation Challenge, providing the WorldPose dataset and video sequences from transmission cameras to researchers worldwide. The goal is to foster the advancement of algorithms capable of accurate AI analysis using a single movable camera.
By sharing the data with others,this could accelerate research in this area,
Jiang explains. If models that are precisely analyzing with a single camera are one of the same quality as our data record, the technology will be widely usable.
Currently, over 150 researchers from across the globe have registered for the competition. ETH Zurich is also continuing its research, refining the dataset and developing new models. The potential for AI to transform football analysis is immense, promising to enhance officiating, player development, and fan engagement.
