Samantha Wu Logo
Samantha Wu
GymVision

GymVision

An AI tool that analyzes gymnastics technique from video using pose detection to estimate movement quality and judging metrics.

Role

  • Product concept
  • Computer vision experimentation
  • AI pipeline prototyping
  • Interface prototyping (Cursor)

Tools

  • Python
  • MediaPipe
  • Cursor
  • Computer Vision Pose Detection

Duration

  • ~1.5 weeks

Why I built this?

As a collegiate gymnast, I noticed that athletes often rely on subjective feedback from coaches when evaluating form and technique. I wanted to explore whether computer vision could automatically analyze gymnastics skills from video and provide objective feedback on execution.

PROBLEM

Challenges in Gymnastics Skill Analysis

  • Judging relies heavily on human observation
  • Small details (split angles, body alignment) affect scoring
  • Video review is time consuming
  • Athletes often struggle to measure progress objectively

Introduces the big idea:

Could computer vision analyze gymnastics technique automatically?

Project Goal

Build a prototype that can:

  • Detect a gymnast performing a split jump
  • Calculate the split angle
  • Provide a numerical result that reflects technique quality

APPROACH

Technical Approach

Pose Estimation:

I used a pose detection model to track key body landmarks (hips, knees, ankles) across each frame. To improve consistency, the system dynamically focuses on the gymnast by cropping and centering the region of interest, while filtering out low-confidence detections.

Split Angle Calculation:

Instead of relying on a single joint angle, I calculated the split using the relationship between both legs and the hips. This allowed the system to more accurately capture variations like uneven splits and oversplits, which are common in gymnastics.

Robustness in Real-World Video:

To handle inconsistent video quality, I built a fallback system that adapts to different conditions. The model retries detection using enhanced versions of the frame and selects the most reliable result. When multiple people appear, the system evaluates several factors (movement, position, pose quality) to consistently track the correct athlete.

Output Generation

The system processes the full video to identify the peak split moment, then outputs:

  • the best frame
  • the calculated split angle
  • a processed video with pose overlays
  • structured data for further analysis

Video Processing Workflow

Video processing workflow: upload, MediaPipe pose detection, maximum split frame, angle calculation, and result display

Prototype Demo

Key Challenges + Solutions

Tracking the Correct Person

Problem: Multiple people on screen confused pose detection. For example, in competition footage, multiple people (athlete, coach, judges, spectators) can appear in frame, which can cause the model to lock onto the wrong pose.

Approach: I developed a multi-factor selection system to consistently track the correct athlete. Instead of relying on a single heuristic, the model evaluates movement, position within the frame, pose visibility, and detection quality to identify the most likely subject.

Key Insight: Simple heuristics (like choosing the largest person) were unreliable. Combining multiple signals led to more stable tracking across frames.

Video Quality

Problem: Low-quality or distant footage reduced pose detection accuracy, especially during fast movements.

Approach: I implemented a fallback system that processes each frame in multiple ways (e.g., contrast adjustments, upscaling) and selects the version with the most reliable landmark detection.

Key Insight: Instead of relying on a single input, adapting dynamically to video quality significantly improved detection consistency.

Frame Selection

Problem: The peak split occurs in a very short moment, making it difficult to reliably capture the correct frame.

Approach: The system analyzes the full video and identifies the frame with the maximum valid split angle, while filtering out low-confidence detections.

Key Insight: A global, frame-by-frame evaluation was more reliable than trying to detect the peak moment in real time.

RESULTS

From a single video, GymVision outputs:

  • a processed video with pose overlay
  • the peak split frame
  • a structured report for analysis

The system identifies:

  • maximum split angle
  • split type (normal, lopsided, oversplit)

At the exact peak frame, it also captures:

  • torso lean
  • knee straightness
  • landmark confidence

Transparency & Debugging

Includes:

  • landmark visibility for key joints
  • number of frames analyzed vs. skipped
  • downloadable outputs (video, image, JSON)
GymVision analysis results dashboard showing split angle, torso lean, knee straightness, landmark confidence, and processing information

Limitations

Performance is strongest on clear, single-athlete footage where key joints remain visible. Crowded scenes, occlusion, or low-resolution videos still challenge pose detection. While the system adapts using cropping and image enhancements, it remains a prototype rather than a fully robust production system.

Summary

GymVision demonstrates an end-to-end pipeline: upload a video once, and receive a synchronized visual and quantitative breakdown of the skill.

REFLECTION

Future Expansions

Expanding to Similar Skills (Short-Term)

The next step is extending the model to analyze skills that share a similar structure to the split jump, such as switch leaps and split ring jumps. These skills rely on similar body mechanics (leg extension, hip positioning, and timing), making them a natural progression for improving the model while reusing much of the existing pipeline.

Generalizing Across Beam Skills (Mid-Term)

Beyond individual leap variations, the system could expand to a broader set of beam elements, including both jumps and acrobatic skills. This would require adapting the model to recognize different movement patterns and defining new metrics for evaluating execution.

Full Routine Analysis (Long-Term Vision)

The long-term goal is to move beyond single-skill analysis and evaluate entire beam routines. This would involve:

  • segmenting routines into individual skills
  • identifying each skill automatically
  • generating a full breakdown of performance across a routine

Ultimately, this could enable a system that provides structured, objective feedback similar to how judges evaluate routines.

Reflection

This project was my first experience working with computer vision, and I approached it without any prior background in the field. Much of the process involved learning through experimentation, testing different approaches, and iterating on what worked in practice.

One of the biggest challenges I encountered was how sensitive computer vision systems are to real-world conditions. Small changes in video quality, camera distance, or background complexity could significantly affect detection accuracy. This highlighted how important it is to design systems that can handle imperfect input, rather than assuming ideal conditions.

I also learned that building AI systems requires more than just applying a model—it involves defining clear rules, metrics, and constraints to translate messy real-world data into meaningful outputs. In this case, even something as simple as identifying a ‘peak split’ required careful decisions about frame selection, validation, and consistency.

More broadly, this project showed me how AI can augment human judgment in sports. While coaching and judging are often subjective, tools like this have the potential to provide more objective, repeatable feedback that can support both athletes and coaches.

Presentation

I also created a short presentation to communicate this concept and prototype.