fixing wobbly AI-generated GIF frames with phase correlation

easygif.lol (previous post) has been using Gemini’s image generation for a while now, and while the results are often ok, sometimes the frames have this annoying jitter - the subject shifts around between frames even though it shouldn’t. makes the animation look wobbly and annoying.

time to fix that.

the problem

when easygif asks Gemini to generate “a cute white bear dancing on a white background” as a sequence of animation frames, each frame is generated somewhat independently. while I am looking into ways to improve that (and make the generation faster/cheaper), it’s what I have. the model tries to keep things consistent, but it’s not perfect:

misalignment: the bear might be centered in frame 1, then shift 10 pixels left in frame 2, then 5 pixels right in frame 3
zoom inconsistency: the bear might appear slightly larger or smaller between frames

the result is a GIF that wobbles and jitters instead of having smooth animation. not good enough.

exploring solutions

I looked into several approaches, keeping in mind that it should not be too slow on my cheap VPS:

phase correlation

uses FFT to find translation offsets between frames in the frequency domain
fast, robust to noise
only handles translation (not rotation/scale)
OpenCV has cv2.phaseCorrelate() built-in

feature-based alignment (ORB + homography)

detect keypoints, match them across frames, compute transformation matrix
can handle translation, rotation, AND scale
more complex, can fail if subject changes appearance

bounding box normalisation

detect subject bounding box in each frame
scale frames to normalise subject size
addresses zoom directly

first attempt - align everything to frame 0

I ran experiments on 10 sessions for which I still had the data… results were mixed.

phase correlation worked reasonably well, but feature-based and bbox normalisation were producing broken images. like, completely distorted. the homography was finding matches but applying wild transformations that made no sense.

my theory is that with actual differences between frames (and especially when comparing to frame 0), features are easily lost and the matcher gets confused

chain to previous frame

while previous results were OK, I tried comparing each frame n to frame n-1 for better jitter detection

Click to see the chained alignment implementation (python)

def align_sequence(self, frames, chain_to_previous=True):
    aligned_frames = [frames[0].copy()]  # First frame unchanged
    
    for i, current in enumerate(frames[1:], start=1):
        if chain_to_previous:
            reference = aligned_frames[i - 1]  # Previous ALIGNED frame
        else:
            reference = frames[0]
        
        aligned, metrics = self.compute_alignment(reference, current)
        aligned_frames.append(aligned)
    
    return aligned_frames, metrics_list

now phase correlation only corrects the incremental jitter between consecutive frames, not the overall position changes from animation.

confidence threshold

manually reviewing possible fixes from the phase correlation showed that the confidence score was a good indicator of, well, how confident I should be that the detected shift should be corrected indeed. applyting a somewhat arbitrary 0.1 threshold did a good job added a threshold check: