DreamM🐼ver: Leveraging the Prior of Diffusion Models
for Image Interpolation with Large Motion

ECCV,2024

Liao Shen¹, Tianqi Liu¹, Huiqiang Sun¹, Xinyi Ye¹, Baopu Li¹, Jianming Zhang², Zhiguo Cao^1*,

¹Huazhong University of Science and Technology ²Adobe Research

Abstract

We study the problem of generating intermediate images from image pairs with large motion while maintaining semantic consistency. Due to the large motion, the intermediate semantic information may be absent in input images. Existing methods either limit to small motion or focus on topologically similar objects, leading to artifacts and inconsistency in the interpolation results. To overcome this challenge, we dig into the pre-trained image diffusion models for their capabilities in semantic representations and generations, which ensures consistent expression of the absent intermediate semantic representations with the input. To this end, we propose DreamMover, a novel image interpolation framework with three main components: 1) A natural flow estimator based on the diffusion model that can implicitly reason about the semantic correspondence between two images. 2) To avoid loss of detailed information during fusion, our key insight is to fuse information in two parts, high-level space and low-level space. 3) To enhance the consistency between the generated images and input, we propose the self-attention concatenation and replacement approach. Lastly, we present a challenging benchmark dataset called InterpBench to evaluate the semantic consistency of generated results. Extensive experiments demonstrate the effectiveness of our method. Code will be released soon.

Method

Given two input images, each is fed into the pre-trained diffusion model for adding noise via DDIM inversion. We extract feature maps from up-blocks of U-Net, and leverage them to get the pixel correspondences of the two images to yield the bidirectional optical flow. We decompose the noisy latent code into low-level and high-level components. To maintain the high-frequency information of images, we perform softmax splatting and time interpolation on high-level space. As for low-level space, we replace all weighted average operations with "Winner-Takes-All"(WTA). In addition, we propose a novel self-attention concatenation and replacement method for consistency. Finally, our method can generate a sequence of high fidelity and consistency interpolation frames.

Visualization Comparison

More Visualization Comparison of baselines and our method. We show the middle-most image obtained by all methods. Our approach generates intermediate results that maintain the best semantic consistency.

DreamM🐼ver: Leveraging the Prior of Diffusion Models
for Image Interpolation with Large Motion

ECCV,2024

Liao Shen¹, Tianqi Liu¹, Huiqiang Sun¹, Xinyi Ye¹, Baopu Li¹, Jianming Zhang², Zhiguo Cao^1*,

Abstract

Method

Results

Dynamic Comparison

Visualization Comparison

Citation

DreamM🐼ver: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion

ECCV,2024

Liao Shen1, Tianqi Liu1, Huiqiang Sun1, Xinyi Ye1, Baopu Li1, Jianming Zhang2, Zhiguo Cao1*,

Abstract

Method

Results

Dynamic Comparison

Visualization Comparison

Citation

DreamM🐼ver: Leveraging the Prior of Diffusion Models
for Image Interpolation with Large Motion

Liao Shen¹, Tianqi Liu¹, Huiqiang Sun¹, Xinyi Ye¹, Baopu Li¹, Jianming Zhang², Zhiguo Cao^1*,