WAN Video Start image to End image with ComfyUI 　「画像を２枚使ってI2Vする」

Date: 2025.7.30

in English

WAN Video's Image to Video (I2V) can also generate videos by specifying the start and end images. By being able to determine the first and last frames, you can increase the likelihood of creating the video you envision.

For information on how to generate a video from a single image, please refer to our previous article .

1. Basics

The workflow for generating a video using two images requires only a few modifications to the workflow for generating a video using only one image. This time, I modified the workflow that utilizes the ComfyUI-WanVideoWrapper node I created previously.

Clicking on the image below will display the image saved in Google Drive, so you can download it from there and drag and drop it into ComfyUI to recreate the workflow.

2. How to use

We will use the following two images. The first frame is an image of a snowman, and the last frame is an image of a woman.

I wanted to make a video of a snowman exploding and a woman appearing, so I used the following prompt:

The snowman exploded and turned into a girl.

The generated video is below:

I think it turned out well.

The video of a snowman exploding and a person appearing can be made using either standard I2V or T2V. However, the advantage of Start-End is that you can clearly specify the characteristics of the person who appears.

In cases like this where you need to determine the end point of a video, using two images can be very effective.

By the way, this 2-second video is with the usual specs

Memory usage: 14.750 GB

Time: 154.63 seconds

The result was as follows.

日本語解説（in Japanese）

WAN Video の Image to Video（I2V）は、始めと終わりの画像を指定して動画を生成することもできます。最初のフレームと最後のフレームを決められるため、想定した動画が作れる可能性が高まります。

尚、通常の１枚画像から動画を生成する方法については、以前の記事を参照ください。

１．基礎

２枚の画像を使用した動画生成のワークフローは、１枚だけのものに少し手を加えるだけで済みます。今回は、以前作成した ComfyUI-WanVideoWrapper ノードを活用したワークフローを改変しました。

下の画像をクリックするとGoogleドライブに保存された画像が表示されるので、そこからダウンロードしてComfyUIにドラッグ＆ドロップすればワークフローが再現できます。

２．使い方

以下の２枚の画像を使います。最初のフレームに指定するのが雪だるまの画像、最後のフレームに指定するのは女性の画像です。

雪だるまが弾けて女性が現れる動画にしようと思ったので、プロンプトは以下にしました。

The snowman exploded and turned into a girl.

生成された動画が以下です。

うまくできたと思います。

雪だるまが爆発して中から人が現れる動画自体は、通常の I2V でも T2V でもできます。しかし Start-End の良い点は、現れる人物の特徴を明確に指定できる点です。

このように動画の終点を決める必要がある場合は、２枚の画像を使った方法は非常に有効だと言えます。

ちなみにこの2秒動画はいつものスペックで

メモリ使用量: 14.750 GB

時間: 154.63 秒

という結果でした。

WAN Video Start image to End image with ComfyUI 「画像を２枚使ってI2Vする」