Using WAN Video Torch Compile with ComfyUI 　「Torch Compile で生成速度を向上させる」

Date: 2025.7.29

in English

There is a mechanism called "Torch Compile" that improves the efficiency of Python.

This can also be applied when generating videos with WAN Video, which is expected to increase the generation speed.

1. Measurement conditions

Model : WAN21_I2V_14B_480p_Q5_K_M.gguf

Text Encoder : umt5_xxl_fp8_e4m3fn_scaled.safetensors

Clip Vision : clip_vision_h.safetensors

VAE : WAN21_VAE_bf16.safetensors

steps = 4, CFG = 1.0, shift = 8.0

Block Swap = 15 ( What is Block Swap? )

Self Forcing Lightx2v LoRA used ( What is lightweight LoRA ?)

The following image will be resized to 480x720 and used.

The workflow we will use is the one we built in the previous article. Clicking on this workflow image will display the image saved in Google Drive, so you can download it from there and drag and drop it into ComfyUI to reproduce the workflow. If the custom node WanVideoWrapper has not been installed, please install it by referring to the previous article .

Add the WanVideo Torch Compile Settings node as shown below and connect it to the WanVideo Model Loader node.

The PC environment used during the measurement is as follows:

M/B: MPG B550 GAMING PLUS (note that the slot is PCIE4.0)

CPU : Ryzen7 5700X

GPU : RTX5060ti 16GB

RAM : DDR4 3200 64GB (32GBx2)

2. Measurement

First, we measured the generation immediately after launching ComfyUI and subsequent generations without Torch Compile.

First Time

Memory used: 11.750 GB

Time: 132.94 seconds

Second Time

Memory used: 11.688 GB

Time: 109.48 seconds

Next, we will look at using Torch Compile.

First Time

Memory used: 12.656 GB

Time: 127.36 seconds

Second Time

Used memory: 11.500 GB

Time: 101.39 seconds

When using Torch Compile, we measured an increase in generation speed of about 5%. The first time, the difference in memory usage is noticeable compared to when Torch Compile is not used, probably because memory is used for Torch Compile calculations, but the second time, the speed only improved with the same amount of memory.

This time the output was at a low step and low CFG, so the difference is small, but in the case of heavy processing, it seems that a speed increase of about 30% can be achieved.

By the way, there was no difference in video quality between using Torch Compile and not using it.

日本語（in Japanese）

Python の効率を上げる「Torch Compile」という仕組みがあります。

WAN Video で動画生成する際にも適用することができ、生成速度の上昇が期待できます。

1. 計測条件

Model : WAN21_I2V_14B_480p_Q5_K_M.gguf

Text Encoder : umt5_xxl_fp8_e4m3fn_scaled.safetensors

Clip Vision : clip_vision_h.safetensors

VAE : WAN21_VAE_bf16.safetensors

steps = 4, CFG = 1.0, shift = 8.0

Block Swap = 15　（Block Swap とは）

Self Forcing Lightx2v LoRA 使用　（軽量化LoRAとは）

画像は以下を 480x720 にリサイズし使用します。

使用するワークフローは以前の記事で構築したこちらになります。このワークフロー画像をクリックするとGoogleドライブに保存された画像が表示されるので、そこからダウンロードしてComfyUIにドラッグ＆ドロップすればワークフローが再現できます。カスタムノード WanVideoWrapper が導入されていない場合は、以前の記事を参考にインストールしてください。

以下のように WanVideo Torch Compile Settings ノードを追加し、 WanVideo Model Loader ノードに接続します。

また、計測時のPC環境は以下となります。

M/B: MPG B550 GAMING PLUS (スロットが PCIE4.0 な点に注意)

CPU : Ryzen7 5700X

GPU : RTX5060ti 16GB

RAM : DDR4 3200 64GB (32GBx2)

2. 計測

まずは Torch Compile 無しの場合について、ComfyUI起動直後の生成とその後の生成を計測しました。

１回目

使用メモリ: 11.750 GB

時間: 132.94 秒

２回目

使用メモリ: 11.688 GB

時間: 109.48 秒

次に Torch Compile を使用した場合です。

１回目

使用メモリ: 12.656 GB

時間: 127.36 秒

２回目

使用メモリ: 11.500 GB

時間: 101.39 秒

Torch Compile を使用した場合、今回は５％前後の生成速度上昇が計測されました。１回目は Torch Compile の計算にメモリを使うのか、Torch Compile 無しの場合と比べ使用メモリの差が目立ちますが、２回目は同じくらいのメモリ量で速度のみ向上しています。

今回は低ステップ、低CFGで出力したため差が少ないですが、重い処理の場合には 30% ほどの速度向上がはかれる場合もあるようです。

ちなみに Torch Compile を使用したときとしなかったときで動画の質に変化はありませんでした。

Using WAN Video Torch Compile with ComfyUI 「Torch Compile で生成速度を向上させる」