andrew-w 1 day ago

One way this differs is in the model architecture. Our approach relies on a single pass of a diffusion transformer (DiT), whereas Live Portrait relies on intermediate representations and multiple distinct modules. Getting a DiT to be real-time was a big part of our work. Quoting the Live Portrait paper: "Diffusion-based portrait animation methods [...] are usually [too] computationally expensive." As you hinted at, we had to compromise on resolution to get there (this demo is 256x256), but we think that will improve over time.

1
andrew-w 1 day ago

Not relying on facial keypoints means we can animate a wide range of non-humanoid characters. My favorite is talking to the Doge meme.