TMD: Transition Matching Distillation for Fast Video Generation

Weili Nie1*, Julius Berner1*, Nanye Ma2, Chao Liu1, Saining Xie2, Arash Vahdat1
1NVIDIA 2NYU *equal contribution
480p videos generated with TMD (distilled from Wan2.1 14B T2V) using less than 3 function evaluations
Close-up of an artist painting on a canvas using a large round brush. The artist has medium-length brown hair tied back in a loose ponytail and wears a white apron over their clothes. They are focused intently, their hand moving smoothly across the canvas. The brushstrokes are visible, blending colors together in fluid motions. The background shows part of the artist's workspace, with other brushes and paint tubes nearby. The scene emphasizes the intricate detail and the flowing motion of the brush as it interacts with the paint and canvas.
A vibrant underwater scene featuring several clownfish swimming gracefully through a colorful coral reef. The clownfish have distinctive orange bodies with white bars and black outlines, darting among the intricate and diverse coral formations. The corals are a mix of soft and hard varieties, showcasing various shapes and hues such as pink, green, and purple. The water is clear, with sunlight filtering through, creating a serene and lively atmosphere. The camera remains static, capturing the continuous motion of the clownfish as they explore their environment. Close-up view.
A serene boat gently sailing along the Seine River in Paris, with the iconic Eiffel Tower subtly visible in the distance. The scene is rendered in a vibrant Van Gogh-style painting, featuring swirling brushstrokes and a palette of rich blues, greens, and yellows. The boat is small, with a sail partially unfurled, and the water reflects the sky and distant structures. The riverbanks are lined with lush greenery and trees, adding to the picturesque ambiance. The overall composition captures the tranquility and beauty of a peaceful afternoon on the river. Medium shot, static view.
A CGI animation of Yoda, the iconic green Jedi master, playing a guitar on a stage. Yoda is dressed in his traditional Jedi robes and has a playful, focused expression as he strums the guitar strings. The stage is dimly lit with spotlights shining down on him, creating dramatic shadows. The background features a live audience with various alien species cheering him on. Yoda is sitting on a stool, leaning slightly forward, and his fingers move nimbly over the guitar. Medium close-up shot focusing on Yoda's face and hands interacting with the guitar.
A happy, medium-sized dog wearing a bright yellow turtleneck sweater sits in a studio setting. The dog has a friendly expression with its tail wagging gently and ears perked up, facing the camera directly. It is standing in a relaxed posture with its front paws slightly apart. The background is entirely dark, creating a stark contrast with the dog’s cheerful demeanor. The lighting highlights the dog’s fur and the texture of the turtleneck. Close-up portrait shot.
In a photorealistic style, an astronaut in a sleek, modern spacesuit is riding a horse floating in the vast expanse of space. The astronaut has a helmet with a clear visor, allowing viewers to see their determined expression as they hold the reins. The horse, a majestic creature with flowing mane and tail, is gracefully suspended against a backdrop of stars and distant planets. Both the astronaut and the horse are illuminated by soft, ambient light from a nearby spaceship. The scene captures the surreal beauty and tranquility of space, with a close-up view focusing on the interaction between the astronaut and the horse.
A plump, fluffy rabbit donning a voluminous purple robe walks gracefully through a vibrant fantasy landscape. The rabbit has large, expressive eyes and a gentle, curious expression. Its fur is soft and thick, and the robe drapes elegantly over its body. The landscape features rolling hills covered in lush green grass, colorful wildflowers, and towering magical trees with shimmering leaves. In the distance, there are sparkling waterfalls and mystical castles. The scene is bathed in warm, golden sunlight. Medium shot, focusing on the rabbit's walk through the picturesque environment.
Macro slow-motion video. A cropped close-up of roasted coffee beans falling gracefully into an empty bowl. The beans are glossy with rich brown hues and intricate textures. As they descend, the beans emit a soft rustling sound, emphasizing their natural beauty and the smooth motion of their fall. The bowl is plain and white, providing a stark contrast to the dark beans. The camera focuses tightly on each bean as it arcs through the air before landing softly in the bowl. The scene is captured in slow motion to highlight the detailed movement and the sensory experience of the falling beans.
3D render of origami dancers made from white paper, performing a modern dance routine against a pristine white background in a studio setting. Each dancer has delicate folds and creases that catch the light, enhancing their ethereal appearance. They are gracefully moving in synchronized motions, expressing fluidity and elegance through their postures. The scene focuses on a medium close-up to capture the intricate details of the dancers' movements and the soft, clean background.
A serene landscape featuring vibrant yellow flowers swaying gently in the breeze. The flowers are arranged in a field with patches of green grass, creating a soft and inviting backdrop. The sun is shining brightly, casting dappled shadows on the ground. A light wind causes the petals to flutter gracefully. The sky is clear and blue, with fluffy white clouds drifting by. The camera starts at a wide angle to capture the expansive field, then slowly zooms in to focus on the individual flowers as they dance in the wind. Medium close-up shot.
A cozy, warm café setting with soft ambient lighting and wooden furnishings. A young adult, casually dressed in a sweater and jeans, sits at a small round table. They hold a steaming cup of coffee in their hand, taking a sip while looking pensively out the window. The café is moderately busy with other patrons engaged in conversations. The background showcases various coffee drinks and pastries displayed on a counter. The person’s expression is relaxed and content. Medium shot focusing on the person’s face and the coffee cup, capturing the intimate atmosphere of the café.
A realistic animation of a cute, bushy-tailed squirrel sitting on a tree branch, holding a juicy burger in its tiny paws. The squirrel has a curious expression as it takes a bite out of the burger, revealing its sharp little teeth. Its fur is brown with lighter underparts, and its large, expressive eyes convey excitement and delight. The background shows a lush forest with sunlight filtering through the leaves, casting dappled shadows. The squirrel is focused on the burger, ignoring its surroundings. Medium close-up shot to capture the squirrel's facial expressions and the burger clearly.

Large video diffusion and flow models have achieved remarkable success in high-quality video generation, but their use in real-time interactive applications remains limited due to their inefficient multi-step sampling process. In this work, we present Transition Matching Distillation (TMD), a novel framework for distilling video diffusion models into efficient few-step generators.

The central idea of TMD is to match the multi-step denoising trajectory of a diffusion model with a few-step probability transition process, where each transition is modeled as a lightweight conditional flow. To enable efficient distillation, we decompose the original diffusion backbone into two components: (1) a main backbone, comprising the majority of early layers, that extracts semantic representations at each outer transition step; and (2) a flow head, consisting of the last few layers, that leverages these representations to perform multiple inner flow updates.

Given a pretrained video flow model, we first introduce a flow head to the model, and adapt it into a conditional flow map. We then apply distribution matching distillation to the student model with flow head rollout in each transition step. Extensive experiments on distilling Wan2.1 1.3B and 14B text-to-video models demonstrate that TMD provides a flexible and strong trade-off between generation speed and visual quality. In particular, TMD outperforms existing distilled models under comparable inference costs in terms of visual fidelity and prompt adherence.

TMD Pipeline
Comparison between the teacher (Wan2.1 14B T2V) and TMD with different (effective) number of function evaluations (NFEs)
Accleration
Teacher (100 NFEs)
TMD (2.75 NFEs)
TMD (1.38 NFEs)
Teacher (100 NFEs)
A stunning young woman with vampire-inspired makeup, featuring pale skin, dark eye shadow, and red lipstick. She has red contact lenses that give her an intense, blood-red gaze. Her hair is flowing freely, styled in soft waves that frame her face. She stands confidently with a slight tilt of her head, exuding an air of mystery and allure. The background is dimly lit with shadows casting an eerie glow. Medium close-up shot focusing on her face and upper body.
TMD (2.75 NFEs)
A stunning young woman with vampire-inspired makeup, featuring pale skin, dark eye shadow, and red lipstick. She has red contact lenses that give her an intense, blood-red gaze. Her hair is flowing freely, styled in soft waves that frame her face. She stands confidently with a slight tilt of her head, exuding an air of mystery and allure. The background is dimly lit with shadows casting an eerie glow. Medium close-up shot focusing on her face and upper body.
TMD (1.38 NFEs)
A stunning young woman with vampire-inspired makeup, featuring pale skin, dark eye shadow, and red lipstick. She has red contact lenses that give her an intense, blood-red gaze. Her hair is flowing freely, styled in soft waves that frame her face. She stands confidently with a slight tilt of her head, exuding an air of mystery and allure. The background is dimly lit with shadows casting an eerie glow. Medium close-up shot focusing on her face and upper body.
Teacher (100 NFEs)
A vibrant display of fireworks lighting up the night sky. The fireworks burst into colorful explosions of red, blue, green, and gold, creating intricate patterns against a dark, starry backdrop. Each explosion is followed by trails of light that slowly fade away. The ground is seen below, with people watching in awe, their faces illuminated momentarily by the bright flashes. The scene captures the excitement and joy of a festive celebration. Wide shot, capturing the expansive sky and the crowd below.
TMD (2.75 NFEs)
A vibrant display of fireworks lighting up the night sky. The fireworks burst into colorful explosions of red, blue, green, and gold, creating intricate patterns against a dark, starry backdrop. Each explosion is followed by trails of light that slowly fade away. The ground is seen below, with people watching in awe, their faces illuminated momentarily by the bright flashes. The scene captures the excitement and joy of a festive celebration. Wide shot, capturing the expansive sky and the crowd below.
TMD (1.38 NFEs)
A vibrant display of fireworks lighting up the night sky. The fireworks burst into colorful explosions of red, blue, green, and gold, creating intricate patterns against a dark, starry backdrop. Each explosion is followed by trails of light that slowly fade away. The ground is seen below, with people watching in awe, their faces illuminated momentarily by the bright flashes. The scene captures the excitement and joy of a festive celebration. Wide shot, capturing the expansive sky and the crowd below.
Teacher (100 NFEs)
A koala bear playing a grand piano in a lush, dense forest. The koala has soft, grey fur and large, round ears. It sits upright on the piano bench, paws delicately placed on the keys, creating gentle melodies. The forest background is filled with tall eucalyptus trees, dappled sunlight filtering through the leaves, and a carpet of green moss beneath the piano. The scene is calm and serene, with the koala focused intently on its performance. Medium close-up shot, capturing the koala and part of the forest surroundings.
TMD (2.75 NFEs)
A koala bear playing a grand piano in a lush, dense forest. The koala has soft, grey fur and large, round ears. It sits upright on the piano bench, paws delicately placed on the keys, creating gentle melodies. The forest background is filled with tall eucalyptus trees, dappled sunlight filtering through the leaves, and a carpet of green moss beneath the piano. The scene is calm and serene, with the koala focused intently on its performance. Medium close-up shot, capturing the koala and part of the forest surroundings.
TMD (1.38 NFEs)
A koala bear playing a grand piano in a lush, dense forest. The koala has soft, grey fur and large, round ears. It sits upright on the piano bench, paws delicately placed on the keys, creating gentle melodies. The forest background is filled with tall eucalyptus trees, dappled sunlight filtering through the leaves, and a carpet of green moss beneath the piano. The scene is calm and serene, with the koala focused intently on its performance. Medium close-up shot, capturing the koala and part of the forest surroundings.
Teacher (100 NFEs)
A close-up view of a cluster of vibrant green grapes on a rotating glass table under soft, diffused lighting. The grapes are large and plump, reflecting the gentle light as they rotate slowly, showcasing their smooth, shiny surfaces. The background is blurred, focusing attention solely on the grapes and their subtle reflections. The camera remains static, capturing the serene and detailed motion of the rotating grapes. Close-up shot.
TMD (2.75 NFEs)
A close-up view of a cluster of vibrant green grapes on a rotating glass table under soft, diffused lighting. The grapes are large and plump, reflecting the gentle light as they rotate slowly, showcasing their smooth, shiny surfaces. The background is blurred, focusing attention solely on the grapes and their subtle reflections. The camera remains static, capturing the serene and detailed motion of the rotating grapes. Close-up shot.
TMD (1.38 NFEs)
A close-up view of a cluster of vibrant green grapes on a rotating glass table under soft, diffused lighting. The grapes are large and plump, reflecting the gentle light as they rotate slowly, showcasing their smooth, shiny surfaces. The background is blurred, focusing attention solely on the grapes and their subtle reflections. The camera remains static, capturing the serene and detailed motion of the rotating grapes. Close-up shot.
Teacher (100 NFEs)
A large great white shark is swimming gracefully through the vast, deep blue ocean. Its sleek, muscular body cuts through the water as it propels forward with powerful tail strokes. The shark's dorsal fin slices through the surface, while smaller fish dart around it. The camera begins at a wide shot of the shark and the surrounding ocean, then smoothly zooms in to focus closely on the shark's sharp teeth and piercing eyes. The scene is filled with sunlight filtering through the water, creating a dynamic interplay of light and shadow. Close-up underwater perspective.
TMD (2.75 NFEs)
A large great white shark is swimming gracefully through the vast, deep blue ocean. Its sleek, muscular body cuts through the water as it propels forward with powerful tail strokes. The shark's dorsal fin slices through the surface, while smaller fish dart around it. The camera begins at a wide shot of the shark and the surrounding ocean, then smoothly zooms in to focus closely on the shark's sharp teeth and piercing eyes. The scene is filled with sunlight filtering through the water, creating a dynamic interplay of light and shadow. Close-up underwater perspective.
TMD (1.38 NFEs)
A large great white shark is swimming gracefully through the vast, deep blue ocean. Its sleek, muscular body cuts through the water as it propels forward with powerful tail strokes. The shark's dorsal fin slices through the surface, while smaller fish dart around it. The camera begins at a wide shot of the shark and the surrounding ocean, then smoothly zooms in to focus closely on the shark's sharp teeth and piercing eyes. The scene is filled with sunlight filtering through the water, creating a dynamic interplay of light and shadow. Close-up underwater perspective.
Teacher (100 NFEs)
A person is playing a video game controller in a cozy living room setting. They are focused intently, with a determined expression on their face. The living room is decorated with soft lighting, comfortable couches, and shelves filled with various games and books. The person is seated on a couch, holding the controller with both hands, pressing buttons and moving the joystick rapidly. The screen of a large TV shows the gameplay in action. The background is blurred slightly to focus attention on the player's interaction with the game. Medium shot, static scene.
TMD (2.75 NFEs)
A person is playing a video game controller in a cozy living room setting. They are focused intently, with a determined expression on their face. The living room is decorated with soft lighting, comfortable couches, and shelves filled with various games and books. The person is seated on a couch, holding the controller with both hands, pressing buttons and moving the joystick rapidly. The screen of a large TV shows the gameplay in action. The background is blurred slightly to focus attention on the player's interaction with the game. Medium shot, static scene.
TMD (1.38 NFEs)
A person is playing a video game controller in a cozy living room setting. They are focused intently, with a determined expression on their face. The living room is decorated with soft lighting, comfortable couches, and shelves filled with various games and books. The person is seated on a couch, holding the controller with both hands, pressing buttons and moving the joystick rapidly. The screen of a large TV shows the gameplay in action. The background is blurred slightly to focus attention on the player's interaction with the game. Medium shot, static scene.
Teacher (100 NFEs)
A dramatic and intense scene featuring an erupting volcano. The volcano is spewing lava and ash into the air, creating a vivid orange glow against a dark night sky filled with billowing smoke clouds. The ground trembles as molten rock flows down the sides of the volcano, lighting up the surrounding landscape. In the foreground, a few scattered trees and rocks are illuminated by the fiery eruption. The camera remains fixed on the volcano, capturing the powerful motion and scale of the event. Nighttime, wide shot.
TMD (2.75 NFEs)
A dramatic and intense scene featuring an erupting volcano. The volcano is spewing lava and ash into the air, creating a vivid orange glow against a dark night sky filled with billowing smoke clouds. The ground trembles as molten rock flows down the sides of the volcano, lighting up the surrounding landscape. In the foreground, a few scattered trees and rocks are illuminated by the fiery eruption. The camera remains fixed on the volcano, capturing the powerful motion and scale of the event. Nighttime, wide shot.
TMD (1.38 NFEs)
A dramatic and intense scene featuring an erupting volcano. The volcano is spewing lava and ash into the air, creating a vivid orange glow against a dark night sky filled with billowing smoke clouds. The ground trembles as molten rock flows down the sides of the volcano, lighting up the surrounding landscape. In the foreground, a few scattered trees and rocks are illuminated by the fiery eruption. The camera remains fixed on the volcano, capturing the powerful motion and scale of the event. Nighttime, wide shot.
Teacher (100 NFEs)
A panoramic view of a towering skyscraper at sunset, showcasing its sleek glass facade and modern architecture. The building stands majestically against a backdrop of orange and pink hues, with city lights beginning to twinkle below. The camera starts from a wide shot, gradually zooming in to capture reflections on the skyscraper's windows as evening traffic flows beneath. The scene emphasizes the verticality and grandeur of the structure, highlighting its presence in the urban skyline. Wide shot transitioning to medium close-up.
TMD (2.75 NFEs)
A panoramic view of a towering skyscraper at sunset, showcasing its sleek glass facade and modern architecture. The building stands majestically against a backdrop of orange and pink hues, with city lights beginning to twinkle below. The camera starts from a wide shot, gradually zooming in to capture reflections on the skyscraper's windows as evening traffic flows beneath. The scene emphasizes the verticality and grandeur of the structure, highlighting its presence in the urban skyline. Wide shot transitioning to medium close-up.
TMD (1.38 NFEs)
A panoramic view of a towering skyscraper at sunset, showcasing its sleek glass facade and modern architecture. The building stands majestically against a backdrop of orange and pink hues, with city lights beginning to twinkle below. The camera starts from a wide shot, gradually zooming in to capture reflections on the skyscraper's windows as evening traffic flows beneath. The scene emphasizes the verticality and grandeur of the structure, highlighting its presence in the urban skyline. Wide shot transitioning to medium close-up.
Teacher (100 NFEs)
A close-up of a young adult playing a classic acoustic guitar, with their fingers deftly plucking the strings. They sit in a cozy, dimly lit room, with soft ambient lighting casting gentle shadows across their face. The person has tousled brown hair, a relaxed expression, and wears a casual outfit consisting of a vintage t-shirt and jeans. Their posture is casual yet focused, leaning slightly forward as they play. The guitar's wooden body reflects the warm glow from a nearby lamp. The scene captures the intimate atmosphere of a musical performance, with the focus on the person's expressive face and skilled hands moving over the fretboard.
TMD (2.75 NFEs)
A close-up of a young adult playing a classic acoustic guitar, with their fingers deftly plucking the strings. They sit in a cozy, dimly lit room, with soft ambient lighting casting gentle shadows across their face. The person has tousled brown hair, a relaxed expression, and wears a casual outfit consisting of a vintage t-shirt and jeans. Their posture is casual yet focused, leaning slightly forward as they play. The guitar's wooden body reflects the warm glow from a nearby lamp. The scene captures the intimate atmosphere of a musical performance, with the focus on the person's expressive face and skilled hands moving over the fretboard.
TMD (1.38 NFEs)
A close-up of a young adult playing a classic acoustic guitar, with their fingers deftly plucking the strings. They sit in a cozy, dimly lit room, with soft ambient lighting casting gentle shadows across their face. The person has tousled brown hair, a relaxed expression, and wears a casual outfit consisting of a vintage t-shirt and jeans. Their posture is casual yet focused, leaning slightly forward as they play. The guitar's wooden body reflects the warm glow from a nearby lamp. The scene captures the intimate atmosphere of a musical performance, with the focus on the person's expressive face and skilled hands moving over the fretboard.
Comparison between our improved version of DMD2 and TMD
DMD2-v (our improved baseline)
TMD (ours)
DMD2-v (our improved baseline)
A warm and tender moment captured in a close-up shot, featuring a person embracing another person in a tight hug. Both individuals have their arms wrapped around each other, with one person's head resting gently on the other's shoulder. They appear to be sharing a loving and emotional connection. The scene is set in a cozy, dimly lit room with soft ambient lighting, creating a serene and intimate atmosphere. The focus is on the expressions of affection and comfort displayed through body language and facial expressions, conveying a sense of security and warmth.
TMD (ours)
A warm and tender moment captured in a close-up shot, featuring a person embracing another person in a tight hug. Both individuals have their arms wrapped around each other, with one person's head resting gently on the other's shoulder. They appear to be sharing a loving and emotional connection. The scene is set in a cozy, dimly lit room with soft ambient lighting, creating a serene and intimate atmosphere. The focus is on the expressions of affection and comfort displayed through body language and facial expressions, conveying a sense of security and warmth.
DMD2-v (our improved baseline)
A sleek black motorcycle cruising along a scenic coastal highway during sunset. The motorcycle has a polished chrome finish and tinted windows on the helmet. The rider, wearing a black leather jacket and jeans, has their hands firmly on the handlebars and is leaning slightly forward, maintaining a steady posture. Waves crash against the rocky shoreline on one side of the highway, while lush green hills roll into the distance on the other. The sky is painted with vibrant hues of orange and pink, casting a warm glow over the entire scene. The camera follows the motorcycle from behind, capturing the wind-swept road and the dramatic coastal landscape in a wide shot.
TMD (ours)
A sleek black motorcycle cruising along a scenic coastal highway during sunset. The motorcycle has a polished chrome finish and tinted windows on the helmet. The rider, wearing a black leather jacket and jeans, has their hands firmly on the handlebars and is leaning slightly forward, maintaining a steady posture. Waves crash against the rocky shoreline on one side of the highway, while lush green hills roll into the distance on the other. The sky is painted with vibrant hues of orange and pink, casting a warm glow over the entire scene. The camera follows the motorcycle from behind, capturing the wind-swept road and the dramatic coastal landscape in a wide shot.
DMD2-v (our improved baseline)
A person is riding a brown horse through a picturesque countryside, with rolling hills and lush green fields stretching out behind them. The rider, wearing a cowboy hat and a casual western outfit, sits tall in the saddle with one hand gently guiding the reins. The horse moves gracefully at a steady trot, its mane flowing freely. In the background, there are scattered trees and a clear blue sky with fluffy clouds. The scene is captured from a mid-shot perspective, focusing on the rider and the horse as they navigate the serene landscape.
TMD (ours)
A person is riding a brown horse through a picturesque countryside, with rolling hills and lush green fields stretching out behind them. The rider, wearing a cowboy hat and a casual western outfit, sits tall in the saddle with one hand gently guiding the reins. The horse moves gracefully at a steady trot, its mane flowing freely. In the background, there are scattered trees and a clear blue sky with fluffy clouds. The scene is captured from a mid-shot perspective, focusing on the rider and the horse as they navigate the serene landscape.
DMD2-v (our improved baseline)
A space shuttle launching into orbit, with intense flames and thick smoke billowing out from its engines. The shuttle is seen lifting off from the launchpad, gradually ascending into the sky. The bright orange flames contrast vividly against the blue sky, creating a dramatic and awe-inspiring scene. Smoke trails behind the shuttle as it accelerates, leaving a trail of white vapor. The camera captures the entire process from a medium distance, focusing on the powerful thrust and the shuttle’s journey towards space. The shot remains static, emphasizing the majesty and power of the launch.
TMD (ours)
A space shuttle launching into orbit, with intense flames and thick smoke billowing out from its engines. The shuttle is seen lifting off from the launchpad, gradually ascending into the sky. The bright orange flames contrast vividly against the blue sky, creating a dramatic and awe-inspiring scene. Smoke trails behind the shuttle as it accelerates, leaving a trail of white vapor. The camera captures the entire process from a medium distance, focusing on the powerful thrust and the shuttle’s journey towards space. The shot remains static, emphasizing the majesty and power of the launch.
DMD2-v (our improved baseline)
Time-lapse footage of a sunrise on Mars, showcasing the reddish-orange hues of the Martian sky as the sun gradually rises over the rugged, dusty terrain. The landscape features large boulders, sand dunes, and the distinctive rocky outcrops characteristic of the Martian surface. The atmosphere is thin and hazy, giving the sunrise a soft, diffused glow. The camera remains stationary throughout, capturing the subtle changes in light and shadow across the Martian landscape. Wide shot, emphasizing the vastness and desolation of the environment.
TMD (ours)
Time-lapse footage of a sunrise on Mars, showcasing the reddish-orange hues of the Martian sky as the sun gradually rises over the rugged, dusty terrain. The landscape features large boulders, sand dunes, and the distinctive rocky outcrops characteristic of the Martian surface. The atmosphere is thin and hazy, giving the sunrise a soft, diffused glow. The camera remains stationary throughout, capturing the subtle changes in light and shadow across the Martian landscape. Wide shot, emphasizing the vastness and desolation of the environment.
DMD2-v (our improved baseline)
A serene African savanna scene with tall grasses and scattered trees. A tall giraffe, with distinctive brown spots on its creamy white coat, bends its long neck gracefully to drink from a calm river. The giraffe's gaze is focused intently on the water as it lowers its head, revealing its long eyelashes and gentle expression. The river reflects the golden hues of the late afternoon sun, casting a warm glow over the scene. The background shows the vast expanse of the savanna with distant hills. The video is a medium close-up, capturing the giraffe's elegant movement and the tranquil environment.
TMD (ours)
A serene African savanna scene with tall grasses and scattered trees. A tall giraffe, with distinctive brown spots on its creamy white coat, bends its long neck gracefully to drink from a calm river. The giraffe's gaze is focused intently on the water as it lowers its head, revealing its long eyelashes and gentle expression. The river reflects the golden hues of the late afternoon sun, casting a warm glow over the scene. The background shows the vast expanse of the savanna with distant hills. The video is a medium close-up, capturing the giraffe's elegant movement and the tranquil environment.
DMD2-v (our improved baseline)
A person in a cozy kitchen setting is washing dishes at a sink filled with sudsy water. The individual, who appears to be middle-aged with casual attire, is scrubbing a plate with a sponge. Water droplets splash and cling to the dishes as they are washed. The kitchen is warmly lit, with sunlight streaming in from a window nearby. The background shows other kitchen appliances and cabinets. The person is standing with a focused expression, moving the sponge in circular motions across the dish. Medium close-up shot, static scene.
TMD (ours)
A person in a cozy kitchen setting is washing dishes at a sink filled with sudsy water. The individual, who appears to be middle-aged with casual attire, is scrubbing a plate with a sponge. Water droplets splash and cling to the dishes as they are washed. The kitchen is warmly lit, with sunlight streaming in from a window nearby. The background shows other kitchen appliances and cabinets. The person is standing with a focused expression, moving the sponge in circular motions across the dish. Medium close-up shot, static scene.
DMD2-v (our improved baseline)
A happy, playful Corgi running and jumping in a park during sunset, captured in black and white. The Corgi has a friendly face with floppy ears and a wagging tail as it moves through the grassy area. The sky behind the dog shows soft gradients of orange and pink fading into shades of gray and black. The park includes a few trees and benches in the background, adding depth to the scene. The Corgi is in motion, emphasizing its joyful playfulness. Medium close-up shot, focusing on the Corgi's expressive face and body language.
TMD (ours)
A happy, playful Corgi running and jumping in a park during sunset, captured in black and white. The Corgi has a friendly face with floppy ears and a wagging tail as it moves through the grassy area. The sky behind the dog shows soft gradients of orange and pink fading into shades of gray and black. The park includes a few trees and benches in the background, adding depth to the scene. The Corgi is in motion, emphasizing its joyful playfulness. Medium close-up shot, focusing on the Corgi's expressive face and body language.