If image-to-video results look unstable, the issue is usually not the image. It is the prompt. The goal of an image to video prompt is to tell the model how to move without rewriting the scene. Too many creative instructions can fight your source image and cause warping, flicker, or drifting backgrounds.
This guide gives you a prompt framework, 10 copy-paste templates, stability constraints, and a troubleshooting section that maps common problems to specific prompt fixes.
Prompt Anatomy (Use This Every Time)
A stable image-to-video prompt has five components:
- Anchor (confirm the image should be preserved)
- Motion (small, specific movement)
- Camera (gentle movement, not chaotic)
- Lighting and style (consistent, not changing)
- Constraints (no flicker, no warping, stable edges)
Preserve the original image composition. Add [subtle motion]. Camera: [gentle movement]. Lighting: [consistent mood]. Style: [simple style]. Stable motion, no flicker, no warping, preserve edges.10 Copy-Paste Prompt Templates
- Clean product reveal: "Preserve the original product photo. Add subtle parallax and a slow camera push-in. Soft studio lighting, realistic commercial style. Stable motion, no flicker, no warping, crisp edges."
- Product rotation feel: "Preserve the image. Create a gentle camera arc from left to right, subtle highlight movement on the product, clean background. Stable subject shape, no distortion."
- Portrait cinematic micro-motion: "Preserve the portrait. Add subtle breathing motion, minimal hair movement, gentle push-in. Cinematic lighting, natural skin texture. Stable face, no warping."
- Travel landscape pan: "Preserve the landscape photo. Slow pan across the scene, subtle atmospheric motion (mist/clouds). Stable horizon, consistent exposure, no jitter."
- Food steam and push-in: "Preserve the food image. Add gentle push-in and subtle steam motion, warm soft lighting, realistic detail. Stable plate edges, no melting."
- Fashion editorial: "Preserve the image composition. Add slow push-in and subtle fabric motion, soft spotlight, editorial style. Stable face and hands, no distortion."
- Motion poster (light sweep): "Preserve the poster design. Add subtle light sweep across the background and slow drifting particles. Crisp text edges, no flicker, stable shapes."
- Nature macro (minimal motion): "Preserve the macro photo. Add subtle depth shift and gentle camera push-in, natural lighting, realistic texture. No warping, stable focus."
- Architecture tilt: "Preserve the architecture image. Add gentle tilt up and minimal parallax, clean daylight lighting. Stable vertical lines, no bending."
- Art or illustration subtle animation: "Preserve the illustration. Add subtle animated texture (grain or brush movement) and gentle push-in, consistent color grading. No flicker, stable outlines."
Stability Constraints (Add These When Needed)
- preserve original shapes
- stable subject, stable edges
- consistent lighting and exposure
- no flicker, no jitter, no warping
- minimal motion
- locked horizon / locked background
Avoid stacking too many negatives. Choose the two to four most relevant constraints.
Motion and Camera Words That Usually Work
Safe camera moves
- slow push-in
- gentle pan
- slight tilt
- subtle parallax
Safer motion descriptors
- subtle
- gentle
- minimal
- smooth
- stable
Risky words (use carefully)
- fast
- shaky
- handheld
- extreme
- glitch
- chaotic
- rapid
Common Pitfalls (And Prompt Fixes)
- Flicker: Add "consistent exposure, stable lighting, no flicker." Reduce motion.
- Melting textures (hair, packaging, hands): Add "preserve original shapes, stable edges." Keep camera simple.
- Background drift: Add "locked background, subtle parallax only." Reduce depth complexity.
- Face distortion: Add "stable face, natural facial features." Prefer push-in over pan.
Recommended Next Steps
- Feature page: image-to-video
- Product templates: product demo templates
- Troubleshooting hub: common failures and fixes
- FAQ: image-to-video FAQ
FAQ
Do I need long prompts for image-to-video?
No. Short prompts with clear motion and constraints are more stable.
Should I describe the entire image?
Only if the model misinterprets the subject. Otherwise focus on motion.
What is the best first camera movement?
Slow push-in is the most reliable for stable results.