Table of Contents

Over the past few weeks, I decided to explore the fascinating (and frustrating) world of AI video generation — testing how far current tools can go when asked to create realistic videos from simple prompts and photos.
Spoiler: we’re not quite there yet!


🌱 The Goal: Create a Simple, Realistic AI Video

My experiment began with a simple prompt:

Prompt:
“A young Indian woman in her late 20s, dressed in casual ethnic attire, sits peacefully by a large window in a sunlit modern apartment in Mumbai, sipping chai with a book in one hand. Behind her, lush indoor plants like Monstera, Areca Palm, and Aglaonema fill the corner. The vibe is serene, calm, natural, and premium. Soft morning light streams in. Branding text on image: ‘Breathe Life In – [Brand Name] Indoor Plants’.”

I uploaded two images of myself and asked each AI model to generate a video using these as a reference. The goal was to see whether modern AI tools could accurately replicate my facial features, outfit, and scene sequence.


🧠 The Tools I Tried

I tested four major AI video models available to the public:

  1. OpenAI Sora (old version)
    The much-hyped Sora 2 model isn’t available in India yet, so I could only use the older one.
    Sadly, the output was far from usable — distorted faces, inconsistent expressions, and unclear scene transitions.

   2. Google’s Gemini Video Studio
Gemini couldn’t generate a complete video directly; it required the Google AI Studio environment.
Even then, the results were random and lacked scene logic — it seemed to misunderstand the sequence and atmosphere I was aiming for.

3. InVideo
This one gave the most acceptable results — but not because it generated something new.
It relied heavily on stock footage and text-to-speech narration, which looked polished but wasn’t true “AI generation” in the creative sense.
I even added my own voiceover, but the cloning still didn’t match well.

Hygen (under testing)
I’m still experimenting with Hijen — early impressions suggest it might handle face structure better, but it’s too soon to conclude.


🎞️ What Worked — and What Didn’t

  • ❌ Accuracy: None of the models could preserve my face or outfit exactly.

  • ❌ Scene continuity: The generated sequences often felt random or disjointed.

  • ✅ Presentation: InVideo’s use of professional stock visuals gave a refined “ad-like” output.

  • ✅ Accessibility: All tools were relatively easy to use, but none gave instant, high-quality custom videos.


💡 Lessons Learned: What AI Creators Should Know

  1. Manage Expectations:
    Even with detailed prompts and good reference photos, the results may not be “as delightful as you want them to be.”

  2. Be Ready to Experiment:
    It often takes three or four tries or more with different models to find something usable. Almost all good generation models require us to make payment.

  3. Understand the Limitations:
    For now, AI video tools are better at concept simulation than precise personal representation.

  4. Stay Updated:
    Models like Sora 2 and NVIDIA’s premium tools may soon change the game — but they’re not yet open to everyone.


🎬 Conclusion: The Future Looks Exciting, but Not Instant

After testing all four, I can safely say that AI video generation still needs time before everyday creators can make seamless, professional-quality clips “just like that.”
But with rapid development happening across OpenAI, Google, and NVIDIA, I’m hopeful that the next generation of models will cross that line soon.

Until then, experimentation continues. 🚀
Stay tuned — I’ll update this post once I test Sora 2 and Hygen advanced version.


📎 Attachments

(You can embed or link your videos here)

  • 🎥 Sora – Test Output

  • 🎥 Gemini – Test Output

  • 🎥 InVideo – Stock-based Result

  • 🎥 Hijen – Under Testing

I have experimented with this stuff for only about 3-4 hours. And my analysis is based on that.

 

Leave a Comment

Your email address will not be published. Required fields are marked *


Facebook
Twitter
LinkedIn