Table of Contents

Prompts, Settings and Real-World Results

Today I thought it would be interesting to take a walk around VHEER text to image, a lightweight AI tool that transforms a few lines of text into an image generated from that data, responding to the written information. And the exercise will consist of testing how it responds to a simple prompt that I was able to come up with this morning while drinking my first coffee. Now I’ll explain everything I discovered.

And What The Hells Is VHEER?

Vheer is an easy-to-use AI tool that lets you create short cinematic clips in two ways: by animating a still image you upload (image-to-video), or by typing a short description that the AI turns into moving scenes (text-to-video). Instead of dealing with heavy software, you can generate 5–10 second videos directly in your browser, adjusting settings like frame rate, resolution, and format. The platform’s core features are free and watermark-free, making it perfect for creators who want to explore motion in their digital art quickly and without complications.

In addition to Text to Image, there are many other options available on their website. For example, the Image Tools section offers all of these options.

Screenshot of AI image editing tools menu, including background changer, remover, blur, text editor, image outliner, colour palette generator, image to text, and prompt generator.

Now, let’s get started with the prompts.

My Coffee Prompt

Let us set the stage with an artistic proposal that is balanced and vivid, but not overdone. I must provide the model with sufficient information to work with, while leaving room for them to contribute their ability to generate interesting results.

As I mentioned before, I will test it with two versions, one with descriptive text and the other in JSON format. A JSON prompt is a structured format for giving instructions to an AI. Think of it like a recipe: you list ingredients and quantities (key–value pairs) so the AI knows exactly what to do, whether it’s generating text, images, or video. For example, if you see something like this:

{
"prompt": "A glowing moon above a calm river",
"width": 720,
"height": 1280,
"fps": 24
}

👉 Then you can instantly see:

"prompt" = the creative instruction
"width", "height" = output size
"fps" = frame rate

Let’s continue with our exercise, we can’t stop now, we have to get to Vheer text to image AI, so we need the text prompt and the JSON prompt now.

🎬 Core Prompt (Text-to-Video)

Text Prompt:

A lone figure walks through a neon-lit city street at night in soft rain; wet asphalt reflecting colourful lights; cinematic, photorealistic look; natural colours, shallow depth of field, 35mm feel, soft bokeh, gentle rim light on the silhouette; crisp realistic detail. Avoid cartoon, anime, illustration, heavy oversaturation, text, watermark.

📝 The JSON Prompt

{
"prompt": "A lone figure walks through a neon-lit city street at night in soft rain; wet asphalt reflecting colourful lights; cinematic, photorealistic look; natural colours, shallow depth of field, 35mm feel, soft bokeh, gentle rim light on the silhouette; crisp realistic detail.",
"negative_prompt": "cartoon, anime, illustration, heavy oversaturation, low-res, blurry, text, watermark, logo, jpeg artifacts",
"width": 1248,
"height": 1248,
"num_inference_steps": 28,
"guidance_scale": 7.5,
"seed": null
}

What are Inference Steps?

Think of them as the number of brushstrokes the AI uses to refine your image from noise to detail.

Fewer steps = faster, rougher. Good for drafts.
More steps = slower, cleaner. Better detail and fewer artefacts, up to a point.
Practical range: 20–32 for most models; going far beyond that often brings diminishing returns.

What is Guidance Scale (CFG)?

This is how strongly the AI follows your prompt—like turning up the GPS volume.

Lower guidance (5–6) = freer, more interpretive, sometimes softer or dreamier.
Medium (6.5–8) = balanced; faithful to the prompt without looking forced.
High (9–12) = very literal, but can cause over-saturation, harsh contrast, or weird artefacts.

Quick tips for yours JSON prompts

Start around 28 steps and 7–7.5 guidance for realistic styles.
If details look mushy → add steps a little.
If the image drifts from your idea → raise guidance slightly.
If things look crunchy or unnatural → lower guidance a notch.

Converting Text to Image

Text-to-image means you write a description, and the AI generates a still picture from it. It’s like painting with words instead of brushes. Enter a description and let Al work its magic. On the right-hand side of the screen, we will see the space where we can type our prompt. On the left side, the application displays a sample image. If you just want to test how it works, in the space for writing you will see an icon with a light bulb, which offers randomly generated “bright” and brilliant ideas to get you started.

Screenshot of Vheer AI Text-to-Image interface showing a cartoon-style woman and the prompt input box on the right where users type descriptions to generate images.

First, I run the test with the simple prompt, the text one. I save the JSON prompt for the next step.

I type the prompt: A lone figure walks through a neon-lit city street at night in soft rain; wet asphalt reflecting colourful lights; cinematic, photorealistic look; natural colours, shallow depth of field, 35mm feel, soft bokeh, gentle rim light on the silhouette; crisp realistic detail. Avoid cartoon, anime, illustration, heavy oversaturation, text, watermark.

Before clicking the “Generate” button, I check that the aspect ratio is 1×1, which is what I prefer for this image. Then I click “Generate”. In a few seconds, I have my figure walking among the neon lights reflecting at her feet. I think it looks pretty good, so I decide to click the “Download” icon at the top right of the image. If you don’t like what you see or aren’t entirely convinced, you can repeat the process as many times as you like for free, which is a particularly positive feature that we have Vheer to thank for.

This is the image, which is 436 KB in size and measures 1248×1248.

AI-generated cinematic image of a lone figure walking through a neon-lit city street at night, with rain reflecting colourful lights on the pavement.

We have managed to make our character walk alone under the neon rain, where light, shadows and silence offer us a cinematic moment.

Introducing our JSON prompt

Let’s move on to the second part of our exercise. Now we use the prompt we have configured using the JSON code but first, I need to explain something: Vheer’s text-to-image model (T2I) is not configured to receive prompts in JSON format. In other words, we are going to run the test to see how it responds, but bearing in mind the restriction I just mentioned.

We want to convey the same message to Veehr: we want a silhouette walking at night with neon lights reflecting on the street. The only difference is the way we share that message with the application, detailing and defining each of its interpretations.

I copied the JSON prompt I showed you in the previous section, and I kept the same quality and aspect ratio, so that the only change is Vheer’s interpretation of this new format in the prompt. Then I press the “Generate” button and in a few seconds this image appears.

AI-generated photorealistic scene of a lone figure walking down a neon-lit city street at night, with rain-soaked pavement and colourful reflections.

As you can see, there is a bit more contrast in the colours, and Vheer has also added more details that do not appear in the previous image: the walking silhouette is wearing a hood, there are vehicles parked on the street, and there are more signs and neon lights.

Deterministic Setups

It is almost impossible for a T2I model to generate two exactly identical images, even if we always use the same prompt —unless you lock everything down.
Text-to-image models start from random noise, so even the same prompt can “land” differently each run. To get the same image you’d need the same seed, model + exact version, sampler, steps, resolution, and even the same software/hardware settings. Most web tools change some of these (or hide the seed), so outputs vary by design. You can get similar results; exact duplicates are rare outside tightly controlled, deterministic setups.

Here are some reasons why these small differences occur even when using the same prompt.

The model fills the gaps. If your prompt says “neon night street” and “a lone figure”, the AI completes the scene with likely elements from its training: hooded silhouettes, parked cars, glowing shop signs. If you don’t forbid them, it may add them.
Randomness by design. Text-to-image models start from random noise and “paint” the image step by step. Unless a tool lets you fix a seed (Vheer doesn’t), each run will drift—so colours, contrast and details can change.
Word biases. Terms like neon, cinematic, moody, photorealistic tend to push the model toward stronger contrast, wetter reflections and richer highlights. One or two extra adjectives can tilt the whole look.
Hidden settings & post-processing. Different internal models/quality modes (or the same mode on a different run) can slightly change sharpness, tone-mapping and saturation.

If you want to keep searching for the perfect duplicate, the image that is really similar to the previous one you generated with the text, here are some tips you can keep in mind:

Nail composition: “single subject centred, medium shot, shallow depth of field”.
Specify absence: “empty street, no parked cars, minimal signage, no hood (short hair visible)”.
Control colour: “muted palette, low contrast, soft neon glow” (and avoid “vibrant”).
Add an “avoid” tail: “avoid cartoon, heavy saturation, extra signs, hooded clothing”.

That’s all about this test with Vheer text-to-image.

I hope you found it interesting and I would like to hear your opinion when you have a chance to try out this T2I model.

💜 Thank you for taking the time to read this.

A Prompt With 15 Faces

The Moon River Prompt

Vheer Text to Image Quick Start Guide

Prompts, Settings and Real-World Results

And What The Hells Is VHEER?