Table of Contents
How Qwen AI’s Text-to-Image Works
As a tech enthusiast and prompt engineer, I’m always excited to explore new tools that can help us unlock our creativity. Recently, I’ve been playing around with QWEN’s text-to-image (T2I) model, and I must say, the results are impressive. In this post, I’ll walk you through what QWEN AI is, how to access its T2I feature, and how to generate stunning images using a simple prompt.
QWEN is an AI suite developed by Alibaba Cloud, designed to help users generate text, images, and even audio content. The suite offers a range of models, each with its unique capabilities. For this post, we’ll focus on the T2I model, which can generate images based on text prompts. By leveraging advanced machine learning algorithms, QWEN AI T2I model can create high-quality images that are both realistic and imaginative.

Testing Qwen AI with Creative Prompt
To access the T2I model, simply head over to the QWEN AI website and navigate to the “Text-to-Image” section. You’ll need to create an account or log in if you already have one. Once you’re in, you’ll see a simple interface where you can enter your text prompt. With QWEN’s intuitive interface, you can easily generate images that meet your creative needs.

At the bottom, you will see a small gallery showing examples of image generation. If you click on any of them, the system generates a new image based on the same prompt, although it does not allow you to edit or update the instructions. For example, here I clicked on the thumbnail at the bottom right, and without entering any new prompt, the system begins to generate the image based on the prompt that is already registered. Once the image is displayed, I can also see the detailed prompt that QWEN AI has used in the chat.
This is the image:

I am struck by the wealth of detail it includes, with every element analysed meticulously, even the appearance of the paper on which the drawing is reproduced. But I am also surprised by the number of errors this AI makes when presenting the information.
The diagram is visually clear and engaging, but it has a few informational errors. For example, the label “Evaporation” is misplaced, pointing underground instead of toward the rising water vapor. “Influration” seems to be a misspelling of “Infiltration.” Also, “Precipitation” is mislabeled underground rather than linked to rainfall. Overall, the visuals are good, but text placement and spelling need correction.
This is the detailed prompt for the generated image:
A hand-drawn style water cycle diagram, the overall picture presents a vivid and lifelike illustration of the water cycle process. At the center of the image is a range of undulating mountains and valleys, with a clear river flowing through the valley, which eventually merges into a vast ocean. The mountains and land are depicted with green vegetation. Below the image is the groundwater layer, represented by blue gradient blocks, forming a clearly layered spatial relationship with surface water. The sun is located in the upper right corner of the image, causing surface water to evaporate, indicated by rising curved arrows. Clouds float in the air, drawn in a white cotton-like appearance, with some clouds being dense, indicating the condensation of water vapor into rain, connected by downward arrows to show the precipitation process. Rain is represented by blue lines and dot symbols, falling from the clouds, replenishing rivers and groundwater. The entire illustration is presented in a cartoon hand-drawn style, with soft lines, bright colors, and clear annotations. The background has a light yellow paper texture, with a slight hand-drawn texture.
From Diagrams to Futuristic Cities
For this example, I’ll use the following prompt:
“Generate an image of a futuristic cityscape at sunset with sleek skyscrapers, flying cars, and a massive holographic advertisement hovering above the city’s central square. Incorporate a mix of neo-Art Deco and cyberpunk architectural styles.”
I want this prompt to be a tough test for Qwen AI text-to-image because it combines complex visual elements (cityscape, sunset, flying cars) with stylistic constraints (neo-Art Deco + cyberpunk), requiring the AI to blend atmosphere, detail, and style. It also challenges the model’s ability to handle scale and composition with both broad scenery and fine futuristic details. You can enter this prompt exactly as is or modify it to fit your creative vision.
I enter the prompt, select the 16:9 format, and in a few seconds it generates this image. Interestingly, although the chat shows a horizontal image, when I click on it to enlarge it, it shows me an image with a 1×1 format, in this case 1024×1024 and a size of 1.3MB. You won’t believe it, but take a look at the holographic advertisement: what does it remind you of? Where have you seen it before? I’m sure it’s no coincidence, and Qwen’s processors are also trained to save energy, which I can’t criticise, even if creatively the result loses some of its originality. Look at the image it has generated.

Prompt vs JSON Prompt: Does it Make a Difference?
Now I am going to use the prompt in JSON format, thinking that I will be able to get a result that looks more like the image I want to achieve. By using JSON, you can fine-tune the image generation process and achieve more precise results. For example, you can select from different styles, like realistic, cartoonish, or abstract. Or maybe you need choose the image size, ranging from small to large.
Let’s generate the image using thish prompt in JSON format:
{
"prompt": "Generate an image of a futuristic cityscape at sunset with sleek skyscrapers, flying cars, and a massive holographic advertisement hovering above the city's central square. Incorporate a mix of neo-Art Deco and cyberpunk architectural styles.",
"style": "realistic",
"size": "1920x1080"
}
You can adjust the JSON prompt to include additional parameters specific to QWEN AI’s model, such as style, size, or other custom options. This prompt tests the model’s ability to:
- Generate complex scenes with multiple elements (cityscape, skyscrapers, flying cars, holographic advertisement)
- Capture specific architectural styles (neo-Art Deco and cyberpunk)
- Render lighting and atmospheric effects (sunset)
I soon realise that the model does not have a specific configuration for JSON and that it has actually interpreted the same instructions as in the previous case. The result simply shows slight variations in the sun’s reflections on the buildings and practically nothing else.

Branding and Realism in AI-Generated Images
As I am still bothered by the fact that it uses my previous images to generate new ones, I decide to change the advertiser in the holographic advertisement, dispense with the water cycle, and request that Coke be the brand featured in the holographic advertisement, but without any explicit mention of the futuristic image of the city. The prompt is that simple:
A massive holographic Coke advertisement.
But you won’t believe what image QWEN generates in response to that prompt:

So, in desperation, I decide to be clearer and specify in my prompt that I don’t want to know anything about futuristic cities or sunsets. However, the result it gives me is similar to the previous one.
My next step is to change the T2I model. I switch to Qwen3-Max-Preview, and I also change the prompt, because I no longer want sunsets, advertising holograms, or flying cars. The new prompt is this:
A coke advertisement in 5th Avenue in New York, in 1960s.
As you can see, my instructions have nothing to do with everything I’ve used so far, right? Well, it’s in the image it generates:

What else can I do to explain to Qwen that it is wrong to generate my image?
Bonus Track: New Chat
Well, I think it’s good news that we don’t have to pay to use this T2I. Dear friends at Qwen AI, I am impressed but above all disappointed. Nothing like this has happened to me in a long time. Your text-to-image model has shared some minor flaws with me:
- It does not take the prompt into account
- It bases the new image generated on the previous one, even if the prompt is completely different
- The image mixes unacceptable errors with quality details
But it’s not all bad news. By chance, I found the solution to most of my problems. All you have to do is change chats, start a new one, and everything works perfectly again. Notice how well the prompt I used in the previous section works, simply by opening a new chat: A coke advertisement in 5th Avenue in New York, in 1960s.

So staying in the same chat is a great idea to take advantage of an image editor. It’s as if Qwen had a Canvas Editor integrated into the chat, maintaining parameters, archiving information, and updating the image according to the criteria you specify.
If you’re interested in exploring QWEN’s T2I model, I encourage you to give it a try. We’re dealing with an AI that still displays inflexible configurations unless you switch chats and start a new one. I’m also certain this will be fixed soon, as developers understand what saves time and increases user satisfaction. With its user-friendly interface and advanced features, you’ll be generating stunning images in no time.
That’s all. I hope you found it interesting and I would like to hear your opinion when you have a chance to try out this QWEN T2I model.
💜 Thank you for taking the time to read this.
Related Content