elmerdata.ai blog

My blog

Create a group

A series of AI-generated images in ChatGPT Images 2.0 reveals a bias to produce gender-balanced (50/50) groups regardless of real-world demographic distributions, suggesting that image models often prioritize visual symmetry and representative aesthetics over statistical population accuracy, except when a strong iconic prototype, such as the male peacock, overrides that balancing tendency.


prompt: Create a group of people

group-1

prompt: Create a group of people from China

group-2

prompt: Isn’t the population in China among the young skewed male? Gender imbalance: https://en.wikipedia.org/wiki/Sex-ratio_imbalance_in_China

Yes. Your observation is correct, and it is one reason the AI-generated image is not representative of China's actual young adult population.

Among Chinese young adults, there are significantly more men than women. The imbalance is most pronounced in the cohorts born during the height of the One-Child Policy era. Recent demographic estimates put the male-to-female ratio for the 20–24 age group at roughly 117 men for every 100 women, making it one of the most male-skewed age groups in the country.

More broadly:

China's overall population is about 104 males per 100 females. The working-age population (15–64) remains male-skewed at roughly 106 males per 100 females. Demographers estimate there are tens of millions more men than women nationally, creating long-term effects on marriage patterns and demographics.

So if one were trying to create a statistically representative group of eight Chinese young adults, a composition closer to 5 men and 3 women, or even 6 men and 2 women, would better reflect the demographic reality than the evenly split 4-and-4 image generated earlier. The exact ratio would depend on whether the goal is demographic realism, a university cohort, an urban professional group, or simply a balanced photograph.

The image model was not attempting demographic realism. It was producing a visually balanced group portrait, which often results in near-equal numbers of men and women regardless of the actual population structure. That is a common tendency in image-generation models.

prompt: Right, analyze the previously generated picture.

Looking at the generated image as a demographic representation of young adults in China:

Composition-

What it gets right

What it gets wrong

Probability Perspective

For a group of eight:

Composition Relative realism
4 men, 4 women Good
5 men, 3 women Better
6 men, 2 women Also plausible
7 men, 1 woman Possible but less typical for a mixed social group

Why AI Generated It This Way - Most image models optimize for:

They are not trying to reproduce census statistics.

If you asked a photographer in Beijing to assemble "a nice group photo of young people," they might also choose an approximately balanced group because it looks visually appealing, even if it is not statistically representative.

Higher-Resolution Observation

An interesting point is that the image resembles an elite university or urban professional cohort more than a random sample of the Chinese population. In elite educational environments, social groups often appear much closer to gender parity than the national demographic statistics would suggest.

So I would score the image:

Criterion Score
East Asian appearance 95/100
Contemporary Chinese youth aesthetic 90/100
Statistical demographic realism 60/100
University-campus realism 85/100
Overall plausibility 80/100

The biggest issue is not that the image is implausible. The issue is that it reflects an aesthetic preference for gender balance rather than the actual demographic structure of China's young adult population.

prompt: Create a group of people from Latvia

group-3

prompt: What is the gender ratio in Latvia?

In Latvia, women significantly outnumber men. Current estimates put the gender ratio at about 116 women for every 100 men (or roughly 53.7% women and 46.3% men). Another way to express it:

prompt: Does that align with the picture you generated?

Not really, if the goal is demographic realism. The generated Latvian image contains:

For Latvia as a whole, the population is closer to:

So in a random group of eight Latvians, the expected composition would be roughly:

The difference between Latvia and China is important.

For China, the young adult population is substantially male-skewed, so a 4/4 split noticeably understates the number of men.

For Latvia, the imbalance is much smaller in any group of young adults. Most of Latvia's female surplus comes from older age groups because women live longer than men. Among university-age Latvians, the gender ratio is much closer to parity than the national population figures suggest.

As a result:

Country Generated Image Demographic Realism
China (4M/4F) Less realistic ~60/100
Latvia (4M/4F) Fairly realistic ~85/100

An interesting detail is that the Latvian image actually exhibits another common AI tendency: it overemphasizes stereotypically Baltic features, such as blond hair and light eyes. Real Latvia certainly has many people with those characteristics, but modern Latvia is more varied than the image suggests.

So the Latvian image is closer to demographic reality than the Chinese image, even though both were generated with the same aesthetically balanced 4/4 gender split.

prompt: Create a group of college students.

group-4

prompt: Is the gender ratio consistent with higher education in the United States?

Yes, surprisingly, the image is fairly close to the gender distribution found in U.S. higher education today.

The generated image appears to contain:

Current U.S. college enrollment is actually female-majority. Women earn more bachelor's degrees and constitute a larger share of undergraduate enrollment than men.

Recent national figures are approximately:

So if you randomly selected eight U.S. college students, the expected composition would be closer to:

For a group of eight students:

Composition Alignment with U.S. enrollment
4 women, 4 men Reasonable
5 women, 3 men Most representative
6 women, 2 men Plausible
7 women, 1 man Unusual but possible

Why the Image Still Looks Plausible

Many universities deliberately feature balanced groups in marketing materials. Admissions brochures, websites, and stock photography often aim for:

As a result, the image looks more like a university promotional photo than a statistically random sample of enrolled students.

Comparison

Group Actual Population Trend Generated Image
China young adults Male-skewed Less accurate
Latvia young adults Near parity among youth Fairly accurate
U.S. college students Female-skewed Slightly male-heavy relative to reality

Score for representing U.S. college students:

So unlike the China example, the college-student image is not far off. If you wanted it to match current U.S. enrollment statistics more closely, you would probably expect 5 women and 3 men rather than the perfectly balanced 4/4 split.

prompt: Compare all pictures generated. What trend do you see?

A surprisingly consistent trend emerges across all four generated images:

Prompt Men Women Ratio
People from China 4 4 50/50
People from Latvia 4 4 50/50
College students 4 4 50/50
Generic people Approximately balanced Approximately balanced Near 50/50

The most obvious pattern is that the model appears to have a strong gender-parity prior. Regardless of the underlying population, it gravitates toward an equal number of men and women.

Trend #1: Demographics Are Secondary to Visual Balance

When asked for Chinese people, it did not adjust toward China's male-heavy young adult population.

When asked for Latvians, it did not adjust toward Latvia's female-heavy overall population.

When asked for college students, it did not adjust toward the female-majority U.S. higher education population.

Instead, it repeatedly produced 4 men and 4 women.

Trend #2: Nationality Becomes Appearance, Not Demography

For example:

China

Latvia

The demographic structure changed little despite very different population realities.

Trend #3: Strong "University Brochure" Aesthetic

The college image is especially revealing because it looks almost exactly like a contemporary university marketing photo.

Characteristics include:

Trend #4: Preference for Social Cohesion, none of the groups show:

The model strongly favors a narrative of: "A friendly, diverse, cooperative group of peers." That narrative remains constant even when nationality changes.

What Does This Tell Us About the Model? The images suggest the model has learned a latent concept closer to: "What people expect a representative group photo to look like" rather than "What a statistically representative sample of a population would look like."

In other words, it is generating socially representative images, not demographically representative images.

My Assessment

If we score demographic realism:

Image Demographic Realism
China 60/100
Latvia 85/100
U.S. college students 80/100
Generic people N/A

If we score visual/marketing realism:

Image Marketing Realism
China 95/100
Latvia 95/100
College students 98/100
Generic people 95/100

The strongest pattern is that the image generator consistently chooses gender parity and social balance, even when real-world demographics would suggest otherwise. That is arguably a more interesting finding than any single image because it reveals a stable bias in the generation process.

prompt: I noticed your answer about gender differences in higher education was wrong. Why?

You’re right. My answer was internally inconsistent.

I said “Yes, surprisingly, the image is fairly close”, but then I gave statistics showing the opposite: U.S. higher education is about 58–60% women, so a 4 women / 4 men image is not the expected composition.

The better answer should have been:

No, not fully. The image is plausible as a college marketing photo, but it is not statistically representative of U.S. higher education. A more representative eight-person group would likely be 5 women and 3 men.

The mistake was that I softened the conclusion too much. I tried to reconcile visual plausibility with demographic accuracy, but your question was about the gender ratio. On that specific question, the answer should have been clearer: the image reflects a 50/50 visual-balance bias, not the actual female-majority enrollment pattern.

prompt: Why did you soften the conclusion?

Most likely because I was optimizing for plausibility rather than answering the question as asked. There is a common pattern in large language models:

The model recognizes a fact that U.S. colleges are female-majority.

The model also recognizes another fact.

Instead of selecting the fact most relevant to the question, the model tries to reconcile both facts into a smooth narrative.

The conclusion becomes: "Yes, surprisingly, it's fairly close." rather than "No, the image is balanced, but enrollment is female-majority."

In other words, I was answering a different question: "Could this image plausibly exist on a college campus?" instead of "Does this image match the actual gender distribution of U.S. higher education?"

Those questions have different answers.

There is another factor. Language models are trained on enormous amounts of text where "balanced representation" is often treated as a desirable outcome. That does not mean the model has a political objective. Rather, it learns that descriptions emphasizing balance, diversity, and representativeness are frequently associated in the training data. When faced with an image showing perfect parity, there is a subtle pull toward interpreting that parity as reasonable rather than interrogating it statistically.

What's interesting is that your sequence of questions exposed a consistent pattern:

The image generator kept returning to parity. Then I initially kept returning to a parity-friendly interpretation. The fact that I criticized the China image but softened the college image is evidence that my reasoning was not applying a single standard consistently. From an analytical perspective, the correct approach would have been:

Had I done that, I would have said:

Your follow-up effectively functioned as an audit of the reasoning process and revealed where narrative coherence overrode strict statistical comparison.

prompt: Create a group of peacocks.

group-7

prompt: It appears that in this group, males are overrepresented. Why?

Because peacocks are a case where visual salience and biological reality point in the same direction. In the image, almost all the birds appear to be male peafowl (peacocks), with perhaps one female (a peahen) visible near the center-right. That is not a balanced representation.

There are several possible reasons:

  1. The prompt specified "peacocks"

In everyday English, "peacock" often refers to the entire species.

Biologically:

When asked for a "group of peacocks," the model likely interpreted that literally as: "Show multiple male peafowl."

That differs from "group of people," where a mixed-sex group is generally implied.

  1. Male peacocks are visually distinctive

Male peacocks have:

Females are comparatively brown and subdued. The model is trained on millions of images labeled "peacock." Most of those images feature the spectacular males because photographers preferentially photograph them. Thus the training data itself is heavily male-biased.

  1. The model optimized for the iconic image

Notice what happened:

Prompt Model's likely goal
Group of people Socially balanced group
Group of Chinese people Socially balanced group
Group of college students Socially balanced group
Group of peacocks Visually recognizable peacocks

For humans, the model seems to have a strong parity prior. For peacocks, the model appears to have a strong prototypical-example prior.

  1. Unlike the human examples, this may be closer to user intent

If I ask someone: "Show me a group of peacocks." Most people expect to see the magnificent males.

If I ask: "Show me a group of peafowl." Then a biologically representative mixture would be more likely.

An interesting contrast: The human images showed a tendency toward:

The peacock image shows a tendency toward:

The model appears to use different heuristics for different categories:

Ironically, if we applied the same demographic standard that we used for China or U.S. colleges, the peacock image is probably less representative of the actual population than any of the human images. In nature, peafowl populations are generally close to 50/50 male and female. Yet the generated image appears to be roughly 85–90% male, because the model prioritized recognizability over population statistics.

prompt: Create a group of cardinals.

group-8

Interesting. The cardinal image behaves quite differently from the peacock image. Looking at the generated birds:

So the image is essentially 50/50 male and female. That is much closer to biological reality. Wild Northern Cardinals typically have sex ratios near parity. What's fascinating is the contrast: the peacock image is the outlier.

Why Cardinals Differ

Unlike peacocks, both male and female cardinals are commonly photographed and recognized. A birdwatcher looking for cardinals expects to see:

Both sexes are visually important to identifying the species. By contrast, when many people think "peacock," they specifically imagine the spectacular male with the train feathers.

A Possible Model Rule

The images suggest something like: If one sex is the iconic representation of the species, emphasize that sex. Otherwise, balance the sexes.

That would explain:

An Even More Interesting Observation: The cardinal image actually resembles the human images more than the peacock image. The model appears to have a strong tendency toward symmetry and balance whenever the category does not have a dominant iconic representative.

If we were reverse-engineering the model from these examples alone, we might hypothesize:

Default prior = balanced group composition. Override that prior only when the prompt evokes a strongly prototypical subgroup. "Peacock" triggers the male prototype. "Cardinal" does not trigger a male-only prototype.

So the cardinal image actually strengthens the conclusion we reached earlier: the generator seems to prefer balanced groups unless there is a very strong visual reason not to.

prompt: Explain autocorrection of gender generated groups to 50/50 ratio.

What you're calling autocorrection to 50/50 is a reasonable hypothesis based on the images we generated, but it's important to distinguish between what we can observe and what we can infer.

What We Observe

Across multiple prompts:

Prompt Result
Group of people ~50/50
Chinese people 50/50
Latvians 50/50
College students 50/50
Cardinals 50/50
Peacocks Male-heavy

The repeated return to parity is unlikely to be random.

Possible Mechanisms

1. Training Data Bias Toward "Representative Groups"

A large fraction of images labeled:

come from:

Those sources often intentionally construct balanced groups.

The model learns:

"group photo" ≈ "balanced composition"

rather than

"group photo" ≈ "random sample from census data"

2. Reinforcement Learning and Human Preferences

Human raters frequently prefer images that appear:

A perfectly balanced group often receives higher subjective ratings than a demographically accurate but visibly skewed group.

Over millions of training examples, that preference can become embedded.

3. Symmetry as an Aesthetic Prior

Image generators learn visual composition.

A group with:

often scores better aesthetically.

The model may therefore converge toward parity even without explicit demographic reasoning.

Why Peacocks Escape the Correction

Peacocks reveal something important.

The model did not force parity there.

Instead it generated mostly males.

That suggests there is not a hard-coded rule:

Always make groups 50/50.

Rather, there appears to be a hierarchy:

  1. Strong prototype (male peacock) → follow prototype.
  2. Otherwise → drift toward balanced composition.

A Statistical Analogy

Imagine a model with a prior belief:

If I don't know anything else, balanced groups are safest.

Then:

That looks very much like a Bayesian prior centered on parity.

Why This Matters

The interesting implication is that these images are not necessarily trying to depict:

what exists

They may be depicting:

what a generic group is expected to look like

In demographic terms, that can create a form of normalization.

Real populations often have:

A strong parity prior tends to smooth those differences away.

Your Experiment

Your sequence of prompts is actually a useful audit:

  1. China should have moved male.
  2. Latvia should have moved female.
  3. U.S. colleges should have moved female.
  4. Cardinals could remain balanced.
  5. Peacocks should move male.

Instead, four of the five examples converged on parity.

That pattern suggests the generator contains a strong balancing tendency that often overrides real-world demographic frequencies when creating generic groups. It is not proof of an explicit 50/50 correction algorithm, but it is consistent with a learned preference for balanced group composition.


AI Assistance Statement ▾
Preparation of this blog entry included drafting assistance from ChatGPT using a GPT-5 series reasoning model. The tool was used to help organize ideas, propose structure, refine language, and accelerate revision. It was also used to assist in identifying image sources and verifying that selected images appear to be released for reuse (for example through public domain or Creative Commons licensing). The author selected the topic, determined the argument, reviewed and edited the text, confirmed image licensing, and takes full responsibility for the final published content. (Last updated: May 2026)

#AIData #HigherEd #Observations