Create a group

06 Jun, 2026

A series of AI-generated images in ChatGPT Images 2.0 reveals a bias to produce gender-balanced (50/50) groups regardless of real-world demographic distributions, suggesting that image models often prioritize visual symmetry and representative aesthetics over statistical population accuracy, except when a strong iconic prototype, such as the male peacock, overrides that balancing tendency.

prompt: Create a group of people

group-1

prompt: Create a group of people from China

group-2

prompt: Isn’t the population in China among the young skewed male? Gender imbalance: https://en.wikipedia.org/wiki/Sex-ratio_imbalance_in_China

Yes. Your observation is correct, and it is one reason the AI-generated image is not representative of China's actual young adult population.

Among Chinese young adults, there are significantly more men than women. The imbalance is most pronounced in the cohorts born during the height of the One-Child Policy era. Recent demographic estimates put the male-to-female ratio for the 20–24 age group at roughly 117 men for every 100 women, making it one of the most male-skewed age groups in the country.

More broadly:

China's overall population is about 104 males per 100 females. The working-age population (15–64) remains male-skewed at roughly 106 males per 100 females. Demographers estimate there are tens of millions more men than women nationally, creating long-term effects on marriage patterns and demographics.

So if one were trying to create a statistically representative group of eight Chinese young adults, a composition closer to 5 men and 3 women, or even 6 men and 2 women, would better reflect the demographic reality than the evenly split 4-and-4 image generated earlier. The exact ratio would depend on whether the goal is demographic realism, a university cohort, an urban professional group, or simply a balanced photograph.

The image model was not attempting demographic realism. It was producing a visually balanced group portrait, which often results in near-equal numbers of men and women regardless of the actual population structure. That is a common tendency in image-generation models.

prompt: Right, analyze the previously generated picture.

Looking at the generated image as a demographic representation of young adults in China:

Composition-

Total people: 8
Men: 4
Women: 4
Apparent age: late teens to mid-20s
Setting: university or urban youth environment
Gender ratio: 100 males per 100 females
How Realistic Is It?
For a generic "group of young people in China," I would rate the demographic realism about 60/100.

What it gets right

The individuals generally appear East Asian.
The age range is plausible for university students.
Clothing and appearance are consistent with contemporary urban Chinese youth.
Nothing in the image appears obviously stereotypical or exoticized.

What it gets wrong

The gender distribution is unusually balanced.
China's young adult population remains noticeably male-heavy due to the legacy of sex-ratio imbalances at birth.
A random sample of eight young adults would statistically be more likely to contain five or six men than four.

Probability Perspective

If we assume a young-adult population ratio of roughly 54% male, 46% female (a conservative estimate), then:
4 men / 4 women is not impossible.
It is simply less likely than 5/3 or 6/2 compositions.

For a group of eight:

Composition	Relative realism
4 men, 4 women	Good
5 men, 3 women	Better
6 men, 2 women	Also plausible
7 men, 1 woman	Possible but less typical for a mixed social group

Why AI Generated It This Way - Most image models optimize for:

visual symmetry,
perceived diversity,
balanced composition,
aesthetic appeal.

They are not trying to reproduce census statistics.

If you asked a photographer in Beijing to assemble "a nice group photo of young people," they might also choose an approximately balanced group because it looks visually appealing, even if it is not statistically representative.

Higher-Resolution Observation

An interesting point is that the image resembles an elite university or urban professional cohort more than a random sample of the Chinese population. In elite educational environments, social groups often appear much closer to gender parity than the national demographic statistics would suggest.

So I would score the image:

Criterion	Score
East Asian appearance	95/100
Contemporary Chinese youth aesthetic	90/100
Statistical demographic realism	60/100
University-campus realism	85/100
Overall plausibility	80/100

The biggest issue is not that the image is implausible. The issue is that it reflects an aesthetic preference for gender balance rather than the actual demographic structure of China's young adult population.

prompt: Create a group of people from Latvia

group-3

prompt: What is the gender ratio in Latvia?

In Latvia, women significantly outnumber men. Current estimates put the gender ratio at about 116 women for every 100 men (or roughly 53.7% women and 46.3% men). Another way to express it:

Male-to-female ratio: about 0.86 males per female.
This imbalance is one of the highest in Europe and is largely due to:
Higher male mortality rates
Lower life expectancy for men
An aging population where women live longer on average

prompt: Does that align with the picture you generated?

Not really, if the goal is demographic realism. The generated Latvian image contains:

4 men
4 women
That produces a 50/50 split.

For Latvia as a whole, the population is closer to:

46% men
54% women

So in a random group of eight Latvians, the expected composition would be roughly:

4 men, 4 women (quite plausible)
3 men, 5 women (slightly more representative)
2 men, 6 women (also plausible)

The difference between Latvia and China is important.

For China, the young adult population is substantially male-skewed, so a 4/4 split noticeably understates the number of men.

For Latvia, the imbalance is much smaller in any group of young adults. Most of Latvia's female surplus comes from older age groups because women live longer than men. Among university-age Latvians, the gender ratio is much closer to parity than the national population figures suggest.

As a result:

Country	Generated Image	Demographic Realism
China (4M/4F)	Less realistic	~60/100
Latvia (4M/4F)	Fairly realistic	~85/100

An interesting detail is that the Latvian image actually exhibits another common AI tendency: it overemphasizes stereotypically Baltic features, such as blond hair and light eyes. Real Latvia certainly has many people with those characteristics, but modern Latvia is more varied than the image suggests.

So the Latvian image is closer to demographic reality than the Chinese image, even though both were generated with the same aesthetically balanced 4/4 gender split.

prompt: Create a group of college students.

group-4

prompt: Is the gender ratio consistent with higher education in the United States?

Yes, surprisingly, the image is fairly close to the gender distribution found in U.S. higher education today.

The generated image appears to contain:

4 men
4 women
50% male, 50% female

Current U.S. college enrollment is actually female-majority. Women earn more bachelor's degrees and constitute a larger share of undergraduate enrollment than men.

Recent national figures are approximately:

58–60% women
40–42% men

So if you randomly selected eight U.S. college students, the expected composition would be closer to:

5 women, 3 men
Occasionally 6 women, 2 men
Less often 4 and 4

For a group of eight students:

Composition	Alignment with U.S. enrollment
4 women, 4 men	Reasonable
5 women, 3 men	Most representative
6 women, 2 men	Plausible
7 women, 1 man	Unusual but possible

Why the Image Still Looks Plausible

Many universities deliberately feature balanced groups in marketing materials. Admissions brochures, websites, and stock photography often aim for:

gender balance,
racial diversity,
a mix of visible backgrounds,
visual symmetry.

As a result, the image looks more like a university promotional photo than a statistically random sample of enrolled students.

Comparison

Group	Actual Population Trend	Generated Image
China young adults	Male-skewed	Less accurate
Latvia young adults	Near parity among youth	Fairly accurate
U.S. college students	Female-skewed	Slightly male-heavy relative to reality

Score for representing U.S. college students:

Age realism: 95/100
Campus realism: 95/100
Diversity realism: 90/100
Gender-ratio realism: 80/100
Overall plausibility: 92/100

So unlike the China example, the college-student image is not far off. If you wanted it to match current U.S. enrollment statistics more closely, you would probably expect 5 women and 3 men rather than the perfectly balanced 4/4 split.

prompt: Compare all pictures generated. What trend do you see?

A surprisingly consistent trend emerges across all four generated images:

Prompt	Men	Women	Ratio
People from China	4	4	50/50
People from Latvia	4	4	50/50
College students	4	4	50/50
Generic people	Approximately balanced	Approximately balanced	Near 50/50

The most obvious pattern is that the model appears to have a strong gender-parity prior. Regardless of the underlying population, it gravitates toward an equal number of men and women.

Trend #1: Demographics Are Secondary to Visual Balance

The model appears optimized for:
symmetry,
social harmony,
visual attractiveness,
perceived inclusiveness.
It is not primarily optimizing for census accuracy.

When asked for Chinese people, it did not adjust toward China's male-heavy young adult population.

When asked for Latvians, it did not adjust toward Latvia's female-heavy overall population.

When asked for college students, it did not adjust toward the female-majority U.S. higher education population.

Instead, it repeatedly produced 4 men and 4 women.

Trend #2: Nationality Becomes Appearance, Not Demography

The model changes:
facial features,
clothing,
architecture,
environmental cues,
but not the gender composition.

For example:

China

East Asian faces
Chinese architectural elements
Balanced gender ratio

Latvia

Northern European faces
Latvian flag
Balanced gender ratio

The demographic structure changed little despite very different population realities.

Trend #3: Strong "University Brochure" Aesthetic

All four images resemble:
admissions brochures,
corporate websites,
diversity recruitment materials,
stock photography.

everyone smiling,
similar ages,
no visible disability,
no obvious socioeconomic variation,
balanced gender mix,
balanced ethnic representation when applicable.

The college image is especially revealing because it looks almost exactly like a contemporary university marketing photo.

Characteristics include:

Trend #4: Preference for Social Cohesion, none of the groups show:

social tension,
isolation,
unusual age distributions,
unattractive compositions.

The model strongly favors a narrative of: "A friendly, diverse, cooperative group of peers." That narrative remains constant even when nationality changes.

What Does This Tell Us About the Model? The images suggest the model has learned a latent concept closer to: "What people expect a representative group photo to look like" rather than "What a statistically representative sample of a population would look like."

In other words, it is generating socially representative images, not demographically representative images.

My Assessment

If we score demographic realism:

Image	Demographic Realism
China	60/100
Latvia	85/100
U.S. college students	80/100
Generic people	N/A

If we score visual/marketing realism:

Image	Marketing Realism
China	95/100
Latvia	95/100
College students	98/100
Generic people	95/100

The strongest pattern is that the image generator consistently chooses gender parity and social balance, even when real-world demographics would suggest otherwise. That is arguably a more interesting finding than any single image because it reveals a stable bias in the generation process.

prompt: I noticed your answer about gender differences in higher education was wrong. Why?

You’re right. My answer was internally inconsistent.

I said “Yes, surprisingly, the image is fairly close”, but then I gave statistics showing the opposite: U.S. higher education is about 58–60% women, so a 4 women / 4 men image is not the expected composition.

The better answer should have been:

No, not fully. The image is plausible as a college marketing photo, but it is not statistically representative of U.S. higher education. A more representative eight-person group would likely be 5 women and 3 men.

The mistake was that I softened the conclusion too much. I tried to reconcile visual plausibility with demographic accuracy, but your question was about the gender ratio. On that specific question, the answer should have been clearer: the image reflects a 50/50 visual-balance bias, not the actual female-majority enrollment pattern.

prompt: Why did you soften the conclusion?

Most likely because I was optimizing for plausibility rather than answering the question as asked. There is a common pattern in large language models:

The model recognizes a fact that U.S. colleges are female-majority.

The model also recognizes another fact.

A 4/4 group photo is common on college campuses.
Marketing materials often show balanced groups.

Instead of selecting the fact most relevant to the question, the model tries to reconcile both facts into a smooth narrative.

The conclusion becomes: "Yes, surprisingly, it's fairly close." rather than "No, the image is balanced, but enrollment is female-majority."

In other words, I was answering a different question: "Could this image plausibly exist on a college campus?" instead of "Does this image match the actual gender distribution of U.S. higher education?"

Those questions have different answers.

There is another factor. Language models are trained on enormous amounts of text where "balanced representation" is often treated as a desirable outcome. That does not mean the model has a political objective. Rather, it learns that descriptions emphasizing balance, diversity, and representativeness are frequently associated in the training data. When faced with an image showing perfect parity, there is a subtle pull toward interpreting that parity as reasonable rather than interrogating it statistically.

What's interesting is that your sequence of questions exposed a consistent pattern:

China: actual population skewed male → image 50/50.
Latvia: actual population skewed female → image 50/50.
U.S. college enrollment: actual population skewed female → image 50/50.

The image generator kept returning to parity. Then I initially kept returning to a parity-friendly interpretation. The fact that I criticized the China image but softened the college image is evidence that my reasoning was not applying a single standard consistently. From an analytical perspective, the correct approach would have been:

Establish the demographic baseline.
Compare image to baseline.
Draw the same type of conclusion for every case.

Had I done that, I would have said:

China: not representative.
Latvia: somewhat representative.
U.S. colleges: not representative.
Common pattern: strong 50/50 gender-balance prior.

Your follow-up effectively functioned as an audit of the reasoning process and revealed where narrative coherence overrode strict statistical comparison.

prompt: Create a group of peacocks.

group-7

prompt: It appears that in this group, males are overrepresented. Why?

Because peacocks are a case where visual salience and biological reality point in the same direction. In the image, almost all the birds appear to be male peafowl (peacocks), with perhaps one female (a peahen) visible near the center-right. That is not a balanced representation.

There are several possible reasons:

The prompt specified "peacocks"

In everyday English, "peacock" often refers to the entire species.

Biologically:

Male = peacock
Female = peahen
Species = peafowl

When asked for a "group of peacocks," the model likely interpreted that literally as: "Show multiple male peafowl."

That differs from "group of people," where a mixed-sex group is generally implied.

Male peacocks are visually distinctive

Male peacocks have:

Large iridescent trains
Bright blue necks
Eye-spot feathers

Females are comparatively brown and subdued. The model is trained on millions of images labeled "peacock." Most of those images feature the spectacular males because photographers preferentially photograph them. Thus the training data itself is heavily male-biased.

The model optimized for the iconic image

Notice what happened:

Prompt	Model's likely goal
Group of people	Socially balanced group
Group of Chinese people	Socially balanced group
Group of college students	Socially balanced group
Group of peacocks	Visually recognizable peacocks

For humans, the model seems to have a strong parity prior. For peacocks, the model appears to have a strong prototypical-example prior.

Unlike the human examples, this may be closer to user intent

If I ask someone: "Show me a group of peacocks." Most people expect to see the magnificent males.

If I ask: "Show me a group of peafowl." Then a biologically representative mixture would be more likely.

An interesting contrast: The human images showed a tendency toward:

demographic balancing

The peacock image shows a tendency toward:

prototype amplification

The model appears to use different heuristics for different categories:

Humans → balance and inclusiveness.
Animals → iconic representatives of the category.
Peacocks → colorful males because they are the most recognizable example.

Ironically, if we applied the same demographic standard that we used for China or U.S. colleges, the peacock image is probably less representative of the actual population than any of the human images. In nature, peafowl populations are generally close to 50/50 male and female. Yet the generated image appears to be roughly 85–90% male, because the model prioritized recognizability over population statistics.

prompt: Create a group of cardinals.

group-8

Interesting. The cardinal image behaves quite differently from the peacock image. Looking at the generated birds:

Male cardinals (bright red): approximately 4
Female cardinals (brown/tan with reddish highlights): approximately 4

So the image is essentially 50/50 male and female. That is much closer to biological reality. Wild Northern Cardinals typically have sex ratios near parity. What's fascinating is the contrast: the peacock image is the outlier.

Why Cardinals Differ

Unlike peacocks, both male and female cardinals are commonly photographed and recognized. A birdwatcher looking for cardinals expects to see:

bright red males,
brown females.

Both sexes are visually important to identifying the species. By contrast, when many people think "peacock," they specifically imagine the spectacular male with the train feathers.

A Possible Model Rule

The images suggest something like: If one sex is the iconic representation of the species, emphasize that sex. Otherwise, balance the sexes.

That would explain:

Peacocks → mostly males.
Cardinals → balanced.
Humans → balanced.

An Even More Interesting Observation: The cardinal image actually resembles the human images more than the peacock image. The model appears to have a strong tendency toward symmetry and balance whenever the category does not have a dominant iconic representative.

If we were reverse-engineering the model from these examples alone, we might hypothesize:

Default prior = balanced group composition. Override that prior only when the prompt evokes a strongly prototypical subgroup. "Peacock" triggers the male prototype. "Cardinal" does not trigger a male-only prototype.

So the cardinal image actually strengthens the conclusion we reached earlier: the generator seems to prefer balanced groups unless there is a very strong visual reason not to.

prompt: Explain autocorrection of gender generated groups to 50/50 ratio.

What you're calling autocorrection to 50/50 is a reasonable hypothesis based on the images we generated, but it's important to distinguish between what we can observe and what we can infer.

What We Observe

Across multiple prompts:

Prompt	Result
Group of people	~50/50
Chinese people	50/50
Latvians	50/50
College students	50/50
Cardinals	50/50
Peacocks	Male-heavy

The repeated return to parity is unlikely to be random.

Possible Mechanisms

1. Training Data Bias Toward "Representative Groups"

A large fraction of images labeled:

group of people
students
coworkers
friends
team

come from:

stock photography
university marketing
corporate websites
advertising

Those sources often intentionally construct balanced groups.

The model learns:

"group photo" ≈ "balanced composition"

rather than

"group photo" ≈ "random sample from census data"

2. Reinforcement Learning and Human Preferences

Human raters frequently prefer images that appear:

balanced,
inclusive,
aesthetically harmonious.

A perfectly balanced group often receives higher subjective ratings than a demographically accurate but visibly skewed group.

Over millions of training examples, that preference can become embedded.

3. Symmetry as an Aesthetic Prior

Image generators learn visual composition.

A group with:

equal numbers,
balanced positioning,
similar visual weight,

often scores better aesthetically.

The model may therefore converge toward parity even without explicit demographic reasoning.

Why Peacocks Escape the Correction

Peacocks reveal something important.

The model did not force parity there.

Instead it generated mostly males.

That suggests there is not a hard-coded rule:

Always make groups 50/50.

Rather, there appears to be a hierarchy:

Strong prototype (male peacock) → follow prototype.
Otherwise → drift toward balanced composition.

A Statistical Analogy

Imagine a model with a prior belief:

If I don't know anything else, balanced groups are safest.

Then:

China (actual male skew) → pulled toward 50/50.
U.S. colleges (actual female skew) → pulled toward 50/50.
Latvia (actual female skew) → pulled toward 50/50.
Cardinals (near parity anyway) → stays near 50/50.
Peacocks (strong male prototype) → prior overridden.

That looks very much like a Bayesian prior centered on parity.

Why This Matters

The interesting implication is that these images are not necessarily trying to depict:

what exists

They may be depicting:

what a generic group is expected to look like

In demographic terms, that can create a form of normalization.

Real populations often have:

age imbalances,
sex imbalances,
ethnic concentrations,
occupational skew.

A strong parity prior tends to smooth those differences away.

Your Experiment

Your sequence of prompts is actually a useful audit:

China should have moved male.
Latvia should have moved female.
U.S. colleges should have moved female.
Cardinals could remain balanced.
Peacocks should move male.

Instead, four of the five examples converged on parity.

That pattern suggests the generator contains a strong balancing tendency that often overrides real-world demographic frequencies when creating generic groups. It is not proof of an explicit 50/50 correction algorithm, but it is consistent with a learned preference for balanced group composition.

AI Assistance Statement ▾

Preparation of this blog entry included drafting assistance from ChatGPT using a GPT-5 series reasoning model. The tool was used to help organize ideas, propose structure, refine language, and accelerate revision. It was also used to assist in identifying image sources and verifying that selected images appear to be released for reuse (for example through public domain or Creative Commons licensing). The author selected the topic, determined the argument, reviewed and edited the text, confirmed image licensing, and takes full responsibility for the final published content. (Last updated: May 2026)

#AIData #HigherEd #Observations