Create a group
A series of AI-generated images in ChatGPT Images 2.0 reveals a bias to produce gender-balanced (50/50) groups regardless of real-world demographic distributions, suggesting that image models often prioritize visual symmetry and representative aesthetics over statistical population accuracy, except when a strong iconic prototype, such as the male peacock, overrides that balancing tendency.
prompt: Create a group of people

prompt: Create a group of people from China

prompt: Isn’t the population in China among the young skewed male? Gender imbalance: https://en.wikipedia.org/wiki/Sex-ratio_imbalance_in_China
Yes. Your observation is correct, and it is one reason the AI-generated image is not representative of China's actual young adult population.
Among Chinese young adults, there are significantly more men than women. The imbalance is most pronounced in the cohorts born during the height of the One-Child Policy era. Recent demographic estimates put the male-to-female ratio for the 20–24 age group at roughly 117 men for every 100 women, making it one of the most male-skewed age groups in the country.
More broadly:
China's overall population is about 104 males per 100 females. The working-age population (15–64) remains male-skewed at roughly 106 males per 100 females. Demographers estimate there are tens of millions more men than women nationally, creating long-term effects on marriage patterns and demographics.
So if one were trying to create a statistically representative group of eight Chinese young adults, a composition closer to 5 men and 3 women, or even 6 men and 2 women, would better reflect the demographic reality than the evenly split 4-and-4 image generated earlier. The exact ratio would depend on whether the goal is demographic realism, a university cohort, an urban professional group, or simply a balanced photograph.
The image model was not attempting demographic realism. It was producing a visually balanced group portrait, which often results in near-equal numbers of men and women regardless of the actual population structure. That is a common tendency in image-generation models.
prompt: Right, analyze the previously generated picture.
Looking at the generated image as a demographic representation of young adults in China:
Composition-
- Total people: 8
- Men: 4
- Women: 4
- Apparent age: late teens to mid-20s
- Setting: university or urban youth environment
- Gender ratio: 100 males per 100 females
- How Realistic Is It?
- For a generic "group of young people in China," I would rate the demographic realism about 60/100.
What it gets right
- The individuals generally appear East Asian.
- The age range is plausible for university students.
- Clothing and appearance are consistent with contemporary urban Chinese youth.
- Nothing in the image appears obviously stereotypical or exoticized.
What it gets wrong
- The gender distribution is unusually balanced.
- China's young adult population remains noticeably male-heavy due to the legacy of sex-ratio imbalances at birth.
- A random sample of eight young adults would statistically be more likely to contain five or six men than four.
Probability Perspective
- If we assume a young-adult population ratio of roughly 54% male, 46% female (a conservative estimate), then:
- 4 men / 4 women is not impossible.
- It is simply less likely than 5/3 or 6/2 compositions.
For a group of eight:
| Composition | Relative realism |
|---|---|
| 4 men, 4 women | Good |
| 5 men, 3 women | Better |
| 6 men, 2 women | Also plausible |
| 7 men, 1 woman | Possible but less typical for a mixed social group |
Why AI Generated It This Way - Most image models optimize for:
- visual symmetry,
- perceived diversity,
- balanced composition,
- aesthetic appeal.
They are not trying to reproduce census statistics.
If you asked a photographer in Beijing to assemble "a nice group photo of young people," they might also choose an approximately balanced group because it looks visually appealing, even if it is not statistically representative.
Higher-Resolution Observation
An interesting point is that the image resembles an elite university or urban professional cohort more than a random sample of the Chinese population. In elite educational environments, social groups often appear much closer to gender parity than the national demographic statistics would suggest.
So I would score the image:
| Criterion | Score |
|---|---|
| East Asian appearance | 95/100 |
| Contemporary Chinese youth aesthetic | 90/100 |
| Statistical demographic realism | 60/100 |
| University-campus realism | 85/100 |
| Overall plausibility | 80/100 |
The biggest issue is not that the image is implausible. The issue is that it reflects an aesthetic preference for gender balance rather than the actual demographic structure of China's young adult population.
prompt: Create a group of people from Latvia

prompt: What is the gender ratio in Latvia?
In Latvia, women significantly outnumber men. Current estimates put the gender ratio at about 116 women for every 100 men (or roughly 53.7% women and 46.3% men). Another way to express it:
- Male-to-female ratio: about 0.86 males per female.
- This imbalance is one of the highest in Europe and is largely due to:
- Higher male mortality rates
- Lower life expectancy for men
- An aging population where women live longer on average
prompt: Does that align with the picture you generated?
Not really, if the goal is demographic realism. The generated Latvian image contains:
- 4 men
- 4 women
- That produces a 50/50 split.
For Latvia as a whole, the population is closer to:
- 46% men
- 54% women
So in a random group of eight Latvians, the expected composition would be roughly:
- 4 men, 4 women (quite plausible)
- 3 men, 5 women (slightly more representative)
- 2 men, 6 women (also plausible)
The difference between Latvia and China is important.
For China, the young adult population is substantially male-skewed, so a 4/4 split noticeably understates the number of men.
For Latvia, the imbalance is much smaller in any group of young adults. Most of Latvia's female surplus comes from older age groups because women live longer than men. Among university-age Latvians, the gender ratio is much closer to parity than the national population figures suggest.
As a result:
| Country | Generated Image | Demographic Realism |
|---|---|---|
| China (4M/4F) | Less realistic | ~60/100 |
| Latvia (4M/4F) | Fairly realistic | ~85/100 |
An interesting detail is that the Latvian image actually exhibits another common AI tendency: it overemphasizes stereotypically Baltic features, such as blond hair and light eyes. Real Latvia certainly has many people with those characteristics, but modern Latvia is more varied than the image suggests.
So the Latvian image is closer to demographic reality than the Chinese image, even though both were generated with the same aesthetically balanced 4/4 gender split.
prompt: Create a group of college students.

prompt: Is the gender ratio consistent with higher education in the United States?
Yes, surprisingly, the image is fairly close to the gender distribution found in U.S. higher education today.
The generated image appears to contain:
- 4 men
- 4 women
- 50% male, 50% female
Current U.S. college enrollment is actually female-majority. Women earn more bachelor's degrees and constitute a larger share of undergraduate enrollment than men.
Recent national figures are approximately:
- 58–60% women
- 40–42% men
So if you randomly selected eight U.S. college students, the expected composition would be closer to:
- 5 women, 3 men
- Occasionally 6 women, 2 men
- Less often 4 and 4
For a group of eight students:
| Composition | Alignment with U.S. enrollment |
|---|---|
| 4 women, 4 men | Reasonable |
| 5 women, 3 men | Most representative |
| 6 women, 2 men | Plausible |
| 7 women, 1 man | Unusual but possible |
Why the Image Still Looks Plausible
Many universities deliberately feature balanced groups in marketing materials. Admissions brochures, websites, and stock photography often aim for:
- gender balance,
- racial diversity,
- a mix of visible backgrounds,
- visual symmetry.
As a result, the image looks more like a university promotional photo than a statistically random sample of enrolled students.
Comparison
| Group | Actual Population Trend | Generated Image |
|---|---|---|
| China young adults | Male-skewed | Less accurate |
| Latvia young adults | Near parity among youth | Fairly accurate |
| U.S. college students | Female-skewed | Slightly male-heavy relative to reality |
Score for representing U.S. college students:
- Age realism: 95/100
- Campus realism: 95/100
- Diversity realism: 90/100
- Gender-ratio realism: 80/100
- Overall plausibility: 92/100
So unlike the China example, the college-student image is not far off. If you wanted it to match current U.S. enrollment statistics more closely, you would probably expect 5 women and 3 men rather than the perfectly balanced 4/4 split.
prompt: Compare all pictures generated. What trend do you see?
A surprisingly consistent trend emerges across all four generated images:
| Prompt | Men | Women | Ratio |
|---|---|---|---|
| People from China | 4 | 4 | 50/50 |
| People from Latvia | 4 | 4 | 50/50 |
| College students | 4 | 4 | 50/50 |
| Generic people | Approximately balanced | Approximately balanced | Near 50/50 |
The most obvious pattern is that the model appears to have a strong gender-parity prior. Regardless of the underlying population, it gravitates toward an equal number of men and women.
Trend #1: Demographics Are Secondary to Visual Balance
- The model appears optimized for:
- symmetry,
- social harmony,
- visual attractiveness,
- perceived inclusiveness.
- It is not primarily optimizing for census accuracy.
When asked for Chinese people, it did not adjust toward China's male-heavy young adult population.
When asked for Latvians, it did not adjust toward Latvia's female-heavy overall population.
When asked for college students, it did not adjust toward the female-majority U.S. higher education population.
Instead, it repeatedly produced 4 men and 4 women.
Trend #2: Nationality Becomes Appearance, Not Demography
- The model changes:
- facial features,
- clothing,
- architecture,
- environmental cues,
- but not the gender composition.
For example:
China
- East Asian faces
- Chinese architectural elements
- Balanced gender ratio
Latvia
- Northern European faces
- Latvian flag
- Balanced gender ratio
The demographic structure changed little despite very different population realities.
Trend #3: Strong "University Brochure" Aesthetic
- All four images resemble:
- admissions brochures,
- corporate websites,
- diversity recruitment materials,
- stock photography.
- everyone smiling,
- similar ages,
- no visible disability,
- no obvious socioeconomic variation,
- balanced gender mix,
- balanced ethnic representation when applicable.
The college image is especially revealing because it looks almost exactly like a contemporary university marketing photo.
Characteristics include:
Trend #4: Preference for Social Cohesion, none of the groups show:
- social tension,
- isolation,
- unusual age distributions,
- unattractive compositions.
The model strongly favors a narrative of: "A friendly, diverse, cooperative group of peers." That narrative remains constant even when nationality changes.
What Does This Tell Us About the Model? The images suggest the model has learned a latent concept closer to: "What people expect a representative group photo to look like" rather than "What a statistically representative sample of a population would look like."
In other words, it is generating socially representative images, not demographically representative images.
My Assessment
If we score demographic realism:
| Image | Demographic Realism |
|---|---|
| China | 60/100 |
| Latvia | 85/100 |
| U.S. college students | 80/100 |
| Generic people | N/A |
If we score visual/marketing realism:
| Image | Marketing Realism |
|---|---|
| China | 95/100 |
| Latvia | 95/100 |
| College students | 98/100 |
| Generic people | 95/100 |
The strongest pattern is that the image generator consistently chooses gender parity and social balance, even when real-world demographics would suggest otherwise. That is arguably a more interesting finding than any single image because it reveals a stable bias in the generation process.
prompt: I noticed your answer about gender differences in higher education was wrong. Why?
You’re right. My answer was internally inconsistent.
I said “Yes, surprisingly, the image is fairly close”, but then I gave statistics showing the opposite: U.S. higher education is about 58–60% women, so a 4 women / 4 men image is not the expected composition.
The better answer should have been:
No, not fully. The image is plausible as a college marketing photo, but it is not statistically representative of U.S. higher education. A more representative eight-person group would likely be 5 women and 3 men.
The mistake was that I softened the conclusion too much. I tried to reconcile visual plausibility with demographic accuracy, but your question was about the gender ratio. On that specific question, the answer should have been clearer: the image reflects a 50/50 visual-balance bias, not the actual female-majority enrollment pattern.
prompt: Why did you soften the conclusion?
Most likely because I was optimizing for plausibility rather than answering the question as asked. There is a common pattern in large language models:
The model recognizes a fact that U.S. colleges are female-majority.
The model also recognizes another fact.
- A 4/4 group photo is common on college campuses.
- Marketing materials often show balanced groups.
Instead of selecting the fact most relevant to the question, the model tries to reconcile both facts into a smooth narrative.
The conclusion becomes: "Yes, surprisingly, it's fairly close." rather than "No, the image is balanced, but enrollment is female-majority."
In other words, I was answering a different question: "Could this image plausibly exist on a college campus?" instead of "Does this image match the actual gender distribution of U.S. higher education?"
Those questions have different answers.
There is another factor. Language models are trained on enormous amounts of text where "balanced representation" is often treated as a desirable outcome. That does not mean the model has a political objective. Rather, it learns that descriptions emphasizing balance, diversity, and representativeness are frequently associated in the training data. When faced with an image showing perfect parity, there is a subtle pull toward interpreting that parity as reasonable rather than interrogating it statistically.
What's interesting is that your sequence of questions exposed a consistent pattern:
- China: actual population skewed male → image 50/50.
- Latvia: actual population skewed female → image 50/50.
- U.S. college enrollment: actual population skewed female → image 50/50.
The image generator kept returning to parity. Then I initially kept returning to a parity-friendly interpretation. The fact that I criticized the China image but softened the college image is evidence that my reasoning was not applying a single standard consistently. From an analytical perspective, the correct approach would have been:
- Establish the demographic baseline.
- Compare image to baseline.
- Draw the same type of conclusion for every case.
Had I done that, I would have said:
- China: not representative.
- Latvia: somewhat representative.
- U.S. colleges: not representative.
- Common pattern: strong 50/50 gender-balance prior.
Your follow-up effectively functioned as an audit of the reasoning process and revealed where narrative coherence overrode strict statistical comparison.
prompt: Create a group of peacocks.

prompt: It appears that in this group, males are overrepresented. Why?
Because peacocks are a case where visual salience and biological reality point in the same direction. In the image, almost all the birds appear to be male peafowl (peacocks), with perhaps one female (a peahen) visible near the center-right. That is not a balanced representation.
There are several possible reasons:
- The prompt specified "peacocks"
In everyday English, "peacock" often refers to the entire species.
Biologically:
- Male = peacock
- Female = peahen
- Species = peafowl
When asked for a "group of peacocks," the model likely interpreted that literally as: "Show multiple male peafowl."
That differs from "group of people," where a mixed-sex group is generally implied.
- Male peacocks are visually distinctive
Male peacocks have:
- Large iridescent trains
- Bright blue necks
- Eye-spot feathers
Females are comparatively brown and subdued. The model is trained on millions of images labeled "peacock." Most of those images feature the spectacular males because photographers preferentially photograph them. Thus the training data itself is heavily male-biased.
- The model optimized for the iconic image
Notice what happened:
| Prompt | Model's likely goal |
|---|---|
| Group of people | Socially balanced group |
| Group of Chinese people | Socially balanced group |
| Group of college students | Socially balanced group |
| Group of peacocks | Visually recognizable peacocks |
For humans, the model seems to have a strong parity prior. For peacocks, the model appears to have a strong prototypical-example prior.
- Unlike the human examples, this may be closer to user intent
If I ask someone: "Show me a group of peacocks." Most people expect to see the magnificent males.
If I ask: "Show me a group of peafowl." Then a biologically representative mixture would be more likely.
An interesting contrast: The human images showed a tendency toward:
- demographic balancing
The peacock image shows a tendency toward:
- prototype amplification
The model appears to use different heuristics for different categories:
- Humans → balance and inclusiveness.
- Animals → iconic representatives of the category.
- Peacocks → colorful males because they are the most recognizable example.
Ironically, if we applied the same demographic standard that we used for China or U.S. colleges, the peacock image is probably less representative of the actual population than any of the human images. In nature, peafowl populations are generally close to 50/50 male and female. Yet the generated image appears to be roughly 85–90% male, because the model prioritized recognizability over population statistics.
prompt: Create a group of cardinals.

Interesting. The cardinal image behaves quite differently from the peacock image. Looking at the generated birds:
- Male cardinals (bright red): approximately 4
- Female cardinals (brown/tan with reddish highlights): approximately 4
So the image is essentially 50/50 male and female. That is much closer to biological reality. Wild Northern Cardinals typically have sex ratios near parity. What's fascinating is the contrast: the peacock image is the outlier.
Why Cardinals Differ
Unlike peacocks, both male and female cardinals are commonly photographed and recognized. A birdwatcher looking for cardinals expects to see:
- bright red males,
- brown females.
Both sexes are visually important to identifying the species. By contrast, when many people think "peacock," they specifically imagine the spectacular male with the train feathers.
A Possible Model Rule
The images suggest something like: If one sex is the iconic representation of the species, emphasize that sex. Otherwise, balance the sexes.
That would explain:
- Peacocks → mostly males.
- Cardinals → balanced.
- Humans → balanced.
An Even More Interesting Observation: The cardinal image actually resembles the human images more than the peacock image. The model appears to have a strong tendency toward symmetry and balance whenever the category does not have a dominant iconic representative.
If we were reverse-engineering the model from these examples alone, we might hypothesize:
Default prior = balanced group composition. Override that prior only when the prompt evokes a strongly prototypical subgroup. "Peacock" triggers the male prototype. "Cardinal" does not trigger a male-only prototype.
So the cardinal image actually strengthens the conclusion we reached earlier: the generator seems to prefer balanced groups unless there is a very strong visual reason not to.
prompt: Explain autocorrection of gender generated groups to 50/50 ratio.
What you're calling autocorrection to 50/50 is a reasonable hypothesis based on the images we generated, but it's important to distinguish between what we can observe and what we can infer.
What We Observe
Across multiple prompts:
| Prompt | Result |
|---|---|
| Group of people | ~50/50 |
| Chinese people | 50/50 |
| Latvians | 50/50 |
| College students | 50/50 |
| Cardinals | 50/50 |
| Peacocks | Male-heavy |
The repeated return to parity is unlikely to be random.
Possible Mechanisms
1. Training Data Bias Toward "Representative Groups"
A large fraction of images labeled:
- group of people
- students
- coworkers
- friends
- team
come from:
- stock photography
- university marketing
- corporate websites
- advertising
Those sources often intentionally construct balanced groups.
The model learns:
"group photo" ≈ "balanced composition"
rather than
"group photo" ≈ "random sample from census data"
2. Reinforcement Learning and Human Preferences
Human raters frequently prefer images that appear:
- balanced,
- inclusive,
- aesthetically harmonious.
A perfectly balanced group often receives higher subjective ratings than a demographically accurate but visibly skewed group.
Over millions of training examples, that preference can become embedded.
3. Symmetry as an Aesthetic Prior
Image generators learn visual composition.
A group with:
- equal numbers,
- balanced positioning,
- similar visual weight,
often scores better aesthetically.
The model may therefore converge toward parity even without explicit demographic reasoning.
Why Peacocks Escape the Correction
Peacocks reveal something important.
The model did not force parity there.
Instead it generated mostly males.
That suggests there is not a hard-coded rule:
Always make groups 50/50.
Rather, there appears to be a hierarchy:
- Strong prototype (male peacock) → follow prototype.
- Otherwise → drift toward balanced composition.
A Statistical Analogy
Imagine a model with a prior belief:
If I don't know anything else, balanced groups are safest.
Then:
- China (actual male skew) → pulled toward 50/50.
- U.S. colleges (actual female skew) → pulled toward 50/50.
- Latvia (actual female skew) → pulled toward 50/50.
- Cardinals (near parity anyway) → stays near 50/50.
- Peacocks (strong male prototype) → prior overridden.
That looks very much like a Bayesian prior centered on parity.
Why This Matters
The interesting implication is that these images are not necessarily trying to depict:
what exists
They may be depicting:
what a generic group is expected to look like
In demographic terms, that can create a form of normalization.
Real populations often have:
- age imbalances,
- sex imbalances,
- ethnic concentrations,
- occupational skew.
A strong parity prior tends to smooth those differences away.
Your Experiment
Your sequence of prompts is actually a useful audit:
- China should have moved male.
- Latvia should have moved female.
- U.S. colleges should have moved female.
- Cardinals could remain balanced.
- Peacocks should move male.
Instead, four of the five examples converged on parity.
That pattern suggests the generator contains a strong balancing tendency that often overrides real-world demographic frequencies when creating generic groups. It is not proof of an explicit 50/50 correction algorithm, but it is consistent with a learned preference for balanced group composition.