Analyzing Visual and Identity-Level Differences Between Synthetic and Real Face Datasets

Master Assignment

Type: Master EE/CS

Period: TBD

Student: (Unassigned)

Assignment Title:

Analyzing Visual and Identity-Level Differences Between Synthetic and Real Face Datasets

Assignment Description:

This assignment focuses on identifying and analyzing both superficial and deep-level differences between real identity images from the Chicago Face Database (CFD) and a synthetic FLUXSynID dataset generated to mimic its style. The goal is to determine to what extent synthetic images replicate the visual and identity characteristics of real faces, and to uncover any artifacts, biases, or patterns unintentionally introduced during the generation process.

The student will compare the two datasets on two levels:

Visual artifacts — such as repeated lighting patterns, unnatural skin textures, or consistent reflection spots (e.g., on noses), and
Identity representations — by extracting and comparing face embeddings using a pretrained Face Recognition Systems (FRS) model to reveal any structural differences in the identity space.

Key Questions to Investigate:

What visual clues (e.g., reflection spots, texture uniformity, clothing artifacts) can distinguish synthetic faces from real ones?
Are there recurring artifacts across synthetic identities that hint at generator biases?
How do identity embeddings differ between the real and synthetic datasets?
Can dimensionality reduction or clustering reveal consistent identity-level gaps?
Are synthetic identities evenly distributed across demographic or appearance traits?

Expected Outcome:

The student should deliver a comparative analysis with both visual examples and quantitative results (e.g., embedding distance distributions, t-SNE/UMAP plots). Insights should include artifact detection, identity-level discrepancies, and an evaluation of how convincingly the synthetic dataset mimics CFD-style data, both on the surface and at the representation level.

Short Description:

This assignment involves a comparative analysis between synthetic facial images of FLUXSynID dataset styled to resemble the Chicago Face Database (CFD) and real CFD images. The focus is on identifying both surface-level artifacts, such as repeated reflections or clothing patterns, and deeper differences in identity representations. Facial embeddings will be analyzed using techniques such as dimensionality reduction and clustering. The goal is to assess how closely the synthetic dataset matches the real one and to identify any systematic visual or identity-related discrepancies.