CVPR 2024 Workshop: Virtual Try-On

This workshop is also streamed virtually on zoom. For access (requires registration), click here.

Overview

Virtual Try-On is an emerging consumer application that enables users to perceive products on their unique bodies in a virtual or mixed reality space. The retail e-commerce industry is beginning to heavily adopt these technologies within their offerings enabling their users to visualize products especially in the beauty, fashion, and accessories space before they make purchases, and provide opportunities to customize and personalize products. In principle, try-on experiences can have significant environmental impact by reducing the need to return products, improving satisfaction of purchased ones and improving accessibility. Enabling these applications requires solving diverse challenges in the space of computer vision, 3D modeling and reconstruction, geometry processing and learning, generative AI and perception. This is an active and multi-disciplinary area of research. The primary goal of this inaugral workshop is to bring together expert academic and industry researchers as well as young researchers working in this space to present, discuss and understand the state of the art and open challenges in this area that are core to enabling a convincing, useful and safe try-on experience.

Keynote Speakers


Ming Lin	Ira Kemelmacher-Schlizerman	Gerard Pons-Moll	Sunil Hadap
UMD & Amazon	UW & Google	University of Tübingen	Amazon

Invited Short Talks

Title	Presenter
M&M VTO: Multi-Garment Virtual Try-On and Editing	Luyang Zhu University of Washington	Abstract We present M&M VTO–a mix and match virtual try-on method that takes as input multiple garment images, text description for garment layout and an image of a person. An example input includes: an image of a shirt, an image of a pair of pants, "rolled sleeves, shirt tucked in", and an image of a person. The output is a visualization of how those garments (in the desired layout) would look like on the given person. Key contributions of our method are: 1) a single stage diffusion based model, with no super resolution cascading, that allows to mix and match multiple garments at 1024x512 resolution preserving and warping intricate garment details, 2) architecture design (VTO UNet Diffusion Transformer) to disentangle denoising from person specific features, allowing for a highly effective finetuning strategy for identity preservation (6MB model per individual vs 4GB achieved with, e.g., dreambooth finetuning); solving a common identity loss problem in current virtual try-on methods, 3) layout control for multiple garments via text inputs specifically finetuned over PaLI-3 for virtual try-on task. Experimental results indicate that M&M VTO achieves state-of-the-art performance both qualitatively and quantitatively, as well as opens up new opportunities for virtual try-on via language-guided and multi-garment try-on.
Photorealistic virtual try-on from unconstrained designs	Shuliang Ning & Xiaoguang Han The Chinese University of Hong Kong	Abstract In this talk, we'll introduce a novel approach, ucVTON, for photorealistic virtual try-on of personalized clothing on human images. Unlike previous methods limited by input types, ours allows flexible style (text or image) and texture (full garment, cropped sections, or patches) specifications. To tackle the challenge of full garment entanglement, we use a two-stage pipeline to separate style and texture. We first generate a human parsing map for desired style and then composite textures onto it based on input. Our method introduces hierarchical CLIP features and position encoding in VTON for complex, non-stationary textures, setting a new standard in fashion editing.
Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models	Pratham Mehta Georgia Institute of Technology	Abstract The growing digital landscape of fashion e-commerce calls for interactive and user-friendly interfaces for virtually trying on clothes. Traditional try-on methods grapple with challenges in adapting to diverse backgrounds, poses, and subjects. While newer methods, utilizing the recent advances of diffusion models, have achieved higher-quality image generation, the human-centered dimensions of mobile interface delivery and privacy concerns remain largely unexplored. We present Mobile Fitting Room, the first on-device diffusion-based virtual try-on system. To address multiple inter-related technical challenges such as high-quality garment placement and model compression for mobile devices, we present a novel technical pipeline and an interface design that enables privacy preservation and user customization. A usage scenario highlights how our tool can provide a seamless, interactive virtual try-on experience for customers and provide a valuable service for fashion e-commerce businesses.
Real-time Video Virtual Try-on Frameworks on mobile devices and data challenges	Ruowei (Irene) Jiang ModiFace	Abstract With recent advances in content generation and rendering tasks using generative models, real-time video virtual try-on remains challenging, especially on mobile devices and web browsers. We present our framework and a series of works that bridge the gap between state-of-the-art neural networks and real-world challenges, constrained by device and data limitations.
Makeup Prior Models for 3D Facial Makeup Estimation and Applications	Xingchao Yang CyberAgent & University of Tsukuba	Abstract We introduce two types of makeup prior models—PCA-based and StyleGAN2-based—to enhance existing 3D face prior models. These priors are pivotal in estimating 3D makeup patterns from single makeup face images. Such patterns play a significant role in a broad spectrum of makeup-related applications, substantially enriching virtual try-on technologies with more realistic and customizable experiences. Our contributions support crucial functionalities, including 3D makeup face reconstruction, user-friendly makeup editing, makeup removal, makeup transfer, and interpolation.
What limits the performance of makeup transfer?	Dr. Zhaoyang Sun Wuhan University of Technology	Abstract Makeup transfer aims to realistically and naturally reproduce diverse makeup styles onto a given face image. Due to the inherent unsupervised nature of makeup transfer, most previous approaches adopt the pseudo-ground-truth-guided strategy for model training. In this talk, we first reveal that the quality of the pseudo ground truth is the key factor limiting the performance of makeup transfer. Next, we propose a Content-Style Decoupled Makeup Transfer (CSD-MT) method, which works in a purely unsupervised manner and thus eliminates the negative effects of generating PGTs. Finally, extensive quantitative and qualitative analyses show the effectiveness of our CSD-MT method.
Generating Animatable Layered Assets from a Single Scan	Taeksoo Kim & Byungjun Kim Seoul National University	Abstract We present a framework that takes as input a single-layer clothed 3D human mesh and decomposes it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create novel clothed human avatars with any pose. We first separate the input mesh using the 3D surface segmentation extracted from multi-view 2D segmentations. Then we synthesize the missing geometry of different layers in both posed and canonical spaces using a novel pose-guided Score Distillation Sampling (SDS) loss. Once we complete inpainting high-fidelity 3D geometry, we also apply the same SDS loss to its texture to obtain the complete appearance including the initially occluded regions. Through a series of decomposition steps, we obtain multiple layers of 3D assets in a shared canonical space normalized in terms of poses and human shapes, hence supporting effortless composition to novel identities and reanimation with novel poses.
StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On	Jeongho Kim Korea Advanced Institute of Science & Technology	Abstract Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image. In this presentation, we introduce StableVITON, learning the semantic correspondence between the clothing and the human body within the latent space of the pre-trained diffusion model in an end-to-end manner. Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent knowledge of the pre-trained model in the warping process. Through our proposed novel attention total variation loss and applying augmentation, we achieve the sharp attention map, resulting in a more precise representation of clothing details. StableVITON shows state-of-the-art performance over existing virtual try-on models in both qualitative and quantitative results. Moreover, through the evaluation of a trained model on multiple datasets, StableVITON demonstrates its promising quality in a real-world setting.
Integrating Learning-based Virtual Try-On in Fashion: Challenges and Advances	Lena Hong & Chaerin Kong NXN Labs	Abstract With the rising success of image foundation models in recent years, we are also witnessing exciting advances in the world of VTO. However, there remains a large gap between theory and practice. In our conversations with customers, we were able to uncover a number of challenges that are under-explored in academia, namely hyper-fidelity, visual aesthetics and style diversity. We will share our early efforts in refining the technology into a commercial-grade product and discuss the shortcomings of current evaluation benchmarks in accurately representing industrial needs.
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All	Mehmet Saygin Seyfioglu University of Washington	Abstract As online shopping is growing, the ability for buyers to virtually visualize products in their settings—a phenomenon we define as "Virtual Try-All"—has become crucial. Recent diffusion models inherently contain a world model, rendering them suitable for this task within an inpainting context. However, traditional image-conditioned diffusion models often fail to capture the fine-grained details of products. In contrast, personalization-driven models such as DreamPaint are good at preserving the item's details but they are not optimized for real-time applications. We present "Diffuse to Choose," a novel diffusion-based image-conditioned inpainting model that efficiently balances fast inference with the retention of high-fidelity details in a given reference item while ensuring accurate semantic manipulations in the given scene content. Our approach is based on incorporating fine-grained features from the reference image directly into the latent feature maps of the main diffusion model, alongside with a perceptual loss to further preserve the reference item's details. We conduct extensive testing on both in-house and publicly available datasets, and show that Diffuse to Choose is superior to existing zero-shot diffusion inpainting methods as well as few-shot diffusion personalization algorithms like DreamPaint.

Schedule

Time	Session	Presenter
1:30 - 1:40	Opening Remarks	Organizers
1:40 - 2:20	Keynote What Do Foundation Models Know About 3D Humans In Clothing ?	Gerard Pons-Moll
2:20 - 3:00	Keynote Science of Scalable VTO	Sunil Hadap
3:00 - 3:30	Break
3:30 - 4:00	Short Talks - Session 1 What limits the performance of makeup transfer? M&M VTO: Multi-Garment Virtual Try-On and Editing Photorealistic virtual try-on from unconstrained designs Generating Animatable Layered Assets from a Single Scan Real-time Video Virtual Try-on Frameworks on mobile devices and data challenges	Zhaoyang Sun Luyang Zhu Shuliang Ning & Xiaoguang Han Taeksoo Kim & Byungjun Kim Ruowei (Irene) Jiang
4:00 - 4:40	Keynote Physics-Inspired Fit-Aware Virtual Try-On	Ming Lin
4:40 - 5:10	Short Talks - Session 2 Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All Integrating Learning-based Virtual Try-On in Fashion: Challenges and Advances StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On Makeup Prior Models for 3D Facial Makeup Estimation and Applications Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models	Mehmet Saygin Seyfioglu Lena Hong & Chaerin Kong Jeongho Kim Xingchao Yang Pratham Mehta
5:10 - 5:50	Keynote AI Fashion: The Power of Inspiration	Ira Kemelmacher-Schlizerman
5:50 - 6:00	Closing Remarks	Organizers

Organizers


Vidya Narayanan	Sunil Hadap	Javier Romero
Amazon	Amazon	FAIR

Katie Lewis	Hanbyul Joo	Alla Sheffer	Hao (Richard) Zhang
Runway	SNU	UBC/Amazon	SFU/Amazon

Contact Info

E-mail: vtocvpr24 AT gmail.com

Header image credits: Sunil Hadap, Medium: ChatGPT 4/Dall-E Prompt: "Photorealistic wide aspect image of a lady in simple clothes shopping using virtual try-on technology for a fancy outfit." seed: 13

Website based on https://futurecv.github.io/.

The first CVPR 2024 Workshop on
Virtual Try-On

17th June, 2024, 1:30 PM to 6 PM @ CVPR 2024 in Seattle, CA, USA

Venue: Seattle Convention Center, Room Arch 309 | Virtual: link

This workshop is also streamed virtually on zoom. For access (requires registration), click here.

Overview

Keynote Speakers

Ming Lin

Ira Kemelmacher-Schlizerman

Gerard Pons-Moll

Sunil Hadap

Invited Short Talks

Schedule

Organizers

Vidya Narayanan

Sunil Hadap

Javier Romero

Katie Lewis

Hanbyul Joo

Alla Sheffer

Hao (Richard) Zhang

Contact Info

The first CVPR 2024 Workshop onVirtual Try-On

17th June, 2024, 1:30 PM to 6 PM @ CVPR 2024 in Seattle, CA, USA Venue: Seattle Convention Center, Room Arch 309 | Virtual: link

This workshop is also streamed virtually on zoom. For access (requires registration), click here.

Overview

Keynote Speakers

Ming Lin

Ira Kemelmacher-Schlizerman

Gerard Pons-Moll

Sunil Hadap

Invited Short Talks

Schedule

Organizers

Vidya Narayanan

Sunil Hadap

Javier Romero

Katie Lewis

Hanbyul Joo

Alla Sheffer

Hao (Richard) Zhang

Contact Info

The first CVPR 2024 Workshop on
Virtual Try-On

17th June, 2024, 1:30 PM to 6 PM @ CVPR 2024 in Seattle, CA, USA

Venue: Seattle Convention Center, Room Arch 309 | Virtual: link