Shaojin Wu,1 Fei Ding,1,* Mengqi Huang,1,2 Wei Liu,1 Qian He1
1 ByteDance Inc. 2 University of Science and Technology of China
We propose VMix, a plug-and-play aesthetics adapter, to upgrade the quality of generated images while maintaining generality across visual concepts by (1) disentangling the input text prompt into the content description and aesthetic description by the initialization of aesthetic embedding, and (2) integrating aesthetic conditions into the denoising process through value-mixed cross-attention, with the network connected by zero-initialized linear layers. VMix outperforms other state-of-the-art methods and is flexible enough to be applied to community modules (e.g., LoRA, ControlNet, and IPAdapter) for better visual performance without retraining.
Qualitative comparison between results with VMix(on the right) and without VMix(on the left)
We will open source this project as soon as possible. Thank you for your patience and support! 🌟
If VMix is helpful, please help to ⭐ the repo.
If you find this project useful for your research, please consider citing our paper:
@misc{wu2024vmix,
title={VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control},
author={Shaojin Wu and Fei Ding and Mengqi Huang and Wei Liu and Qian He},
year={2024},
eprint={2412.20800},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。