CosyVoice v1, v2 논문 리뷰
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer
based on Supervised Semantic Tokens
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
Supertone 논문을 읽어보았다.
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Mean Flows for One-step Generative Modeling
논문을 읽고