Automated Conversion of Music Videos into Lyric Videos

Jiaju Ma, Anyi Rao, Li-Yi Wei, Rubaiat Habib Kazi, Hijung Valentina Shin, and Maneesh Agrawala

The ACM Symposium on User Interface Software and Technology (UIST 2023)


Musicians and fans often produce lyric videos, a form of music videos that showcase the song's lyrics, for their favorite songs. However, making such videos can be challenging and time-consuming as the lyrics need to be added in synchrony and visual harmony with the video. Informed by prior work and close examination of existing lyric videos, we propose a set of design guidelines to help creators make such videos. Our guidelines ensure the readability of the lyric text while maintaining a unified focus of attention. We instantiate these guidelines in a fully automated pipeline that converts an input music video into a lyric video. We demonstrate the robustness of our pipeline by generating lyric videos from a diverse range of input sources. A user study shows that lyric videos generated by our pipeline are effective in maintaining text readability and unifying the focus of attention.


We propose a set of design guidelines for adding lyrics to music videos in a manner that ensures text readability and unifies the viewe's focus of attention. We further implement a fully automated pipeline that instantiates these guidelines to convert an input music video into a lyric video. The results shown above demonstrate that our pipeline is able to generate lyric videos from a wide variety of inputs. (a, b, d, e) are in landscape format while (c, f) are in portrait format. Video source: (a) Selfish Soul by Sudan Archives, (b) August by Taylor Swift, (c) The Scientist by Coldplay, (d) Let It Go by Idina Menzel, (e) iPad by Chainsmokers, (f) Sandman by Ed Sheeran.

See these results as videos here.


@inproceedings{10.1145/3586183.3606757, author = {Ma, Jiaju and Rao, Anyi and Wei, Li-Yi and Kazi, Rubaiat Habib and Shin, Hijung Valentina and Agrawala, Maneesh}, title = {Automated Conversion of Music Videos into Lyric Videos}, year = {2023}, isbn = {9798400701320}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {}, doi = {10.1145/3586183.3606757}, booktitle = {Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology}, articleno = {13}, numpages = {11}, keywords = {lyrics, video generation, Design guidelines}, location = {, San Francisco, CA, USA, }, series = {UIST '23} }