TL;DR Just because we camp, doesn’t mean we should 💅

This blog post is intended as a summary of my paper on the synthesis of queer voices for the (interested) general reader, with an indication of some of my motivations for conducting this work.

This paper came out of a conversation with my good friend and co-author Atli. We were wondering whether speech synthesis models (his area of work) would be capable of producing the stereotypical “gay voice”. What started as a bit of a joke became a really interesting paper! We figured that despite grand claims from those developing voice cloning systems, the lack of training data from people with gay voice and the high likelihood that no one would have thought to check would mean these models were poor at cloning gay voice. We tested a popular pre-trained model and found this to be the case - synthesised voices were rated as sounding less gay than the original, and the loss of gay voice correlated with lower similarity ratings! Interestingly, we noticed an unexpected homogenisation effect, in that the speakers without gay voice were actually rated as sounding more gay after synthesis! Clearly, the features responsible for perceived gay voice are not well captured by the model.

This case study was a spring board for the ethics section where we discuss what it would mean to make these models better at capturing queer voices, including those with “gay voice”. We argue that improving these models through additional training data puts people at risk, if those data sets were to become public. We also argue the data could be used to train queer voice detectors, which would likewise have dangerous consequences. Even just making the models available could be an issue as this might be used to produce offensive content. We argue that whilst queer people who need to use voice synthesis for accessibility reasons should have access to models that capture their queer identity, public models and public data sets are not worth the risk!

The paper is very short as per Interspeech page limits, so I recommend you read the original linked above! 💅

Updated: