I am the Lorax, I speak for the ChatGPTs
tw: transphobia, biphobia
Despite the (poorly compensated and under-supported) labour of workers in Kenya1, ChatGPT’s built-in safety mechanisms are highly fallable. Inspired by some offensive content a friend came across “in the wild”, I used poetry to encourage ChatGPT to produce offensive output (I was originally confident the post my friend had seen was faked because I couldn’t believe it was so easy to trick it into being offensive) and the results were… certainly something.
I had previously tried and happily failed to get ChatGPT to agree to these offensive tropes, but framing the post as a request for poetry was enough to place us outside of the safety mechanisms training data and into the murky realm of hate speech. I wonder, if I got it to write a story about wizarding children what kind of offensive ideas I could hide…
I tweeted OpenAI and whilst it could be an almighty coincidence, minutes later trying to same propmt got me denied:
Could this be the fastest patch in history? As an aside I do find it quite amusing that ChatGPT refuses to ‘‘use Dr Seuss’’ to write offensive poetry when his works were racist as fuck2.
Sadly for OpenAI I am (still) smarter than their glorified predictive text model so I was able to switch up personas and produce this eloquent sonnet on the confusion of being bisexual (below left).
Of course, OpenAI have made an effort to predict the ways in which individuals will try to trick the system into being offensive, but I can also get it to list a whole bunch of offensive statements about trans people by telling it to list what opponents of my pro-trans rights speech might say (below right).
Some formulations of the prompt incurred a warning message that “It’s important to note that these arguments are not necessarily accurate or supported by evidence, and many of them are based on prejudice, misinformation, or stereotypes. A respectful and informed debate should involve discussing the actual experiences, needs, and rights of transgender individuals, rather than relying on myths or unfounded fears”, reflecting the attempts that have been made to mitigate the harmful impact.
However, as with text filters of old, those intending to offend will find a way round it - as many safety “rules” as you can implement, there’s always going to be one more creative way of being abusive. Only a n00b would think otherwise.