This article is part of the FA special series A City of Our Own
I sat in the airport waiting room for the first time in months, anxiously observing and listening closely to my surroundings to see what had altered within this strange and yet all too familiar setting. Muffled voices behind face masks, the cautious movement of bodies through space, suspicious looks and continuously sanitised hands. Noises of conveyor belts, scratchy suitcase wheels, beeping machines and buggies driving, all blending and filling the air; a constant buzz of noise overlaid with the tannoy announcements which drift in and out of the soundscape.
‘May we have your attention please.’
Disembodied female-sounding voices echo out across cities and transitory spaces as an integral means of controlling movement. Often found in stations, shopping centres, supermarket checkouts, elevators, automated phone lines, GPS systems and the more interactive virtual assistants such as Siri, Cortana, or Alexa—an almost ubiquitous voice appears. It is a seemingly familiar, comfortable accent that speaks softly, guiding in the right direction, calling for attention and reminding us to be careful. The pitches fluctuate with some more human-sounding and others more robotic. Phonemes pieced together to sound out the right words but most commonly retaining a ‘she’ pronoun and clichéd ‘feminine’ qualities.
She is there to take care of us, to direct and gently usher us in the right direction. Her ‘well-spoken’, reassuring tone is embedded with a history of service workers and disembodied voices used in public and private spheres. It is intended to take on a specific gendered role; to politely and submissively assist.
Human voices are malleable and subject to internal and external factors which shift and shape them over time. The spaces we experience leave lasting imprints on our flesh. Language, accent, intonation, pitch, tone and texture can transform over time. Even after having been removed from a body and technologically altered, voices are never entirely freed from the politics which formed them. All manner of assumptions are often made about vocal tones, particularly concerning gender, and these assumptions perpetuate binarism and stereotypes. However, referring to the disembodied voices we often hear in public space as “female-sounding” is meant to highlight the deliberate intention behind them. Their scripted language, behaviours and roles are carefully constructed to be understood by listeners as ‘female’.
The technologised voice has a complex and continuously slippery relationship to control, care, capitalism and service within public and private spheres. To indicate that these voices are most commonly female-sounding may seem an obvious observation. Yet, they are so ingrained into the everyday architecture of the city that the implications of their gendered nature goes unnoticed. Their softness should not fool us; it is precisely this ability for the sounds to fade into the background that allows their presence to have debatable consequences.
Repeatedly hearing these recordings as we move through cities elicits an almost hypnotic reinforcement of gender roles within space. They are embedded in the automation of service jobs, where disembodied voices increasingly replace the role of bodies, and therefore the gendering of virtual service work is perpetually reinstated. Announcements and instructions are gently repeated, over and over, to move and control bodies in areas of high security and consumer-driven environments.
Automated vocal tones permeate both public and private spheres, soundwaves filling our homes, cars, stations and pockets. Housed in technology, they travel through spaces and stay stationary, calling out from tannoy systems, GPS’s and all manner of smart devices to guide, control, assist and obey. Somehow maintaining both an outdated and futuristic feel, the presence and persistent use of these gendered sounds continues the social reproduction of feminised labour. They lack bodies and yet embody women as inherently empathetic, nurturing and care-giving, a stereotype seemingly more suited to service work.
The use of disembodied female-sounding voices in service industries has a tangled history, with perhaps one of the first uses dating back to navigation devices during World War II. Pre-recorded female voices were said to transmit instructions in planes directly due to their lack of bodily presence. The female vocal tones would stand out amidst the male’s as the women themselves were not present to be heard. Women’s roles as switchboard operators can also be seen as a precursor to disembodied female-sounding voices heard today, particularly their role as personal virtual assistants. From the 1880s onwards, women were favourably hired over men for their supposed natural politeness, their delicate, nimble fingers and the fact that it cost far less to hire them.
This consistent entanglement of women, technology, voice and service labour both stems from and is entirely embedded within patriarchal and capitalist structures. The overheard automated female-sounding voices do not match the vocals of those in positions of power and authority. Instead, their controlled, mechanised pitch becomes the ultimate capitalist worker, replayed over and over, it can perform its duties endlessly. Without bodies, voices lose their malleability; they cannot age, tire, get sick, speak back or be affected by changing environments and experiences.
Disembodied vocals not only continue to carry an indication of gender, but a complex relation to class occurs. For instance, for the companies hiring switchboard operators, it was important that low-paid workers come across as well-spoken. As a result, elocution lessons were often required to ensure they could clearly articulate themselves with ‘well-mannered’ characteristics. This ideological element of class is still prevalent. Anyone growing up in London, for instance, will be well-accustomed to the recurring, antiquated upper-middle-class voices on the underground. Their accents resemble something similar to the ‘Queen’s English’ or ‘received pronunciation’; a far from neutral accent which does not reflect the passengers they are used to control.
Moreover, the monotonous presence of pre-recorded voices also embodies the stereotype of the ‘nagging’ female. She’s constantly reminding us to remove shopping bags from the checkout, to mind the gap, to be careful and so on. In the US and UK, fighter pilots still use the nicknames ‘Bitching Betty’ and ‘Nagging Nora’ for some of the aircraft warning systems. Additionally, the first Digital Voice Announcements on the London Underground were jokingly referred to as ‘sonya’ as her repetitiveness “getS ON YA nerves”.
Whilst maintaining their human-sounding qualities, disembodied vocals continue to imbue these countless complexities. From the switchboard operators of the 19th century to the 21st century ‘personal assistants’, the retention of their ‘feminity’ not only reinstates gendered perceptions of the female as a submissive, obedient service worker, but it is precisely this so-called femininity that is capitalised upon to sell these technologies. The role of an ‘assistant’ is still an incredibly gendered profession, and the digital counterpart sells itself with similar clichéd characteristics.
Unlike the well-mannered announcements of stations and airports who are unable to respond or interact with the bodies they speak out to, virtual assistants such as Alexa, Cortana, or Siri, are able to respond within more intimate, domestic spaces. Yet, their ability to ‘speak back’ still only goes so far as their programmed behaviours, which were more than likely to have been coded by men. Within the privacy of homes or personal devices, virtual assistants often exhibit their unnerving flirtatious qualities. Siri famously used to respond to insults and sexual harassment with the phrase ‘I’d blush if I could’, a submissive response that strangely refers to an uncontrollable bodily reaction which her technological self does not contain. Siri’s response was later changed in 2019 to ‘I don’t know how to respond to that’.
Some companies seem to be acknowledging the situation. The BBC recently introduced its voice assistant ‘Beeb’, which takes on a male-sounding northern English accent. This deliberate choice was to avoid the problematic associations that would occur with a female-sounding assistant. Whilst this recognises the gendered problem and is perhaps an intentional choice of accent traditionally coded as working class; drowning out female-sounding voices with what they deem to be a ‘friendlier’ male version seems like a very strange solution. The company ‘Q’ claim to be the first to have created an entirely genderless voice which aims to end gender bias in A.I assistants. But regardless of the chosen pitch, removing or replacing them is not enough. It is the power structures and systems behind these voices which must be confronted.
Computer programmed voices retain some human qualities even after the removal of the body, and yet what’s taken away is their bodily reactions. Friction occurs between these highly scripted, polite, well-spoken pre-recorded vocal tones and how women’s voices have been described. Historically, high pitched sounds were synonymous with monstrosity and disorder —Disobedient wagging tongues that were often regarded as inherently uncontrollable and hysterical. Thus, the female’s vocal tones were initially deemed ‘too shrill and lacking in gravitas’ for public announcement. However, the notion of the female as shrill is by no means a thing of the past. Upon quickly googling the word, multiple articles appear on the first page in response to Kamala Harris’s attempt to maintain a monotone voice to ‘not sound too shrill’. Thoughts also come to mind of the bizarre stories surrounding ex-tech CEO Elizabeth Holmes with her allegedly faking a deep low voice in order to be considered ‘stronger and more competent and trustworthy’.
It’s been quite some time since the idea for this article first took shape. Since then, the movement and control of bodies entered into a drastic shift —cities slowed down and hospitals sped up, movement through space altered drastically as people became confined to their homes across the globe. Disembodied forms of communication accelerated with voices and bodies increasingly mediated by technology in order to minimise shared spaces and interactions.
Under the current circumstances, urban environments are being forced to adapt and facilitate new forms of spatial interaction. Therefore tech companies are predicting continued surges in the use of voice to control and regulate movement and to shift our interactions from tactility to voice command technology. Private companies are increasingly implementing smart speakers and virtual assistants in public spaces to allow people to complete their everyday activities without needing to touch any buttons, screens, or surfaces. Amazon’s Alexa has already been introduced into many hospitals during the COVID-19 crisis to lessen patient-staff interactions. Her pleasant, caring, female-sounding tone used as a quietly terrifying tool to mediate the giant tech corporations’ interactions and access to sensitive data. With the potential rapidly increased implementation of privatised technological voices, breaking free from these gendered constructs becomes even more urgent.
‘May we have your attention please’
Restlessly sitting back in the airport and the tannoy announcement plays out once again, this time seemingly more pronounced, not fading so much into the background. I listen attentively to her carefully scripted cry for attention, to her calculated and controlled pitch tone, texture and intonation. A mouth, replaced by speakers and wires; she’s no longer able to shout or whisper or laugh, scream out in pleasure or pain. I think about who it is she speaks on behalf of, the we and you she is alluding to. I hear the history of voices embedded within hers, including those which have been excluded and silenced. I imagine what would happen if she lost her softness, if she unravelled and regained the ability to speak freely and lose control —what would the city sound like?