From Taskmaster to Mimic: Assessing AI’s Transition from Original Intent to Destabilizing Identity with Voice and Image Synthesis

From Taskmaster to Mimic: Assessing AI's Transition from Original Intent to Destabilizing Identity with Voice and Image Synthesis

Joanna Plaskonka and Lorina Pardi

Abstract

In the age of advanced artificial intelligence (AI), the ability to convincingly replicate voices and images has far-reaching implications. The surge in ‘deepfakes’—synthetic media seamlessly blending fabricated videos and speeches—on social media platforms highlights the ubiquity of this technology. As AI tools become increasingly accessible to the general public, the absence of robust legal frameworks has created a regulatory gap. This article explores the technical nuances of artificial neural networks and scrutinizes their legal implications, emphasizing the urgent need for comprehensive laws to navigate the challenges posed by AI-driven identity destabilization.

Introduction

The saying: “Believe nothing you hear, and only half of what you see” is commonly attributed to Edgar Allen Poe which underlines the author’s scepticism. However, it is a sentiment that should also be aligned to the current climate of artificial intelligence. With the ability to recreate not only the voice but also the image of your loved ones, people in power, and just about anyone and anything – the opportunities are endless. The repercussions of this power and ability are plentiful and overarching several fields of not only law, but also ethics. Most people have already come across some form of a ‘deepfake’, also referred to as synthetic media, videos, or even speech. Recently, social media platforms have had an influx of videos of certain artists covering popular songs, or even animated characters performing known pop songs. We watch these without the conscious thought of how they are developed, and how that effort can be transferred into something less innocent. Currently, we live in a time where through artificial intelligence, our voices can be used for a variety of purposes. The software and technologies of such phenomenon are no longer limited to professionals. They are readily available, often times even free of charge, to the general public. Albeit this isn’t a new technology per se, its domestication and rapid progression occurred in such manner, where our legal systems which are notorious for being slow in general, have failed to keep up. As such, in the absence of laws and regulations to control robotics in general, as well as subtopics within the field, mindful use of the internet and internet literacy to protect ourselves, no longer cuts it. As such, this article will explore artificial neural networks from a technical point of view and reflect on it from a legal perspective highlighting the effects of the destabilization of our identity as we know it.

The origins of artificial neural networks (JP)

The integration of artificial neural networks into everyday life has become a reality for many people with the advent of solutions like ChatGPT,[1] Copilot,[2] Midjourney,[3] and others. It appears that the term defining the year 2023 globally could be “AI.” Is the concept of artificial neural networks something novel for humanity, or has it been a longstanding idea?

It appears that the idea originated many years ago. A 1943 paper by neurophysiologist Warren McCulloch and mathematician Walter Pitts is considered the key publication on the subject.[4] Their work titled “A Logical Calculus of the Ideas Immanent in Nervous Activity” laid the foundation for the formalization of artificial neural networks. McCulloch and Pitts’ work can be regarded as the first computational theory of mind and brain. Their research aimed to explore the idea that the brain’s functioning could be explained in terms of simple, formalized elements. They sought to construct a mathematical model of the neural networks within the brain, driven by the conviction that the computational processes of the brain could be encapsulated by a set of logical rules.

At the time of their publication, modern computers didn’t exist yet. Consequently, the early development of artificial neural networks was constrained in terms of practical implementations due to the limitations of computing technology during that era. The field of neural networks has witnessed fluctuations in interest, alternating between periods of significant advancement and relative dormancy.[5] Notably, substantial progress emerged in the 1980s and continued thereafter,[6] driven by the escalation of computational power and the development of novel training algorithms.[7]

An overview of chosen current practical implementations of artificial intelligence (JP)

Over the years, AI technology has undergone significant advancements, enabling a diverse range of applications and solutions across various domains. Equally noteworthy is the expanding accessibility of tools and solutions built on the latest technological achievements, reaching an increasingly wider audience.

In the realm of Natural Language Processing (NLP), AI showcases its prowess in comprehending and processing human language.[8] This includes tasks such as text summarization, sentiment analysis, language translation, and the creation of interactive chatbots and virtual assistants. Furthermore, advancements in speech recognition enable seamless interaction with technology through speech-to-text conversion and the creation of voice assistants, facilitating hands-free engagement.

Moving to Computer Vision, AI leverages visual data to excel in tasks like image and video recognition.[9] Additionally, it plays a pivotal role in object detection, tracking, facial recognition, and contributes to the development of autonomous vehicles. AI’s integration leads to the development of autonomous robots and robotic process automation (RPA), streamlining tasks in industries such as manufacturing and logistics.

Generative Models showcase AI’s ability to create realistic images and generate coherent text, pushing the boundaries of creativity in fields such as art and literature.[10] Furthermore, AI’s impact on creativity extends to artistic image generation and music composition, showcasing the potential for collaboration between human creativity and machine intelligence.

The evolution of technology has enabled a multitude of possibilities and now facilitates the creation of incredibly realistic images and videos, prompting one to ponder: are they real? Amidst the opportunities brought by technological advancements, there are concurrent threats to individuals, including cybersecurity challenges, the proliferation of fake news, and an increase in common digital frauds. As the digital landscape expands, the legal implications of emerging technologies like voice and image synthesisation become increasingly critical to address, raising questions about authenticity, privacy, and the need for robust legal frameworks to navigate the evolving digital terrain.

From a legal point of view: lack of regulation (LP)

In the context of artificial speech synthesis, facial re-enactment, and 3d printing – our ability to have full control over our identity, has been destabilized notwithstanding the notion of there being no concrete instrument at our disposition to regain control, or moderate and regulate the usage of our own image and voice. There no longer exist barriers to the technology that was ones limited to computer-generated image specialists.[11]As such, many of us have come across AI generated videos of certain artists covering other artists works on various social media platforms. Whilst these versions are known recreations, and perhaps most hobbyist or amateur individuals using these technologies for recreational purposes have yet to achieve a completely fool proof versions – technology has provided us with very convincing copies certain aspects or characteristics of an individual that have fooled many. An example of this would be the CEO in the UK who carried out a conversation with an individual who appeared to be his boss but was later found out to be a synthetic replica of his voice costing the CEO 220,000 euros by convincing them to transfer said amount to the fraudsters.[12]

As a matter of fact, there are now communities of people with the ambition to becoming proficient in this technology share tips and tricks on forums where anyone and everyone can practice using the technology. Albeit the notion that robots being made in the image of someone requires a certain level of expertise and skill, the power that rests in producing deepfakes are readily available for the general public. This is much thanks to the open-source face swapping program that were made available in 2017.[13]Furthermore, there are apps such as Zao (currently only available in China),[14] and companies like Deepfakes web β who advertise deepfake services at a cost.[15]

To adopt a legal response to the budding issue would come at the cost of complex implications. It is possible to state that deepfakes would ultimately prove to be immune to legal action. This is because the available legal remedies are cursory, once a deepfake video is on the internet a subsequent legal action would prove useless against the views, shares and downloads as well as screen captures that may have already been taken. Additionally, there is a lack of harmonization on this topic on an international level, therefore dealing with an instance where a deepfake is created in one country and then uploaded in another has cross country elements for which there are no legislation in place to govern. As such, our current best bet is internet literacy and public awareness. However, Kelsey Farish suggests that “ex ante technological mechanisms could prevent the use or dissemination of deepfakes”.[16]

Approaching voice and image synthesis (LP)

When approaching the concept of voice and image synthesis from a legal point of view, it becomes apparent that the area of law currently most affected by these phenomena is intellectual property law. As previously alluded to, the popularity of your favourite artists or tv characters singing famous songs is a ‘trend’ that currently exists on social media. Naturally, when it comes to artists’ voices being used, they are not completely defenceless – can the same be said about us regular folk?

First of all, artists in the US can sue on the basis of ‘right to publicity’,[17] outside of the US, those same celebrity artists can rely on unfair competition laws. Furthermore, in the cases of Midler v Ford Motor Co., Apple Corp v Leber et al., and Estate of Presley v Russen, the courts found that the freedom of expression (First Amendment to the Constitution) did not protect ‘imitative entertainment without creative components.’.[18] The court also established that albeit an individual’s voice is not copyrightable, is as distinctive and unique as their face. This statement suggests that the court has ruled or established a certain legal principle. However, these sources are outdated and not much development has occurred since the ruling in these cases. Furthermore, it only discusses the matter from the point of view of celebrities.

In the absence of international harmonization and binding instruments, the legal landscape surrounding voice and image synthesis is left to evolve on a case-by-case basis. This organic development risks creating a lack of standardized principles, emphasizing the need for contemporary legal frameworks to keep pace with the rapid advancements in technology and their legal implications.

Conclusions

In the era of advanced AI, the maxim “Believe nothing you hear, and only half of what you see” takes on a new significance, echoing the caution demanded by AI’s transformative capabilities. The rise of ‘deepfakes,’ synthetic media seamlessly blending fabricated videos and speeches, underscores the ubiquity of this technology and the challenges it poses to our understanding of identity. As AI tools become increasingly accessible, a regulatory gap has emerged, raising urgent concerns about the need for comprehensive laws to address the risks associated with AI-driven identity destabilization.

This article has delved into the technical intricacies of artificial neural networks, tracing their origins to the ground-breaking work of Warren McCulloch and Walter Pitts in 1943. The evolution of these networks has been propelled by advancements in computational power and training algorithms, leading to their integration into everyday life, as evidenced by solutions like ChatGPT, Copilot, and Midjourney.

Examining current practical implementations of AI, the article has showcased the vast scope of its applications, from Natural Language Processing (NLP) and Computer Vision to Generative Models. While these advancements bring unprecedented possibilities, they also raise questions about the authenticity of images and videos, giving rise to concerns about cybersecurity, fake news, and digital fraud.

From a legal perspective, the article has highlighted the lack of regulation in the realm of artificial speech synthesis, facial re-enactment, and 3D printing, emphasizing the destabilization of personal identity. The accessibility of AI tools to the general public, exemplified by communities sharing tips on deepfake creation, poses challenges to legal responses. Existing legal remedies appear cursory, and the absence of international harmonization further complicates matters, leaving internet literacy and public awareness as current safeguards.

The synthesis of voice and image from a legal standpoint reveals a landscape dominated by challenges, particularly in intellectual property law. While artists in the U.S. can leverage the ‘right to publicity,’ the absence of a standardized approach on an international level leaves room for case-by-case decisions, resulting in a lack of established standards for protecting individuals against AI-driven identity manipulation.

In conclusion, as we navigate the uncharted waters of AI-driven identity destabilization, the urgency for comprehensive legal frameworks is evident. The article underscores the need for international collaboration, innovative legal responses, and heightened public awareness to mitigate the risks posed by deepfakes and other AI-driven threats to personal identity in this rapidly evolving technological landscape.

References

[1] ChatGPT OpenAI, accessed: 5.12.2023.

[2] Microsoft Copilot accessed: 5.12.2023.

[3] Midjourney, accessed: 5.12.2023.

[4] W. S. McCulloch and W. Pitts, ‘A logical calculus of the ideas immanent in nervous activity’ (1943) 5 Bulletin of Mathematical Biophysics 115-133

[5] D. O. Hebb, The Organization of Behavior (New York: Wiley, 1949), Widrow and M. E. Hoff, ‘Adaptive switching circuits’ (1960) 4 IRE WESCON Convention Record 96-104, Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (Washington, DC: Spartan Books, 1962), A. Anderson, ‘A simple neural network generating an interactive memory’ (1972) 14 Mathematical Biosciences 197-220

[6] J. J. Hopfield, ‘Neural networks and physical systems with emergent collective computational abilities’ (1982) 79 Proceedings of the National Academy of Sciences 2554-2558, E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘Learning representations by back-propagating errors’ (1986) 323 Nature 533-536

[7] Y. Le Cun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, ‘Handwritten digit recognition with a back-propagation network’ in Advances in Neural Information Processing Systems (1990) 2, 396-404, Schmidhuber, ‘A fixed size storage O(n^3) time complexity learning algorithm for fully recurrent continually running networks’ (1992) 4(2) Neural Computation 243-248, Krizhevsky, I. Sutskever, and G. E. Hinton, ‘ImageNet classification with deep convolutional neural networks’ in Advances in Neural Information Processing Systems (2012) 25, 1097-1105, He, X. Zhang, S. Ren, and J. Sun, ‘Deep Residual Learning for Image Recognition’ (2016) 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770-778, doi: 10.1109/CVPR.2016.90, Vaswani et al., ‘Attention is all you need’ in Advances in Neural Information Processing Systems 30 (2017) 5998-6008

[8] D. Khurana et al., ‘Natural language processing: state of the art, current trends and challenges’ (2023) 82 Multimedia Tools and Applications 3713-3744, https://doi.org/10.1007/s11042-022-13428-4

[9] J. Chai, H. Zeng, A. Li, and E. W. T. Ngai, ‘Deep learning in computer vision: A critical review of emerging techniques and application scenarios’ (2021) 6 Machine Learning with Applications 100134, https://doi.org/10.1016/j.mlwa.2021.100134

[10] A. I. Miller, The Artist in the Machine: The World of AI-Powered Creativity (MIT Press, 2019)

[11] Susie Dunn, Identity Manipulation: Responding to Advances in Artificial Intelligence and Robotics (Draft version: 16 July 2020) (University of Ottawa 2020) 1

[12] Catherine Stupp, “Fraudsters Used AI to Mimic CEO’s Voice in Unusual Cybercrime Case” The Wall Street Journal (25 February 2020) https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402 (accessed: November 25 2023)

[13] Robert Volkert & Henry Ajder, “Analyzing the Commoditization of Deepfakes” NYU Journal of Legislation and Public Policy (27 February 2020)

[14] Zak Doffman, “Chinese Deepfake App ZAO Goes Viral, Privacy Of Millions ‘At Risk'” (2 September 2019) Forbes; https://apps.apple.com/cn/app/id1465199127 (accessed: December 1 2023)

[15] https://deepfakesweb.com/ (accessed: December 3 2023).

[16] Kelsey Farish, “Do deepfakes pose a golden opportunity? Considering whether English law should adopt California’s publicity right in the age of the deepfake” (2020) 15-1 Journal of Intellectual Property Law & Practice 48

[17] Haelan Laboratories, Inc. v Topps Chewing Gum, Inc. [1931] 18 F (2d) 997 (2d Cir

[18] Midler v Ford Motor Co [1995] 849 F Supp 460 (CD Cal), Apple Corp v Leber et al [1972] 340 F Supp 1002 (SDNY), Estate of Presley v Russen [1980] 513 F Supp 1339 (D Kan).