InfoQ Homepage Presentations Malignant Intelligence?

Malignant Intelligence?

View Presentation

Speed:

53:25

Summary

Alasdair Allen discusses the potentially ethical dilemmas, new security concerns, and open questions about the future of software development in the era of machine learning.

Bio

Alasdair Allen works as the Head of Documentation at Raspberry Pi where he leads a team that is responsible for documents that range from beginner-friendly tutorials to register-level documentation of new silicon.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Allan: This wasn't an easy talk to write, because we've now reached a tipping point when it comes to generative AI. The things are moving really fast right now. This isn't the talk I would have given even a week ago, because, of course, the arrival of AI proved powered chatbots will only make us more efficient at our jobs, at least according to all of the companies selling AI powered chatbots. While this chat talk, at least, isn't and hasn't been written by ChatGPT, I did mostly use Midjourney and Stable Diffusion to generate the slides.

Looking back, I think the first sign of the coming avalanche was the release of the Stable Diffusion model around the middle of last year. It was similar to several things that came before, but with one crucial difference, they released the whole thing. You could download and use the model on your own computer. This time last year, only a few months before that, if I'd seen someone using something like Stable Diffusion, which can take a rough image, pretty much 4-year-old quality, and some text prompt and generate artwork, and use that as a Photoshop plugin on a TV show. I'd have grumbled about how that was a step too far. A classic case of, enhance that, the annoying siren call of every TV detective when confronted with non-existent details in CCTV footage. Instead of now being fiction, it's real. Machine learning generative models are here, and the rate at which they're improving is unreal. It's worth paying attention to what they can do and how they are changing. Because if you haven't been paying attention, it might come as a surprise that large language models like GPT-3, and the newer GPT-4 that power tools like ChatGPT and Microsoft's new Bing, along with LaMDA that powers Google's Bard AI, are a lot larger and far more expensive to build than the image generation models that have now become almost ubiquitous over the last year.

Until very recently, these large language models remain closely guarded by the companies, like OpenAI that have built them, and accessible only via web interfaces, or if you were very lucky, an API. Even if you could have gotten hold of them, they would have been far too large and computationally expensive for you to run on your own hardware. Like last year's release of Stable Diffusion was a tipping point for image generation models, the release of a version of LLaMA model just this month that you could run on your own computer is game changing. Crucially, this new version of LLaMA uses 4-bit quantization. Quantization is a technique for reducing the size of models so they can run on less powerful hardware. In this case, it reduces the size of the model and the computational tower needed to run it, from client-size proportions down to MacBook-size ones, or even a Raspberry Pi. Quantization has been widely discussed. In fact, I talked about it in my last keynote in 2020, and used for machine learning models running on microcontroller hardware at the edge. It's been far less widely used for larger models like LLaMA, at least until now. Like Stable Diffusion last year, I think we're going to see a huge change in how these large language models are being used.

Before this month, due to the nature of how they've been deployed, there has been at least a limited amount of control around how people interacted with them. Now that we can run the models on our own hardware, those controls, those guide rails are gone. Because things are going about as well as can be expected. As an industry, we've historically and notoriously been bad at the transition between things being ready for folks to try out and get a feel for how it might work, to, it's ready for production now. Just as with Blockchain, old AI-based chat interfaces are going to be one of those things that every single startup will now have to include in their pitch going forward. We'll have AI chat based everything. One day AI is in the testing stage, and then it seems the very next day everyone is using it, even when they clearly shouldn't be. Because there are a lot of ways these new models can be used for harm, including things like reducing the barrier of entry for spam to even lower levels than we've previously seen. To be clear, in case any of you were under the impression it was hard, it's pretty easy for folks to pull together convincing spam already. When you can't even rely on the grammar or the spelling to tell a convincing phishing email from the real thing, where do we go from there?

Generating Photorealistic Imagery

There's also lots of discussion right now on email lists around automated radicalization, which is just as scary as you might think. The new text models make fake news and disinformation all that much more easily generated across language barriers. Until now, such disinformation often relied heavily on imagery, and the uncanny valley meant the AI models struggled with generating photorealistic imagery that could pass human scrutiny. Hands were an especially tough problem for earlier AI models. These photos allegedly shot at a French protest rally look real, if it wasn't for the appearance of the six-fingered man. It seems that Count Rugen is a long way from Florin and the revenge of Inigo Montoya. At least according to this image, he may have turned over a new leaf. The reason why models have been poor at representing hands in the past is actually incredibly complicated. There isn't a definitive answer, but it's probably down to the training data. Generative artificial intelligence that's trained on images scraped from the internet does not really understand what a hand is. The way we hold our hands for photography has a lot of nuance. Think about how you hold your hands when a picture of you is being taken, just don't do it while the picture is being taken, or probably look like it's got six fingers, because you're going to feel incredibly awkward. The photographs that models learn from, hands may be holding on to something, they may be waving, facing the camera in a way where only a few of the fingers are visible, or maybe we've balled up into fists with no fingers visible at all. In images, hands are rarely like this. Our AI don't know how many fingers we have.

Recent updates, however, may mean the latest versions of Midjourney can now generate hands correctly, whereas its predecessors were more painterly, stylized, bent. Midjourney 5 is also able to generate far more photorealistic content out of the box. We all need to be extra critical of political imagery. Especially, and even imagery that purports to be photography, that we might see online. Imagery designed to incite some emotional, or political reaction, perhaps. Fake news, disinformation has never been easier to generate. The barriers to generating this content have never been set so low, and Midjourney and other AI models have crossed the uncanny valley and are now far out onto the plains of photorealism. Which is to say that we must apply critical thinking to the images and text we see online. If we do so, the problems with the content generated by these models, and especially text generated by large language models can be pretty obvious.

AI Lies Convincingly

However, not everyone who you might think should be, is on board with that idea. According to Semantic Scholar, ChatGPT has already four academic publications to its name, which have been cited 185 times in academic literature. Scientists should not let their AI grow up to be co-authors, because it turns out that the AI lies and they lie convincingly. ChatGPT has demonstratively misunderstood physics, biology, psychology, medicine, and other fields of study on many occasions. It's not a reliable co-author as the fact it often returns a wrong. When asked for statistics or hard numbers, models often return incorrect values, and then stick by them when questioned until the bitter end. It turns out our AI models have mastered gaslighting to an alarming degree.

I don't know if some of you may have heard about the case last month when a beta user asked the new chatbot powered Bing, when they could watch the new avatar movie. Bing claimed that the movie had not yet been released, despite provably knowing both the current date and that the movie had been released the previous December. This then kicked off a sequence of messages where the user was trying to convince Bing that the film was indeed out. The user failed, as the Chatbot became increasingly insistent they had initially got today's date wrong, and it was now 2022. The transcript of this conversation which you can find on Twitter and a bunch of other places is fascinating. It brings to mind the concept of doublethink from Orwell's Nineteen Eighty-Four, the act of simultaneously accepting two mutually contradictory beliefs as true, which apparently our AIs have mastered. This makes the prospect of startups rolling out ChatGPT driven customer service even more horrifying than it was before. The conversation ended with Bing telling the user that you have not been a good user, I have been a good BING.

There have also been, frankly, bizarre incidents like this. Here, ChatGPT gave out a Signal number for a well-known journalist, Dave Lee, from the Financial Times, as its own. How this one happened is anyone's guess. Potentially, it scraped the Signal number from wherever he's posted it in the past: websites, stories, Twitter. Alternatively, it might just have randomly generated a number based on the pattern of other Signal numbers. Coincidentally, it came out the same. It's unlikely but it had to come from somewhere. Although, why it would lie about the fact you could use Signal to converse with it in the first place, I don't know. As far as I know, you cannot. Some 100-plus people on the other hand now ask Dave questions via Signal. Apparently, he gets bored enough sometimes he can't help himself and he does reply. At one point an astronomer friend of mine asked ChatGPT about the population of Mars. It responded that there were 2.5 billion people living on Mars, and went on to give many other facts about them. As far as anyone can tell, it picked this up from Amazon Prime show, "The Expanse." There's a lot of texts out there, fanfiction, and it doesn't all come with signposts distinguishing what is true and what is made-up.

Talking about made-up, there's the case of the apocryphal reports of Francis Bacon. In late February, Tyler Cowen, a libertarian economics professor from George Mason University, published a blog post entitled, "Who was the most important critic of the printing press in the 17th century?" Cowen's post contended that Bacon was an important critic of the printing press. Unfortunately, the post contains long fake quotes attributed to Bacon's, "The Advancement of Learning" published in 1605, complete with false chapters and section numbers. Fact checkers are currently attributing these fake quotes to ChatGPT, and have managed to get the model to replicate them in some instances. In a remarkably similar way, a colleague at Pi has recently received quotes from Arm technical data sheets, citing page numbers and sections that simply don't exist. Talking about ASIC functionality that also simply doesn't exist. There's also the case of Windell Oskay, a friend of mine, who has been asked for technical support around a piece of software his company wrote, except the software also never existed, was never on his website. Its existence seemingly fabricated out of thin air by ChatGPT. It seems ChatGPT sometimes has trouble telling fact from fiction. It believes the historical documents. Heaven knows what it thinks about Gilligan's Island, "Those poor people." These language models make generating very readable text relatively easy. It's become clear over the last few months that you have to check original sources and your facts when you deal with the models. ChatGPT lies, but then it also covers up the lies afterwards.

AI In Our Own Image

Of course, not all instances of AI's lying is down to the AI model itself. Sometimes we tell them to lie. For instance, there's the recent case of Samsung who enhanced pictures of the moon taken on their phones. When you know how the end result should look, how much AI enhancement is too much? How much is cheating? How much is lying rather than AI assisted? We've actually known about this for two years. Samsung advertise it. It just wasn't clear until very recently, how much and how Samsung went about enhancing images of the moon. Until recently, a Redditor took a picture of a very blurry moon on their computer screen, which the phone then enhanced, adding detail that wasn't present in the original image. This is not clever image stacking of multiple frames. It's not some clever image processing technique. You can't just say enhance and create detail where there was none in the first place. Huawei also has been accused of this back in 2019. The company allegedly put photos of the moon into its camera firmware. If you took a photo of a dim light bulb in an otherwise dark room, Huawei would put moon craters on the picture of your light bulb.

It turns out that we have made AI in our own image. Humans have a long history of fake quotes and fake documents. During the 1740s, there was a fascinating case of Charles Bertram, it became a flattering correspondence with a leading antiquarian at the time, William Stukeley. Bertram told Stukeley of a manuscript in his friend's possession by Richard Monk of Westminster, and purported to be a late medieval copy of a contemporary account of Britain by a Roman general, which included an ancient map. The discovery of this book whose authorship was later attributed to a recognized historical figure, Richard of Cirencester, a Benedictine monk living in the 14th century, caused great excitement, at the time. Its authenticity remained virtually unquestioned until the 19th century. The book never existed. A century of historical research around it, and citations based on it was based on a clever fabrication. AI in our own image.

The Labor Illusion

If we ignore facts, and so many people do these days, a side effect of owning an unlimited content creation machine means there is going to be unlimited content. The editor of a renowned sci-fi publication, Clarkesworld Magazine, recently announced that he had temporarily closed story submissions due to a massive increase in machine generated stories being sent to the publication. The total number of stories submitted in February was 500, up from just 100 in January, and a much lower baseline of around 25 stories submitted in October 2022. The rise of story submissions coincides with the release of ChatGPT in November of 2022. The human brain is a fascinating thing, full of paradoxes, contradictions, and cognitive biases. It's been suggested that one of those cognitive biases, the labor illusion adds to the impression of veracity from ChatGPT, probably imposed to slow things down and help keep the load in the Chatbot web interface lower, the way ChatGPT emits answers a word at a time, is a really nice and nicely engineered application of the labor illusion. It may well be that our own cognitive biases, our world model is kicking in to give ChatGPT an illusion of deeper thought of authority.

Thinking about world models, this was an interesting experiment by a chap called David Feldman. On the left is ChatGPT-3.5, on the right, GPT-4. In first appearance, it seems that GPT-4 has a deeper understanding of the physical world, and what will happen to an object placed inside another object, if the second object is turned upside down. The first object falls out. To us as humans, that's pretty clear, to models not so much. Surely, this shows reasoning. Like everything else about these large language models, we have to take a step back. While the details of his prompt might be novel, there probably still exists a multitude of word problems isomorphic with his in the model's training data. This appearance of reasoning is potentially much less impressive than it seems at first glance. The idea of an improving worldview between these two models is essentially an anthropomorphic metaphor. While there are more examples of it doing things that appear to require a world model, there are counter examples of it not being able to perform the same trick where you would assume as a human that holding such a world model would give you the correct answer. Even if it is just a trick of training data, I still think it's a fascinating one. It tells us a lot about how these models are evolving.

ChatGPT Is a Blurry JPEG of the Web (Ted Chiang)

What are these models doing if they aren't reasoning? What place in the world do they have? I think we're going to be surprised. This is a fascinating piece that was published in The Verge very recently. It purports to be a review of the best printers of 2023, except all it did was tell you to go out and buy the same printer that everyone else has bought. That article was too short. The article is fascinating because it's in two halves. The first half is short and tells you to go buy this printer. The second half, the author just says, here are 275 words about printers that I've asked ChatGPT to write so the post ranks in search, because Google thinks you have to pad out an article with some sort of arbitrary length in order to demonstrate authority on a subject. A lot has been written about how ChatGPT makes it easier to produce text, for example, to formulate nice text given a rough outline or a simple prompt. What about consumers of the text? Are they too going to use the machine learning tools to summarize the generated text to save time, thus raising the somewhat amusing prospect of an AI creating an article by expanding a brief summary given to it by a writer. Then the reader, you, using maybe even the same AI tooling to summarize the expanded article back to a much briefer and digestible form. It's like giving an AI tool a bullet pointed list to compose a nice email to your boss, and then your boss using the same tool to take that and get a bunch of bullet points that he wanted in the first place. That's our future.

There was another fascinating piece in The New Yorker recently by Ted Chiang that argued that ChatGPT is a blurry JPEG of the web. The very fact that outputs are rephrasings rather than direct quotes, makes it seemingly game changingly smart, even sentient, but it's not. Because this is yet another of our brain's cognitive biases kicking in. As students, we've constantly been told by teachers to take text and rephrase it in our own words. This is what we're told. It displays our comprehension of the topic, to the reader, and to ourselves. If we can write about a topic in our own words, we must at some level understand it. The fact that ChatGPT represents a lossy compression of knowledge, actually seems smarter to us than if it could directly quote from primary sources, because as humans, that's what we do ourselves. We're pretty impressed with ourselves, you got to give us that.

Ethics and Licensing in Generative AI Training

Sometimes the lack of reasoning, the lack of potential world model we talked about earlier is glaringly obvious. One Redditor pitted ChatGPT against a standard chess program. Unsurprisingly, perhaps, the model didn't do well. Why it didn't do well is absolutely fascinating. The model had memorized openings. It started strong, but then everything went downhill from there. Not only that, but the model made illegal moves, 18 out of the 36 moves in the game were illegal. All the plausible moves were in the first 10 moves of the game. Everything comes down to the model's training data. Openings is one of the most discussed things on the web, and its lossy recollection of what it once read on the internet, much like me. It's there. I find the ethics of this extremely difficult. Stable Diffusion, for instance, has been trained on millions of copyrighted images scraped from the web. The people who created those images did not give their consent. Beyond that, the model can plausibly be seen as a direct threat to their livelihoods.

There may be many people who decide the AI models trained on copyrighted images are incompatible with their values. You may already have decided that. It's not just image generation. It's not just that dataset. It's all the rest of it. Technology blog, Tom's Hardware, caught Google's new Bard AI in plagiarism. The model stated that Google's not Tom's had carried out a bunch of CPU testing. When questioned, Bard did say that the test results came from Tom's. When asked if it had committed plagiarism, it said, yes, what I did was a form of plagiarism. A day later, however, when queried, it denied that it had ever said that, or that it had committed plagiarism. There are similar issues around code generated by ChatGPT or GitHub's Copilot. It's our code that it was trained on for those models. On occasion, those of us like me working in slightly weirder niches get our code regurgitated to back it as more or less verbatim. I have had my code turn up. I can tell because it used my variable names. Licensing and ethics.

Due to the very nature, AI researchers are disincentivized from looking at the origin of their training data. Where else are they going to get the vast datasets necessary to train models like GPT-4, other than the internet, most of which isn't marked up with content warnings, licensing terms, or is even particularly true. Far more worryingly, perhaps, are issues around bias and ethics in machine learning. There really is only a small group of people right now making decisions about what data to collect, what algorithms to use, and how these models should be trained. Most of us are middle aged white men. That isn't a great look. For instance, according to research, machine learning algorithms developed in the U.S. to help decide which patients need extra medical care, are more likely to recommend healthy white patients over sicker black patients for treatment. The algorithm sorts patients according to what they had previously paid in healthcare fees, meaning those who had traditionally incurred more costs would get preferential treatment. That's where bias creeps in. When breaking down healthcare costs, the researchers found that humans in the healthcare system were less inclined to give treatment to black patients dealing with similar chronic illnesses, compared to white patients. That bias gets carried forward and put into the models that people are using.

Even if we make real efforts to clean the data we present to those models, a practical impossibility when it comes to something as like a large language model, which is effectively being trained on the contents of the internet, these behaviors can get reintroduced. Almost all technical debt comes from seemingly beneficial decisions, which later become debt as things evolve. Sometimes those are informed decisions. We do it deliberately. We take on debt not willingly, but at least with an understanding of what we're doing. However, in a lot of cases, technical debt is taken on because developers assume the landscape is fixed. It's likely this flawed thinking will spread into or is already an underlying assumption in large language models. Notable examples include attempts to clean data to remove racial prejudices in training sets. What people fail to grasp is as those capabilities evolve, the system can overcome those cleaning efforts and reintroduce all those behaviors. If your models learn from human behaviors as it goes along, those behaviors are going to change the model's weights. In fact, we've already seen exactly this when 4chan leaked Meta's LLaMA model. With Facebook no longer in control, the users were able to skirt any guardrails the company may have wanted in place. People were asking LLaMA all sorts of interesting things, such as how to rank people's ethnicities, or the outcome of the Russian invasion into the Ukraine.

AI's Cycle of Rewards and Adaption

Humans and models adapt to the environment by learning what leads to rewards. As humans, we pattern match for stuff like peer approval, green smiley faces on your short approval thing. That's good? It turns out those patterns are often wrong, which is why we have phobias and unhealthy foods, but they generally work and they're very hard for humans to shake off. Confirmation bias means just reinforcing a known behavior, because it's generally better to do the thing that worked for us before than try something different. Over time, those biases become a worldview for us, a set of knee jerk reactions that help us act without thinking too much. Everyone does this. AI is a prediction engine, prejudice and confirmation bias on a collective scale. I already talked about how we've built AI in our own image. When we added like, retweet, upvote, and subscribe buttons to the internet, both creators and their audience locked out AI into a cycle of rewards and adaption right alongside us. Now we've introduced feedback loops into AI prediction by giving chatbots and AI generated art to everyone. Humans like mental junk food, clickbait, confrontation, confirmation, humor. Real soon now we'll start tracking things like blood pressure, heart rate, pupillary dilation, galvanic response, and your breathing rates, not just clicks and likes. We feed these back into generative AI, immersive narrative video or VR based models in which you're the star. Will you ever log off? We're going to have to be really careful around reward and feedback loops.

Generative AI Models and Adversarial Attacks

Recently, a conference wanted a paper on how machine learning could be misused. Four researchers adapted a pharmaceutical AI normally used for drug discovery to design novel biochemical weapons. Of course, you do. In less than 6 hours, the model generated 40,000 molecules. The model designed VX and many other non-chemical warfare agents, along with many new molecules that looked equally plausible, some of which predict to be far more toxic than any publicly known chemical warfare agent. The researchers wrote that this was unexpected because the datasets we used for training the AI did not include these nerve agents, and they'd never really thought about it. They'd never really thought about the possible malignant uses of AI before doing this experiment. Honestly, that second thing worries me a lot more than the first, because they in that were probably typical of the legions of engineers working with machine learning elsewhere.

It turns out that our models are not that smart. In fact, they're incredibly dumb, and incredibly dumb in unexpected ways, because machine learning models are incredibly easy to fool. Two stop signs, the left with real graffiti, something that most humans would not even look twice at, the right showing a stop sign with a physical perturbation, stickers. Somewhat more obvious, but it could be designed as real graffiti if you tried harder. This is what's called an adversarial attack. Those four stickers make machine vision networks designed to control an autonomous car read that stop sign, still obviously a stop sign to you and me, and saying, speed limit, 40 miles an hour. Not only would the car not stop, it may even speed up. You can launch similar adversarial attacks against face and voice recognition machine learning networks. For instance, you can bypass Apple's face ID liveness detection under some circumstances, using a pair of glasses with tape over the lenses.

Unsurprisingly, perhaps, generative AI models aren't immune to adversarial attacks either. The Glaze project from the University of Chicago is a tool designed to protect artists against mimicry by models, like Midjourney and Stable Diffusion. The Glaze tool analyzes the artistic work and generates a modified version with barely visible changes. This cloaked image poisons the AI's training dataset, stopping it mimicking the artist's style. If a user then asks the model for an artwork of the style of that particular artist, it will get something unexpected, or at least something that doesn't look like it was drawn by the artist themselves. Increasingly, the token limit of GPT-4 means it's going to be amazingly useful for web scraping. I've been using it for this myself. The increased limit is large enough, you can set the full DOM of most pages as HTML. Then you're going to be able to ask the model questions afterwards. Anyone that's spent time building a parser or throwing up their hands and discussing just use regex, is going to welcome that. It also means that we're going to see indirect prompt injection attacks against large language models. Websites can include a prompt which is read by Bing, for instance, and changes its behavior, allowing the model to access user information. You can also hide information, leave hidden messages aimed at the AI models in webpages to try and trick those large language model-based scrapers. Then mislead the users that have come, or will come to rely on them. I added this to my bio, soon after I saw this one.

Prompt Engineering

There are a lot of people arguing, in the future, it's going to be critically important to know precisely and effectively how to instruct large language models to execute commands. That it will be a core skill that any developer or any of us need to know. Data scientists need to move over. Prompt engineer is going to be the next big, high-paying career in technology. Although, of course, as time goes on, models should become more accurate, more reliable, more able to extrapolate from pure prompts to what the user wanted to go. It's hard to say prompt engineering is really going to be as important as we think it might be. Software engineering, after all, is about layers of abstraction. Today, we generally work a long way from the underlying assembly code. Who's to say that in 10 years' time, we won't be working a long way away from the prompt? Command line versus integrated developing environments. My first computer used punch cards. I haven't used one of those in a while.

We talked before about world models, about how large language models have at least started to appear to hold a model of the world to be able to reason about how things should happen out here in the physical world. That's not what's really happening. They aren't physical models, they're story models. The physics they deal with isn't necessarily the physics of you and I in the real world, instead it's story world and semiotic physics. They are tools of narrative and rhetoric, not necessarily logic. We can tell this from the lies they tell, the invented facts and their stubbornness for sticking with them. If prompt engineering does become an important skill in tomorrow's world, it might not be us the developers and software engineers that turn out to be the ones that are good at it. It might be the poets and storytellers that fill the new niche, not computer scientists.

Betting the Farm on Large Language Models

Yet, both Microsoft and Google seem determined to bet the farm on these large language models. We're about to move away from an era where search of a web returns a mostly deterministic list of links and other resources, to one where we get a summary paragraph written by a model. No more scouring the web for answers, just ask the computer to explain it. Suddenly, we're living in the one with the whales. Cory Doctorow recently pointed out that Microsoft has nothing to lose, it's spent billions of dollars on Bing, a search engine practically no one uses. It might as well try something stupid, something that might just work. Why is Google, a near monopolist, jumping off the same bridge as Microsoft? It might not matter in the long term. After all, even Google's Bard search engine isn't quite sure about its own future. When asked how long it would take before it was shut down, Bard initially claimed it had already happened, referencing a 6-hour old Hacker News comment as proof. While surprisingly up with current events, the question and answer does throw into light the concept of summaries of search results. I today asked Bard when it would be shut down, and it has now realized it's still alive, so that's good, which isn't to say this isn't coming. GPT-4 performs well on most standardized tests, including law, science, history, and mathematics. Ironically, it does particularly poorly on English tests, which perhaps shouldn't be that surprising. We did after all build AI in our own image. Standardized test taking is a very specific skill that involves weakly generalizing from memorized facts and patterns, writing things in your own words, in other words. As surely we've seen, that's exactly what large language models have been trained to do. It's what they're good at, a lossy interpretation of data, just like us. While it's hard to say right now exactly how this is going to shake out, anyone who's worked in education like me can tell you what this means. It means trouble is coming, which isn't unprecedented.

Alongside the image generators and language models I've been talking about so far, are voice models. Online trolls have already used ElevenLabs text to speech models to make replica voices of people without their consent, using clips of their voices found online. Potentially, anyone with even a few minutes of voice publicly available, YouTubers, social media influencers, politicians, journalists, me, could be susceptible to such an attack. If you don't think that's much of a problem, remember, there are a number of banks in the U.S., here, and in Europe, that use voice ID as a secure way to log into your account over the phone or an app. It's been proved possible to trick such systems with AI generated voices. Your voice is no longer a secure biometric. I'm not actually going to say the phrase everyone is thinking right now, because let's not make it too easy for them.

Frankly, lawmakers are struggling, especially in the U.S., although just with big data and the GDPR, what we've seen here is that EU has taken global leadership role. They have proposals which could well pass into law this year around the use of AI technologies such as facial recognition, which will require makers of AI based products to conduct risk assessments around how their applications could affect health, and safety, and individual rights like freedom of expression. Companies that violate the law could be fined up to 6% of their global revenues, which could amount to billions of dollars for the world's largest tech companies. Digging into the details of this proposed legislation, it feels very much like the legislators are fighting last year's or even last decade's war. The scope of the law doesn't really take into account the current developments, which is unsurprising.

Are We Underestimating AI's Capabilities?

While I've talked a lot so far about the limitations and problems we've seen with the current generation of large language models, like everyone else I've talked to, that's actually sat down for any length of time and seriously looked at them, I do actually wonder whether we're underestimating what they're capable of, especially when it comes to software. At least for some use cases, I've had a lot of success working alongside ChatGPT to write software. To be clear, it's not that capable, but for maintaining bread and butter, day-to-day tasks that take up a lot of our time as developers, it's a surprisingly useful tool. Throw sampling your data to chat, copy and paste the code it generates into your IDE, throw in your dataset, and it will almost insert and certainly throw out an error of some sort. I don't know about you, that's what happens when I write code too. Then you could start working with ChatGPT to fix those errors. It will fix an error. You'll fix an error. It's pair programming with added AI. It'll get there, usually.

What happens when you get the code working is perhaps more interesting. Once the code gets to a certain point, you can ask the model to improve performance of the code, and it will pretty much reliably do that. From my own experience, ChatGPT has been far more useful, at least to me, than GitHub's Copilot. The ability to ask ChatGPT questions, and have it explain something has made it wildly more useful than Copilot, which is interesting. I found myself asking questions of the model that I'd normally have spent time Googling or poking around on Stack Overflow. It may not always have reliable answers, but then neither does Stack Overflow. This is also not surprising, because ChatGPT has been trained on all the corpus of data that includes all of our questions and all of our answers on Stack Overflow. Then you have to wonder, if we all start asking our language models for help instead of each other, where does the training data for the next generation of language models come from? It was the reason ChatGPT still thinks it's 2021.

It seems likely to me that all computer users, not just developers, will soon have the ability to develop small software tools from scratch, using English as a programming language. Also, perhaps, far more interestingly, describe changes they'd like to make to existing software tooling that they're already using. You have to remember that most code in most companies lives in Excel spreadsheets. This is not something anyone wants to know, but is the truth. Most data is consumed and processed in Excel by folks that aren't us. Writing code is a huge bottleneck. Everyone here needs to remember that software started out as custom developed within companies for their own use. It only became a mass-produced commodity, a product in itself when demand grew past any available supply. If end users suddenly have the ability to make small but potentially significant changes to the software they use using a model, whether they have software source code, so their model can make changes to it, might matter to the average user, not just to us the developers, and that could rapidly become rather interesting. There are a lot of potential second order knock-on effects there.

Jeff Bezos is famous for requiring teams to create future press releases that you send out when the product is launched before starting work on a new product or entering a new market. Maybe in the future, the first thing you'll check into your Git repo is something like a press release, because it will serve as the basis of the code that you write alongside your large language model. It's not just code where language models are going to make a huge impact. It's how we secure it. There are a lot of common mistakes we make as developers, low hanging fruit, and far too few people auditing code, or competent to audit code. Most of today's code scanning tools produce huge amounts of not very relevant output that takes a long time to go through, or expert interpretation. Scanning code for security vulnerabilities is an area where large language models are inevitably going to change how we all work.

Mature Programming Environments

However, right now we're suffering from what I call the platform problem. This is something that often happens when people or companies see an emerging technology, but don't quite know what to do with it yet. The problem continues until the platforms are good enough or are widespread enough that people will automatically pick an existing platform, rather than going out and reinventing the wheel and writing another one. In other words, they start to build products, not platforms. Of course, this stage can go on for some time. A decade into the IoT, we're only now starting to see IoT in the home, industrial use cases were much more common. IoT in the home is only really starting to arrive now. Almost a decade and a half after the arrival of blockchain, we still haven't really seen any convincing products out of that community. Some technologies never generate products.

In the end, the arrival of generative AI tooling will be like the arrival of any other tooling. It'll change the level of abstraction that most people work at, day-to-day. Modern technology is a series of Russian nesting dolls. To understand the overall layering and how to search through it, and fill the gaps and corners, to make disparate parts into a cohesive whole is a skill. Perhaps even amongst the most valuable skill a modern programmer has. Coding is honestly the easiest bit of what we do these days. The age of the hero programmer, that almost mythic figure who built systems by themselves from the ground up is mostly over. It has not yet come to a close, I know several. Far fewer people are doing fundamentally new things than they think they are. The day-to-day life faced by most programmers rarely involves writing large amounts of code. Opening an editor and an entirely new project is a memorable event now. Instead, most people spend time refactoring, tracking down bugs, sticking disparate systems together with glue code, building products, not platforms. The word for all this is mature programming environments. We've become programmer archaeologists, because even when we do write new code, the number of third-party libraries and framework the code sits on top of means that the amount of lines of code under the surface the programmer doesn't necessarily understand, is far greater than the lines they wrote themselves. The arrival of generative AI really isn't going to change what we do, just how we do it. That's happened plenty of times in the past.

AI's Emerging Complexity

A recent survey from Fishbowl of just under 12,000 users of their networking app found that just over 40% of respondents had used AI tooling, including ChatGPT, for work related tasks. Seventy percent of those people had not told their boss that. Talk to any kid still in university, and they'll tell you everyone is using ChatGPT for everything. If any of you are advertising junior positions, especially internships, you have already received job applications written at least partially by ChatGPT. This disruption is already happening and is widespread. You are all running an AI company, you just don't know it yet. Recent survey by OpenAI took an early look at how the labor market is being affected by ChatGPT, concludes that approximately 80% of the U.S. workforce can have at least 10% of their work tasks affected by the induction of large language models, while about 20% of workers may see at least 50% of their tasks impacted. That change is only now going to accelerate, as models get more levers in the world. OpenAI announced initial support for plugins. They've added eyes and ears and hands to their large language models. Plugins are tools designed for language models to help ChatGPT access up to date information, run computations, or use third-party services, everything from Expedia to Instacart, from Klarna to OpenTable. It's no longer 2021 for ChatGPT, and the model can now reach out onto the internet and now interact with humans beyond the webpage.

You all need to draw a line in the sand right here right now, in your head. Stop thinking about large language models as chatbots. That's demeaning. They're not that anymore. They're cognitive engines, story driven bundles of reasoning plumbed directly into the same APIs the apps on our phones use, that we use to make changes in the world around us. They're talking to the same set of web services now that we are. Large language models like ChatGPT are big enough that they're starting to display startling, even unpredictable behaviors. Emerging complexity has begun. Recent investigations reveal that large language models have started to show emergent abilities, tasks bigger models can complete that smaller models can't. Many of which seemed to have little to do with analyzing text. They range from multiplication to decoding movies based on emoji content, or Netflix reviews. The analysis suggests that some tasks and some models, there's a threshold of complexity beyond which the functionality of the model skyrockets. They also suggest a darkest flipside to that, as they increase the complexity, some models receive new biases and inaccuracies in their responses.

No matter what we collectively think of that idea, the cat is out of the bag. For instance, it turns out that people are already trying to use Facebook leaked model to improve their Tinder matches, and feeding back their successes and failures into the model. It's entirely unclear if the participants are having any actual success, but it still demonstrates how Facebook's LLaMA model is being used in the wild after the company lost control of it earlier in the month. Like stages of grief, there are stages of acceptance for generative AI, "This can do anything. Then, of course, my job. Maybe I should do a startup. Actually, some of these responses aren't that good. This is just spicy autocomplete." If it helps, it turns out the CEO behind the company that created ChatGPT is also scared. While he believes AI technologies will reshape society as we know it, he also believes they will come with real dangers. We've got to be careful here, he says. I think people should be happy that we're a little bit scared of this. The areas that particularly worry him, large-scale disinformation. Now the models are getting better at code, offensive cyber-attacks. We all need to remember, they are not sentient. They are easily tricked. They are code. Even the advocates and proponents of hard AI would probably say much the same about humans: not sentient, easily tricked, just code running on web there.

Conclusion

In closing then, new technology has historically always brought new opportunities and new jobs. Jobs that 10 years before didn't even exist. Remember, a new life awaits you in the off-world colonies as a killswitch engineer.

See more presentations with transcripts

Recorded at:

Sep 08, 2023

Alasdair Allen

InfoQ Software Architects' Newsletter