InfoQ Homepage Podcasts Generally AI Episode 4: Sold out!

Generally AI Episode 4: Sold out!

Feb 14, 2024

In this episode of Generally AI, Roland and Anthony explore the theme of "sold out" and delve into the world of GPUs, hot sauce, and beer. The hosts cover CUDA-enabled GPUs and parallel programming patterns. Then they explore the parallels between the scarcity of GPUs and Sriracha hot sauce; the historical context of GPU shortages; and how beer and college students can model supply chain dynamics.

Key Takeaways

A key concept in achieving computational efficiency in CUDA-enabled GPUs is the parallelizability of an algorithm
Efficiently utilizing GPU cores involves considering factors beyond sheer computation, such as memory allocation, synchronization, and avoiding data bottlenecks
High demand for GPUs, driven not only by gaming but also by cryptocurrency mining and the growing use of GPUs in machine learning applications, has led to supply shortages
Supply chain disruptions, driven by factors like demand spikes, shortages of essential components, and external events, contribute to product scarcity and price fluctuations
The "Beer Game" illustrates the challenges of coordinating supply chains and introduces the concept of the bullwhip effect, where small fluctuations in demand are magnified upstream

Subscribe on:

Introduction

Roland Meertens: Let me start with a fun fact. So the theme of the podcast today is sold out, and I thought it's kind of interesting that it's inherent about software that it can't really sell out. However, the media with which you distribute or use the software, can of course sell out. And quick question to you, what is the most sold game console?

Anthony Alford: The most sold game console. Oh, goodness. I'm going to guess a PS, something.

Roland Meertens: Yes. Which one?

Anthony Alford: PS2.

Roland Meertens: Yes. So the PlayStation 2, which was released in the year 2000, sold 155 million units, only followed a bit by the Nintendo DS, and Sony stopped producing it in 2013 after 12 years of production.

Anthony Alford: Wow. So that might be the longest running one as well.

Roland Meertens: Yes, probably. It's insane. So this was the same year in which the PlayStation 4 was released. So while they were selling the PlayStation 3, they were also still making the PlayStation 2.

Anthony Alford: If it ain't broke, don't fix it.

Roland Meertens: If it ain't broke, don't fix it. Yeah, so directly after its release, it was difficult to find PlayStation 2 units on retailer shelves, mostly just due to manufacturing delays, I think. But yeah, that's a fun fact.

Welcome to Generally AI, an InfoQ podcast. My name is Roland Meertens and I'm joined by Anthony Alford.

Anthony Alford: Hello, Roland. Good to see you.

Roland Meertens: Also, good to see you. So as I said, the theme for today is sold out. Shall I start with what I brought this week?

Anthony Alford: Absolutely.

GPUs: The Horsepower of 1,000 Chickens [01:52]

Roland Meertens: I think we have something which fits together relatively well, but because we are both thinking, "What is always sold out or what is currently sold out?" NVIDIA GPUs, you can't get on the Google Cloud platform. You can't get them on Huggingface, you can't get them in the shop.

So yeah, GPU stands for Graphical Processing Unit and the NVIDIA ones can run CUDA. And I was just thinking, "Why are things which are graphical processing units always sold out?" And CUDA stands for Compute Unified Device Architecture, and basically it means that you can do general purpose computing on GPUs nowadays.

And what I wanted to do with you today, Anthony, and what I want to do in this podcast is try to bridge the gap a bit between high level thinking and low level implementation, because a lot of podcasts either give you an analogy of what CUDA is or can do, or you go to YouTube videos where people start programming directly. And when you are working with CUDA, it's not about the programming, it's also about a different way of thinking and a different way of reasoning. So yeah, hope you're ready for that.

Anthony Alford: I am ready because, believe it or not, I'm not as familiar with CUDA as I probably should be.

Roland Meertens: Yes. I think that, if I think about CUDA, when I learned it, I was like, "Wow, this thing is the coolest thing since machine learning," which is where I started. But for a lot of people they probably heard about it and they have an idea of what it can do, but they have never really tried it, which is a shame.

So yeah, when you have a CUDA enabled GPU, you have a certain amount of CUDA cores, that's I think what most people know, to do processing, and I will give an analogy. So I will start with a very high level thinking because one thing which people always say is, "Oh, using a lot of GPU cores is kind of like using one or two cows or a lot of chickens." I don't know if you heard this analogy before.

Anthony Alford: I have not.

Roland Meertens: Well, in that case, the question to you is, Anthony, do you rather want to have two cows or a thousand chickens for your working your lands?

Anthony Alford: Interesting. I suppose it depends on how your costs scale, if they scale per head...

Roland Meertens: I think the real answer is it depends on your task, right? Because if you want to tow a big cart, then probably having one cow to tow a cart is better. If you want to plant a lot of seeds in a field, maybe your army of chickens can distribute it faster. And maybe another better analogy here is either using grownup people versus an army of toddlers, where toddlers can't keep track of anything and can't do complicated tasks, that's why they are toddlers.

Anthony Alford: Ah, I see.

Roland Meertens: But for a lot of tasks you don't really need to. So if you want to clear the trash from a forest, you can go with an army of scouts, children and toddlers, into the forest. They can clean it up very efficiently.

Anthony Alford: Yes. So parallelism versus just absolute power.

Is It Parallelizable? [04:49]

Roland Meertens: Yeah, so the parallelism versus absolute power, it's really about, can you run something completely parallel? Is there a context needed for a task you do? And when you are learning programming, you're really learning for yourself. I do one step, and I do another step, and I do another step.

And this is the thing which this CUDA thinking, this parallel thinking, I'm really trying to advocate here, is doing. Okay, so it's time for the CUDA quiz where I give you problems and you are going to tell me whether they are parallelizable, if you can efficiently solve them using CUDA or not.

Anthony Alford: I am ready.

Roland Meertens: Question one. I want to add two arrays together, as in the all elements of an array, I want to add it to all elements of a second array.

Anthony Alford: Do you add them element-wise? So the first element of each array, you add that, and then that's the first element in a new array?

Roland Meertens: Yes.

Anthony Alford: That seems quite parallelizable, since they're independent.

Roland Meertens: Yes, indeed. So every index of an array you can add together and put it in a new array. So you can have a thousand GPU cores do that for you simultaneously. So that's good.

Anthony Alford: Woo-hoo.

Roland Meertens: What about comparing two strings? Comparing one string to a second string and checking if one of them is contained inside the other one.

Anthony Alford: Oh.

Roland Meertens: So I have a very long text and I want to find if a certain word is in the text.

Anthony Alford: Let me think. I'm going to say maybe, let me think about this. How would we do that? So I think you could, it might require duplicating data or else letting them have access to a shared data structure somehow.

Roland Meertens: Yeah, well, at least you can have multiple processes, check different starting points at the same time. So at least that way you can more quicker find specific values or sequences in a array. So that's doable.

How about calculating the nth number of the Fibonacci sequence?

Anthony Alford: Okay, so this depends on how you implement it. This is actually a favorite thing of mine to ask people.

Roland Meertens: Oh.

Anthony Alford: Because we know that the classic way you do the Fibonacci sequence is recursion, right? A lot of times I'll pose that problem to a candidate as an example of recursion. However…

Roland Meertens: However.

Anthony Alford: You can also do it with matrices. Actually you can create a matrix, it's a simple two-by-two matrix and I think the numbers are, the elements are all one except one of them is zero. Anyway, you find the eigenvalues and you can raise the eigenvalue to a power.

Roland Meertens: Yes.

Anthony Alford: Now, I don't know, that actually might be parallelizable.

Roland Meertens: Oh, okay, because I didn't know that there was this trick. When I was googling this, I did find that there is apparently a closed-form available for the Fibonacci sequence.

Anthony Alford: Yes, there's a number called phi or phi. It's 1.5 something I think. So you can just basically find powers of that. Yeah, I think there may also be a closed-form. I've started to just talk, but I think you can maybe parallelize it somewhat, but I'm not a hundred percent sure just off the cuff.

Roland Meertens: So ignoring the closed-form or matrix multiplication, which I didn't know was a thing for the Fibonacci sequence, normally we say that each outcome depends on the outcome of the previous step, and that's why this is a pretty good example of something which you'd normally wouldn't want to parallelize.

How about adding blur to an image? You want to make an image blurry.

Anthony Alford: Interesting. Okay. So again, this is actually something that I have a little bit of experience with, a long time ago, with a class of processor called the digital signal processor. Long story short, if you frame the problem as doing a convolution of an image with a small kernel, I think yes, that's parallelizable, because each pixel in the new image is a sum of the pixels around it in the other one. And so yeah, you could parallelize that.

Roland Meertens: Yes, indeed. So if you have a patch of three-by-three and you're blurring based on that, you indeed can do a convolution and then you can compute each pixel individually. And the fact that this is a convolution, we'll tie back to neural networks later.

Anthony Alford: You can see I'm old school, I didn't go straight to convolutional neural networks. I just took convolution to be a signal processing operation.

Roland Meertens: Also, if you like signal processing, there is for example a CUDA FFT library, like a cuFFT. So if you want to do Fourier transforms, you can do that efficiently with CUDA, apparently,

Anthony Alford: I believe it.

Roland Meertens: How about Newton's Method where you want to find the roots of a function, where the function is equal to zero basically.

Anthony Alford: Now, that's been a long time. I remember Newton-Raphson as an iterative method for finding a zero. I don't know how it works if you know there are multiple ones. So again, with iterative, I'm thinking maybe not.

Roland Meertens: Yes, no, it is mentioned on Wikipedia as an inherently serial problem because we know that step two has a direct dependency on step one, and step three has a direct dependency on step two. And I guess you could try a thousand initial steps and then have them all solve that. But yeah, it is apparently an inherent serial problem.

How about the Game of Life?

Anthony Alford: Okay, again, I think this one is quite similar to our convolution case where the next state of the system is pretty localized, right? The state of a cell in the next iteration depends on the cells around it. So again, I would guess, yes, you could parallelize that.

Roland Meertens: Yep, indeed. Yeah. So the rules of the game are indeed something like any cell with fewer than two live neighbors dies and you can indeed calculate it for every cell at the same time. So you could implement the Game of Life with CUDA.

Now we're getting to something I think which might be a bit more interesting. How about summing up all the numbers in a list? So you have one list and you want to get the sum of all the numbers in there.

Anthony Alford: Ah, okay. So I used to think about MapReduce back in the day. And because a sum is commutative, I think you can do it parallel somewhat, but you have to do multiple steps.

Roland Meertens: Yes, indeed. So this seems like a problem for the naive serial minds, which you couldn't parallelize because of course you're relying on the previous answer, right? But you can also see it as a binary tree where you sum two numbers at a time and then you can sort the results.

Anthony Alford: I said commutative, but I meant associative, I think. Sorry, but anyway, yes, it doesn't matter what order you add the numbers because the answer is the same.

Roland Meertens: Yes, it indeed doesn't matter in which order you add the numbers, it's basically like every toddler can add two numbers together and then they can share the result with the next person. And if you do that in a binary way, you end up with one result. So in that sense, you can do this super fast.

How about multiplying two matrices?

Anthony Alford: Oh, so I was actually going to say, that one is not commutative, but again, you have to share the data, but as we know in the product matrix, each element is a dot product of a row from one and a column from the other. So dot products are what these things are designed to do very quickly. So I would say yes.

Roland Meertens: Yep.

Anthony Alford: Again, if we think about deep learning, multiplying matrices is what that's all about.

Roland Meertens: Yep, indeed. Yeah. This is I guess the reason that the GPUs are now always sold out. But yeah, so if we're talking about parallel computing, this was great by the way. Congratulations. I think you had all the answers correct.

Anthony Alford: Do I get the job?

Roland Meertens: You're hired. I will send an email to Jensen saying that you passed.

Anthony Alford: Very cool.

Parallel Programming Patterns [12:35]

Roland Meertens: But yeah, so this is when I was learning CUDA for myself a couple of years ago, I noticed that this is a very interesting way of thinking, which you need to acquire, is that you can just break down problems by solving them simultaneously, by multiple processes communicating over shared memory, and that these processes don't necessarily execute in same order. So you don't really know what is going to happen first, what's going to happen next.

And that makes it a bit different than for-loops, for example, where you know exactly in what order they're going to execute and you can put breakpoints or you can keep track of some things over time. And if you want something to be synchronized, you can use the sync threads function to synchronize the threads within the block to prevent data races.

So you can basically, for the summing of the numbers, you can continue summing numbers as soon as every thread is done, summing numbers. You can use mutexes in CUDA, but yeah, then processes are waiting for other processes, which isn't very efficient.

Anthony Alford: So I have a question, when you get to a stopping point for questions.

Roland Meertens: Yes, go ahead.

Anthony Alford: So what are some general principles for helping someone figure out: is this a good fit for CUDA or is it not? We've talked about something that's inherently serial, but are there some others?

Roland Meertens: Well, the one thing I noticed when I was doing this was that there are way more algorithms than you can think of which you can calculate in parallel. However, you also quickly start thinking more about the device itself, and you have to get a bit of a feeling for how a GPU works and how everything is laid out.

Anthony Alford: I see.

Roland Meertens: So for example, with synchronization, I just mentioned, and mutexes, sometimes people can come up with algorithms where you do not need to synchronize between different threads, and sometimes people can come up with algorithms where you don't need a mutex.

So there are loads of ways in which you can, in a very nice way, make things faster. Also, some CUDA programmers are just insanely good. They know exactly the layout of their device and they know exactly what kind of hacks they can use to do things faster.

Anthony Alford: Gotcha.

Roland Meertens: So yes, in that sense, just starting to think about it helps a lot. There's often programs where you can do more parallel than you might expect.

Anthony Alford: And what we've seen in the deep learning world is of course frameworks like TensorFlow or PyTorch where you adopt another layer of abstraction and think about your problems in those terms. And then it's up to the framework developers to figure out how to map that stuff to CUDA to run efficiently.

Roland Meertens: Yeah, even inside CUDA, you quickly start to see certain patterns like the one with the binary tree I just mentioned. That's something which you will start seeing very often, or scattering operation, or putting things at a specific place in an array so you can later get it back again. Those are things you will start seeing quite quickly.

Anthony Alford: Interesting.

Roland Meertens: And also talking about libraries, so there are of course the very high level libraries for machine learning like PyTorch, which can do a lot of things, and has a lot of the same functions as NumPy.

Anthony Alford: Mm-hmm. Yes.

Roland Meertens: But then you can run it on a GPU. So that's already a cool way to speed up your programs. The way you program CUDA is in a bit of their C++ or C variation, which can be quite hard to read and understand, to be honest. It's also hard to debug because it's difficult to figure out what went wrong and for what reasons when you're suddenly debugging a thousand processes at the same time.

Because in normal linear programming, you just have breakpoints and you figure out where exactly in your for-loop it goes wrong, but you can't do that with CUDA. Oh, and CUDA has a lot of libraries which you can start using.

So there is the cuBLAS, which is relatively well-known, the Basic Linear Algebra Subroutines library. There's a cuRand for a random number generation, which is used a lot. There's a graph analytics library. There's cuFFT, their Fast Fourier Transform library we were just talking about. So yeah, loads of things like that.

Anthony Alford: Very cool. I have to admit, I love the FFT. Definitely one of the great tools.

Roland Meertens: It is, maybe we should have a whole episode dedicated to Fast Fourier Transforms.

Anthony Alford: I mean, that's not a bad idea.

Roland Meertens: Definitely not. Anyways, what's also one thing I found interesting while doing this research is how does it work in terms of GPU cores? Because an old GTX 1080 Ti has 3,584 cores. So you basically have this amount of processors available to you if you have a 1080 Ti. I have a RTX 3060 Ti and I have 4,864 CUDA cores. So that's pretty nice for my gaming PC. If you now buy a new GPU, you can buy an RTX 4090 and that has 16,384 cores.

Anthony Alford: So let's talk about that then, if you have a second.

Roland Meertens: Yes.

Anthony Alford: How common is it for speedup to happen? When you have more cores, you go faster, you know what I'm saying? So for example, are most video games, say, designed in a way that if you get a better graphics card, you get a card with more cores, your game becomes better?

Roland Meertens: Yes, I was also thinking about that. So also to make it more complicated for me is that this RTX 4090 with the 16,000 cores is 1,500 pounds [sterling], and this is a consumer graphics card, right? So 1,500 pounds, loads of money, but still you could maybe afford it. If you go to the enterprise cards and A100 has 6,912 cores and is 7,100 pounds.

Anthony Alford: It's a pound a core or more.

Roland Meertens: Yes, I didn't reason about it that way, but yeah, you're right. You are indeed paying one pound a core.

Anthony Alford: Approximately.

Roland Meertens: Yeah, this doesn't go up again when you're talking about Hopper, the H100, the newest generation because it has 16,896 cores and costs 32,000 pounds.

Anthony Alford: Oh, so it's super linear.

Roland Meertens: Yes, it's the same as with the stock price of NVIDIA. It goes up very quickly.

Anthony Alford: Yes. Interesting. Okay.

Roland Meertens: But what I was then thinking is if you can get 16,000 or 17,000 cores on a consumer card for 1,500 pounds or for 32,000 pounds, what, except for the license agreement with NVIDIA, stops you from using one over the other? And this is all going back to your question about, do I need more cores to be faster?

And it is that the memory you operate on, on your GPU, is different than your RAM memory. So you have your GPU internal memory and you have to allocate your own memory on the GPU, send the data from your computer to your GPU and retrieve it when you need it again. Maybe you want to communicate between GPUs to share data between GPUs, and it seems that getting data in the right place at the moment is a larger bottleneck than just sheer computation.

Anthony Alford: I see. So when in parallel computing you talk about speedup, that's an actual metric. At some point it saturates because you've got something that, in your algorithm, that's not completely parallelizable. So it sounds like moving the data in and out of memory is one of those bottlenecks.

Roland Meertens: Yes, indeed, indeed. So yeah, I think that NVIDIA keeps stretching it and keeps coming up with faster cards, which is really cool to see, but it doesn't always mean that it's faster. And this is also something I might want to say to machine learning engineers who are listening to this, also maybe take a look at, what are you doing with your GPU? Are you using it efficiently? Are you using all the cores? Are you moving data in and out of it fast enough? Yeah, take some time to learn your tools.

Anthony Alford: I would say that might be a whole podcast because effectively we're talking about starvation of the cores, I guess. You've got cores sitting there with nothing to do because of that bottleneck of memory.

Roland Meertens: Yeah, you can see it. So there's this nvidia-smi command in your terminal you can type in, and then you can see your GPU utilization and you can get that to almost a hundred percent if you have your data ready to be computed at all times.

But sometimes you see that people are having a bottleneck in their CPU and thus, they don't have enough data in their GPU to compute things. So it's kind of like putting a very fast horse in front of a, I don't know, a brick or something, something which doesn't move at all. Yeah, you can have the fastest horse, but if you're not using it efficiently, it's not working.

Also, just still coming back to your one question you asked me five minutes ago. The one other thing to keep in mind is that later GPUs also have, so-called, tensor cores and they can do multiple multiply and add operations at the same time. So I think from the top of my head, they can multiply two-by-two matrices, they can multiply and add them together in one clock cycle or something insane like that.

Anthony Alford: Nice.

Roland Meertens: So if you can manage to write something in a way that these tensor cores are used, then you can go even faster because then you don't need to retrieve all your data, multiply it, add it, and then put it somewhere back.

Anthony Alford: I mentioned those digital signal processor chips from long ago, most of them had a single instruction to multiply two numbers and add them to a running total. It's called a multiply-accumulate.

Roland Meertens: Oh, nice.

Anthony Alford: I'm sure the GPUs have that too.

Roland Meertens: That sounds like something which should be there, but it shouldn't run then at the same time, should it?

Anthony Alford: Well, no. It's just a single instruction to do what we would think of as two things, multiply two numbers together and add them to a third number.

Roland Meertens: Oh, interesting.

Anthony Alford: So the use case of that is to calculate the dot product of two vectors.

Roland Meertens: Yeah. Yeah, yeah. That sounds pretty useful. Anyways, just going back to the conclusion then, why is it sold out? And I think that when Bitcoin became slightly popular in 2015, cards have already been hard to acquire all the time because people figured out that you can apparently, very efficiently, calculate hashes on it. And that's, I guess, parts of hash computation don't rely on previous parts of hash computation.

Anthony Alford: Yes, and this is where I've done some of my research for my segment.

Roland Meertens: Okay, perfect. Yeah, just reminding that it became pure insanity during the pandemic when everyone started buying them, both for gaming, and bitcoin mining, and neural networks. And I think everybody right now is both training and running inference on GPUs, and simply because you can do matrix multiplication very efficiently and can do so, especially for images when you need to apply them on each pixel.

Anthony Alford: Yep.

QCon London [23:46]

Hey, it is Roland Meertens here. I wanted to tell you about QCon London 2024. It is QCon's flagship international software development conference that takes place in the heart of London, next April 8 till 10. I will be there learning about senior practitioners' experiences and exploring their points of view on emerging trends and best practices across topics like software architecture, generative AI, platform engineering, observability and secure software supply chains. Discover what your peers have learned, explore the techniques they are using and learn about all the pitfalls to avoid. Learn more at qconlondon.com, and we really hope to see you there. Please say, "Hi," to me when you are.

GPUs and Hot Sauce [24:40]

Roland Meertens: So yeah, but you say you have a segment on why they were sold out all the time. Tell me more.

Anthony Alford: That's right, yeah. Well, I'm going to start with a quiz for you.

Roland Meertens: Okay.

Anthony Alford: How is a GPU like Sriracha hot sauce?

Roland Meertens: How is a GPU like Sriracha hot sauce? Both CEOs look hot all the time.

Anthony Alford: Pretty much. So I asked ChatGPT for these and ChatGPT gave me two, somewhat okay, jokes. One is, “They're both red-hot commodities right now and you can never seem to find them when you need them.”

Roland Meertens: That's a pretty good one.

Anthony Alford: And the other one was, “They both make you break a sweat just trying to find them, but once you get a taste, you're hooked.” So again, if those are lame, that's ChatGPT. I also asked ChatGPT, what does GPU stand for? And the answer is “Gone, Permanently Unavailable.”

Roland Meertens: Oh.

Anthony Alford: Which I think is a little bit better.

Roland Meertens: That's a pretty good joke.

Anthony Alford: Yeah. So I'm going to let ChatGPT do all my material from now on, but it's true that you mentioned GPUs are sold out. Sriracha hot sauce, you may have heard, is also sold out, at least here in the US.

Roland Meertens: I didn't know that.

Anthony Alford: Yeah, at least here in the USA, there's a brand of hot sauce called Sriracha, and there's been an ongoing shortage of it for about three years or so. The shortage was so bad this summer that scalpers were selling the bottles for more than $50. People would buy them wherever they could, they would go on Craigslist or eBay and sell them for $50.

Roland Meertens: That's insane. That's absolutely insane.

Anthony Alford: That is pretty nuts. And you cannot even do a convolution with it. So the question might be, what's going on? And the short answer is, supply chain. And we all learned to say, "Supply chain," and blame supply chain back in 2020 when you may…I don't know if you had a problem finding toilet paper and other daily essentials, but we started seeing some scarcities here in the supermarkets.

Roland Meertens: And that's when the Sriracha shortage started.

Anthony Alford: Well, it actually started... The roots of it go further back than that. So the problem with the hot sauce is a shortage of the peppers that they use to make the sauce. According to my research, they usually use about 50,000 tons of these peppers each year, and lately they've been having trouble sourcing them. One reason is, the company back in 2017... Huy Fong, I think, is the name of the company. But anyway, according to my research, back in 2017, they ended their 28-year relationship with their pepper supplier.

Roland Meertens: Oh.

Anthony Alford: So they had to find a new source for peppers.

Roland Meertens: In hindsight, that's not a very good idea, is it?

Supply and Demand [27:20]

Anthony Alford: Especially since that supplier says they could have kept up with demand, but instead the new suppliers are having trouble keeping up with demand because of drought, and so there's a general shortage. And because they don't have enough peppers, they can't produce enough supply. In the meantime, turns out the demand for hot sauce is static. It doesn't change much, and so if you've learned nothing else from economics, what are the two words you remember from economics?

Roland Meertens: Oh man, supply and demand.

Anthony Alford: Supply and demand. That's it, right? All of economics is supply, demand. So if supply does not meet demand, you either get shortages, or price increases, or both. So does that sound like GPUs?

Roland Meertens: That sounds exactly like GPUs.

Anthony Alford: Yes, and again, this is nothing new. You mentioned all the way back to 2015, 2016, there were some supply chain issues. There were limited yields on the chip production, so there were not enough GPUs to meet demand. At the same time, demand was rising because of cryptocurrency mining.

And as you mentioned, those computations, they would run faster on a GPU than on a CPU. So less supply, more demand, shortages. And when there's shortages, you have prices going up, either from the actual seller of these things or people scalping them, buying them up and reselling them for a markup

Roland Meertens: For the crypto parts, did you know that the cheaper gaming cards now have some protection built in to prevent you from mining cryptocurrency?

Anthony Alford: I have heard that, yeah. I don't know how they do it.

Roland Meertens: I also don't know how they do it, but I think it's insane that they are limiting what their cards can do to just get the cards in the hands of the gamers instead of scalpers.

Anthony Alford: And back when I started looking at this, again, a long time ago, probably 2017 or 2018, Bitcoin mining had gotten to the point where you really couldn't do it very efficiently even on a GPU. Everybody that was seriously mining Bitcoin were using custom hardware to do it.

Roland Meertens: Yeah.

Anthony Alford: I think some of the other coins like Ethereum, maybe it's still viable. So that's still driving some of the demand.

Roland Meertens: Yeah. I'm also surprised that this was still a thing in the last couple of years because I thought we were over this and people had come up with faster hardware, but no.

Anthony Alford: No, it's never going to end. So anyway, so that was 2015, '16, '17. Fast-forward to 2020 and 2021, and things just got even worse. So the supply chain became just completely bonkers. When we have things like pandemics and then container ships blocking global trade route choke points, talk about bottlenecks.

Roland Meertens: Haha.

Anthony Alford: I'm sure a lot of people remember that, the boat parked sideways in the Suez Canal.

Roland Meertens: What a time was that, yes.

Anthony Alford: We're going to have a great time looking back at telling the young people in 30 or 40 years that “You kids today, you just don't know how good you have it.” I hope, I hope we'll be able to do that.

Roland Meertens: You should have been there when the boat blocked the canal.

Anthony Alford: Oh man, but anyway, at the same time that supply was messed up, as you said, demand was at an all time high because now everyone needed, maybe not a high-end gaming rig or mining rig, but a reasonably good Zoom or Webex video chat rig.

And so the PC market grew by 13% in 2020, which I think was the highest since, I forget when, but it's pretty high historically. Now, things did seem to get better around 2022, prices were coming down, GPU cards were more available. But then that's when the AI took off. So in the second half of that year, there was some pretty spectacular AI models released, which you and all our listeners already know because they read InfoQ news.

Roland Meertens: Of course.

Anthony Alford: So for those who it's their first time, talking about things like Stable Diffusion, the image creation model, and then ChatGPT. And as you mentioned, training these models and running the inference, that's done on GPUs for the reasons that it's parallel and it's fast. So now, demand has shot up and it's stayed high, and that means, again, shortages. Even if there were not supply problems, the demand is causing that.

Roland Meertens: I find it so interesting to see that there is such a massive shortage and I have a feeling that in my daily life, I don't see, besides me purposely finding them because I love playing with AI stuff, but I don't have the feeling it's being used that much right now. Or at least there's such a long way to go and we're already seeing shortages, it's insane.

Anthony Alford: Well, and not only the consumers trying to find graphics cards for their gaming rigs, we talked about the big cloud players are struggling to find GPUs. Even Google, who has their own custom AI accelerator, the TPUs, the Tensor Processing Unit, they're having a hard time.

Roland Meertens: And the chip shortage itself seems to be relatively over, right?

The Beer Game - Not What You Think [32:31]

Anthony Alford: Yes. We'll get into that. So the problem is, now there's time lag, right? Everything has time lag in the supply chain, and that's actually one of the sources of the problems. So I'm probably not the best person to provide insights to this problem, right? People come and listen to this for the amazing insights on the podcast, but you might think an economist would be the best person, but it turns out that one of the pioneers in studying supply chains was actually an electrical engineer.

So this is a man named Jay Forrester. He got a bachelor's degree in electrical engineering from MIT in 1939. He got a master's in 1945. He was a pioneer in computing technology. He co-invented the magnetic core memory. You familiar with that, Roland?

Roland Meertens: I have never used a magnetic core memory. I didn't know this was a thing.

Anthony Alford: Okay. So I've never used one either. I'm not that old, but apparently back in the day, for RAM, computers would use something called magnetic core memory. It was an array of tiny little magnets and you could orient them one way for a one and the other way for a zero. And so basically you're using the magnetic field orientation to store a one or a zero.

Obviously kind of big, and heavy, and slow because it's not completely mechanical, but once we got RAM on microchips, it kind of went away. So he also worked on one of the first animations in the history of computer graphics. His team did an animation of a bouncing ball using an oscilloscope.

Roland Meertens: Oh, interesting. That must be a super weird mathematical equation, right?

Anthony Alford: Yeah, I'm not exactly sure. If you think about, that was the early displays. Vector graphics I think is essentially an oscilloscope of some kind. Back in the old, old days, like the old Star Wars games, I don't know if you ever played one of those.

Roland Meertens: Yeah, the Star Wars game is insane. It is so cool to see. I love it.

Anthony Alford: I put a lot of quarters into that, back in 1985, let me tell you.

Roland Meertens: Yeah, it's such a good game. It just plays so well.

Anthony Alford: So this guy Forrester, during the 1950s and 1960s, he became a professor at the MIT Sloan School of Management, and he was actually a founder of the discipline of system dynamics where he's studying things like supply chains, modeling them as dynamic systems. Some of his research, he wrote a book, which is said to have influenced the video game SimCity.

Roland Meertens: Oh, must have been a pretty good book.

Anthony Alford: Yeah, so again, he didn't just do one thing. So back to supply chains. In 1960, he was a professor in college. So he invented a tabletop game called the Beer Distribution Game, or simply “The Beer Game,” which was real popular with college students until they found out that you don't actually drink beer in this game.

Roland Meertens: Okay, so it's a board game?

Anthony Alford: Yes. So instead, the players take on the role of a business in a beer supply chain. It's a very simple four-step supply chain. You've got a brewer, distributor, wholesaler, and retailer.

Roland Meertens: Okay.

Anthony Alford: So each turn represents a week, and in every turn, the retailer gets a random number indicating the amount of beer demanded by consumers. So I think in the original iteration of the game, it was a number from zero, one or two, and this is, say, cases of beer. Each player receives order requests from downstream.

So the retailer sends his order to the wholesaler, and then each player also fulfills orders back downstream out of their inventory. So the retailer, he wants two cases of beer from the wholesaler, and then the wholesaler has to send the beer to the retailer. And then when you get an order in, you add it to your inventory and then you'll fulfill your orders out of your inventory. However, it takes two weeks for an order request to go upstream and another two weeks for it to go back downstream.

Roland Meertens: In real life or in this board game?

Anthony Alford: In the board game.

Roland Meertens: Okay.

Anthony Alford: In the board game. So the retailer, first turn, he has demand for two cases of beer. So he has some on hand. He fulfills that. Then he also has to ask the wholesaler for some beer. The wholesaler doesn't know about that until two turns later.

Roland Meertens: Oh, okay.

Anthony Alford: And then the retailer doesn't get that back, what he's asked for, until two weeks later. So it's basically a month lag between what you ask for and what you get. So one more thing is, you incur a cost every turn for the amount of inventory on hand and an even higher cost if you cannot fulfill a request, so if you have to back order.

Okay, so as it stands, it's a very simple model of the supply chain. And Forrester had students play this over the years, and what inevitably happened is that things just go out of control very quickly. In particular, there is a phenomenon that's now known as the bullwhip effect. The short description of this is that you get small fluctuations in demand at the retailer end, from zero, one or two cases. Those turn into bigger and bigger fluctuations in orders as you go upstream.

Roland Meertens: Yes, because you think, "Oh, I see so many orders, let's order a lot," and then you have too much.

Anthony Alford: Exactly. So in fact, if you ever study control systems, you'll learn that if there's a lag or delay in your control loop, you are very susceptible to oscillations. And in fact, that's what the bullwhip is, it oscillates from nothing to very high and back to nothing. So it turns out it's very difficult for human beings to, well, at first glance, your first time to play this, it's hard for you to understand how you need to play the game.

Roland Meertens: I checked the boardgamegeek.com site to see how people rated it, and it was a 7.2, with only five ratings. And someone left a comment, this game feels like it will never end. It was fun at first, but then as time went on, I gradually lost patience. So I think they were whipped by the bullwhip.

Anthony Alford: Yeah, it's probably not intended to be a party game. Instead, it's something for students to do in a classroom setting. But, let's bring this discussion back to AI. So as I was studying up on this beer game, I started thinking, "Well, surely someone's trained a model to play this."

So let me ask you... Well, actually let me first talk about what is the optimal strategy if you're a human player. So this is actually probably what you might consider a cooperative board game. Basically, you're trying to work together to make the supply chain run smoothly. So since this game's been around for a while, the optimal strategy is known. Somebody probably did their PhD dissertation on this. I don't know.

Roland Meertens: Is it a PID controller or something?

Anthony Alford: Well, it's simpler than that. It's called the base-stock policy. The policy is, always order an amount of beer to bring your inventory position, which is your on-hand, minus back orders, plus on-order, bring that to a fixed value. And that's called the base-stock value. So it's apparently very simple to do the optimal strategy. So if you were going to train an AI model to play the beer game, what would you do?

Roland Meertens: In terms of what information would I give it, or what would the strategy be?

Anthony Alford: Like what kind of AI model? What would your model architecture look like, maybe?

Roland Meertens: I guess you would have the inputs and you would have some kind of recurrence network, right, because you have to learn that amount over time?

Anthony Alford: Okay.

Roland Meertens: It's probably going to be some kind of temporal model. Now that we know the optimal strategy, I guess we can just compute it every turn.

Anthony Alford: So, right, if you're not going to learn it, right? If you're just going to program an agent to do it, yeah, you might program it to just play the optimal strategy. But if you're going to learn it, turns out for a lot of games, people use reinforcement learning.

So basically the model plays games over and over. So someone did do that, at least one research group that I found, a group at Lehigh University used reinforcement learning with a Deep Q-Network and it learned the optimal policy.

Roland Meertens: Oh, okay. So it actually learns the actual optimal policy. Oh, nice.

Anthony Alford: And it turns out that it even learned how to play the game pretty well when the other players are humans who are not following the optimal policy. It works best when everybody follows the optimal policy, but not everybody knows it or plays that. So the reinforcement learning did pretty well.

Roland Meertens: By also learning from humans to play optimally.

Anthony Alford: Yes, exactly. Well, typically, reinforcement learning, there's that whole trade-off between exploration and exploitation. So it'll try just random stuff at first, but over time, after playing multiple rounds of the game, multiple games, it tries to do better. So you have a reward function built in, which is the cost, right? Every turn you incur a cost, so you want the opposite of that to be the reward function.

Roland Meertens: The poor humans playing this board game over and over again just to make the AI learn it optimally.

Anthony Alford: Yeah. So I think actually they programmed fake humans to follow basically just an irrational strategy. But the code is available on GitHub.

Roland Meertens: Nice.

Anthony Alford: I don't know if we have show notes or anything like that.

Roland Meertens: We hereby have show notes and we can put it in there.

Anthony Alford: Yeah. So does that mean that we should put AI in charge of our supply chains?

Roland Meertens: Given that it's still learning, that sounds like a bad idea, but if it can learn from all the past experiences, I guess it can work out.

Anthony Alford: Well, in fact, there's a AI doom scenario that people talk about called paperclip maximization.

Roland Meertens: Yeah.

Anthony Alford: You've got an AI whose job is to maximize production of paperclips, and that leads to unintended consequences like, well, if there were no human beings, I could turn the human beings into paperclips or something completely disastrous. So...

Roland Meertens: We can use all the land available on the Earth to make paperclips.

Anthony Alford: Yeah, exactly. Use all the oxygen in the air, something like that.

Roland Meertens: Yeah.

Anthony Alford: So anyway, that was what I learned about supply chains, and AI, and GPUs, and hot sauce, and beer. So I guess to tie it all together, you should have game night using your GPU, have some beer and hot wings.

Roland Meertens: That sounds pretty good. I will add one more fun story about predicting supply and that is that I knew someone who was working at a company which would have a daily demand for products. And they trained a model on one country, like let's say Germany, and then they also deployed that on another country, let's say the Netherlands.

But the Netherlands and Germany, they have a couple of different days, like maybe in the Netherlands you have King's Day where we celebrate King's Day, and then of course then the month is going to be way higher or way lower than predicted. So basically, they had to work on the other end of this model, where the model was just wrong the half of the time, and they had to try to fix it.

Anthony Alford: Oh, no.

Roland Meertens: So in terms of predicting, that's the moment when I learned that when you're predicting things, you should really learn your local patterns instead of your global patterns.

Anthony Alford: Definitely true. Well, it's interesting, I don't know if I want to save this for later, but to me it was interesting that the optimal strategy for this game is very simple.

Anthony Alford: You just try to hit a target and in fact, you don't really need to look back or forward very much. You just need to keep track of what you have on hand, and what you've got on back order, and what you've got ordered. So you basically just need to watch three numbers.

Conclusion [43:39]

Roland Meertens: That sounds easier than expected. So that brings us to words of wisdom. What did we learn in this podcast? Anthony, what did you learn in this podcast? And did you learn anything yourself recently, which you want to share with the listeners?

Anthony Alford: Again, I think it was interesting to me that the optimal strategy for the beer game was so simple.

Roland Meertens: I was also, in terms of podcast, surprised about that, that there is such a simple strategy. I also really liked the analogy of supply chains to control algorithms, where I can definitely imagine it's very hard to predict how much you need of a product if you have people anxiously shouting that they want it. But you don't know what percentage of customers is doing that.

Anthony Alford: Yeah, and that's something I didn't even get into, is that people act like they want it, but then maybe they don't actually, or they buy it and they return it, or those kinds of things bring all kind of complications to that simple model.

Roland Meertens: I think the best example of this, which I experienced in real life was fidget spinners, where from one day to the other, fidget spinners were the hot commodity. In Mall of America, they had them in every shop, and two weeks later, people were like, "Well, we all have a fidget spinner now, so we don't need one." And that's when the back orders started to come in from all the stores who were like, "Oh, let's get into this new hype."

Anthony Alford: Oh boy.

Roland Meertens: And it was a very short-lived hype, and then afterwards, all the shops just had masses. They gave them away for free. It's crazy.

Anthony Alford: Kind of like paperclips.

Roland Meertens: Kind of like paperclips, yes. One day you just make fidget spinners. Anything you learned in your private life, any interesting insights, fun facts, fun prompts you found?

Anthony Alford: No. I did meet with some folks at my job that I had not talked to much before and we bonded over talking about state machines, so that was a lot of fun.

Roland Meertens: Nice. From my side, in terms of things I learned is that, one, chicken is vegetarian. So I had a birthday dinner yesterday and I asked ChatGPT to make me a menu which was vegetarian. So it proposed to me to start with a tomato soup, then have some roast vegetables, have some chicken and end with a pie. Yeah, had an amazing vegetarian dinner and made an amazing chicken. I guess we're learning again that you always have to check your outputs of ChatGPT.

Anthony Alford: I don't even have anything funny to say in response to that. So astonishing.

Roland Meertens: I thought it was very weird because it even ended with, "Hey, look, I made you this amazing menu which is suitable for all people vegetarian," or something, like, bro, you just-

Anthony Alford: At least it wasn't poison though.

Roland Meertens: No, indeed. Indeed. I know we talked about this in a previous podcast, so I don't even know why I was asking ChatGPT for recipes, but I thought it would be fun to try to do this.

The other thing I have is an appeal to Microsoft. So I just realized this. I have this super weird shaped keyboard, the Microsoft Sculpt, and it is the best keyboard ever made, especially if you have RSI because it has a nice gentle curve and it is very small. So I want a small, tiny bit curved keyboard. And last week I had two of them at home, like two sets because they come as a set of a mouse and a keyboard. And I had two of them at home and both broke in the same week.

And here comes the thing I learned, Microsoft stopped producing them in September. So I guess I'm royally screwed, and I just received one from Italy. So I now have a keyboard with an Italian layout. So if anyone at Microsoft is listening, please bring the Microsoft Sculpt back, it's the best keyboard ever.

Anthony Alford: I was going to recommend you buy as many of them as you can find, but maybe not.

Roland Meertens: Yes, I found one, but I was also thinking, "How many keyboards do I want to buy right now?" Because I really like it and I tried many keyboards and didn't like any other one actually. Yeah, if listeners have recommendations, please, please add me to LinkedIn and please send me your recommendation. I'm eager to try anything.

Okay, anyways, I guess that brings us to the end of the podcast. Please like this podcast on whatever platform you're listening to and tell your friends about it. The best way to get to know podcasts is by telling your friends, word of mouth. I don't know if you have any podcast recommendation engines, Anthony. I don't. I literally rely on people telling me what podcasts they like.

Anthony Alford: I'm the same. Yep.

Roland Meertens: Yes, it's so weird that for videos, YouTube recommends you content. For podcasts, nothing. It's really word of mouth. So if you have friends who are commuting a long time to work, sitting for about an hour in a car, then this podcast is perfect because it is, by now, about an hour.

Anthony Alford: 1.5x. So whatever that works out to, 40 minutes.

Roland Meertens: Yes, but whenever I listen to this back and I hear my own voice, I'm like, "Man, this is way too fast." So sorry about that. I'm already talking at 1.5x.

Anthony Alford: Oh dear.

Roland Meertens: Anyways, so yeah, this was Generally AI, an InfoQ podcast. Thank you very much for listening and looking forward to the next episode.

Anthony Alford: Pleasure as always.

Roland Meertens: Thank you, Anthony.

Anthony Alford: Cheers.

About the Authors

Anthony Alford

Show moreShow less

Roland Meertens

Show moreShow less

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.