InfoQ Homepage Presentations Real-world Architecture Panel

Real-world Architecture Panel

View Presentation

Speed:

Download

49:14

Summary

The panelists discuss the unique challenges and opportunities in software / hardware architectures that interact with the physical world, with particular emphasis on data flow, control, and machine learning.

Bio

Randy Shoup is currently VP Engineering at WeWork in San Francisco. David Banyard is Senior Senior Construction Technology Disrupter at WeWork. Tessa Lau is currently Founder/CEO at Dusty Robotics. Jeff Williams is Robotics Systems Developer at AddRobots. Colin Breck is Sr. Staff Software Engineer at Tesla.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Shoup: Welcome to the last session of the day in the "Architectures You've Always Wondered About" track. So this is a little bit of a divergence because when we usually say at QCon “Architectures You've Always Wondered About," there's this implicit bracket in front, which says, "(Software) Architectures You've Always Wondered About," right? And so the interesting thing that I thought is, and I'll tell you the inspiration in a moment, is like, "What if we got together people that did some software, but also did hardware that interacted with the real world?" And there's a lot of hype, both deserved and otherwise, around the Internet of Things, various IoT stuff, and there's maybe, if you haven't done it as I haven't, you might have this weird idea that controlling physical systems in the real world is some new thing that we're just now inventing. And as we will find out, you won't be surprised to learn that this stuff has actually been done for decades, and some of the people up here have been doing it for decades.

So the way I thought I'd do this is pretty casual. So, I'll have everybody introduce themselves up on the panel. I have a bunch of questions that I've queued up with them. But also maybe once we get going, people can feel free to raise their hands and we'll kind of go from there. I guess the one thing I need to say from it, is that I do kind of have to moderate the questions a little bit. Some of us work at a place, but we're not able to talk about what we're doing in the real world, if that makes any sense. And so just be sensitive to that if I have to say, "No, he's not going to be able to answer that." Does that makes sense? So if you wouldn't mind starting us off. Colin Breck works at Tesla, and maybe you can tell us a little bit about the things you've been doing in your career.

About the Panelists

Breck: So I have history in industrial automation and control. I worked at a company for over a decade that builds a time series infrastructure that's used widely in industry. In the past three years, I've been working at Tesla on distributed systems for monitoring, aggregation, optimization, and control of distributed energy assets. So that's stationary battery storage, solar vehicle charging, these kind of things. And I'm one of these people who needs to be careful about what I say. So just to make it clear, I'm speaking on my personal experiences. I'm not speaking on behalf of my employer. So if you tweet about this event, I hope you can respect that.

Banyard: My name is David Banyard. I'm an architect by training. Most recently before joining WeWork, I was the manager of the building of the Warriors arena over in Mission Bay. I have a wide-ranging career that spans from training to my A Plus certification at 14 years old, to not really doing a lot of programming now, but my main job now is that I apply artificial intelligence, machine learning, robots, and different things into construction. My current title is manager of construction technology, so I'm finding ways to get both software and hardware and actual technology into the building environment.

Shoup: And I just want to point out that he's, like, a real architect.

Lau: Hard to follow that. My name is Tessa Lau. Up until about seven months ago, I was chief robot whisperer and CTO at a little startup called Savioke. If you haven't seen it yet, Savioke Relay is a hotel delivery robot. So you can imagine if you're staying in a hotel room and you want to order room service late at night, a little R2D2-sized robot trundles up to your door and delivers it to you. So we got about 100 robots out into the world, but then I left and decided to start a new thing. And so I'm now founder and CEO of a new company called Dusty Robotics. We are actually going after the construction industry, and trying to build robots that can actually automate some of the building of the buildings that we all live, work and eat in.

Williams: Hi, I'm Jeff Williams. I actually have a day job. I'm an IC at Autodesk. I handle millions of user transactions that are coming in on various things every day. But I have a passion project, which I and a couple of other people and I have been working on for several years, where our goal is to reduce the cost of manipulating the physical world, probably on the order of about 100X and beginning significantly simpler. And our goal is that every person in this room will be able to do some industrial strength manipulation in the world of various kinds. You know, over the next couple of years, that'll just become much and much easier. And it's totally open source. It's not a business. It's just a thing that we want to see in the world.

Democratizing Automation for Everybody

Shoup: Yes, I'm going to start with you for this. I know you don't like it. So again, the problem that you're trying to solve, you could say it, but democratizing automation for everybody. And then we can start to talk a little bit about some of the software and some of the components in the systems, does that make sense?

Williams: Sure. There are kind of two big pieces. Big piece number one is that the automation landscape is really deeply unfair to workers at the moment. And if things continue to play forward as they are, these tens of millions or hundreds of millions of robots will be owned by a very small number of people. Had the workers made the decisions to do things themselves on their own, let's for the sake of argument say at a third arm, the decisions they would make about what they would automate, how they would want this automation to help them will be very different than, say, like Jeff Bezos or Jack Ma might decide to do.

So that's kind of in the socialist camp. But then in the capitalist camp, the robotic business is largely controlled by integrators. And it's not that they're bad companies, it's that the integrator business is kind of hard. And as a function of the cost of sale, it doesn't make a lot of sense for integrators to touch small projects. So, as you know in the past, $10 million, $15 million is kind of the range under which these projects start to make sense, and they take a long time. But there's really no industrial answer for, "I want to do a thing. I'm the only person who cares about it. I have a small budget and maybe a couple of weeks to get it done," which is really common for how we do software. And so those two things are driving this project for me.

Applications for Robotics

Shoup: Tessa, you want to talk a little bit about some of the things that inspired you to, well first inspired you to found Dusty Robotics, and then, I know you're a thick early, but maybe you can talk a little bit about what systems you're thinking of putting together that you have done already.

Lau: Sure. I've been building robots for a while. When we started Dusty, I actually was remodeling my house. And I was looking for a new opportunity to get robots out into the world doing something useful. We'd already explored the hospitality industries like hotels and hospitals through Savioke. And when I was looking for inspiration, I actually talked to my general contractor and learned about how construction is done today. And I realized that it's the way it's always been done. The way they're doing it now is, in a lot of cases, the same way the construction has been done for the last 50 or 100 years. And it's also very labor intensive. So it's a great opportunity to get robots out into the world to do useful things.

The challenge is that the expectations are really high. If you talk to someone who doesn't know anything about robotics, they expect that you can build a humanoid robot, and it can do human equivalent tasks. And, “Oh, yes, you can get that done next year.” And that's a really tall order to do for someone who understands the technology. So as we've been developing Dusty, we've been looking for applications for robotics that are doable using today's technology. So we're leveraging our experience building robots that can drive around in human environments and try to find applications where mobile robots like that, that we can build on a reasonable timeframe, can actually do some useful work.

Banyard: I think a lot of people think that since we're interested in using robots in construction, that we're trying to get rid of people's jobs. So I will dispel that right now. I think we are happy with actually employing more people to monitor those robots to make sure that they're doing what they're doing as long as we find ways to do things more consistently and faster. So if we're paying somebody more to do that same job, that's why we're interested very much so in getting robots to do things and finding ways to automate a lot of processes and even monitor things. Because we find that if we can get a very consistent and fast product, we can be cheaper and more efficient in our work. So I think that's why we're very interested in it.

Motivation

Breck: Motivation question?

Shoup: Yes. I was actually thinking we started with software-hardware architecture, but you can go with motivation, and then we'll get there.

Breck: So what motivates me. I mean, the ultimate driver is changing the way people produce and consume electricity. But day-to-day for me, what that looks like is building operational technology. So basic technologies that are controlling physical systems and trying to do this in aggregate, at scale in distributed systems over the internet with unreliable messaging and things that are offline and this kind of thing.

And the large piece for me actually is something I'm fairly passionate about is growing teams that are effective in doing this. A lot of the work I do is kind of the intersection between, say, process engineering, industrial control, and distributed systems. And I find there's a lot of people with a lot of experience in, say, building cloud-native distributed systems really well. And there's a lot of people with a background, say, in industrial automation. But at the intersection of the two, I think there's not a ton of people with that experience. A big part of my job, I would say, is actually growing teams with that intersection of experience.

Shoup: If we could riff on that for a little bit. Maybe you could talk a little bit about some of the components. Actually, if you want to riff, go for it.

Lau: I would love to tell a story about this. One of the challenges in building robots is that it's a multi-disciplinary process. You can't have just a single person who knows how to build all of the different parts that go into a robot. And so, every time, talking about assembling teams, that's one of the things you need to do when you're building a robotics business. And so I'll tell you a story about one of the problems that we debugged with the Savioke Relay. The robot, it's a robot that drives down the hallway, and it supposed to drive straight. And there's one time we had this robot, and it was actually driving what they call drunk. It was actually wobbling back and forth. And our customers were complaining to us, "Our robot is drunk, what's going on?"

So we sent out a tech to investigate. And we first thought it was the software, right? Obviously, you start at the top of the stack. It's the software. It's giving it the wrong commands. We ruled that out. Then we thought it was the electrical system. You know, maybe there was a loose contact or something. We ruled that out. And then we checked the mechanical systems. And it turned out that there was a loose bolt, and one of our wheels was actually not turning when it should have been turning. And so it took the entire team to debug that all the way down the stack just to figure out, you know, start at a top level behavior and work your way all the way down. So these systems that we're building are getting very, very complex and need a lot of people.

Shoup: I was expecting you to say it was the DNS.

Lau: That too.

Time Series Data

Shoup: So actually where I was going to go was riffing off a little bit the intersection between industrial control and distributed systems, and maybe you could talk briefly about the collection of the time series data that you've been working on, and just motivate that problem a little bit for everybody.

Breck: Yes, maybe building blocks is a good way for me to kind of speak about. I can't speak about any technologies I use in particular, but the building blocks that are important for me in these systems. There are three things. There's state, there's messaging, and there's dealing with all this data in motion. So first of all, in terms of state, you often want to model these physical things, whether they're a battery or a heat exchanger, or a robot. This notion of like a digital twin that lives in software, I think GE coined that term. So it lives in software and represents the current state of a device is really important.

And also, I want state machines for modeling because you want a model that a device is online or offline and maybe have different behaviors in those states, or model workflows of devices coming online, something like this. So for me, that plays out in the actor model programming. I'm a huge fan of actor model programming for this, and using actors as the unit for concurrency, distribution, and modeling state. Big fan of that.

The next thing is messaging. And these systems, they don't look like crud. You have events coming in, you want to have bi-directional communication, often in near real time. You want to have message queues that have persistence so that you can share messages. You want to have pub/sub systems so you can subscribe to events, so messaging is really, really diverse. And you need a huge ecosystem in your toolkit to make that happen. And then I think the third thing that I feel is a fundamental building block is dealing with the data in motion, because these systems are...there's data flying everywhere, and you need to have something to make the system reliable through these dynamics. I'm a big fan of Reactive Streams for this.

So Reactive Streams is a protocol. It's for interoperability of systems and libraries. But ultimately, in building these systems, you want some kind of high-level reactive streams library. So you can express your programs in terms of mapping, filtering, throttling, merging streams, and bifurcating streams. That's kind of sick. So to me, those are the fundamental pieces. They take many different forms once you actually get down to different technologies, but those are the pieces.

Leveraging Data and Algorithms

Shoup: David, we talked earlier about leveraging data and algorithms in a bunch of the work that you do. Maybe that's something you could expand on a little bit, or maybe talk about some of the components that you put in place at the Warriors stadium or something like that.

Banyard: Well, in general, one of the ways that we use algorithms and design is in parametric for pushing an idea through a model. So you can take a computational algorithm and actually form your design and architecture from it. In the Warriors arena, you'll see there's little slots all around the face of the actual arena circle. And on that, those slots were actually formed by an algorithm that was just a computer algorithm that they determined for, I forget, for something else's like some space movie that the owner liked or something, and they just conceptualized that into the building. But then in our actual construction work, we're using algorithms to push the way that we can analyze how workers are in a space, analyze how efficient we're being in a space. And so that's where we're trying to go now with using algorithms that are for other things in our actual built environments.

Shoup: Just before we go on, we talked about this before, but I'd love for you to say the scale of that work, like how many people, because it's pretty sweet.

Banyard: On the Warriors arena, we had up to about 1,400 people per day. We had to kind of understand where all those people are working and understand where all the logistics of all those people were coming on site. That amount of people produced, on most days, about $2.5 million worth of work every single day. So you're placing $2.5 million worth of work, and that's steel, that's chip, that's studs, that's concrete, that's rebar. I mean, we're putting in a lot of different things. We had to understand where all the logistics and where trucks were coming in of different types, of different sizes. So we had to judge all of that together. And so we did some of that by human brain, but some of it was also by using machine understanding of where things were so that we didn't have major conflicts on our sites.

Lau: So, data. One of the common questions we get about robotics is, "Do you use machine learning inside of a robot?" Everyone ask me that. The truth is, we don't. Not yet. And one of the reasons why is because there's a lot of machine learning capabilities coming online now, TensorFlow, you name it. Google and Amazon are leading the pack in terms of providing those technologies for everyone to use in the cloud. The problem with the robot is that it's a disconnected environment. You can't have a robot driving down the hallway and checking with its cloud server and saying, "Should I stop?" Especially if you're standing right in front of it.

So a lot of the robotics that, I think, are being developed today are making use of edge computing. Before, it was called edge computing, that's what we were doing with our robots. We were processing all the sensor data from all the cameras, and lidars, and sonars coming off the robot and trying to make decisions about, "How do I get to where I'm going? How do I plan a path around the things that I'm seeing in my environment so that I can keep it safe and not run into anything that I shouldn't?" And so all of that is currently onboard a robot, although I think one of the interesting frontiers is, how do we push some of that compute out to the cloud and try to take advantage of all the stuff that's happening online these days but still retain the safety, the basic safety, about how to operate that in a partially disconnected environment?

Data Components

Shoup: Cool. We're talking about data components in your system?

Williams: Very similar to what you guys have described, the way it starts is the whole thing runs in Kubernetes. The goal is to make it really approachable to the kinds of people who are in this room. So for those of you who aren't familiar with Eclipse Che, it's a cloud IDE. It's essentially a no-install cloud IDE. And so, you sign up, and you get your session, and you get a container, and now your robot, your motion control, is actually running in the cloud. The entire thing is commanded in real time to Google protocol buffer encoded messages that are being sent back and forth and the larger distribution of those messages to Google Firebase. And that was sort of like, "Do we run that in RabbitMQ or do we let Google run it? Finally, we’ll let Google run it. The performance is actually quite high.

The Edge device in our case is actually Android, and the reason for that is twofold. Well, actually three. Thing one is our sort of key innovation is that the motors are USB 3 Type-C devices with power delivery, so a USB cable delivers both the power and the data to the motor. So that's it. That's your whole bus. And there's a little board that goes on the back of the motor that does all the local drive. But then we incorporate a native version of OpenCV, so all the behaviors of OpenCV are available to the phone for local control, although the motion planning is all done entirely within the cloud. Round trip time from Google on these things is about 50 milliseconds, plus or minus, but we actually schedule the commands so you can get microsecond precision. And then those Google protocol buffer commands are bi-directional, and they're making their way all the way through. And they're reliable, right through all the way down across the board.

So when it gets to the board, those same commands are being spooled out over a serial peripheral bus to reach all the various components that are on the board. And the goal is to be able to use things that are cheap. So the coding is done. You could take a broken school Chromebook, and that's your coding. You close it, the program is still running. I'm able to find these USB handsets now, the sort of older versions that do power delivery, you know, use $100-$150 range. And that's sort of like the broad overview of the tech stack at the moment.

Shoup: I think maybe I'll open it up for broader questions. I have a bunch of other ones queued up, but maybe you can get some audience participation in here too. So I'll happily walk around if anybody has any questions.

Continuous Deployment?

Participant 1: How can you do deployment when on the other side there is a warehouse full of robots? Do you do continuous deployment still, or it's not possible? I'm asking because it's easy when there is a software on the other side only in the cloud, right? You can always scale it up and down. But what if there is actual production?

Lau: Yes. I'll take this one, and then you guys can chime in if you like. The deployment was actually one of the biggest challenges that we had at Savioke. And it's hard because you are maintaining both the hardware and the software platform at the same time. And your hardware platform, if you're operating fast, and you're a lean startup, and you're continuing to iterate, your hardware platform is constantly changing. We did one analysis at some point of how many different software hardware versions we had in the field, and it was in the hundreds, even though we only had one product on the market nominally, and several dozen instances of that one product. And it's because when things break, you fix them, but it's pretty cost prohibitive to fly someone out to fix it everywhere it exists. So you end up doing things piecemeal. And I'm sure that could be done better, but that's how we did it.

And so, some of the challenges that we had in terms of deployment are, how do you package things so that they can transactionally get upgraded on each new hardware when each new hardware might have a different configuration? You can't test everything adequately in simulation because it's really hard to simulate the real world and all of its complexity, so you can't rely on testing. And also, it's sort of like the consumer router upgrade problem. How do you upgrade your hardware without breaking it, when you're providing a 24/7 service and your customers are expecting you to keep it up to date and running the latest security patches and adding new features? So those are some of the challenges that still remain to be solved, I think, for robotics.

Breck: I can comment on it too, if you want. I think one really important lesson actually when you're controlling physical systems is that you can't rely on rapid iteration necessarily of the software that's running close to the hardware. If you have millions of IoT devices, you're not going to update those in some tight feedback loop. It just going to take time. Some of them might be offline for weeks at a time and come back on old firmware versions. And you can also run into environments where certain hardware is, say, validated. So you can't even upgrade software at will. It has to be revalidated.

And then you may also have legacy systems that you just can't even update. So I think a real lesson is you need to be responsive server-side to this kind of stuff. And that comes down to really modeling your assets and understanding their capabilities, which is challenging. And also, thinking of ways that you can evolve your system and compensate server-side for devices that you can't necessarily iterate as fast as you'd like in terms of the software running close to the hardware.

Williams: Yes. Having been creamed by this problem many times in the past, one of the early design decisions I made was to adopt this … If anyone's familiar with JTAG, which is a form of writing firmware, there's a, I call it C doubt for short. But essentially, it allows you to rewrite firmware from the USB bus. So early on, I'm like, "Oh my god, I don't want to do field updates anymore." So we actually provide a mechanism by which the Android itself will issue the USB commands over the bus to update the firmware for the motors, and you can schedule that. And I feel like these are the kinds of things that need to be solved in order for actuating the physical world to become much more approachable to people who are more comfortable with software.

And I think we're going to see more of this kind of thing where, you know, which is more standardizing, making things more uniform so that the model of the program environment actually looks coherent for the people writing the program that's running in the cloud, which is where we sort of naturally want everything to be right now.

Challenges of Cueing Software in the Physical Environment

Shoup: Riffing on that a little bit, we've mentioned some of the challenges about cueing software in the physical environment and the real world. And I think one of them is the upgrade situation, but I think there are maybe others. And David, I know some of the things that we've talked about an email is the challenges around like, "Hey, there's this built environment, and I'm trying to install, I don't know, software," something like that. And maybe we could talk a little bit about that.

Banyard: I keep on saying “in general.” I should stop. We're working in spaces that are relatively similar as far as WeWork goes. So what we try to do is, because we have a lot of things that we can do a certain type of wall over and over and over. I could iterate one thing like 50 times if I want, because we're doing hundreds of projects at a time. I can have a similar or iterative advanced exploration of whatever that particular thing is in a lot of different forms. And I don't have to necessarily update the early one because I'm seeing what becomes of that in progress so I can start to see which one gave me the best result.

So it kind of gives us a little bit of flexibility that way, that we can do a lot of different iterations of things without having to go back and say, "Oh, that one needs some update on some software. I can just let that one run unless there's something that's critical, or if it's causing us a delay. Because if something causes delay, it's not good. So we have to have ways of still getting the work done. So we have some contingencies usually to be able to do it either manually or some other way, or we are allowed to go back if it is a critical path. But generally, we're just trying to [inaudible 00:27:28] We're trying to just see what happens in a lot of different ways to see what we come up with.

Shoup: I love that idea of iterating and learning as you go. That's the thing that we do a lot in software, but it's pretty cool to be able to do that at scale in a physical environment.

Participant 2: How far do you think we are from the physical world and us building there, is from the software world, where we have open source? And a lot of what we do for even large enterprise projects is just gluing things together, and the standards make it all kind of to snap?

Lau: The interesting thing in robotics right now is that there are starting to become more of building blocks. When we started Savioke about five years ago, we were building a lot of it from scratch. So there was the robot operating system, Ross, which was one of those building blocks, thanks to open source. And I know there's an OSR person in the audience here. But nowadays, they're starting to become more tools. But it's still far from what you see in the software case. It's much, much easier to bring a startup.com online now, and, build your own website and deploy it with a couple clicks. You can do that pretty quickly.

But with a robot, it's still at least a couple months effort to get their first prototype out, just because the availability of parts, there's a long lead time for supplies. You have to bring them all together. You need lab space to assemble it. You can't just do it in your living room with a laptop. So, I'm excited that there's more stuff coming available, but I think we're still a long way from seeing the same richness of toolsets and primitive building blocks that we can cobble together for robotics quite yet.

Banyard: I think for us in a vertical building environment, it's much quicker to see that type of thing. In regular construction, it takes a long time to start to see robots and different things applied to the field. So, and you'll see that there are companies like ours, there's Katerra, there are some others that are doing some vertical there. They can control the whole process, so that gives them the ability to do more exploration and pre-fabrication and things like that. So we're allowed to do more iterations of that, but I don't think in the regular built environment it's going to take a little bit longer because they wait until something's totally proven before they'll even touch it. I mean, I think some guys are still stuck on calculators because they don't trust computers yet.

Shoup: Componentization. No, I don't want to do it. Do you want to do it? You want to do it, Jeff?

Williams: What's that?

Shoup: The question was around a philosophy that is open source, where we can assemble things and components.

Williams: Oh, yes. Well, I mean, they speak my language. Yes.

Shoup: Yes. Well, I was like, "What do you mean you don't want to talk about …?"

Williams: So the one thing I think is really important is the, and you bring this up, David, is this parametric design and design constraints. And I think a lot of us when we hear the word robot, we think six degree of freedom arms, something that's sort of anthropomorphic. But really, this room is robot. I mean, it is. I mean, there's several systems that are automated in this room. And so, when we start to change what we think about is a robot or machine control, the kinds of applications I'm interested in are really narrowly focused. I'm serious when I say it's like a thing that does a thing that only one person cares about. And it maybe doesn't even move around on the ground. I don't really want to define what that domain is, I just want to make it so that it's easy to do and cheap to do.

And so, I think there are two kinds of development there going on. There are people like me who are super general and like Pinus guy, and there's people who are very specific. And it's hard to be me when you're trying to solve a specific problem. So I kind of have the freedom to just sort of like, you know, my little side project. But if you're trying to build a real thing, that doesn't necessarily work. So you kind of have to decide what the domain you care about is, before you can say how easy or how hard it will be to make it purely software I think, at least now.

Security Concerns

Participant 3: Can you speak to some of the security concerns that you had to address or that manifest in your domain?

Banyard: I guess I could start with that. We have had to look into a lot of things because we exist globally into following things like GDPR in Britain or in Europe. So a lot of the things that we were already exploring, we've had to adjust because now an image is considered data. So we have to be careful that if we have people coming into our spaces that are from places that have restrictions, that we can accommodate and be respectful of that. So we're having to adjust a lot of things that we're doing to make sure that no matter what we're doing, that we're keeping that in mind.

Lau: I'll tell a little story about hotel robots. A robot is basically a camera on wheels, many, many sensors on wheels, a little server farm in there. And one of the things that they do is record their video of all the stuff around them. And you can imagine if you're in a hotel and you have that camera on board a robot that could become a security or privacy problem for some people. There have been some problems in the past with hotels getting upset because pictures were taken of people who were staying in those hotels and they were published on the internet, and those people didn't want them published.

So as a company that builds that hardware device that's roaming around inside of a hotel and taking pictures, we struggled with, how do you make that, preserve people's privacy in an era where everyone's worried about online privacy? We considered things like, do we put a red recording light on a robot to indicate that its sensors are actually grabbing video data. But we thought that was very intrusive, and we didn't want to do that. We didn't want to remind people that this thing is essentially a little video recorder.

And so what we did was we made a compromise. We decided that the public spaces in a hotel, like the hallways, are public and filmable. And the private spaces behind your door, when you're staying in a hotel room, are private. And so, we decided that all of the video that our robot was capturing would be blurred when it's pointed into a hotel room. And so that's a compromise that we got to in order to be able to operate safely without destroying the aesthetic of our product, but also preserving some of the right to privacy that people might have when they're staying in a public environment like that.

Breck: I can't talk about any specifics. No surprise. But I'd say definitely anytime you're controlling physical devices, there's always a safety concern for sure. So security needs to be foremost in the design of your systems. And then certainly with the rise of importance of personal data protection, it's also very important. So if I especially reflect back on my previous experience in industrial automation and control, a lot of those environments are under regulation. You're under NERC CIP, these kind of things, or you're a pharmaceutical manufacturer, and you're regulated. So you have all these rules you need to follow that often dictate the security concerns around your systems as well.

And another kind of unique thing about those systems is you're often dealing with legacy systems that you just can't update. So you have some really terrible piece of software written in C with all sorts of buffer overrun, this kind of thing. And if that ever gets exposed to somewhere on your network that can be taken advantage of, it's a huge problem. So figuring out how to isolate those systems and protect them in the industrial world, that's a big, big problem for sure.

I think that's actually important for our industry is that, you know, taking software development almost to an engineering discipline. Because if we don't, it's going to get forced upon us through government, through regulations, through these kind of things, just because these systems are so critical and then, and they have safety concerns around them.

Technical Skills and Technologies

Participant 4: What type of skills or technical skills or technologies does one need to know to be involved in your fields?

Breck: Systems thinking is number one in my list, actually, because I think you already mentioned the complexity of these systems is you're just interacting with so many things. It's not like monitoring a database in your data center. There are just dynamics that you can't imagine. So thinking in systems and training people to think in systems I think is really, really important.

Banyard: We hire a choir, so give me your resume now. We are looking for all types of people, whether it's in robotics, programming, regardless. But also, we've been, obviously, you've probably seen that we've acquired quite a few companies, whether it's a company that does spatial comfort. We've gotten groups that just do, team that do software that kind of analyzes a lot of information. But within our own R&D team, we do a number of R&D explorations through creating our own software. So we have a whole full stack team that does a lot of that technology development.

Lau: So Dusty Robotics is actually hiring. Thanks for the plug. It takes a village to build a robot. We look for people with mechanical skill sets to actually assemble the form. People with electrical or EE skill sets to actually build the systems and glue them all together. We look for roboticists who can actually make a move. We look for systems integrators who can actually write code that talks all the hardware bits, and make them all work together. And we look for application-level people to create front-end beautiful, delightful interfaces that actually let our customers enjoy using the product.

Williams: I'm [inaudible 00:38:23] before you guys.

Shoup: I don't know if this was the intent of the question, but I also work for WeWork. We're also hiring. I'll just double down on what David said. I'll be quiet after 30 seconds, but you have to indulge me after this great day. So yes, we have actually 600 software engineers around the world headquartered in New York. Most of the engineers are there. A decent sized team here in San Francisco, also Tel Aviv, in Israel, and in Singapore. And basically, you can imagine any kind of software that you would build in other places, we have some aspect of that from building to web apps, and mobile apps, the whole gamut back office, front office. Anything you would, do we do that. So yes, come talk to me afterward.

Testing at Scale

Participant 5: How can you test at scale where there are hundreds of robots before sending this code to them?

Lau: It's very hard. I don't have a good answer yet. I'm hoping someone in this room can actually develop a solution for us. Simulation is the obvious answer. You try to simulate as much as you can, and do as much testing as you can, unit tests, functional tests, all the low-level tests you can try to do. But the problem is that there is so much data coming in, there is so much input into your system that it can be nonlinear in terms of what its responses are. So how do you test for all of those? Honestly, we did a lot of our testing with customers. I hate to say that, but that's the reality of robotics.

So you do all the testing you can, and then you release it into the wild, and you see what breaks. And you hope that you didn't make a big mistake. I would love to see better environments for doing testing. One of the challenges that we face with simulation is that it's very hard to simulate all of the details about the environment that you're going to be operating in. If you imagine a construction site, it's full of dust, so that means that your robot has to operate on this uneven ground. It's very hard to model that and hard to predict what's going to happen when it encounters it. And so that's one of the ongoing challenges that we have in robotics.

Williams: In a previous thing that I worked on, we had a couple of sites that had 6,000 little low power wireless devices. And I think the right way to think about a problem like that, is like microservices, at the insanity level because each one of these things is operating independently and communicating with each other. So all the things that we would apply to reasoning about microservices environment probably apply here too. Observability, logging, sort of early detection of problems, the ability to sort of safe things off. Like, what's that hysterics thing where you cut off when they overrun? I'm blank on the name? Anyway

Shoup: Circuit breaker.

Williams: Circuit breaker. Thank you. Yes. So when we're running a lot of distributed systems in the cloud and they're at arm’s reach, the problems are actually kind of similar. It's a little bit hard to reason about that from your laptop. So you really have to build all these systems around how would you reason about that when it's out of your laptop, and it's running, say at Google computer, AWS? And my experience is that when you scale that up for small devices that are physical, the same things apply. And it's hard. It's just really hard, as we know.

Breck: I think you can learn a lot just from the way we deploy software in the cloud. I think it's very similar. You can still do Canary deploys, you can do A/B Testing, you can do phase deployments. I think that all still applies to physical systems. I think it's really important to embrace failure as part of your design, because things are going to fail. So, don't make failure exceptional. It's part of the design, embrace it. It's going to happen. And I wanted to say one more thing, but I can't remember what it is.

I think, which has already been mentioned, as these systems become more complex, especially at scale, I think we're going to start to take a more empirical approach to answering these kind of questions. It won't be like, “Is the system working or is it not?” Because it's always going to be partially broken, and we'll operate these systems a lot more like we would in the physical world, actually. It'll look a lot more like operating the oil refinery or something like this, like a physical process. And we'll take empirical approaches, actually, to determining the health of the system.

Lau: I'll add another story to that about testing in the real world. You can design your system to work as perfectly as you can, speaking of embracing failure, but you can't conceive of everything happening to you in the real world. And one of the things that happened to us was that we had one of our robots out of, I don't know, maybe 60 that we had out in the world, one of them at the time, one of them would just randomly reboot during operation. And luckily, we built it so that after reboot, it'd come right back up and keep doing what it was supposed to do.

And that was an example of how you program it to be tolerant and graceful in the case of failure because you don't know what's going to happen. So we brought that one back in and we tried to figure out what was causing it. Was it a hairline fracture in the motherboard? Was it a loose connection somewhere? We never actually figured it out. We put a new robot in the same environment, and it also started failing in the same way. So it must have been something in the environment, but you can't predict everything that's going to happen. You can't simulate all those things in simulation, so you just got to make it tolerant to anything that you can throw at it.

A Surprising Thing

Shoup: If you guys have, each of you go in any order, but one super surprising thing, or maybe one thing you wish you could tell your five years’ previous, 10 years’ previous, 20 years’ previous, thing that you know now that you didn't know before. Something that was, yes, maybe surprising, or maybe whys that you wish you could like give back to your previous self?

Banyard: Well, I think that I have to be more open to things that I'm not thinking would be applicable to my situation. So very often, I'll get thrown something that I'm like, "Well, how am I going to use that in my situation?" Or, "How is that going to help us do anything and what we're doing, trying to get a building built?" And so, I've learned to just try to take whatever I'm given and find the value in it to possibly get me to our end goals. And maybe it's not an end goal that I can see, but maybe it's something that I have to be looking for so that I'm not just like, "Oh, I need this result, so I'm going to use this thing." I have to say, "Well, I'm going to use this thing, and maybe it'll give me a result. And maybe it won't, but I'm willing to explore it."

Breck: I've already said the first two that come to mind: systems thinking and designing with failure in mind. So I'll try and think of something else.

Shoup: Maybe a surprising thing that happened.

Breck: There are lots of ideas for, say, scaling. These systems are architected in interesting ways. It's the cost-effectiveness of that at scale, especially when dealing with IoT. That often becomes a real concern. And so it's pretty interesting, you can architect a system. It works. It's reliable. You're pretty happy. And then if you want to sell a million of those products or 10 million of those products and you do the math, you're like, "This is …" We just don't have a profitable business at that point. So I think that can be pretty surprising to people at certain times, that the cost-effectiveness of your design is actually going to dictate your system in large part.

Lau: Interesting.

Shoup: Systems thinking, plan for failure, do the math.

Lau: I'll add one thing to that. One of the surprises that struck me as we were developing Savioke was that you'd assume that a lot of your, almost all of your resources are going to building the physical robot. That's the thing that people see. It's the thing that's driving around in the world. But I would say that that's only about half of the problem. And the other half of the problem is, how does that unit interface with the rest of the world? And we actually spend a ginormous amount of time developing all that infrastructure. So that was anything from fleet monitoring systems to customer support tools, to customer-facing web applications, to technician-facing web applications so we could monitor and control our devices. We spent an inordinate amount of time developing all of that stuff, even though it's not the robot, but it's actually really important to actually getting the system to work in the real world.

Williams: For me, the most surprising thing was caused by myself. So like many of you, I'm like, "All right, I'm going to rip into Onvoy, and I'm going to change the Onvoy as your two file. And I'm going to, like, proxy, you know, a gRPC web, no problem." And usually, we'll succeed. But with physical things, sometimes that hubris can get you in trouble. And I have this Broadcom magnetic rotary enclosure that sits on top of the motor shaft. And theoretically, it's 16-bit so it should be able to give you, you know, 0.01875 degrees of motor shaft detent.

And I know centers are noisy, so I have a lot of experience with this. But I wasn't ready for the fact that motors are a flux field of evilness when you have a magnetic rotary encoder. And a simple PID control loop isn't going to cut it because, depending on where the motor shaft is, and what the flux field is in the inductor, the magnetic environment is totally changing. So I think it's a classic example of things seem really simple, like, "How hard could it be? It's a motor, and it's a cutter, and it's a center.” And then you open it up and it's like, "Oh, it's the natural world and physics applies, and now I have to solve this crazy problem, which mostly got solved by switching the algorithm."

See more presentations with transcripts

Recorded at:

Feb 01, 2019

InfoQ Software Architects' Newsletter