InfoQ Homepage Presentations Multi-Modal Input Design for Magic Leap

Multi-Modal Input Design for Magic Leap

View Presentation

Speed:

Download

47:53

Summary

Colman Bryant talks about what types of new input modalities are coming online and how they can be used and combined in different ways to surpass existing approaches in terms of throughput, discoverability, accessibility, and prediction with stories and examples from Magic Leap's Interaction Lab.

Bio

Colman Bryant is currently Lead Designer for the Interaction Lab at Magic Leap, an internal incubation team that explores interaction paradigms and prototypes new products and use cases for Magic Leap's Mixed Reality platform.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

My name is Colman Bryant and I am the design lead for the Interaction Lab at Magic Leap. I'm here to talk to you today a little bit about where we think the future of HCI is headed and, specifically, how we think we may be moving the ball forward in terms of introducing new types of inputs and modalities that are focused around sort of natural human-computer interaction.

A little background on me. I worked in video games before I came to Magic Leap. I was about 11 years in the game industry, previously a game designer. It was my childhood dream. I got it. I worked on almost the full breadth of what game designers can do. So, did a little bit of everything, started with some prototyping for educational games. Had a little brief stint at a lottery game company, not my personal favorite. Did some level design for an MMO, systems, AI, inputs, quests, social systems, competitive gaming, casual gaming, you name it. And I've found that at the mobile incubator at my last company, Hi-Rez Studios. After that, I was kind of feeling like I was beginning to repeat myself a bit, and I wanted to look around a little bit and see what was new and what was coming in computing and in different types of new mediums.

That's about when I heard about this little secretive startup company called Magic Leap down in Florida. I went down, did an interview, tried their device, and I was blown away at the potential of what this new wearable computing platform is that can put content in the world around you. So, that's when I started. This was about two years ago, and I've been leading R&D and prototyping in the Interaction Lab since then, seeing what we can do with our different types of inputs.

What is the Future of HCI?

What is the future of HCI? This is a very big topic, where do you even begin? It's kind of like asking, "What can you do with the computer?" Tons of stuff. What can you do with the computer in the future, a future computer? Depends on what that computer is. Before we really dig in deep, I'll just make sure we have a couple definitions here. Who in here is a designer? One in the back. All right, cool. So, yes, we're going to do just a little quick overview, some terminology, make sure we're all on the same page.

Some fundamentals. HCI, who here is familiar with what that is? Okay, more people, good. At a basic level, it's the study of how humans interact with computers. This probably seems a little bit obvious, so we'll dig a little deeper. It's part science, so it's sort of the mechanics of how this happens, the mechanics of interaction informed by some sort of a purpose, a goal that you're trying to work towards, and adapted to our physiology. We have to be able to actually interact with the computer's interface. It's also part art. We're appealing to human senses, human emotion. We want to make interfaces that are understandable, that are relatable, that are not hostile, not offensive. Ideally, they're easy to use and pleasant. Art with purpose is design, and this is what I work in.

To really understand what HCI is, we also need to get a little bit deeper down into it. A prereq for HCI for the interaction part is I/O, input/output. Think of it kind of like the physics of interactions. It's like Newton's third law. You provide an input, you get an output. You can't really have interactions without this kind of a system. The human puts something in, or outside world, the computer responds. Just a real quick breakdown of that: human provides input, computer churns on it, does stuff, provides response, and then we call this a feedback loop. So this cycle continues as you go, and it's up to the computer or the environment how to react once it gets that feedback back.

What's the future of I/O then? To understand where we're going, it helps to know where we've been. A couple more quick definitions here. There's this term affordance. This is a word that gets thrown around in design circles a lot. It's really important to what we do. An affordance is the properties of an object that reveal how that object is used. For instance, you can see here on the left, we have a row of elevator buttons, and a button has a certain type of affordance. There's a little mechanism that you can press. It depresses when you push on it. Maybe there's some sort of a light or something around that that provides feedback that you've given input. And, from there, you can tell that you pressed the button, you're going to the right floor.

In this case, you have good agency. Agency is another design term we use a lot. It's sort of how tight the pairing is between the input and the output. So it's that good feeling you get when you push a button and the light comes on. You push a button, the light doesn't come on, you feel like you're not sure if your response actually happened. That's bad agency.

Here's another classic example of an affordance. So this is the doorknob, right? Here you can see, we have this handle. It lends itself to being grabbed. It kind of fits in the hands. It's cylindrical. There's a lever on it that you twist. So you can't do much else with it at that point. You grab it, you twist, and then you hear a click, and then you can open the door. Again, good agency, clear, understandable use case. You don't have to train somebody how to open a door. It's just kind of inherent.

Another thing, just to make sure we all are on the same page: input modality. An input modality is a distinct channel of input for a system. That could be something like voice or a microphone that provides an audio feed in. It could also be something like a keyboard that can send different ASCII characters across the pipe.

A Brief History of HCI (and I/0)

We've got our definitions. Let's take a little bit of a quick look at where we've been so that we can understand where we're going. We're not going to go all the way back to these mechanical calculation machines. We're going to start at lke a programmable computer level. All right. So this is largely considered to be the first programmable computer. This is called the Colossus. This is from 1943. It was built by the British, and it was used to decrypt German communications during World War II. It probably, debatably, helped the Allies win the war, or at least helped them do it faster and with fewer casualties. And it's super powerful to be able to know where your enemies are going, where they're going to be, where you need not to be, where they're vulnerable, where the supply lines are. It was aptly named the Colossus. It's a giant metal, not very portable hunk of machine. It's a total beast to use. As you can see, it's kind of a monster. Where do you begin. You tell somebody, "Hey, go decrypt a signal." They're going to look at you crazy.

The inputs for this- you can see on the left-hand side, there's a bunch of these little switches and relays. That's actually the computer itself. It's not really programmable in the sense that you are writing code for it. You're more manually reconfiguring where the electrons flow. That's a very low-level way of interacting with a computer. You can also see on the right side, there's another input. There are these spools, and those were actually feeding reels of communications in that are being decrypted. Then the outputs are these little indicator-like panels that somebody has to actually stand there and watch, and manually capture what's happening there and make sense of it. The usability is very poor. You might be able to flip a few knobs and switches, but how that relates to how the overall system is going to work is not very clear.

Moving forward a little bit, here's another large room scale computer. This one's called the ENIAC. This is from 1945. And this is the first Turing-complete general purpose computer. Again, it's a big room scale thing. It's a mess of wires and cables and switches and blinking lights. The press at the time called it a Giant Brain. It probably took a giant brain to use it. Again, not very user-friendly. Where do you begin using this? You had to be specifically trained to use it, or one of the people who built it. Only a handful of people really could. The affordances were not discoverable, and it wasn't really ready for primetime. This is not a home computer by any stretch. Just a quick note before we move on, these last two images, if anybody ever says that women don't belong in tech, I'd like you to show them these first two images, because these are actually the first computer programmers, and it was actually considered a profession for women initially.

Here's the ENIAC again running a more complicated program. You can tell because there's more crap hanging out of it. All right. So we're starting to move forward to Teletype machine's route. This is actually starting to get a little bit more relatable to the modern computer form factor. Here you can see a bunch of people sitting there with their little keyboards, typing away. On the shelves in front of them are these punch cards. These are actually the early computer programs. Your input is keyboards, you use that, you punch these cards in a certain way, you create pieces of programs, and then those come out of this computer. And then you feed those into a different machine that does processing, and then it outputs other sheets of paper.

Here's a compiled program. If you want to know where the word compiler comes from, it's a stack of cards. And fun note, some of these machines were used for the Manhattan Project originally, for running simulations for that, and their program files were about a million cards thick. Imagine trying to debug that or, God forbid, you drop it and have to put it back together.

When your inputs and your outputs at the program level are sheets of paper, what could possibly go wrong? The computer can actually eat your program. There are a lot of pain points here. It's tedious, it's error-prone. If this happens, you then have to retype that sheet of paper. If you mess up one character, you have to start that whole sheet again, because it's all just punching as you go. The mapping from input to output is very unclear. So this is not really human readable. This is machine code basically that they're writing in. You still had to be a very trained expert to use despite the keyboard.

Who here recognizes this one? Yes. So this is probably the first computer for a lot of people here. This is the next real major milestone in computing. This is the Apple II. So this is 1977, and it's a big advance in HCI. Here we still have a keyboard as input using real characters that people can read, so it's not switches and abstract knobs. And then, now, we have this improved output, the monitor. So it's actually starting to show some text that's human readable. You don't have to be trained to use this. Also, the form factor fits on a desk. So this isn't something that you have to put in your room, it's not something that you have to borrow time from at a university. This is really the first real step towards democratizing computer use and bringing it home.

Meanwhile, in another thread, there was a fellow named Douglas Engelbart, and he was experimenting with this other new type of input. He called it a mouse. You can see here on the left, this is his first prototype. This is the Engelbart mouse, and it's a game changer for HCI. It used a very intuitive one-to-one mapping between physical motion and a cursor on a screen. We're starting to become more human as we go with these inputs. You can see there, there are some little wheels that it can use to track its motion, and then there's a single clicker, and then there's a chassis that you can hold on to.

So, the mouse evolved. This is actually the first mouse that I ever used. This is from the Apple Mac Plus. You can see here, it's fundamentally the same thing. It's a chassis with a wire and a little clicker, but it's getting more usable. The button's a little bit bigger. It's a little bit easier to press. The form factor is a little bit tighter. It fits more naturally under the hand. The edges are starting to get more rounded off. Continuing into the future, this is a modern mouse. This is actually the same I used at my last company, and it's basically still the same thing. There are some more buttons, it's a little more programmable, there's a scroll wheel now, but it's basically the same thing, just starting to fit a little bit better into the human form factor. The ergonomics are continuing to improve. And this is actually a little bit of a preview of where we're going with this future of HCI.

After the home computer was introduced, we start to see this explosion of new types of computing and new types of ways of interacting with them. Does anybody know this guy? Atari 2600. This is the first video game console for home use. You can see here, its input is a little controller with a joystick, with two degrees of freedom, so X and Y, and it has a single button. And then the output is a color TV. You plug it into that, it provides images and actual graphical form on your screen, and it can play audio.

Here's the next step in home gaming console. This was actually my sibling growing up. It's why I'm here today. This is the NES, the Nintendo Entertainment System, circa 1985. It also had a wired game controller, with two degrees of freedom D-pad in this case, and it had multiple buttons. The addition of these extra buttons started to allow some new things to happen. Instead of just moving a character and having a single action, you can move a character on the screen, and you can make Mario run by holding B button, and then you can press A while he's doing that to jump. And so we're starting to see that more things are becoming more possible as we're adding more input channels. Then the output, again, color TV, slightly better color, slightly better audio.

Here's a weird one, but it's worth calling out. This was called the Power Glove. This was another attempt by Nintendo to make something kind of cool and novel. Unfortunately, this one didn't really catch on. It was a little bit weird, a little bit niche, but you can kind of start to see what this underlying desire was. This is really one of the first times that somebody made a consumer input product that actually fits a user, like a glove, literally. You can see here, we're starting to try to wrap the computers around the user. We're not just interfacing with a giant wall of buttons anymore. The computers are becoming part of us.

Here's another interesting milestone on the continuum of computing. This one's called the Virtual Boy, 1995. First attempt to democratize home VR. It wasn't a great product. It had plenty of issues. The inputs are a standard controller with a little bit weird feeling, but the outputs are doing something interesting here. So this is actually a stereoscopic view. It actually simulates depth. As you look through it, you see content at different places in the world, and that's where, again, we're starting to wrap the computer around the human, but from an output standpoint. We're trying to make it a little bit more comfortable to see your content and to feel immersed in it.

All right. A modern game controller, so you can see the inputs have exploded. It's now got dual analog sticks, multiple buttons, triggers, bumpers, a touchpad, and it's all wireless. Again, more portable, more accessible, easier to use, more possibilities. Here's an old style touchscreen. So this is an interesting thing of note. In this case, we're actually starting to merge the inputs and the outputs. You are directly interfacing with the output device to do your inputs. Again, trying to make this more intuitive than before, less abstractive mappings. These earlier ones are kind of lower res, they only do a single point of contact, but they're an interesting proof of concept for physical interactions with a computer using your hands.

Here, we start to reach the modern era. Here, we have an iPhone. This is an iPhone 4, which is the first one that really sort of broke out into large scale consumer use. This is an interesting device. It has a multitouch screen. Again, the input and the output are the same in this case. It also has other sensors in there. There's a gyroscope, an accelerometer, a compass, a mic. You can start to capture new types of things, and that enables all kinds of new experiences. Many, in fact, that we can hardly even imagine living without these now, and using them for navigation, using them for calls, all kinds of other things. The output is the screen, the speakers, and it has haptics. It can actually call your attention to it, which is important for a device that sits in your pocket. As far as democratizing computers, this was a huge step. The iPhone and the Android have now exploded, and now, give or take, roughly half the world's population has access to a computer that they can carry in their pocket and has a huge amount of functionality. And that's really significant for where we're going.

Can You See the Trend?

We've talked about how we got to where we are. Can you see the trend? Basically, over time, as time goes on, accessibility increases. It's worth noting, this increases your potential user base. The less you have to train people, the more intuitive things are, the more accessible they are, the more people can use computers. That means the more people can benefit from digital technology. Also worth noting, as we get further along this curve, our job gets harder. We have to think more about how people are going to interact with these computers, how to embrace their natural intuitions, how to make things feel natural. We can't assume, as a foregone conclusion, that somebody's going to come in fully trained, knowing how to flip all the switches on a giant board. Also, our engines and our tools need to get better. Can you imagine trying to program for the iPhone with punch cards? No. Cool.

So, that's the future of HCI. Thank you for coming. I think we have time for one or two questions. No, just kidding. We haven't gotten to the cool stuff yet. A couple more things to note before we really dig in. Bad design actually has consequences. Who can tell which direction this door opens just from looking at it? This has probably happened to everybody in this room, and if it hasn't, you're lying. But, you walk up to the door, it's got this ambiguous handle, it could be push, it could pull, it's symmetrical from both sides, there's no other real clues to tell you where the latches are and where it's going. And this is a classic example of a bad user interface in an everyday context. In this case, it's more of an annoyance. The worst case scenario is that maybe you bump into it or you spill your coffee or your papers go flying everywhere, if you're me. But it does still have impact, and this is still a common problem. I run into these kinds of doors every day, wherever I go.

Here's another example of bad design with a much bigger consequence. Does anybody recognize this picture here? Yes, this is Three Mile Island. This is the site of the worst nuclear disaster on U.S. soil in history. Why was there a nuclear disaster here? Well, here's their control room. So, as you can see, it's a giant, huge gross-looking room scale computer, with a whole bunch of computers and monitors and switches and knobs all over the place. The usability is poor.

What happened here is basically you have a staff who is trained, maybe not as trained as they needed to be, and you have a computer interface that is bad, very unusable. And there was a small issue that happened in the secondary system. A little valve got stuck, and it didn't go noticed at the right times by the right people. The feedback loop was broken, the output from the computer was not clear. There was a little blinking light in the corner, and they didn't see it, and so they did the wrong thing. What ended up happening is they flushed a bunch of radioactive waste into the surroundings of this area, and it had to be evacuated and cleaned up, and it caused a big old mess.

We have a confusing UI, compounded by lack of poor training, and that allowed a problem that was very small to go way too far and cause enormous repercussions. This is an example of how good design could have potentially solved this problem. Maybe your industry might have its own Three Mile Island setup as well. Do you have a control room like this? Is all of your technology usable? Is everybody trained adequately? Are you going to potentially hit the wrong switch, or not see something and lose everybody's data? It's totally possible. So usability matters. It matters because we want to democratize access to computing. It matters because we want to minimize errors. We don't want to accidentally release huge amounts of radioactive waste into our environment. Generally a bad thing.

Magic Leap to the Rescue!

And so, that's where Magic Leap comes in. Our mission: we want to harmonize people and technology to create a better, more unified world. We want to democratize computing so that everybody can use it. We want to embrace natural inputs and natural human tendencies. We want to make computing mobile so you can take it with you. You don't have to be locked into a little box or tied to a wall. We want everything to be comfortable and ergonomically friendly, to play with your senses in a natural, approachable, comfortable way. And we want to be ethical, and we want to keep the humans at the center of the experience and build the computer around them.

This is Magic Leap One, or Magic Leap Creator Edition. This is our first offering to the world. We just shipped it this summer. Think of it kind of like a DevKit for the future. So we call this a mixed reality device. Some people call it spatial computing. What can this do? What Is and Os can it support? Well, first, let's make sure we have our definition of mixed reality correct. So this is a graphic that you will see in a lot of XR circles. XR is the larger augmented reality/virtual reality/mixed reality community. This is called the reality continuum. Mixed reality here is on the right, virtual reality is on the left. Some call mixed reality spatial computing. I really don't like that term. I consider that more of a class of mediums, where mixed reality is one of those. But I actually hate this chart, because it's kind of misleading. It implies that all of these things are on the same axis, and there are actually some nuances that are worth calling out.

Just to be clear, here's what I think is a better way of thinking about this, two continuums. So you have one that's virtuality to reality, and you have another that's non-spatial to spatial. So our old school mediums, like our personal computers, are highly virtual and highly non-spatial. So all of the content lives inside the box, it has no concept of the outside world, and it doesn't matter where you put your computer in your room, the only thing that changes is maybe your seat and the view behind you.

Here, we have virtual reality. It's also highly virtual. All of the content lives in the system, it's not aware of the outside world, but it's starting to get spatial. You are now in the computing experience. The content can be around you, it can be above you, it can be behind you. You can move your head around, and it can track. And that's sort of an interesting thing. Then augmented reality is more on the reality side. So the reality is the backdrop, the content sort of lives on top of it, but not in a super smart way. It's a little bit spatial, but it's not very environmentally-aware. It's more static, more low level. We're starting to see that improve with things like ARKit, ARCore, but it's still not quite the same.

And then, here, we have mixed reality. This is Magic Leap. We are heavily based on reality. The world is your backdrop. In game terms, the world is your level. We're also heavily spatial. The content can exist around you. But we're also very smart about it. We're context-sensitive. Not only is our content living in the world, it's actually aware of the world. We're creating digital content that follows the rules of reality.

Here's an example. This is sort of what Magic Leap is doing under the hood, as far as its world understanding. It's mapping your surroundings, it's finding feature points around, so little high-contrast areas, and it uses those using some computer vision and deep learning to track how you're moving. It can see in parallax how you're moving around a room by tracking these points. Then, on top of that, it's starting to create a depth model of the world. This is figuring out where empty space is and where solid space is. And then, once it figures that out, it starts to skin that with a mesh.

So it's actually creating a 3-D mesh of your environment, and so this is how we're pulling the real world into the digital so that we can have our content actually be smart and be aware of where it's placed. It's finding planes and surfaces, and it could even recognize objects. It can tell, "This is a table," or, "This is a chair," or, "This is a cup." So you can start to be really smart about how your content interacts with the world. Once you have all of this, you have a mesh and everything, you can put content on surfaces, and you can even put it behind surfaces. You can have a character jump behind the table and you no longer see them, and that's pretty amazing. So, cool.

Outputs

That's what it is, how do you use it? How do you interact with this thing? What are the Is and the Os that it supports? How is it progressing HCI? I'll start with the outputs, actually. So, our outputs, we think of them as fields. It's not just a single thing. We're actually creating spatial outputs. That's important when you're making a computer that's more tightly paired to your senses than anything ever before. Confused senses equals a bad user experience. It's one reason that a lot of people have nausea when they try a VR device. For instance, if you're on a cruise ship and you're below the deck, your inner ear is getting all this rocking motion, but your eyes are still getting this kind of static environment. And so, that creates a mismatch and that makes you feel nauseous. That's why they tell you to go above board and look at the horizon when you're on a ship feeling sick.

You can have the same kinds of things with wearable experiences as well. VR does that. If your view is slightly off, it can make things swimmy, and you can get a little bit pukey, and that's not very fun. At Magic Leap, we're aware of all this. We want to be ethical and we want to be physiologically conscious. We want to make sure that the computer is bending to accommodate its users and respectful of their senses, and integrating with them in a natural way so you don't have these kinds of bad side effects.

Here's an example of how we do this on the rendering standpoint. We have this thing we call lightfields. We make sure that the content in the world around you feels like it's in the right place in the world around you. Quick example, this picture on the left, you see this flower here. It's in focus and the background is out of focus. The reason it's in focus, if you're actually looking at this with your eyes, it's like when you pull a finger in close, your eyes cross to track it. This is called vergence or vergence-accommodation.

So what Magic Leap is doing is it's actually tracking your eyes, and it's figuring out where your vergence is, and it's making sure that content is on the right depth planes. We're actually rendering content on multiple plains, and we're doing this in a way that integrates seamlessly with the other light that's coming in so that we can put content on this table and it looks in focus with the table, or I can put it at the back of the room and it's in focus with the back of the room. And that way, that makes the depth more believable, it makes me be able to target better, and it reduces the chances of me having eye strain.

We also do this with audio. On Magic Leap, we have soundfields. We called it spatial audio. So we can actually make a sound sound like it's anywhere in your surroundings. Think of it kind of like if you go to an IMAX movie. When you first sit down, they do their little sound demo, right, and they play the little like rinky-dink band that's in mono. Then they expand it to stereo, and it starts to sound like there's a little bit of depth. And then they expand it out to full surround sound, and it's like starting to sound rich and like it's fully immersing you. Then they throw it all in and do the full IMAX, like sounds all around you. It sounds like there's a band and you're sitting in the middle of it.

We can actually do this on a Magic Leap device using soundfields. We have a speaker array, and it's basically blasting the sound in in a way that simulates depth. If you want to make something sound like it's up here, you can do that, and you can use that to guide user's attention. If you want to call someone's focus, you can play a sound and call their focus. But you can also use it just to make sure that the sounds feel right, that all of the senses match up. So, if I see something over there and I hear it from over there, then I know it's over there.

Input Modalities

Magic Leap can place content in the world around you in a convincing way that fools your senses and blends naturally with the physical world. That's awesome. That's a really good O. But what about the I in the I/O? How do you interact with it? What makes it more than just a glorified 3-D video player? Magic Leap actually supports a wide range of input methods. We have a bunch of different modalities. So these are all the ones that we support natively. You can see a wide range up here, and I'm going to do a little deep dive on each one of those, and we're going to talk about what they are and what types of use cases they can be used for.

The most fundamental of these, we call it headpose. This is kind of something you get for free if you're making a wearable HMD that can track its movement in the world. If I can track my motion, then I know where the headset is. If I know where the headset is and where it's facing, then I can shoot a vector out from there and I can do stuff with that. For instance, here's a very simple example of a couple pieces of content in the world. A user is looking back and forth between them, and the content that is currently the focus of that user is being highlighted. So, that's just a very low hanging fruit use of headpose.

I mentioned this - we track your vision. We want to make sure that we are putting content at the right depth plane. So we have eye tracking. Not only are we doing this for making sure that the rendering looks good, but eye tracking can also be used as an input. Here's an example, just a very simple prototype of an array of cubes. The camera represents their headpose, you can see sort of where their head is turning. And then there's this little blue dot that's actually highlighting the cubes, and that actually represents where their visual focus is from their eyes. So it's actually doing a ray cast there, colliding with cubes, and picking one based on that. That's really powerful potentially for targeting mechanisms.

We also can do blink detection, or detect if your eyes are open or closed. Here's a little simple example on the right. This is from a Magic Kit example. Magic Kit is some of our sample projects. I'll talk a little bit more about that as we go. But this is one called Blink, it's coming soon to a developer portal near you. It's intended to show you how to use our different eye tracking systems. So, here, we have a cloud that you can look at, and it'll grow and shrink, but also, every time you blink, it changes colors. That's actually something that's pretty interesting. So there are use cases where you might want to do things when people blink. You might want to swap content out, do a little smoke and mirrors potentially. Or you might want to detect that for some reason.

What can these types of inputs be used for in real-world use cases? Potentially, training. We go back to our Three Mile Island example. If you had been able to highlight that button that was flashing in the corner and lead the user to it and know when they're looking at it and then pop up a little instruction saying, "Hey, fix the valve, it's stuck." That'd be pretty powerful. Or for instance, if you could lead an actual maintenance person down a hall to that and call it out, also very powerful. You could also use this for habit training. So, somebody who maybe is a little bit socially awkward, they want to train for an interview, and they maybe have issues maintaining eye contact, you could have them do a training session with a virtual character, and then you could count how much they're actually looking at that character while they're doing that interview. And then you could score them on that, or even give them prompts to say, "Hey, look back at the person who's talking to you."

You could use this in medicine and wellness. Earlier this year, my mom had a stroke, and she now has a blind spot where she didn't before on her right side. I'm actually really excited about some potential use cases for this where we could help her become aware of that, maybe work around it, or maybe even retrain her vision to be able to see it again. This is a really powerful tool. Eye tracking is really awesome, very excited about it.

We also have this thing we call control. This is our physical device to interact with the world. Think of it like a tool that can reach out and touch digital content. So the control uses 6DoF tracking or 6 degrees of freedom. It tracks in position on three axes, X, Y, and Z. It also tracks rotation in three axes, X, Y, and Z. So you can literally know where this thing is in the world, and it's completely wireless. There's no outside lighthouses or anything. It's all done using electromagnetic fields based on your device. In addition, it has a touchpad which you can use for fine controls or for micro controls. I'll give some examples of that. There's also a bumper and a trigger and a home button. So lots of options for things. You can use these for grabbing, for instance, reach out, touch content, pick it up, place it over here, rotate it, whatever you need, shine a laser pointer across the room.

Here's an example of this in use. This is another Magic Kit example. This one's called Drive. So, in this one, you pilot different vehicles around the room. Here, we have a little helicopter, and you can see- and this is playing in slow motion, it's playing in bullet time, but that's fine- there's a little helicopter that's flying around, and as the user rotates the control, the helicopter pitches and leans in that direction. So, that's how you control the flight. Then you can also see, the user can reach out and give a little swivel on the touchpad to redirect it while they're doing this. So it's an example of how even one input like this can use multiple modalities to accomplish some interesting effects. Here's another pure 6DoF example. This is using the control like a laser pointer, as I mentioned, so reaching out, pointing, grabbing, selecting. All right.

Well, also worth noting, the control has an LED halo and haptic vibrations as well, so there's actually some output there. There's a little light on it that lights up. You can use that for providing different information. And then haptic feedback, kind of like on your phone, can be used to call attention to things or potentially to give you feedback when you've touched a virtual object, so you know that you've touched it.

We also do hand tracking. This is really powerful. Hands are the original input tool for humans. It's defining for our species. We rely on them for nearly everything we do. So to be able to have your hands physically interact with digital content is groundbreaking. There are two aspects of hand tracking on Magic Leap. So the first is feature tracking. This is what you see here on the left. Your hand has a number of feature points that we can track, which are associated with your joints, center of your palm, base of your hand, etc. And so we can actually track these as you move, we can attach content to them, we can know which fingers are touching which things. We also have these things called static hand poses that we support. These are more kind of like a discreet way of interacting with your hands. For instance, you can use these for simple selections. So, while you're targeting some content, you can do a little pinch gesture to select it.

Here's an example of this in practice. Here we have a little hand rig built by tracking these feature points. And a user is shooting a ray from the base of their hand through the center point between their fingers to do targeting. So you're actually targeting it like a little hand laser. That's great for little simple things, yes, and for grabbing, it could be used for direct interactions or for targeting things across the room. In this case, it's actually combining things. So there's a hand rig, which we're using for targeting, but then, also, the pinch is what we detect for actually firing these particle effects and doing that kind of discreet click motion.

Here's another example of hand tracking. This one actually has a fun story with it. This is an example called Dodge, another Magic Kit thing. Basically in this experience, you place a few little points around the room, and this little Gollum pops up, and he's a really hostile guy. He just throws a rock at your head. This is actually the first Magic Kit lesson, and it teaches you about some different things, like placement, how to respect the user's personal bubble, and it also teaches you a little bit about hand tracking. But it was originally more of a headpose focused lesson, but we do daily playtests in my team. So we iterate quickly. We try to get to the core of what's good as quickly as possible. We don't do this ivory tower design.

One thing we noticed while we were doing these playtests is we'd have new users come in, and they would do some things that we didn't really expect. We wanted people to either get hit in the face or to dodge. And it was cool with the Dodge because we could play some spatial audio. It's really immersive. You hear this rock flying past. It plays the whoosh. It's awesome. But a lot of users in practice, when they played this experience, the first time, they'd take the hit to the face, and then the second time that little guy pops up and throws a rock, they'd decide they don't want that. And they would just hold their hand out to try to block it. Initially, we didn't support that. The rock would fly right through their hand, pop them in the face again, and they felt cheated.

This was something that we decided to embrace, because, again, we're trying to make the computing experience more about the human. For that, you need to embrace natural human tendencies. So, in this case, we just figured out, "Oh, lots of people want to be able to block the rock with their hands. We should let them." We enabled hand tracking. We put a collider on it, and if you hit any of those feature points, basically, the rock gets blocked. Here's another real quick example of how you can use hand posing and hand tracking in a personified AI environment. Again, you could have a character react to you when you wave at them.

We also have mic and speech support. Speech, I think you all probably can see some value in that if you have an Alexa or if you've used Siri and things like that before. On the Magic Leap device, we have both inward and outward-facing mics. Again, we're even detecting sound in a field format in this case. Speech is a very powerful system. You could use it to tell your computer to do something, to tell an AI, have a conversation with the AI for instance. But even just the mics are also pretty interesting. Imagine a training scenario where a user is walking around, and you're recording everything they're doing, including what they're saying. Then you can play that back for the next person and have them very quickly ramp up in training. Maybe you don't need a physical trainer anymore, you just need one person to do it, and everybody can do it again after that.

Then there are some other kind of like more subtle things you can do with it, too. You can use these mics to actually detect breath. So we've built experiences before where, when you breathe, when you exhale and inhale, a little flower unfolds or folds back as a wellness experience, or you could use this in your training to monitor fatigue, if a user's breath rate gets high, maybe you tell them to take a break or chill down a little bit, put the gun down, whatever it is.

Then, we also have this mobile companion app that we can use. This is just a standard app you get off the App Store. You link it to your Magic Leap device. It can do some OS platform things, like import your images into your Magic Leap gallery, for instance. But it can also be used for more interesting inputs. Let me show you a couple examples. So, here, we're using it for text entry. It has a keyboard. Text entry is kind of an unsolved problem in mixed reality, so we still haven't found that Holy Grail of text entry that makes it really awesome, that's comparable to modern throughput methods. But at the very least, with the mobile companion app, you can be no worse than the current bar for mobile computing, which is your current keyboard with autocorrect and everything else.

There's also some interesting things you can do with that. You can see, at the top of this, there's this little grid, that's actually a little touch area where you can do swipes and gestures, like 2-D phone gestures. You can actually make experiences, and we've prototyped this in my lab, where you can have your phone and, as a user touches it, they can type some text in, and then they can pinch in this area and pull the text out of their phone into mixed reality space. These things are networked together. So you're literally pulling text from one medium to another and leaving a sticky note on your table, for instance. That's a really cool magical experience.

Beyond that, if it can connect, you can use it. So, game controllers, for instance, that little helicopter example I showed earlier, we've recreated a different input scheme for that using a drone controller, a Bluetooth drone controller, and it works just fine. We've also integrated things like a Wacom tablet. One of the examples that we showed at LeapCon, which is our Magic Leap Convention, we just did the first one about a month ago, was an experience where multiple people with these tablets could all walk around and preview an industrial design on a table. So they're all seeing it in the same place, they're looking at it, and they're making little annotations and drawing over it and things. And so, that enables all news kind of use cases in physical space.

Combining Input Modalities

All right. So those are the core input modalities. That's all some really cool stuff, but the real magic even comes when you start combining these in different ways. Here's an example of two things combined. This is back to Dodge, the little rock Gollum guy. So, here, our user is using headpose to place the experience initially. They're defining the points where the Gollum is going to pop up, and there's this little targeter you can see, and there are some like minimum radius requirements and things. Basically, if they find a good flat space where the Gollum can spawn, then they can reach out and do a pinch and define that as a place for the Gollum to be placed. So this is how you can combine headpose for targeting, and then hand tracking and gestures for confirmation.

Here's another example. This is actually one of my favorite prototypes that we've built in my lab. So this is an example where we are teaching users how to speak another language and learn terminology. We're using headpose plus a custom phone app that we built, plus object recognition. What it's doing is it's finding objects in their surrounding, it's labeling them in Spanish, and then they're using headpose to target them and the phone to select the objects that they want to learn. Then, once they do that, we quiz them in different ways, where they have to retype the terms to make sure that they get the spelling correct, or do some simple matching problems. We can even quiz them on the phone as they're out in the world just to make sure that they're keeping up their training and staying on top of it.

Here's another example of speech plus gaze. As a user is saying something, you can combine speech and gaze, and if you can parse the words in the right way, you can know where they're looking when they say certain words. You can literally say something like, "Hey, put that over there." And so, when you're looking at that object, it selects it. When you say "over there," it moves it and it knows where to go. That's a multi-model use for speech combined with user's focus.

These are just some cool things you can do now. As we're continuing to go, we're finding some other cool use cases, too. We can combine inputs to even get to the point where we can start predicting user behaviors and analyzing their intent. So a quick example of that is this water cup. If I want to reach out and grab this water cup and I'm not already looking at it, there's a set number of actions I go through that are very consistent across all humans. You'll see this is very reproducible. I'll do it real quick. Observe, and then, okay. I'll do it again, and pay attention to how my senses are going. I'm using my eyes, I'm using my head, I'm using my hands. My eyes lead as my head is starting to turn, they find the object, my hand starts to reach out, and it grabs it, and then I have my cup of water. What's happening there, my eyes are actually giving away where I'm trying to go with this, on what I'm focused on. So we can actually start to predict what a user is doing. If this was mixed reality content, we could have it pre-highlighted or we could change its state knowing that a user is reaching out and trying to do something with it.

Recap

Quick recap. We began talking about where HCI began and where it's going, and what inputs it originally started as, and how we're starting to see those advance and become more usable. We talked a little bit about why this matters. So we're democratizing computing. We're opening up computers to more people with less friction. We're trying to avoid the next Three Mile Island incident. We talked a little bit about how Magic Leap fits into that. So we're striving to be the next great milestone in human-computer interaction. We're putting the humans at the center of it, and we're building the computers around them. Then we talked a little bit about where we're going from here, some cool new features that are starting to come online, and how you can use these for your own experiences and use cases. Hopefully, by now, you've seen that the future of HCI is actually now.

We hope you'll join us. For more info, if you'd like to learn, I mentioned Magic Kit along the way. So this is our collection of high-end design-focused developer examples. We also have a lot of lower level examples on our Creator Portal. They can help you get up to speed and learn concepts about design and how to code on Magic Leap. We support engines like Unity, Unreal, and we also have our own Custom One. So I highly recommend you check it out. I've written about half of these design diaries myself. There's documentation, sample code, and sample projects. So, definitely a good place to get started. And that's it. Thanks for coming.

See more presentations with transcripts

Recorded at:

Mar 31, 2019

Colman Bryant

InfoQ Software Architects' Newsletter