We’re used to navigating our computing with keyboards, mice, and maybe track pads — analog input. But those inputs work for desktop computers; they’re clunky for XR interfaces. That’s why we need gesture controls ASAP, according to today’s guest, Clay AIR’s Varag Gharibjanian.
Alan: Hey, everyone, Alan Smithson here. Today we're speaking with Varag Gharibjanian, the chief revenue officer at Clay AIR, a software company shaping the future of how we interact with the digital world, using natural gesture recognition. We're going to find out how Clay will bring our real hands into the virtual world. Coming up next, on the XR for Business podcast.
Varag, welcome to the show, my friend.
Varag: Hey, Alan. Glad to be here.
Alan: It's my absolute pleasure to have you on the show. I know you guys are working on some cutting edge stuff, so why don't I not ruin it, and just let you tell us what is Clay AIR?
Varag: So Clay is a software company, we're specializing in hand tracking and gesture recognition, mostly in the AR and VR space. And we're also tackling a couple other industries, automotive. And our third product category we call Clay Control, which is kind of all the devices that can use gesture interaction at a distance.
Alan: Are you doing this from single cameras, or is this from infrared cameras, or a combination of everything?
Varag: Yes, so Clay's-- we're hardware agnostic. So it'll work across all those types you just said. It could be one camera, two cameras, or more. And all different types, so we'll work on RGB cameras that you'll find on everyday smartphones, to what you might find embedded in AR and VR devices, to monochrome ones, time-of-flight ones, and so we're pure software and we've worked across a lot of those different types and have compatibility with most of them now, which gives us a lot of flexibility and it's really useful.
Alan: So I'm going to be able to look at watches on my wrist in AR, right? Like I'm going to be able hold my hand up and see what the newest, latest, greatest watch is?
Varag: It's actually pretty cool that you say that, because that is one of the use cases that comes in often inbound to us, as companies -- it hasn't happened yet -- but those companies definitely brainstorming around how you track the hands even with just a smartphone, like overlaying something.
Alan: We actually did it. We did a project just using Google's hand tracking library. We managed to make the watch sit on the wrist, but it was kind of glitchy. It would sit weird. And yeah, it was-- it was not great, but we made it work, it just wasn't sellable.
Alan: So this is really a foundational software. And I know you guys are working with some of the larger manufacturers. You want to talk about that -- or can you talk about that -- and then what that might look like?
Varag: Yeah, I can speak a little bit about that. So we feel -- like you said -- this is software that really needs to be optimized for the hardware that it's working on. The deeper it is in the stack, the better performance you'll get, and the better synergies you'll get with all the other technologies that are working on these devices. So that's why when I joined the company, really, I made the focus to get as deep into the stack as possible. We looked at the market that time a couple of years ago to look at who is really central to defining the reference stack. What's going to most AR and VR devices? And to me, Qualcomm made the most sense. So we spent a lot of time working with them. As you know -- and some of our listeners might know -- they really do define a lot of what goes into the guts of most AR and VR devices today. And likely in the future, too. So we work closely with them. What that means that from a software architecture standpoint and a hardware standpoint, we try to make our software as optimized as possible for their reference designs. And as a result of that, any OEMs that want to pick up a Qualcomm chip and all the technologies around it, we're really well suited to fit along the side of all those other technologies.
With Qualcomm's 835 -- or 845, whatever -- their new chips are kind of really powering the future devices. I know pretty much all of the standalone VR headsets right now and most of the AR glasses are running on the Qualcomm chips. So this kind of opens up the world of spatial computing. And hand tracking needs to be there. It's kind of part and parcel. And now when Qualcomm's announced their new XR2 chip, I think this is going to really, really unlock it. And it's basically 10x performance right across the board. And so you guys are well-positioned for that.
Varag: Absolutely, yeah. And it's pretty cool. We have every chip that comes out. The best part of that partnership, in a sense, not just the amazing people we get to work with, but also the chips that we get some of those first references sites to our office,and to our labs. And it's pretty cool that we get to experiment with the latest and greatest machine learning models, and try to get the most out of those chips. Which in every single chip that comes out -- which makes sense -- like we get just a little bit more accuracy, a little more points of interest at a lower consumption rate on those pieces of hardware. So it's pretty cool, to see that evolve. It's pretty quick. Even though it's day to day, it moves more quickly than people think.
Alan: I think this whole technology stack of XR is moving. It's actually moving way, way faster than I had ever anticipated. I was looking at 2025 for ubiquitous AR glasses. But after seeing what came out of CES this year, and learning about this Qualcomm XR2 chip, you've now got AR glasses coming out en masse. CES, I think it was 11 AR glasses that came out this year, and all of them in the form factor of a pair of sunglasses.
Varag: What did you think of that, by the way? Because for me, when I saw that and I saw Nreal's glasses, which to me are the closest thing you've got to consumer AR. I saw a lot of companies that were coming out kind of mirroring what they were doing. Slightly different form factors, seem like fast followers in a way. To me that seemed like a good signal.
Alan: I think it's a great signal. I think what will happen is -- like with the VR market -- there was thousands of Chinese knockoffs of VR headsets and all of them have gone out of business now. I think you're going to see this kind of flooding of the market of these cheap AR glasses. And it will come down to things like embedding technologies like Clay into it, and having proper Qualcomm chips in there and in proper chipsets. And it really comes down to having the full tech stack to deliver on the quality that people really want. So I think you're going to see lots of incumbents, you're going to see a lot of people come in and try to take over the market. But these are not easy problems to solve, as you know. And then--
Varag: Especially because every maker has got to print hardware make-up, in a sense. And so it's really about bringing these technologies together in that specific device. I think what Qualcomm is doing and companies like us working together is making it more-- look, I'm really looking forward to-- and this is why I joined the company in a sense. I want that iPhone moment within this industry, where the right technologies come together and a quote unquote, a kind of dominant design is developed.
Alan: With Facebook now working on AR, and you've got Magic Leap, and Microsoft's Hololens.
Alan: So they've kind of set the bar pretty high. The Hololens was an amazing piece of tech and now Hololens 2. And, you know, Magic Leap-- people are going to expect when they walk around holograms, they stay put. They stay rock solid, steady, attached to the real world, which is really gives you that kind of mixed reality experience. But it's not easy to do that. And I think a lot of these companies are like here, "Here's a heads-up display, you can see things in mid-air." But they're not thinking about the tracking. They're not thinking about object recognition. They're not thinking of the full tech stack that's required for real pervasive augmented reality in the world. Then you've got glasses like the North glasses, which just give you a little heads up display, like your Apple Watch display. And I think those are going to have a great place, too. But let's get back to hand tracking, because this is a vital part. And I know Oculus Quest just introduced hand tracking. And I think the Quest -- I just read an article today -- the Quest has sold over half of all the VR headsets in the world last year.
Alan: Yeah, it is-- it's a game changer. And they introduced hand tracking. And as soon as we start to see apps that has that built in, that will be the new standard. That'll be the new normal. Just like standalone, no wires, non-tethered VR is now the new normal. We're not going backwards. We're not going, "Hey, let me connect this to a big computer and cables and stuff." Nobody wants that anymore.
Varag: Right, exactly, exactly. And you know what? By the way, Alan, I think it's already becoming-- you're right, it's not the standard yet amongst users and consumers, but amongst the OEMs. And that's who I speak with a lot. It is becoming like, hey, they want the same thing, ASAP.
Alan: What's the next step for hand tracking, then? You've got-- you're able to track hands very precisely. Are things like midair haptics-- I know-- I guess Leap Motion would be a competitor to you guys, even though they're using a hardware solution to do that. But things like midair haptics, with the Ultra Haptics or the Ultra Leap now. These things are really foundational and interesting.
Varag: Yeah, I think there's still some things to be solved even on basic, marker-less, hardware-less sort of computer vision based hand tracking. So even things like occlusion or field of view and compute, those all need to be even better, so to say. So what I mean by that is, if you take your hands and you go out of the field of view and you quickly bring them back into the field of view, what's that activation time look like? Like, how quickly do you go from not recognizing that hand to recognizing it. And in particular, now a lot of people are using machine learning models, optimizing those models, making sure that there are fewer false positives. There's still some things to be solved. If you have other hands in the field of view or-- and then from a compute standpoint, just making sure it all works and it's not burning the battery. There's still some things to be done there. And that's an ongoing challenge, especially if you're not like a huge multi-billion dollar company like Facebook, and you don't have all the resources they have to stitch that all together in the best way. So companies like ours are getting better and better at doing that and offering that. So there's still some things to be done there, in terms of once that's solved, or in tandem, There's-- you're right, there's some cool things you might be able to do with haptics, because one thing you notice is that as our technology gets better and better, and you really feel like your hands are there, it becomes more confusing that when you actually go to touch something in the virtual world, especially if we're talking about VR, you don't have any of that that sensation back. It's really confusing to your brain. So having something like that -- even if it's a small form of feedback -- could be really, really helpful. Although I'm -- at least for consumer VR -- I'm against adding any extra hardware. I think it's hard enough for consumers to adopt -- for the regular consumer, even the early majority of consumers -- to adopt VR. We don't want to be making them wear anything more on their hands or controllers, even, to me sometimes can be excessive or too much of a requirement.
Alan: It's interesting you say that, because I actually got to try something at CES last year, and it was these little sensors that went over my fingers, almost like a pulse oximeter.
Alan: And they just gave slight haptic feedback on the tips of your fingers. And it turned out that that was actually quite convincing. We didn't need the whole haptic glove or anything. It was just a very small sensor that clipped on your finger. And I'm assuming -- I was wired -- but I'm assuming it could maybe be Bluetooth or whatever. But it was enough sensation, that I actually felt it was real. I reached out and touched a fire, and it buzzed. Then I jumped back. [chuckles] I felt like an idiot. Freaked me out.
Varag: It's pretty cool. Yeah. And I even met this company not too long ago. Unfortunately blanking on the name right now, but it was at VRX last time I was there. And they were working on something really interesting about haptics. They were changing what that haptic feedback felt like, both in frequency and intensity, based on what you were touching within VR. Which I thought was really cool. So I'm looking at a desk right now, touching a desk in real life feels very different from touching this cup, right here. So how do you bring those differences in VR too, along with hand tracking? That's super, super exciting. And I think it makes sense definitely now -- as you know -- to do this all of the enterprise space, but rolling that into consumer where prices need to lower and things need to be even easier to use, that's yet to be seen. Maybe there needs to be something like of what's embedded in the headset, where there's ultrasounds around or something like that. That's pretty far out there, though, I think.
Alan: Yeah, I don't know. This is going to be so many use cases for this. And I'm just thinking of the different ways hand tracking can be deployed, and in just the use cases from e-commerce and being able to see products in your hands, but also just interacting in the real world in a natural way. One of the things that always stuck with me is that when Hololens 1 came out, when you show a kid something, they just pick it up and they do it. But when you showed anybody over a kind of 30, that gesture motion of the clicking that Hololens-- you had to reach out and kind of thumb-point-click thing. And nobody could figure it out. Everybody just reached out and tried to push things, they just reached out naturally to touch it. And I think the Hololens 2, I believe they've addressed that. But there's gonna be so many different devices out there. And I think you guys have made a really good idea in putting it at the chipset level and really making it available as that reference design for everybody.
Varag: Yeah, it's paid off so far, in the sense that when we go to OEMs or companies that really are building this hardware -- or even those that are buying it -- they love the fact that we understand the architecture, the software itself understands the rest of the architecture. And we've got a version of something we could put in there that most importantly, it interoperates with everything else. Because there's just so many other functions you need to-- and sometimes we're using literally the same cameras that are being used for other functions, too. So we need to respect those as well and make sure they work harmoniously together, to enable those use cases as well.
Alan: Indeed. So speaking of use cases, let's dig into some of the use cases. I've got a pair of AR glasses or VR goggles on. What are some of the use cases where I'd want to use some of the gesture recognition? I'm just looking at the page here on your site about gesture recognition. How you have flip, grab, up, down, swipe, pinch, victory point, all these things. But what are some of the use cases, practical use cases that people can wrap their heads around?
Varag: First thing I'd like to say is, I think of hand tracking and gestures and some of the other input methods as kind of what is like the keyboard and mouse of back when in the PC era, more like just basic navigating around that device, I think is going to not just only be hand tracking, but it will be all those, in some multi-modal form together. So I think that's number one, is just getting around the device most easily. So when you first put that headset on, you don't necessarily want to have to reach for the other controllers or something else every time you want to just navigate the device. So I think that's first and foremost. And they're in app control as well, too. So obviously for gaming, just interacting, you can be holding like -- it's more on the consumer side -- you can be shooting out of your hand, we've done applications like that before. But especially on the enterprise side -- where things are being used a lot today -- I see various things like just communicating to the device, or doing grab gestures for short cuts in the device makes a lot of sense. Sometimes it's just like capturing a screenshot of what's going on. You might do one of those gestures to take a screenshot or take an image if that's what you're looking at in AR. But what comes up a lot definitely -- for gestures, at least -- is shortcuts. To the extent that there is something you want to do very quickly in-device that you do repeatedly very often, using a gesture to do that can make a lot of sense. Especially when some of the other inputs are just not convenient. So like voice, for example, if there is a noisy background, you might want to just use your hands instead. So yeah, that's some of the ones we've seen, especially if you've got gloves on or something, in the enterprise use case and you don't-- you can't necessarily do touch on the device itself and you might want to use a gesture. And those functions can include things like-- those shortcuts can be like muting/unmuting, changing volume, changing apps you're in, waking up the device or making it sleep. So and so forth.
Alan: The gestures in shortcuts are going to be amazing, because I think it's kind of like you said, multi-modal is going to be really important, being able to use voice, being able to use gaze, understanding what you're looking at. I think once we have eye tracking and this gesture recognition and voice, it's just going to be "Show me that thing." and it knows you're pointing, it knows what you're looking at, and it knows what you said, and gives you what you want.
Varag: And Alan, one thing I think about a lot is--t I like to think several years ahead, it's fun. When you think about all those things coming together, just imagine what you can observe about-- some people might say this is creepy, but what you can observe about what a user wants and what they're doing. Today we're looking at what's someone's behavior on an app, on a phone, or on their laptop. And you can track a lot through those sensors that are onboard. But just imagine what you could track about where someone's eyes are looking in a given seat, how long you're looking at a certain ad, or specifically with their hands. So we have at least 22 points of interest tracked in some cases, and we could see how they're-- where they are in 3D space, relative to all the other things that are in 3D space. And that, I think, is really interesting for capturing intent of what a user is doing. Super exciting stuff. And how you can maybe monetize that data at some point is interesting too. But just the fact that you can understand what a user wants and how they're moving their hands around. I think hands are a really natural, expressive way of what a user, a person is doing.
Alan: Indeed. I wonder if you're going to have people flipping off the bird, and can you edit that out, so it doesn't do it? [laughs]
Varag: [laughs] That could be added. Like every time that's read--
Alan: "This user needs help." And support comes up. [laughs] "I see you're having troubles. Can I help you?"
Alan: Awesome. So let's get practical here. If you're going to be building this into all sorts of things, not just VR and AR headsets, put automobiles and that sort of thing, why would I want gesturing or hand recognition in a car?
Varag: That's a good question. So we're seeing that in the automotive space, for example, that-- I'll make it more broad than automotive. Anywhere where the rules of how you interact with a device are either being written for the first time or being rewritten, I think that's where gestures are interesting and hand tracking as well. In the automotive space, I think there is a lot of automotive players trying to figure out what does the future look like, especially with autonomous driving coming, I think that's part of what's driving that. And so that means that passengers and drivers have an opportunity to have a different experience with how they interact with the car. And that's where something like gestures comes in. Even when things are not autonomous yet, actually, we were-- one of our clients in that space is Renault. And I think one of the key drivers there is safety. And the use case there is drivers making gestures to control the infotainment system. So any function of the infotainment system that either takes too many button clicks to get to that specific function you want, again, have a shortcut to the feature. The button you want to press is just too far away, and there's too much reach. That seems like it's a small thing, but really, when you're driving, any milliseconds matter. It's to make it a little more convenient for the driver so they can mute, unmute, answer a call, end a call, zoom in, zoom out, so on and so forth. Then we can map gestures to different kinds of functions there, that make the most sense for that specific car. And it's not just for the convenience, but also for safety. The more we can reduce the cognitive load of drivers, the more safeness. That's the idea.
Alan: Yeah, I guess that makes sense. So what's next for you guys then? You've partnered with Qualcomm. You're gonna be kind of at the root level of all of these new devices. What's next then?
Varag: What's next is going to be continuing that work, what we're doing with Qualcomm, and really trying to onboard more and more of their customers. This year we've been focusing on bringing on Tier 1 companies that are building AR/VR devices. And there is still in almost in every case these device makers, they make little tweaks and changes to that reference design that they get. And so they might take different kinds of cameras. And there is some work with Clay to onboard them in the right way, make sure that the hand tracking is working well -- I guess -- with all the other functions, and we'd like to own that process a little bit too, and get involved. So it'll be the year for us, I think, where our tech has never been as ready as it's been today. What I mean by that is we're interoperable now with cameras that are the most available in the market, including monochrome cameras that are used for six-degrees-of-freedom. So we can work with those. And so the next steps will just be bringing on those big customers. I think the demand is going to be there this year more than ever. That's what I'm excited for, mainly because we have the software that runs in the right hardware more than ever. And I think what Oculus did and launched that as a feature proves that it's important, and that everyone else now wants it too. So that's what I'm excited for for AR and VR in 2020, for sure. And from a technology standpoint, we're just always-- that's never going to stop. I think the key thing we're going to be driving there is getting more and more detailed tracking and more more accurate tracking. And at the same time, monitoring how much we're really consuming the CPU, the GPU, or DSPs onboard, and optimizing like crazy, because we don't want to be--
Alan: You guys don't want to be the one that sucks up your battery.
Varag: Totally. Yeah. We want to be the one that's like "That works better than anyone else." and no one says you're too expensive.
Alan: I have to ask, because I'm on your new website -- it's not published yet, but it'll be out soon -- you have here, "scalability: one to four hands."
Alan: I'm not sure what what's all about, but alright!
Varag: [laughs] So maybe that applies to, like-- [laughs] maybe that applies to the other use cases. So we have three product categories, we've got plain reality for all AR and VR stuff. But for one to four hands, four hands could make sense maybe in-- actually in situations where you've got a really, really big screen at big shows. Things like that, you might want to have multiple people, multiple hands.
Alan: I was also thinking being able to -- in virtual space -- being able to see other people's hands come into the space, and hand them things.
Varag: Haven't seen that yet come up. But I could see that coming for sure. So, yeah, absolutely.
Alan: Very interesting. What is one problem in the world that you want to see solved using XR technologies?
Varag: It's a good, good question. I think that there's a lot that AR and VR can solve. I guess what I'm like personally most excited about -- this is less about Clay and more about me -- but I think the problem of distance is just-- and that's that's very general, but what got me super excited into XR in the first place was this idea that two people can be far away from one another. And I know we can connect now through a lot of different mediums, but I think that those mediums really only get us halfway there. And strangely, being halfway there makes us less often to be there, like in person face-to-face today. And I think that-- I really actually-- like I actually despise that about social media, and all the other technologies that help us connect today. It's like I feel like people are now lazier than ever with really connecting. What I'm excited about in AR and VR, I think it's the best candidate for a technology to really, actually bring us to 90 percent there, maybe even close to a 100. There is a lot of problems that get solved, obviously, by being able to do that. One is just connecting better with people. But all the things that come around, all the problems that we get around today by not doing that well enough, that will be solved. So that's seeing a loved one more often and feeling like you're present with them, that's great. If it's trying to do business with someone who's really far away, I think like feeling present with them in a room -- like you can do in VR -- is pretty amazing. So that's what I'd say.
Alan: It's a really good answer, and I think it's a great way to position XR as a medium by which we can create a more global unified world.
Varag: Absolutely. I mean, I remember the closest thing I did to that-- I haven't done enough multiplayer or multi-user type of VR experience lately, but I remember a while back I tried -- I guess maybe like a year and a half ago -- I tried a company's software called -- you may or may not have heard them -- called VRChat.
Alan: Yeah, absolutely. I actually did a -- ooh, back maybe three years ago -- I was in Gunther's Universe, which was one of the VRChat rooms, and I was interviewed in Gunther's Universe. It was super fun.
Varag: Yeah. And I just remember sitting there thinking-- I felt like -- I'm sure it's much better today than it was when I tried it -- and I was like, "Oh my God." I just remember thinking, "Wow, I really feel like I'm here with someone who's a stranger." And it was kind of strange, because that kind of creeped me out a little bit, in the sense that I was like "Wait, this person, I'm in a room with them. I don't know them, though." And strangely enough, I didn't have all the information that you get when you're -- because I love networking, I love meeting strangers all the time -- but when you're in person, you get 100 percent of who they are. You see them, interact with them, you shake their hand. But in VR, at that time, three years ago, you're only halfway there.
Alan: The first time I had that was with Altspace. I was lying in bed, and I put on Altspace. It was with the Gear VR.
Alan: And I was just walking around. I was kind of talking myself, like, "What is this silly thing?" And somebody stood beside me and started saying, "Hey, it must be your first time in here." And I couldn't figure out that they were talking to me. I was like, "What?!"
Varag: That was your first time. That's an interesting note, for sure.
Alan: Yeah, and it's funny because up until recently, Altspace really didn't improve. But I was in there the other day, and it actually looks like they've made some real improvements, which is fantastic to see. Everything's getting better.
Varag: Absolutely. I've got to check that out.
Alan: And then the new one that we really love is Meet In VR. It's a Danish company. And they've really nailed the interactions for corporate clients, for enterprise, because it allows you to be in a VR chatroom. But you have access to photos, you have access to 3D models. You can write on the walls, you can have conversations. It's-- I don't know. To me, it's the most comprehensive business tool for communication, it's really great.
Varag: That's a big one, because I'm on the business side of Clay. And so that means I travel around a lot and I actually like traveling lots. I don't mind it too much. But being able to do that pretty quickly in virtual reality would be amazing. I've got to try that out. But it hasn't happened for me yet.
Alan: Indeed. Well, I want to, Varag, thank you again for taking the time out of your busy schedule to join us on the show.
Varag: Thanks for having me, Alan. This is fun.
Alan: Absolutely. Where can people find out more information about Clay and the work you guys are doing?
Varag: So they can go to www.clayair.io. They could reach me out on an email, too. It's my first name, [email protected], as well. We have a new website coming soon, so we'll be announcing that somewhere on LinkedIn, social media, and all that good stuff. Alan, I love what you do in the industry. Just please keep it up and keep in touch.
Alan: Thank you very much. And you know what? I'm going to take this opportunity then to tell people to subscribe to the podcast. I always forget this part and people keep telling me, "You've got to tell them to subscribe." So, subscribe to the podcast and you'll get more XR for Business deliciousness, all the time.
We talk a lot about the business use cases of XR on this podcast, but any good business comes with a great fitness plan...
Alan recently discussed immersive learning with his partner in life and business, Julie Smithson, on her sister podcast, XR for Learning. We thought it...
Regular listeners will know that podcast host Alan Smithson is no stranger to the conference circuit, and is often asked to present or speak...