AT Update Logo

ATU630 – Envision AI Updates with Karthik Kannan

Play

AT Update Logo

Your weekly dose of information that keeps you up to date on the latest developments in the field of technology designed to assist people with disabilities and special needs.
Special Guest:
Karthik Kannan – CTO and Co-Founder – Envision
Bridging Apps: www.bridgingapps.org
——————————
If you have an AT question, leave us a voice mail at: 317-721-7124 or email tech@eastersealscrossroads.org
Check out our web site: http://www.eastersealstech.com
Follow us on Twitter: @INDATAproject
Like us on Facebook: www.Facebook.com/INDATA
—– Transcript Starts Here —–
Karthik Kannan:

Hi, my name is Karthik Kannan. I’m one of the founders of Envision, and this is your Assistive Technology Update.

Josh Anderson:

Hello and welcome to your Assistive Technology Update, a weekly dose of information that keeps you up to date on the latest developments in the field of technology designed to assist individuals with disabilities and special needs. I’m your host, Josh Anderson with the INDATA Project at Easterseals Crossroads in beautiful Indianapolis, Indiana.

Welcome to episode 630 of Assistive Technology Update. It is scheduled to be released on June 23rd, 2023. Today’s show, we’re excited to have Envision AI back on the show to give us some updates on some of the really cool features that they now have. We’ve also got BridgingApps on with an app worth mentioning. A lot of things to cover today. So let’s go ahead and get on with the show.

If you’re out there and you’re looking for more great content on assistive technology, head over to our website at eastersealstech.com. Over at eastersealstech.com, you can find not just this podcast along with transcripts of it, but also our sister podcast, ATFAQ and Accessibility Minute. Accessibility Minute hosted by Laura Metcalf is just a very quick little taste about something assistive technology based while ATFAQ is a question and answer show where we do rely on your questions.

So be sure if you do have questions about assistive technology, definitely get those over so that maybe we can actually answer those on the show. But there’s so much more than just the podcast there. You can also find links to our YouTube channel where you’ll find tech tips that come out every Monday about different pieces of assistive technology. These short little three to five minute videos, we’ll just show you what a piece of technology is, how it works, and how it might benefit an individual with a disability.

You can find consumer highlights, stories, blog posts, as well as all the services of in-data and our clinical assistive technology out outline right there on the page. If you’re looking to find your local Assistive Technology Act here in the United States, you can go to eastersealstech.com/states and it’ll get you over there and find it. So really, for a lot of the things you might need assistive technology wise, check us out at eastersealstech.com for more shows like this for videos, for blog posts, for consumer stories, heck, even for pictures of the whole darn team, you can find those over there as well.

So thanks as always for listening and don’t forget if you’re looking for even more content, check us out at eastersealstech.com. Next up on the show, please join me in welcoming back Amy Barry from BridgingApps with an app worth mentioning.

Amy Barry:

This is Amy Barry with BridgingApps and this is an app worth mentioning. This week’s featured app is called myNoise: Relax, Sleep, Work. My Noise is a free app that covers up sounds that you don’t like with sounds you like, sort of like a customizable noise canceling machine. The app is designed to be used to help with sleep, relaxing or at work. We used the app in multiple environments, mostly to block out distracting noises, working in an office and working from home. Leaf blowers, lawnmowers, screaming kids and loud talking were easily drowned out by selecting a noise generator. Typically rain or white noise, it can be used with or without headphones, which was also very helpful. Using the soft piano or nature sounds with headphones also helped with concentration and relaxation. Using it to take a 30 minute break between work tasks or to clear the mind was also helpful.

Anyone who has difficulty concentrating due to noise distractions will want to try this up. People looking for a tool to help take a mental break or to relax will like the variety of sound options available to them. Kids, teens, adults, and even older adults made a benefit from trying this app. Noise pollution and distracting sounds can be a real problem in our busy world. So this is a tool that can be used to focus the mind and mask some of those distractions. myNoise is currently available for iOS devices and it’s free to download with optional in-app purchases. For more information on this app and others like it, visit BridgingApps.org.

Josh Anderson:

Listeners, back in August of 2021, we were very excited to welcome Karthik from Envision on the show to tell us all about Envision in the Envision Glasses. Well, today we’re even more excited to welcome him back to the show to talk about their new Ask Envision feature and the other great things from Envision. Karthik, welcome back to the show.

Karthik Kannan:

Thank you so much Josh for having me. I really appreciate you guys having us on the show. Thank you.

Josh Anderson:

And I really appreciate you taking time out to come on the show. But before we get into kind of talking about Envision, the new Ask Envision feature and everything else, could you tell our listeners a little bit about yourself?

Karthik Kannan:

Sure. So my name is Karthik Kannan. I am the founder and CTO at Envision. So my life revolves around Envision and working on the Envision product itself. So I spend a lot of my time working on the technology that goes into Envision’s products, the AI, and I work closely with the design team at Envision. I also spend a lot of my time talking to end customers directly, either through platforms like these or on online forums or just meeting them face to face and getting to know about how they use the product and so on.

So my job is probably the most fun job ever, at least according to me because I get to really play with the coolest of tech and I get to spend every minute of my waking hours trying to think of how we can take those and apply it to help people in the blind and visually impaired community. And for me, that’s just super rewarding. So that’s a little bit about me, about my job, and that’s who I am.

Josh Anderson:

And that’s awesome and that’s great that you get to see it from both sides, from being able to develop it but also see it in people’s hands and do that all together, that makes it a little bit more rewarding and makes those long hours probably a little bit easier to bear as well.

Karthik Kannan:

Yeah, no, definitely. I think that is the most rewarding thing. A lot of people are asking me about, “how do you tend to do this?” We’ve been doing this for seven years now almost, and I think I could do this for another 70, to be honest with you, because the fact is we work on this stuff and we put it out and then people react to it and people react to it in such heartwarming ways. People make it part of their lives. They use it to read the greeting cards written by their grandkids or sometimes they use it in their art.

People make videos about their glasses or they make it as part of their YouTube videos or they write songs about it. And to me it’s incredibly heartwarming to see that something that I’ve been a part of and contributing to is actually literally weaved into the lives of people and it’s impacting them in a positive way. So that’s like the biggest-biggest source of strength and energy. It’s like having 10 Red Bulls every day.

Josh Anderson:

That is great. And I can hear and just feel your energy even over Zoom right now, so that is absolutely awesome. Before we really get into digging into the new Ask Envision kind of feature, let’s get just a little bit of background for our listeners who aren’t kind of aware, but what is Envision and the Envision Glasses?

Karthik Kannan:

Sure. So Envision Glasses is basically a pair of smart glasses that has a camera and a speaker in them. And what they do is they help you translate the visual world around you into speech. So it helps you basically take images of things around you and it extracts information from those images and speaks it out to you. And here the information could be text, you could be looking to read a menu card at a restaurant, or it could even be faces of friends and family members. It could be getting a description of what is around you or being able to detect certain objects in your environment. So the glasses are capable of dealing with all types of visual information. Could be text, could be objects, faces, and so on. And it’s all in the form factor of a regular pair of sunglasses.

So you just wear them on your face and then you go ahead and interact with the world and it basically converts all of that into audio for you. So that’s Envision Glasses. We also have the Envision app, which is an app that does a lot of things that the Glasses does, but it’s a completely free smartphone app that can help you read text, recognize faces. You can even read, you know, PDF files with the Envision app. And it’s available on both iOS and Android completely for free. So these are the two products that Envision makes, the Envision Glasses and the Envision app.

Josh Anderson:

Awesome. And the reason we have you on here today is for a really cool, kind of new thing that you have there at Envision called Ask Envision. Now what is that?

Karthik Kannan:

So one of the main things that I believe the Envision Glasses helps people with the visual impairment do is it helps them get access to information really quickly. Back in the days before Envision Glasses really came on the scene, you usually have… if you want to read a document for example, you had to buy yourself one of those fat OCR scanners that you’d put on a desk and you can’t really take them anywhere. So you just scan the document with that and hope you get it right and then hope the output is right and so on. So that was before the Envision Glasses. And then when the Envision Glasses came around and AI started to really take off, people could just take a picture with the glasses and then it would just read out all the text to you. Now the Ask Envision feature is the next step.

Now what if, you take a picture of a menu card for example, and you want to know what the appetizers are in that particular menu. Now, earlier with the Envision Glasses, you had to scroll through the entire menu or you have to jump through the headings and then go to the appetizers and listen to it. Now with the Ask Envision feature, you can basically literally ask the Envision Glasses a question, you can ask it, “Okay, tell me all the vegan appetizers in this menu” and the glasses will understand, the text will understand the question and then go ahead and answer it for you. So this basically means that you don’t have to scroll through a wall of text anymore, you can just ask it questions like how you ask a human being, and then it would go ahead and give you the answer back to you.

And this is made possible because in the last few months a lot of tech savvy users might have heard of the term GPT. It’s a really groundbreaking piece of technology that helps you ask questions and get answers in a very easy way. And we’ve incorporated the GPT for a API by OpenAI into the Envision Glasses. And so what it basically does is it understands the text you give it, it understands the question, and it can basically help you answer pretty much any question you have about the text that you just scanned with the Envision Glasses. So this is the new Ask Envision feature, and this is actually a huge game changer in terms of productivity, in terms of being able to just the very natural way of asking something, a question, getting an answer. It’s a game changer in that sense.

Josh Anderson:

Oh, for sure. And I always think of bills just because I know a lot of individuals, I mean you do still get some bills by paper and any kind of scanning thing, it’s not always easy to figure out exactly where the amount due is, where the due date is, because they’re all different in columns and they’re all kind of scattered all over the place. You just have to wait until you hear something and hope that’s the right part. But with this, I could just ask it what’s the due date? And as long as it’s close to something that says due date, it could just rattle that right off to me. Is that pretty correct?

Karthik Kannan:

Exactly. Even it’s extremely powerful in so many other ways. For example, you could go ahead and scan a document in German for example, you can scan a document that’s in German, you can ask a question in English and then the glasses would actually understand the German text, would understand the English question and give you the answer in English if you want to. You can ask it to translate documents in over a hundred different languages. And then this GPT tech is able to do that.

So it goes beyond just trying to match the question to the text. It’s able to go ahead and understand the text and help you make sense of it. So it’s extremely powerful and the wider world is still trying to figure out what the use case for such a technology is. But then when we saw this, we immediately knew that if we just take this technology and then pair it with the scan text feature on the envisioned glasses which scans text, it basically can help people just go ahead and do things much faster. And we are already seeing a lot of people, for example, use the glasses in the workplace where they’re able to do tasks literally 10X faster than what they were able to do before. It’s incredibly fascinating. Yeah.

Josh Anderson:

Oh yeah, it really is. And I never even thought about the translation, but man, if you’re overseas or if the signs aren’t in your native tongue, that’d be so great to have because you could actually just know what they all said just by simply asking it. That’s really awesome. And Karthik, I know we can also… just because I did a little bit of research on this, can kind of do some fun things. I saw you demonstrate a little bit of making it rap. So can you just tell us just some of the fun things that it can kind of do? And I know it can do tons, but just because I happened to see you demonstrate that and I got a pretty big kick out of it.

Karthik Kannan:

Yeah, no. So like I said, you can basically ask it, for example, let’s say you’ve just scanned a boring legal document. And you’re going through the legal document and just for kicks, you say, “Okay, can you wrap this document to me in the style of your favorite wrapper?” For example, in my case it was Dr. Dre and I took a scan of this document and then it was just some basic, some dry information about a room and what size the room was and so on. And I just said, okay, can you wrap this in the style of Dr. Dre? And then the AI just gave the whole information to me in the form of a rap song that could have been written by Dr. Dre.

You could ask it to write you a poem based on the document that you just scanned by your favorite poet. Or you could say, “Can you create a tag joke out of this document you just scanned?” It could do so many things. Like for example, I scanned the manual of an airbag in my car and I said, okay, can you make a dad joke out of this? And it basically gave me this really funny joke, which was, “Why did the airbags girlfriend leave him?” And the answer was because he was so full of himself.

Josh Anderson:

Nice.

Karthik Kannan:

It’s a perfect dad joke, right? And yeah, it can do so many things like these, literally the possibilities are limitless and we are trying to make this not just really easy to use, but also accurate. Because what we’re trying to do is just take this technology and instead of letting it wild, we’re kind of focusing it to only stick to the text that it has been shown. And that way it’s able to deliver something that’s accurate based on the text that you just shared with it, and at the same time, you can do it in such an easy way. And if you want to really have fun with it, you can go have fun with it.

Josh Anderson:

That is awesome. And again, I just love that because if you’re just scanning texts, like you said, it can get boring after a while. So being able to liven it up a little bit can always be really helpful. And I can ask you Karthik, just looking in the future because I know Envision’s been using AI for a while, and this is a big step, like you said, that has just happened here lately. Where do you see this all going as far as the integration of AI and AT in the future? And feel free, if we look back at this in 10 years, and you’re completely wrong, no one’s going to say anything, but where do you see it going in the future? I mean, I know the sky’s kind of the limit, but do you see any kind of trends or any kind of way that it might be heading?

Karthik Kannan:

Sure. I think the traditional way of interacting with a product, especially in the AT world, everything is a button and you click on a button if you want something. And even today in the Envision app, let’s say you want to detect color, you have to go to this specific feature called detect color and then activate it. Or if you want to get a description of the scene, you had to go to the described scene functionality and then activate it. So we’ve always been used to products where we have to interact with buttons and menus and things like that. But with where AI is going, I basically see just nothing but a microphone, a camera, and a speaker, that’s it. That’s all you need to do. And the only input is conversation. So having conversational interfaces have been around for quite some time. Everyone’s been used to Siri and voice assistants like Google Assistant and so on, or Alexa.

This is taking it to the next level where you just don’t work with audio and text, you also are able to work with images and video as well. So in the future, you could just be going ahead and saying something like, okay, give me what’s in front of me, or you could just hold up a document. And then the AI is automatically able to understand that you want to read a document or you could just hold a menu in front of the glasses and then the glasses knows that this is a menu and then you can just directly start asking questions to the glasses instead of you having to go into a specific feature and then activate it and so on. And I see in the future, and this could happen maybe in the next two to three years, I see a future where there are literally no menus, all of computing is just a camera and your voice and you can just converse with the world just like how you converse with another human being.

And that kind of technology is used to be sci-fi before, 10 years ago, five years ago, if somebody were to tell me this kind of tech is on the horizon, I would’ve laughed them out of the room. But having been around for these last five, six years and seeing the kind of rapid progress that’s happening in the past year, I think we’re definitely closer to that kind of a world. And when that happens, for me, what’s most exciting about that future is that when that happens, I think everybody will be more or less on a level playing field. Because for the longest time, I think people in the blind and visually impaired community have… they’ve been asked to do 21st century work with 19th century technology. And especially in the AT space, it’s always been products that are clunky and it’s been pretty stagnated to be honest.

But with AI coming into the picture, all of a sudden the AT space has the same or better tools to use than the wider public. And over a period of time I see something like smart glasses going mainstream. And when that goes mainstream and these conversational AIs become really smart and they combine visual world with the audio world and text, I think it’s going to make it so much more easier for someone in the community to be able to do things. It’s just going to make them really independent and productive. And that’s what to me is most exciting about the future.

I also think AI is going to become a lot more personalized. That’s also a future that we are working towards at Envision, where we believe that today’s AI is very generic. You are given an AI that’s trained on millions of images, but it doesn’t know a lot about your own surroundings. What if you could teach an AI to not just recognize the object that it already knows, but teach it new objects? Even though your AI model can’t recognize AirPods, what if you could teach it how to recognize your AirPods case? What if you could teach it your favorite coffee mug from your wife’s coffee mug?

That’s the kind of future that we are heading towards. That’s what we are working towards at Envision, and that’s what I’m most excited about as well. So ease of use and personalization, these are the two things that are going to come together. And to be honest, without a lot of exaggeration, the future is extremely exciting and polite. Yeah, I can’t wait for it.

Josh Anderson:

Oh, for sure. And like you said, I mean it’s just amazing how far it’s come in just the last few years and the ease of use, just the being intuitive. If I think of the other… you mentioned Siri, some of the others, you got to kind of know how to talk to them. You got to know those key words, you got to know how to get it to do. There’s a lot of, I don’t understand what you want me to do. I don’t know, I didn’t understand that kind of thing. So the newer AI and just being intuitive and just being able to plain language, kind of say what it is you want and it know what you want and kind of go from there.

So yeah, that makes just a huge difference because as you said, I mean especially in the blind and low vision, I’ve got a swipe, I’ve got a tap, I’ve got to move, I’ve got to do this, I’ve got to do that. There’s a lot of steps involved. And for some individuals they’re tech savvy, that’s not that big of a deal. But for others, that’s just another barrier to being able to access things. So that ease of access is just a huge one. And I love the whole personalization part. That would be absolutely wonderful if it could sit there and like you said, just be able to tell things apart, be able to tell all this different stuff, and really just give you more and more information of the world around you. That would be absolutely awesome.

Karthik Kannan:

Yeah, no, that’s exactly what it’s going to look like. And we actually have a feature that we are kind of working on internally where we are basically combining the power of the whole GPT AI with images. So you could go ahead and take a picture of something and just ask it questions about the image itself. So I could take a picture of you and I could say, okay, can you describe what is in front of me in as much detail as possible? And then the glasses actually start giving you a lot of detail. Or you could basically say, okay, can you tell me what Josh looks like? What is he wearing? Right? What’s the color of his hair, for example?

So you could go ahead and ask it questions of the picture that you just took. And we are doing a few tests with users internally and they absolutely love it. So I think a lot of that stuff is going to come in the near term. We are going to be pushing those things out in the coming months, hopefully this year it’s going to be a huge upgrade to the glasses in terms of what the AI is capable of doing.

Josh Anderson:

Oh, that is great. That is absolutely awesome. And Karthik, we can’t wait, not wait to see all those different features and just everything else. Because Ask Envision is definitely a huge step forward, and I’m sure that just… well, in the few times I’ve been able to talk to you, it’s just going to continue to snowball and get bigger and better and better and better. If our listeners want to find out more about Envision, about Ask Envision, about the glasses, what’s the best way for them to do that?

Karthik Kannan:

SO the best way for people to go ahead and find out more information about the glasses and even get a demo of the glasses, I know I would strongly encourage everyone who’s listening to this podcast to actually try out the glasses for themselves. And you can actually request a demo by going to our website, letsenvision.com, so it’s L-E-T-S-E-N-V-I-S-I-O-N.com and request a demo of the Envision Glasses. And someone from Envision will actually get in touch with you, will walk you through all the features of the glasses and if you have any questions, you can go ahead and use that time to ask questions as well. So that’s our website. Find out more information about the glasses, about the app, and if you like it, just go ahead and request a demo, a free demo of the Envision Glasses, and we’d love to go ahead and take it for a spin with you.

Josh Anderson:

Awesome. We’ll put that down in the show notes so that folks can easily access it. Well, thank you so much for coming on today for talking about… oh, just, well, I mean not just Ask Envision, but just kind of where things are going with tech. And I must admit that all your excitement, not just about the product that you all have, but the work that you all do is a little bit contagious too. So hopefully that gets out there to the listeners as well. But Karthik, thank you so much for coming back on the show and telling us just about these really cool new features and the really cool new Ask Envision from… that goes along with the Envision Glasses. Thank you again.

Karthik Kannan:

Oh, thank you so much. Thank you so much, Josh for having me. Thank you.

Josh Anderson:

Do you have a question about assistive technology? Do you have a suggestion for someone we should interview on Assistive Technology Update? If so, call our listener line at (317) 721-7124. Send us an email at tech@eastersealscrossroads.org or shoot us a note on Twitter @INDATAProject.

Our captions and transcripts for the show are sponsored by the Indiana Telephone Relay Access Corporation or InTRAC. You can find out more about InTRAC at relayindiana.com. A special thanks to Nikol Prieto for scheduling our amazing guests and making a mess of my schedule.

Today’s show was produced, edited, hosted, and fraught over by Yours Truly. The opinions expressed by our guests are their own and may or may not reflect those of the INDATA Project, Easterseals Crossroads, our supporting partners or this host.

This was your Assistive Technology Update, and I’m Josh Anderson with the INDATA Project at Easterseals Crossroads in beautiful Indianapolis, Indiana. We look forward to seeing you next time. Bye-bye.

Leave a Reply

Your email address will not be published. Required fields are marked *