Couple of years ago, the world of mobile apps was shocked with the appearance of World Lens app, which detected words on live cameras. Even though the mobile phones at the time had quite limited camera capabilities, what World Lens showed to us was the future of translation. Naturally, revolutionary apps like that do not just disappear as Google proved by acquiring the app maker, Quest Visual.
Given that World Lens for Google Glass was one of most impressive things I’ve personally used on Google’s first attempt at Augmented Reality, as you can see on video below:
Now, over a year passed since the acquisition of the company, and first results are in. Google just released new version of its Google Translate app, and this version is nothing short of impressive. While originally, Google Translate was able to detect and translate characters on photos in seven languages, new update brings the total to 27 languages:
English, French, German, Italian, Portuguese, Russian, Spanish, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, Filipino, Finnish, Hungarian, Indonesian, Lithuanian, Norwegian, Polish, Romanian, Slovak, Swedish, Turkish and Ukrainian
This number is nothing to be sneezed at, and the algorithms used now can translate a picture from Ukranian to Croatian, Turkish to Norwegian and many more. Using World Lens tech, and dramatically improved deep neural networks, software engineers are now utilizing parallel computing offered by GPGPU products such as Nvidia Tesla or less efficiently, Xeon Phi accelerators – to ‘learn new languages’. You can read more about Google datacenters here. Why is this important? Because this is real world:
In a blog post on Google Research, Otavio Good explained how they pushed both sides – front end (mobile phones / tablets etc) and the back end (data center) to accelerate recognition patterns beyond what we might expected:
“We needed to develop a very small neural net, and put severe limits on how much we tried to teach it—in essence, put an upper bound on the density of information it handles. The challenge here was in creating the most effective training data. Since we’re generating our own training data, we put a lot of effort into including just the right data and nothing more. For instance, we want to be able to recognize a letter with a small amount of rotation, but not too much. If we overdo the rotation, the neural network will use too much of its information density on unimportant things. So we put effort into making tools that would give us a fast iteration time and good visualizations. Inside of a few minutes, we can change the algorithms for generating training data, generate it, retrain, and visualize. From there we can look at what kind of letters are failing and why. At one point, we were warping our training data too much, and ‘$’ started to be recognized as ‘S’. We were able to quickly identify that and adjust the warping parameters to fix the problem. It was like trying to paint a picture of letters that you’d see in real life with all their imperfections painted just perfectly.”
And how does it work? Watch this La Bamba video: