You’re trying to get a pair of tickets to the latest Jay Chou concert and the ticketing website demands you prove yourself a human being.
“Click on all the squares containing pictures of cats, click next when there are none left”.
Cat walking across the street click, cat in a monkey costume click, cat half-hidden behind a streetlamp click. And… next.
“Darn it, they’ve sold out! Curse you, cats!”
How does being able to identify cats, dogs, apples, oranges, street signs, or even being able to decipher squiggly text make us different from robots (or “bots”)?
What is CAPTCHA?
CAPTCHA is otherwise known as Completely Automated Public Turing test to tell Computers and Humans Apart. It is precisely what it sounds like—a tool that differentiates attempts made by humans and bots accessing a website.
Websites usually use CAPTCHA to stave off attempts by individuals trying to utilise bots to conduct mass activities, such as creating email accounts, filling out survey forms or mass-buying tickets with the intention of then scalping them.
Earlier CAPTCHA tasks relied on the individual’s ability to identify distorted words.
Unfortunately, Artificial Intelligence can now solve these word identification tasks with 99% accuracy.
This is why you may have noticed that CAPTCHA tasks these days require you to identify images (something that computers still have trouble with) rather than words.
The task itself seems simple enough right? So just what is it about the human eyes and brain that bots are unable to replicate?
A cat scurries across the room. You tell yourself, “That’s a cat”.
It took all of 150 milliseconds for your eyes to receive the light bouncing off the creature, for the neural signals to travel down the optic nerve, for the visual cortex to process it and for you to finally decide that the creature is a “cat”.
What does it take for a computer to identify a cat?
To create a visual processing unit like humans have, the task of engineers in the field of computer vision, we would have to mimic both the human eye and brain functions. The former is the easier part, with cameras becoming better and clearer.
The latter part is the monumental task—the supercomputer, Dawn, managed to simulate only 1% of the human cerebral cortex (the part of the brain responsible for higher-order functions) at 1/600th its actual speed (if it took 150ms for you to recognise a cat scurrying across the room, it would take Dawn about one-and-a-half minutes). All this required 1 million Watts of power and approximately 80 000 cubic metres of chilled air per minute.
Theoretical physicist Michio Kaku wrote in Physics of the Future that modelling the activity of an actual human brain would require a supercomputer 1000 times that size, the power of a nuclear plant, an entire river to cool it and a space large enough to house several city blocks.
All that just to tell you, “That’s a cat”.
Computer vision built on machine learning
Technology companies have been trying to create computer vision and artificial intelligence capable of mimicking the activity of the human eyes and brain in other ways that won’t require as much resources. Google, in a bid to create machines capable of recognising images with high accuracy, introduced machine learning algorithms to their search engine 2-3 years ago.
How this works is large amounts of raw data, like images of cats, are fed into the machine to enable it to learn all the features associated with cats before new data is inputted, which could be images of cats and dogs, to determine or test the extent of its learning.
In fact, while you were furiously clicking away at the cats to get to your tickets, the machine continued to reinforce its learning through the confirmation of the images it had already determined were cats.
Now, repeat that process for every single other image or pattern you want the machine to recognise and you’ve more or less got a human being’s visual processing capabilities— easier said than done.
Computer vision powered by machine-learning is nowhere close to replicating what an actual human being can do holistically, but it is showing up in an increasing number of industries. From face detection technology in apps like Instagram and Snapchat to cashierless stores like Amazon Go to driverless cars, technology companies are just beginning to explore the potential of computer vision.
We still have a long way to go before our machines are able to recognise objects with the efficiency and accuracy at which human beings can. If all this sounds interesting to you, and you want to explore computer vision in greater depth, why not sign up for a course in Computer Vision with us at UpCode Academy?