Hot on the heels of Microsoft’s Project Oxford, Google is bringing its Cloud Vision API into beta. The offering allows software developers to create new ways of reading faces and emotions to help push the limits of what can be done with AI and machine learning.
Earlier this month, Google moved its Cloud Vision API out of limited release into open beta. The tool will enable developers to create apps that can parse the emotional content contained in a photo or image. The API also offers a window into how Google views the future of artificial intelligence and machine learning.
This effort comes at a time when other companies, notably Microsoft, are doing the same.
There’s also a business model here. When announcing the API, Google detailed its pricing scheme for the offering, which kicks in March 1. During the beta timeframe, each user will have a quota of 20 million images per month. Label detection on an image will cost $2 per 1,000 images. Optical character recognition (OCR) comes in at $.60 for 1,000 images.
However, Google notes: “Cloud Vision API is not intended for real-time mission critical applications.” Instead, Google is offering the API to developers to push the limits of AI and machine learning. The company can watch all these developments and learn as it goes.
The Vision API is designed to analyze images that are stored in Google’s Cloud Storage, and return characteristics of it. But, the API is not limited only to the GCS platform. A Google representative told InformationWeek in an email that, “Users can integrate the REST API to upload their images in other environments than Google Cloud Storage.”
Saying that it is powered by the same technologies behind Google Photos, Google said that Cloud Vision API will, for example, identify broad sets of objects in images — from flowers to popular landmarks.
Also, Google is leveraging the same Safe Search that it uses to filter “inappropriate content” from its Web-based image search results to filter images submitted via the API.
One of the limited release users, Photofy, noted that the API can flag potentially violent and adult content on user created photos in line with their abuse policies. Photofy CTO Chris Keenan also noted in the same statement that protecting these branded photos from abuse was almost impossible before the Cloud Vision API.
Cloud Vision API can also analyze emotional attributes of people in images, finding joy, sorrow, and anger, and detecting popular product logos, according to Google. However, it must be remembered that the API works only on static images and not on video. (The same Google representative also confirmed this to InformationWeek.)
This image-only restriction makes the Google effort similar to what Microsoft has announced in Project Oxford.
Oxford also has an API interface of specific tools for deriving emotional states from static images.
“The emotion tool released [in November 2015] can be used to create systems that recognize eight core emotional states — anger, contempt, fear, disgust, happiness, neutral, sadness, or surprise — based on universal facial expressions that reflect those feelings,” according to Microsoft. It returns those eight states as text labels above parts of the images.