Google and Amazon are stepping up their competition on machine learning, with both companies introducing new video and natural language recognition tools to help customers make sense of piles of undifferentiated moving images and words.
Google (Nasdaq: GOOG) plans on Tuesday to launch general availability of its Cloud Video Intelligence machine learning API to analyze video content, including providing video transcription. And it's adding Content Classification to its Cloud Natural Language platform, to automatically classify content into more than 700 categories
For Cloud Video Intelligence, Google is adding deeper video analysis tools, including shot change detection, content moderation, and detection of 20,000 labels, to identify 180 different types of fruits, including kiwis and jujubes; 229 models of airplanes; 667 car models; and more than 200 types of buildings, including supermarkets and convention centers, according to a post scheduled to go live Tuesday morning by Wei Hua, Google engineering manager for cloud AI Video Solutions, on the Google Cloud Big Data and Machine Learning Blog.
A Google demo shows how a user can search a large library of videos for the keyword "baseball" -- like running a Google search on the web -- and find only videos pertaining to that sport. The result shows which videos have baseball, and when in each video baseball appears.
Cloud Video Intelligence can now automatically transcribe video audio into text. The feature is in private beta in English only, but will expand to new languages.
Hua outlines several customer use cases for the video service. Incentro, which provides IT solutions on Google Cloud Platform for media and publishing customers, uses machine learning to help its customers store and find digital media assets. Cloud Video Intelligence lists details in videos and labels and timestamps where the details can be found. Previous solutions were manual, time-consuming and less complete and accurate than the automated tool, Google says.
Robotix Media uses the Google video tool to optimize social marketing campaigns, analyzing video ad performance.
Google's Content Classification sorts documents and content into more than 700 different categories, including Arts & entertainment, Hobbies & Leisure, Law & Government, News and Health. Hearst Newspapers is using the system to classify more than 3,000 articles daily.
Amazon debuts video, natural language tools
At last week's AWS re:Invent conference, Amazon Web Services Inc. introduced its own machine learning improvements. Amazon Rekognition Video is a follow-up to AWS's Amazon Rekognition service introduced last year, which searches, analyzes and organizes millions of still images. Rekognition Video provides real-time video analytics, detecting objects and scenes, such as a package arriving, or inappropriate video content, celebrity video and person tracking. Using technology called Skeleton Monitoring, Rekognition Video can track people even when they are outside the frame, AWS CEO Andy Jassy said.
Amazon introduced Video Kinesis Video Streams to get video from cameras, phones, satellites, radar and other diverse devices where video lives, into the cloud, Jassy said.
Amazon Transcribe provides long-form speech recognition on a WAV or MP3 audio file, with applications such as call logs, video subtitling, capturing presentations and meetings. The service will be initially available in English and Spanish, with other languages to come. Unlike other automated transcription services, Amazon Transcribe adds punctuation and formatting, rather than producing an undifferentiated stream of text. Amazon Transcribe timestamps every word to align subtitles to video. It supports lower-quality audio, such as phone calls.
In coming months, the transcription service will be able to distinguish between multiple speakers, and users will be able to add custom vocabulary libraries.
The new Amazon Translate service automatically translates text into different languages for realtime translation and batch analytics, with automatic language recognition -- the service will recognize what language a person is speaking.
And the new Amazon Comprehend is a fully managed natural language processing service, which can understand documents such as social networking posts and articles stored in a data lake in AWS's S3 storage service. Hotels.com, an Expedia business unit, is using the AWS service to classify reviews and comments to find out what people like and what they don't like, Jassy said. The service will classify documents to sort them into topics and categories such as sports, politics and business. A healthcare provider can organize documents based on symptoms, Jassy said.
— Mitch Wagner Editor, Enterprise Cloud News