What Is Computer Vision — Eduardo Avelar

Welcome to the first section of the Computer Vision Fundamentals with Google Cloud course.

In this part of the course, you learn what computer vision is and recognize the rapid growth in high-resolution image data available.

You’ll explore different types of computer vision problems

and business applications of computer vision that high-resolutions image data can be applied to.

And, you’ll recognize various machine learning tools offered by Google Cloud in order to solve computer vision problems.

Finally, you’ll learn and experiment with the pre-built Vision API for computer vision problems

that let you derive insights

from your images

by detecting emotion,

understanding text, and more.

Computer vision is a subset of machine learning (ML),

and therefore artificial intelligence (AI),

which focuses on how computers see

and understand

digital images

and videos.

Although capturing an image in a digital form began in the 1960s,

research on configuring computers to understand the subject matter

or the meaning of all those pixels

has only progressed in the last few decades.

And it is only within the last few years, with the advent of cloud computing

and specialized processors designed for computer vision,

that this technology can be leveraged at scale.

This advancement is primarily because every image can be represented as 2-dimensional arrays of numbers, which are known as pixels.

Computer vision has also experienced growth because of the mass amounts of data generated now

which allow researchers to train computer vision models.

In 2002, the first phones with built-in cameras became publicly available.

By 2006, more than half of all phones had integrated cameras.

Now, mobile technology with built-in cameras has saturated the world with visual inputs

like digital images

and videos.

Internet growth statistics from Statista show that the total amount of data consumed globally in 2021

was 79 zettabytes

and 93% of people accessed the internet through mobile devices.

A ‘byte’ is a unit of digital storage capacity, and a zettabyte is 10 to the 21st bytes.

Considering that the phones in your pockets can capture high-resolution images, the amount of visual data is even more impressive.

To give you another sense of the scale of visual data, consider that for every second that passes,

8.5 hours of videos

are uploaded to YouTube.

That’s 500 hours of video data for every minute.

And that’s just video data to YouTube.

As of November 2020, 4 trillion photos were stored in Google Photos,

and every week, 28 billion new photos and videos are uploaded.

This number roughly corresponds to 46,300 visual inputs per second.

If you’ve ever read the stat that estimates that 90% of the world’s data has been created in the last two years, there are so many articles using the same exact line,

consider how recent events drive demand for data and analytics to unprecedented levels.

Finally, new algorithms such as convolutional neural networks build on the hardware and software capabilities of our era.