Time to find other tricks and say goodbye to "camera arms race"

Oct 18, 2021 AI

One of the reasons why Instagram being extremely popular 10 years ago was because the cell phone lens was so bad at that time, and the filters used to cover up the "incompetence" of the former with beautification. After 10 years, it's not what it used to be. More and more lenses are on the phone, the camera module is getting thicker and thicker, and the sensor even soared to 100 million pixels......but in recent years, cell phone manufacturers stop running in the way of "stacking materials". The growth rate of both pixel and sensor size is slowing down with the limitation of space and chip process in cell phones.

But if you think carefully, you will find that the "camera arms race" has not stopped, the manufacturers just changed the track and ran.

The concept of "computational photography" has sprung up on cell phones since 2019. In a device like a cell phone that strives to be portable, and it is impossible to make the lens too large, so the manufacturer simply changed direction: improving photography by AI.

01Taking photos is more than hardware

The most representative is the Google Pixel 3 announced three years ago, with only a single camera in the zoom, night scene, blur and other features to easily defeat counterparts.

This let all users start to realize that 'stacking' does not necessarily satisfy the shooting needs. Do users want to take a photo with 100 megapixels on a phone that is not just for shooting, and has limited storage space? What they need more than "big enough" is "good enough" - stable, balanced, easy to use, and capable of stimulating creativity.

Cell phones are not traditional lenses because their capability of chip solving is continuously evolving. This enlightened cell phone manufacturers: since it is impossible for hardware to obtain photos comparable to professional cameras, can the missing part be "guessed" by AI and "made up" by algorithms?

So, in what ways has AI changed traditional mobile photography?

Take "super resolution" as an example, the so-called super resolution is to change from a low-resolution photo to a high-resolution photo. AI adds details that are not available in low-resolution photos through "guessing" .

How does AI have this capability? To put it simply, we first take out a high-resolution photo A, then intentionally "lose" some details to get a low-resolution photo, and then equate it to a high-resolution photo, then perform a series of feature extraction to get a high-resolution photo B with sufficient details, and then compare B with A. If they are close, it means that the middle parameters are adjusted correctly, and thus the AI is trained.

The phone is able to perform a wide range of zooms thanks to super resolution. When the image is magnified, the noise and jitter can be very high. But with this feature, the noise and jitter can be corrected by AI inference to make the image look clearer.

Then let's talk about night mode. When users take pictures at night, there is often not enough light and noise is noticeable. At that time, if you want to get enough light, you need a long exposure, but the blur caused by a long exposure shake is obvious. If the exposure time is not enough, the picture will be very dim; if the picture has high brightness lights, the light will be overexposed if the exposure time is too long. In short, the camera makes it difficult to perform bright and dark places clearly.

The traditional solution is that the user fixes the device first, takes a long exposure and another short exposure, and manually combines the two photos into one in the end, the process which is time-consuming and laborious. The AI on the phone can now perform intelligent noise reduction, then take several exposure photos automatically, and composite automatically, and this series of actions are completed in a short time when the user presses the shutter. In this way, the user ends up with a HDR (High Dynamic Range) photo.

How did AI help photos to perform natural blur? Dual camera is equivalent to two human eyes, and each eye looks at the same object at different angles. There will be parallax, and this difference can infer the relationship between the front and back of the object. This method does not work if the object is far away, and with the increased power consumption of opening multiple cameras for a long time, the industry began to introduce AI. Manufacturers take a large amount of image materials to train AI, and AI can then distinguish who is in front and who is behind through a 2D photo. In this way, it is also possible to achieve accurate segmentation of portraits and backgrounds.

AI anti-shake, old photo repair, passerby elimination and other features all depend on algorithms driving. In the last few years, manufacturers have been fighting for the number of len; in the future, they will fight for AI and algorithms. The penetration rate of AI in the smartphone is increasing, and gradually AI is integrated with cell phone shooting.

02 Do what the traditional camera cannot do

“The 2020 China Artificial Intelligence Mobile White Paper” shows that in the first half of 2020, short video applications accounted for nearly 32% of the effective hours in a single phone per day of the apps commonly used by consumers.

In addition, cell phone shooting function has become the most important cell phone element for Chinese consumers, which will continue to guide cell phone manufacturers to improve cell phone shooting capability, shooting function and usage experience. The White Paper points out that cell phone manufacturers' shooting algorithms cooperate extensively with external algorithm companies, and nearly 60% of cell phone shooting algorithms are obtained through cooperation with algorithm companies; the "AI + light perception" cell phone algorithm market is highly concentrated, and the share of head enterprises represented by Megvii is nearly 80%.

In addition to the super resolution, night mode and blur performance, cell phone manufacturers are also using algorithmic approaches to better coordinate and schedule richer lens sets to achieve near DSLR-like photo effects, such as multi-camera blur and smooth zoom.

A single lens is difficult to meet the needs of different scenes, such as large scenes of scenery, daily documentary, humanities, and portrait photography. While if you add wide-angle, ultra-wide angle, telephoto lens at the same time, you can deal with a wider range of shooting scenes.

However, the phone's internal space is precious, cell phone camera module size is small, and it needs to download a lot of hardware and system, so it is difficult to have the same optical structure as the SLR. Therefore, most of the current lenses on the phone are fixed-focus lenses, which can achieve a true optical zoom only in a specific zoom node.

At that time, we need algorithms to achieve a smoother continuous zoom between multiple lenses of different focal lengths.

It's not better to have more rear cameras, or bigger cameras. Manufacturers are also starting to shift their thinking from just "how to make one lens strong" to "how multiple lenses can work together better."

AI multi-camera gradually comes into the public view, not only thanks to the iterative upgrade of cell phone manufacturers for products, but also can not be separated from the innovation and empowerment of Megvii and other AI companies for the underlying technology. Take multi-camera fusion as an example, Megvii's multi-camera fusion algorithm adopts neural network design model, which can effectively solve the common problems in traditional multi-camera fusion algorithm, such as uneven clarity, curved lines at the edge of the subject, double shadows in some positions, broken and misaligned lines, and misaligned repeated textures.

Cell phone imaging is a set of system engineering, and the test is the matching between various elements. Megvii believes that the previous light perception system is a relatively independent operation of "hardware upgrade - algorithm assistance - optimization post-processing", but with the addition of AI, these processes and hardware and software can work together to complete the upgrade. This is the concept of "AI redefines light perception system" pioneered by Megvii.」

With the cooperation of AI, the phone can even do what the traditional camera can not do.

03 Create a brand-new visual imagination by powerful alogrithm

On April 10, 2019, astronomers from many parts of the world simultaneously released the first true picture of a black hole, which is the first photo of a black hole in human history. This bright ring of fire is exciting and it is promised to confirm that Einstein's general theory of relativity still holds true under extreme conditions. However, it was difficult to take this photo because the black hole itself does not emit light.

Researchers have assembled eight high-sensitivity radio telescopes around the world. From the Atacama Desert in Chile to the Antarctic ice fields, from the mountains of Spain to the islands of Hawaii, these eight radio telescopes have been combined by "very long baseline interferometry" to simulate a telescope with an aperture as large as the Earth.

After collecting the materials, astronomers printed the photos successfully after about two years of data processing and theoretical analysis. In other words, the photographs of black holes are also "computational photography".

Computational photography is like this, using AI to break through the physical limits of optics, using powerful arithmetic to create a new visual imagination, using deep learning to go beyond the details visible to the human eye. Compared with the physical upgrade of cell phones, computational photography is invisible, and AI companies that provide technology for computational photography are even more invisible.

AI multi-camera gradually comes into the public view, not only thanks to the iterative upgrade of cell phone manufacturers for products, but also can not be separated from the innovation and empowerment of AI companies for the underlying technology.