The Image Signal Processor (ISP)

So what purpose does ISP have? Well, pixels are sensitive to light between some set of wavelengths, essentially they’re color agnostic. The way to get a color image out is to put a filter on top, usually a bayer pattern color filter, then interpolate the color of the pixels adjacent. Your 8 MP CMOS doesn’t sense red green and blue for each pixel, it senses one color for each, then ISP guesses the color based on what’s next to it. This is called demosaicing, and it’s probably the primary job of ISP, and there are many secret sauce methods to computing this interpolated image. In addition ISP does all the other housekeeping, it controls autofocus, exposure, and white balance for the camera system. Recently correcting for lens imperfections like vignetting or color shading imparted by the imperfect lens system (which you’ll add right back in with instagram, you heathen) has been added, along with things like HDR recombining, noise reduction, other filtering, face or object detection, and conversion between color spaces. There’s variance between the features that ISP does, but this is really the controller for getting that bayer data into a workable image array.

Obviously the last part is the human interface part of the equation, which is an ongoing pain point for many OEMs. There are two divergent camps in smartphone camera UX – deliver almost no options, let the ISP and software configure everything automatically (Apple), and offer nearly every option and toggle that makes sense to the user (Samsung). Meanwhile other OEMs sit somewhere in-between (HTC, others). The ideal is an opt-in option for allowing users to have exposure control, with safe naive-user defaults. There are still many players making horrible, almost unthinkable mistakes in this area too. I wrote about how the iPhone 5 crops the preview to a 16:9 size, yet captures a 4:3 image, and later was amazed to see the AOSP camera UI on the Nexus 4 deliver an arbitrary shape (not even 16:9 or something logical) crop in the preview, and also capture a 4:3 image. Composition unsurprisingly matters when taking a photograph, and it’s mind-blowing to see established players blow off things like preview. In addition, preview framerate and resolution can be an issue on some platforms, to say nothing of outright broken or unstable user interfaces on some devices. Many OEMs have been thrust into crafting a camera UI who really have limited to no camera experience — previously it was a feature to have a camera period, much less controls. As the smartphone evolves from being a camera of convenience to the primary imaging device for most people, having robust controls for when ISP and auto exposure functionalities fail will become important. Right now camera UI and UX is rapidly changing from generation to generation, with more and more serious toggles being added. I don’t think any one player has a perfect solution yet.

For video we need to also consider the encoder. The pipeline is much the same, though the ISP will usually request a center crop or subsample from the CMOS, depending on the capabilities of the sensor. The encoder takes these images and compresses them into a format and bitrate of the OEM or user’s choice, basically H.264 at present. Not every encoder is the same, as Ganesh will tell you. There are a number of players in this market supplying IP blocks, and other players using what they have built in-house. Many OEMs make interesting choices to err on the side of not using too much storage, and don’t encode at the full capabilities of the encoder. This latest generation of phones we saw settle somewhere between 15 and 20 Mbps H.264 high profile for 1080p30 video.

The Camera Module & CMOS Sensor Trends Evaluating Image Quality


View All Comments

  • ssj3gohan - Sunday, February 24, 2013 - link

    Couple of comments on this and your rant in the podcast :)

    First of all, you're lauding HTC for their larger pixel size and lamenting the move towards smaller pixels. But isn't it true that effective resolution, especially when your pixels are significantly smaller than the airy disk, is basically a function of integration area? The only downside to using smaller pixels is that you increase the effect of read noise and decrease fill factor. In an ideal world, a 100MP phone camera with the same sensor size as a 10MP one would make pictures that are just as good. With read noise being essentially absent nowadays, I don't see the reason to particularly bash on 13MP phone cameras compared to larger-pixel but same-integration-area sensors. They make the same pictures, just take up a little less space on the sd card.

    Of course, you could make the argument that it's wrong to give in to the 'moar megapixels!' consumer side of things and try to educate people that sometimes less is more.

    Next, you say that refractive index and focal length is essentially what limits the focal length for very thin cameras, but this can be alleviated by using diffractive optics (not yet now, but in the future). We may very well see 3mm-thickness 35mm focal length equivalent camera modules with large sensors someday. It's technically possible. Especially with, as you said, nanodiamonds and other very high refractive index synthetic lens materials in the making.

    Next, about the resolving power. There's the airy disk and rayleigh's criterion, but this is not the end of resolving power. It does make sense to oversample beyond this point, you will get extra image information. It becomes exponentially less as you increase the megapixel count but you can still get about 150% extra image information by oversampling beyond the size of the airy disk. Again, in an ideal world without drawbacks to doing so, this does make sense.
  • tuxRoller - Sunday, February 24, 2013 - link

    Especially, with the use of metamaterials that make use of negative indexes of refraction to allow you to resolve detail beyond the diffraction limit? Reply
  • ssj3gohan - Monday, February 25, 2013 - link

    Well, keep in mind that the reason you can resolve beyond the diffraction limit is the fact that the geometrical properties of the sensor and optics differ. Optics will by definition cause gaussian blur as their defect mode, while the sensor has square and offset pixels. These areas do not overlap perfectly, so in order to perfectly image that blurry optical image you need pixels that are smaller than the fundamental size of the diffraction pattern (airy disk).

    These optical effects don't go away when you're using metamaterials/quantum opticss/etc. Light will still be a wave that will not necessarily enter the sensor perfectly perpendicular.
  • UltraTech79 - Monday, February 25, 2013 - link

    I ave seen many many reviews of lenses and the technical details of digital imaging ect, and almost every time the article would have really shitty JPG images. I found it highly ironic. Kudos to you for using PNG throughout this quality article. Reply
  • AnnihilatorX - Monday, February 25, 2013 - link

    I was reading the review for Sony's Xperia Z at techradar, I was astonished at how poor the 13MP Exmor RS sensor performs. Frankly, the image looks blurry and more like it's taken by a 5MP scaled up, with heavy noise even in a well lit scene:

    While I don't really care too much about smart phone camera, and I use my budget DSLR (cheaper than a smart phone) for my photography pleasure, I was thinking if the MP race and new gen smart phones can eliminate the need for me to lunge a DSLR around. If this article is correct on the physical limitations of smartphone camera technology, looks like there is still a future for DSLRs.
  • danacee - Monday, February 25, 2013 - link

    Traditional, aka -crap- P&S clearly are at a disadvantage now, only the still very useful of optical zoom keeping them alive. However high end, 'big' sensor P&S such as the not too young Sony RX100 are still many many generations ahead of smartphone cameras, even the Nokia Pureview has terrible image quality next to it. Reply
  • pandemonium - Tuesday, February 26, 2013 - link

    I am surprised at the lack of mention for Carl Zeiss lenses in here. If you're going to make an article about lens quality and cameraphone technology, why wouldn't you include the best in the market for such? Or are we disputing that fact?

    Also, not all cameraphones suffer as much from dramatic lens flare discoloration issues as said "very popular phone."
  • ShieTar - Tuesday, February 26, 2013 - link

    Sure, you get a 3µm diffraction spot on your camera, and with 1.1µm pixels it gets oversampled. But that does not have to be a waste. As long as the diffraction pattern is well characterised, you can remove the diffraction effect through a deconvolution as part of your ISP. This even remains true for near-field optical effects that occur once you pixel size gets close to or below the image wavelength. As long as such corrections are implemented, and as long as your per-pixel noise is small enough for these algorithms to work, decreasing the pixel size does make a certain sense.

    Once noise becomes a larger problem then resolution, the smaller pixels hurt though, by wasting light through the larger crop factor and also by increasing the overall read-out noise. When exactly that point is reached depends on the light conditions you want to use your camera in, so it would be interesting to understand for which kind of conditions smartphone-cameras are being optimised.
  • rwei - Wednesday, February 27, 2013 - link

    hurr hurr Reply
  • theSuede - Wednesday, February 27, 2013 - link

    I don't know where your Rayleigh limit comes from, but in real world optics, Rayleigh is:
    [1.22 x F# x wavelength] -giving 1.3µm for green (550nm) light in an F2.0 lens.
    But maybe it's your interpretation of Rayleigh that is wrong, and that's where the error stems from. From the graphs, you show spot resolution limit as 2xRayleigh - and it isn't. Spot resolution is 1xRayleigh - giving an F2.0 lens a maximum resolution of the aforementioned 1.3µm - NOT 2.6µm.

    The definition of Rayleigh:
    -"Two point sources are regarded as just resolved when the principal diffraction maximum of one image coincides with the first minimum of the other.

    "Just resolved" in this case means a resulting MTF of about 7% - i.e The minimum distance between two peaks where you can still resolve that they are two, not one large is equal to the RADIUS of the first null on the Airy disk. Not the diameter. This is quite a common error made by people from the "E" side of ElectrOptics.

Log in

Don't have an account? Sign up now