Yeah, binning for a color camera can be a little confusing.
Basically, all color cameras start their life as a monochrome sensor (except for the Foveon sensor, but that is not as sensitive as "regular" CMOS sensors today, so it is seldom even mentioned anymore; Foveon itself was purchased by the Japanese Sigma company).
Tiny color filters are deposited over each of the pixels in an array named after its inventor (who worked at Kodak). Typically, each super pixel is made of an array of 2x2 mosaics. Color vision (tri-stimulus model) of the human eye-brain system can be approximated by red, green and blue channels. By changing the ratio of red, geen and blue, your brains gets the sensation of seeing the color spectrum that Newton first discovered. There have been attempts to model human color vision differently, including one by Polaroid's inventor Edwin Land, but the tri-stimulus model is the most widely accepted one.
If you look at a natural rainbow, you can actually see the color orange, formed by a certain wavelength of light. However, you can mix a certain ratio of red, green and blue light (all three have different wavelengths) and the brain can be fooled to think it is seeing the wavelength of orange! It is like tuning a radio to three different frequencies and be able to receive a completely different frequency -- which of course cannot happen -- but the brain is a fascinating organ.
Since there are three color channels but 4 pixels in a 2x2 array, Bryce Bayer decided to assign 2 of them to green, one to red and the remaining one to blue. This is because green part of the spectrum is the most prominent one for the eye. This is actually unfortunate for astrophotographers because the most interesting colors for us in terms of emission nebula are the red and the blue-green regions! We actually could use 2 red, 1 green and 1 blue pixel and use less exposure time for nebulas :-).
There are multiple ways to arrange 2 green, 1 red and 1 blue pixel. The one Sony has chosen (many of the CMOS sensors today come from Sony) is one of the popular ones, where the top two pixels of the 2x2 array are red and green, and the bottom two pixels are green and blue. E.g., (RG, GB). You often see it written as "RGGB."
To create a color image from this hodge-podge of pixels, you use a demosaicing process that is named after Bryce Bayer -- "deBayering," or "debayering." He is one of the few people whose name has become a verb!
When it comes to binning of a Bayer array, you are faced with an interesting dilemma. The normal way of binning a monochrome image is to take a 2x2 array (of monochrome pixels) and sum them. If you did that to a Bayer 2x2 array, guess what, you end up with adding two greens, 1 red and 1 blue pixel, so you end up with something that approximates a grayscale image to the human eye!
There is no way to recover a color image from such a binning. It is like taking Red, Green and Blue M&M candy and melting them together -- you no longer can produce larger red, green and blue M&M candy anymore.
What you need to do instead is to take a red from a Bayer cell and sum it with the red from the adjacent three Bayer cells! This way, after binning, you still end up with pure red, pure green and pure blue superpixels, at half the resolution of the original image. A post processing program can then debayer this lower resolution image to form a color image.
Chen