What is H.264 / H.265?

H.264 and H.265

H.264 (MPEG-4 AVC) and H.265 (MPEG-4 HEVC) are MPEG video standards. H.265 is the newer of these standards.

What is Motion JPEG?

Motion JPEG is a compression format that is older than H.264/H.265 and was created as a version of the still image JPEG format for video use. It is often shortened to just JPEG in video applications.

Why choose H.264/H.265 over Motion JPEG?

Motion JPEG ignores frame-to-frame data redundancy. When using Motion JPEG, each video frame is compressed using JPEG. The Motion JPEG video stream is then presented by displaying each frame in order.

H.264/H.265 improved upon this by using compression algorithms that compare frame-to-frame data to help eliminate the transmission of redundant data.

Both provide video compression far greater than Motion JPEG without sacrificing image quality.

Greater compression means big savings in terms of your network’s bandwidth consumption and the amount of space required to store recorded video on an NVR over Motion JPEG. Decreasing the amount of storage space needed will lead to longer retention times without increasing storage capacity. This aspect is even more important when considering use cases that require high numbers of cameras, high frame rates, and long retention times.

The following graph shows a comparison of three identical model 2MP cameras set to 15 frames per second, running the same firmware version. However, each camera is set to a different recording format. The camera set to Motion JPEG is shown at the top, in red. Below, the H.264 stream is shown in blue, and the H.265 stream just under that in green. The changes in data correspond with a person walking through the scene.

ExacqVision records files in 5 minute increments. Examining the files created from these 2MP cameras recording continuously for 5 minutes and then extrapolating their file sizes over the course of 24 hours, and 1 year, the amount of storage space saved is considerable.

	5 minutes (KB)	24 hours (GB)	1 year (TB)
Motion JPEG	705,453	203.17	74.2
H.264	32,132	9.25	3.4
H.265	21,966	6.3	2.3

How does H.264 work?

As mentioned Motion JPEG compresses each frame individually and all frames are essentially equal. H.264 and provides for multiple types of frames.

I-frames are somewhat similar to a Motion JPEG frame. An I-frame contains all the data needed and can be decoded without reference to any other frames. H.264 always begins with an I-frame, and you’ll learn why in a moment. These I-frames occur at regular intervals in the video. A camera’s GOV, or GOP, rate sets the distance between each I-frame.
P-frames lie between the I-frames. P-frames reference the previous I-frames and are smaller because they only include the regions which have changed. This provides a huge benefit over Motion JPEG because you are not retransmitting data that hasn’t changed.

Imagine a scene in which a building is in the background and a person enters the field of view. The camera is not moving, nor is the building, so the pixels making up the part of the image where the building stands do not need to be sent again. Instead, only those pixels representing the person that entered the scene are sent and these replace the pixels in the image displayed to you.
B-frames are not provided by all applications or video devices. B-frames occur in between I-frames and P-frames, or between multiple P-frames. B-frames are predictive in nature. They refer not only to previous frames but to future frames as well. For this reason implementing B-frames is not always used in live video applications since there is a slight delay introduced due to the need to wait for additional frames to arrive before the B-frame can be created. When only I-frames and P-frames are used, it is referred to as an H.264 baseline stream.

H.264 compression is performed by processing and compressing the frames in regularly sized ‘macroblocks’ of 4 to 16 pixels, which are further broken down into smaller blocks for compression. The main takeaway is that the image frame is divided up in very regularly sizes areas.

How does H.265 work?

H.265 includes the same types of frames as mentioned above, but it improves over H.264 by providing the ability to dynamically size the regions the frame is broken up into. Rather than the macroblocks, transform blocks, and prediction blocks of H.264, these dynamically sized areas are named Coding Tree Units (CTUs). This usually translates into more efficient compression than H.264 because the frame can be compressed more heavily in some areas than others when needed.

Illustrated below, H.264 on the left breaks up the image into equal blocks for compression. Whereas, on the right, H.265 divides the image into dynamically sized regions to better compress those regions based on what is in them.

An Illustration of H.264 (left) vs H.265 (right)