H.264 is an emerging video coding standard, which aims at compressing high-quality video contents at low-bit rates. While the new encoding and decoding processes are similar to many previous standards, the new standard includes a number of new features and thus requires much more computation than most existing standards do. The complexity of H.264 standard poses a large amount of challenges to implementing the encoder/decoder in real-time Boutique Belstaff Belgique via software on personal computers. This work analyzes software implementation of H.264 encoder and decoder on general-purpose processors with media instructions and multi-threading capabilities. Specifically, we discuss how to optimize the algorithms of H.264 encoders and decoders on Intel Pentium 4 processors. We first analyze the reference implementation to identify the time-consuming modules, and present optimization methods using media instructions to improve the speed of these modules. Belstaff 554 Blouson Belstaff Scooter After appropriate optimizations, the speed of the codec improves by more than 3×. Nonetheless, the H.264 encoder is still too complicated to be implemented in real-time on a single processor. Thus, we also study how to partition the H.264 encoder into multiple threads, which then can be run on systems with multiple processors or multi-threading capabilities. We analyze different multi-threading schemes that have different quality/performance, and propose a scheme with good scalability (i.e., speed) and good quality. Our encoder can obtain another 3.8× speedup on a four-processor system or 4.6× speedup on a four-processor system with Hyper-Threading Technology. This work demonstrates that hardware-specific algorithm modifications can speed up the H.264 decoder and encoder substantially. The performance improvement techniques on modern microprocessors demonstrated in this work can be applied not only to H.264, but also to other video or multimedia processing applications.