I read here that Intel introduced SSE 4.2 instructions
for accelerating string processing.
Quote from the article:
The SSE 4.2 instruction set, first implemented in Intel's Core i7, provides string and text processing instructions (STTNI) that utilize SIMD operations for processing character data. Though originally conceived for accelerating string, text, and XML processing, the powerful new capabilities of these instructions are useful outside of these domains, and it is worth revisiting the search and recognition stages of numerous applications to utilize STTNI to improve performance
In regards to software libraries I would look at Agner Fog's asmlib. It has a collection of many routines, including several string manipulation ones which use SSE4.2, optimized in assembly. Some other useful functions it provides which I use return information on the CPU such as the cache size for each level and which extensions (e.g. SSE4.2) are supported.
http://www.agner.org/optimize/asmlib.zip
To enable SSE4.2 in GCC compile with -msse4.2 or if you have a processor with AVX use -mavx