Anyhow, I was pulling down a 14GB MySQL database dump today. Trying to compress it through plain Jane gzip was pretty slow, so I looked for some parallel options. The server I was pulling from has 16 cores, so I figured I could make use of them. Anyhow, here's what I found:
- pbzip2 - Parallel BZIP2: Parallel implementation of BZIP2. BZIP2 is well known for being balls slow, so speed it up using multiple CPUs.
- pigz - Parallel GZIP: Parallel implementation of GZIP written by Mark Adler (guy who co-authored zlib and gzip, so you can be reasonably confident he has his shit together).
- Plain gzip, default compression level: 11 minutes, 58 seconds. Resultant file is 2.3GB.
- pbzip2, default compression level: 8 minutes, 48 seconds. Resultant file is 1.7GB.
- pigz, default compression level: 1 minute, 33 seconds. Resultant file is 2.3GB.
If any readers know of other parallel compression schemes I can try, e-mail me and let me know. I will post stats here.