The Power of the Pipe

It's a Unix thing:

ps aux | grep 'sidekiq\|redis\|puma' | grep -v grep | awk '{ ORS=" "; print $5, $6 ; ORS="\n"; s = ""; for (i = 11; i <= NF; i++) s = s $i " "; print s }'

I needed ongoing memory usage on three processes on a server: you can probably guess from the above that they're sidekiq, redis, and puma. Sidekiq is a significant memory hog, so we're tracking memory usage while we figure out what to do about it.

Let's do a break-down on the above. ps aux produces output in the following form:

# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1  28728  1792 ?        Ss   10:10   0:01 /sbin/init
root         2  0.0  0.0      0     0 ?        S    10:10   0:00 [kthreadd]
...

I'm only interested in VSZ, RSS, and the command line that started the process. So how to get that? Use grep to match the three process names:

# ps aux | grep 'sidekiq\|redis\|puma'
root       596  0.1 16.1 746100 165320 ?       Ssl  10:10   0:06 puma 3.4.0 (unix:///var/run/grand.sock) [project]
redis      599  0.2 10.6 251528 109368 ?       Ssl  10:10   0:09 /usr/bin/redis-server 127.0.0.1:0
root       673  2.4 60.4 1093492 619176 ?      Ssl  10:10   1:24 sidekiq 4.1.3 project [0 of 5 busy]
root     24256  0.0  0.0  12700   136 pts/0    S+   11:09   0:00 grep sidekiq\|redis\|puma

Now you see why we have the weird clause | grep -v grep ... - we're getting rid of grep finding itself running in the results. We want to cut this down further, only printing fields 5, 6, and 11 and up. So we pipe the output through awk: there are other choices like cut or sed or even perl, but I don't know Perl well and cut has a nasty tendency of always matching on a single delimiter (ie. if you have "a---b" and the dash ("-") is your delimiter, cut sees this as four fields).

awk requires some weird manoeuvring to make this work ... we print fields 5 and 6, but use a space as an Output Record Separator so we don't get a newline between field 6 and field 11. Then we have to build a list of fields to print, anything including or after 11 (honestly, I don't fully understand the last little bit, I cribbed it from Stackoverflow ... see link below). Our final output looks like this:

746100 165320 puma 3.4.0 (unix:///var/run/grand.sock) [project]
251528 111064 /usr/bin/redis-server 127.0.0.1:0
1088496 597852 sidekiq 4.1.3 project [0 of 5 busy]

Comments on Stackoverflow led me to think about the alternatives, and ... I'd like to say I simplified it. It's somewhat more readable, but I've hauled in another command line utility (awk is replaced by sed and cut). Still, I think it's a bit better.

ps aux | grep 'sidekiq\|redis\|puma' | grep -v grep | sed -e 's/\s\s*/ /g' | cut -d' ' -f5,6,11-

sed searches for all whitespace (\s\s* ... \s+ doesn't seem to be supported in the version I'm using?), and replaces it with a single space. This means cut, despite its nasty single delimiter limitation, can now do its work. We have to tell it to use a space as a delimiter (-d' ') because its default delimiter is the tab, and then tell it to print fields 5, 6, and 11 and up. I tried telling sed to replace whitespace with a tab (\t) to simplify the cut command slightly, but then fields 11+ are also spaced out by tabs and the output looks worse.

Now to put this on a crontab and output to a file to watch the damage in action.

Bibliography

http://stackoverflow.com/questions/5081916/how-to-print-all-the-columns-after-a-particular-number-using-awk
http://stackoverflow.com/questions/7880784/what-is-rss-and-vsz-in-linux-memory-management - what are VSZ and RSS?
Linux Process Memory Usage - my blog entry about memory usage and tracking dirty memory