Running Commands in Parallel in Linux

root · 2 years ago

Running Commands in Parallel in Linux

Illecors · 2 years ago

Thank you for this! Always wanted to try Parallel, but jumping straight into manpage discouraged me. This example makes perfect sense!

@[email protected] · 2 years ago

Messing about with a file seems a bit superfluous when you could just use a ‘here document’, even straight into the shell:

$ parallel -j 3 << EOF
  sleep 5 && echo five
  sleep 3 && echo three
  sleep 1 && echo one
EOF

outputs what you’d expect:

one
three
five

Illecors · 2 years ago

HEREDOCs have their place, but I always prefer a file if possible, as their formatting is never nice nor consistent. I do appreciate having an option, though!

fatboy93 · 2 years ago

Love posts like this, because I can plug a tool that I revently found!

Its called ParaFly and i use it a lot on HPCs. Doesn’t really have a multi-node support, but it also offers logging and resuming of jobs.

So your point 3 is essentially this: ParaFly -c commands.txt -CPU N where N is the number of jobs you want to run in parallel

30021190 · 2 years ago

For anyone else reading this, please make sure this tool is correct for your HPC.

I would be annoyed at my users if they tried using any of these tools without fully understanding it fully and judging using the scheduler Vs paralellism correctly.

fatboy93 · 2 years ago

Absolutely! Sometimes its just easier for me to keep jobs in a single list and run them on a big fat node rather than array submit and block half the queue!

root · 2 years ago

Hmm I didn’t know about ParaFly, so something I learned today as well 😀 .

@[email protected] · 2 years ago

Or the good old make -j

@[email protected] · 2 years ago

love parallel !, for example encoding a bunch wavs to opus:

parallel --eta 'opusenc --bitrate 256 {} {.}.opus' ::: *.wav

@sudneo · 2 years ago

TIL GNU parallel. Honestly I had never heard it before.

I don’t really have many use cases for things running in parallel (which I cannot achieve with tmux), but it seems a better solution for some testing. Thanks!

@[email protected] · 2 years ago

GNU Parallel can also run jobs across the network on other machines as well as the local CPUs

@andybug · 2 years ago

I’ve been using xargs forever and never noticed the -P option, thanks!

For some reason I always remember parallel being difficult to use, but maybe I was always trying to do something difficult like processing different batches of files simultaneously.

zero_iq · 2 years ago

Don’t forget pipes: |

cmd1 | cmd2 | cmd3

…will run all 3 in parallel: cmd3 can be processing cmd’s output while cmd2 is generating new data, and so on.

How much parallism actually occurs depends on the nature of the processing being done, but it is a powerful technique, which can be combined with the others to great effect.

PHLAK · edit-2 2 years ago

I don’t think that’s correct. Pipes are synchronous in nature since they take the output of the preceding command and feed (or “pipe”) it into the following command.

@anupcshan · 2 years ago

Synchronous means each write (flush) to the pipe blocks on a read. Both programs need to be running concurrently.

Running Commands in Parallel in Linux

Running Commands in Parallel in Linux

Running Commands in Parallel in Linux

Method 1: Using `&` (ampersand) symbol

Method 2: Using `xargs` with `-P` option

Method 3: Using GNU Parallel

Conclusion

Running Commands in Parallel in Linux

Running Commands in Parallel in Linux

Running Commands in Parallel in Linux

Method 1: Using & (ampersand) symbol

Method 2: Using xargs with -P option

Method 3: Using GNU Parallel

Conclusion

Method 1: Using `&` (ampersand) symbol

Method 2: Using `xargs` with `-P` option