cross-posted from: https://lemmy.run/post/15922
Running Commands in Parallel in Linux
In Linux, you can execute multiple commands simultaneously by running them in parallel. This can help improve the overall execution time and efficiency of your tasks. In this tutorial, we will explore different methods to run commands in parallel in a Linux environment.
Method 1: Using
(ampersand) symbol
The simplest way to run commands in parallel is by appending the
symbol at the end of each command. Here’s how you can do it:
command_1 & command_2 & command_3 &
This syntax allows each command to run in the background, enabling parallel execution. The shell will immediately return the command prompt, and the commands will execute concurrently.
For example, to compress three different files in parallel using the
gzip
command:gzip file1.txt & gzip file2.txt & gzip file3.txt &
Method 2: Using
xargs
with-P
optionThe
xargs
command is useful for building and executing commands from standard input. By utilizing its-P
option, you can specify the maximum number of commands to run in parallel. Here’s an example:
echo -e "command_1\ncommand_2\ncommand_3" | xargs -P 3 -I {} sh -c "{}" &
In this example, we use the
echo
command to generate a list of commands separated by newline characters. This list is then piped (|
) toxargs
, which executes each command in parallel. The-P 3
option indicates that a maximum of three commands should run concurrently. Adjust the number according to your requirements.For instance, to run three different
wget
commands in parallel to download files:
echo -e "wget http://example.com/file1.txt\nwget http://example.com/file2.txt\nwget http://example.com/file3.txt" | xargs -P 3 -I {} sh -c "{}" &
Method 3: Using GNU Parallel
GNU Parallel is a powerful tool specifically designed to run jobs in parallel. It provides extensive features and flexibility. To use GNU Parallel, follow these steps:
Install GNU Parallel if it’s not already installed. You can typically find it in your Linux distribution’s package manager.
Create a file (e.g.,
commands.txt
) and add one command per line:command_1 command_2 command_3
Run the following command to execute the commands in parallel:
parallel -j 3 < commands.txt
The
-j 3
option specifies the maximum number of parallel jobs to run. Adjust it according to your needs.For example, if you have a file called
urls.txt
containing URLs and you want to download them in parallel usingwget
:parallel -j 3 wget {} < urls.txt
GNU Parallel also offers numerous advanced options for complex parallel job management. Refer to its documentation for further information.
Conclusion
Running commands in parallel can significantly speed up your tasks by utilizing the available resources efficiently. In this tutorial, you’ve learned three methods for running commands in parallel in Linux:
- Using the
symbol to run commands in the background.
- Utilizing
xargs
with the-P
option to define the maximum parallelism.- Using GNU Parallel for advanced parallel job management.
Choose the method that best suits your requirements and optimize your workflow by executing commands concurrently.
Thank you for this! Always wanted to try Parallel, but jumping straight into manpage discouraged me. This example makes perfect sense!
Messing about with a file seems a bit superfluous when you could just use a ‘here document’, even straight into the shell:
$ parallel -j 3 << EOF sleep 5 && echo five sleep 3 && echo three sleep 1 && echo one EOF
outputs what you’d expect:
one three five
HEREDOCs have their place, but I always prefer a file if possible, as their formatting is never nice nor consistent. I do appreciate having an option, though!
Love posts like this, because I can plug a tool that I revently found!
Its called ParaFly and i use it a lot on HPCs. Doesn’t really have a multi-node support, but it also offers logging and resuming of jobs.
So your point 3 is essentially this:
ParaFly -c commands.txt -CPU N
where N is the number of jobs you want to run in parallelFor anyone else reading this, please make sure this tool is correct for your HPC.
I would be annoyed at my users if they tried using any of these tools without fully understanding it fully and judging using the scheduler Vs paralellism correctly.
Absolutely! Sometimes its just easier for me to keep jobs in a single list and run them on a big fat node rather than array submit and block half the queue!
Hmm I didn’t know about
ParaFly
, so something I learned today as well 😀 .Or the good old make -j
love parallel !, for example encoding a bunch wavs to opus:
parallel --eta 'opusenc --bitrate 256 {} {.}.opus' ::: *.wav
TIL GNU parallel. Honestly I had never heard it before.
I don’t really have many use cases for things running in parallel (which I cannot achieve with tmux), but it seems a better solution for some testing. Thanks!
GNU Parallel can also run jobs across the network on other machines as well as the local CPUs
I’ve been using
xargs
forever and never noticed the-P
option, thanks!For some reason I always remember
parallel
being difficult to use, but maybe I was always trying to do something difficult like processing different batches of files simultaneously.Don’t forget pipes: |
cmd1 | cmd2 | cmd3
…will run all 3 in parallel: cmd3 can be processing cmd’s output while cmd2 is generating new data, and so on.
How much parallism actually occurs depends on the nature of the processing being done, but it is a powerful technique, which can be combined with the others to great effect.
I don’t think that’s correct. Pipes are synchronous in nature since they take the output of the preceding command and feed (or “pipe”) it into the following command.
Synchronous means each write (flush) to the pipe blocks on a read. Both programs need to be running concurrently.