I've got about 50 folders of data to process, and I've got a ruby script that processes a folder's files (which folder it processes is based on a .yml configuration file). And a computer with four CPUs in it.
I'd like to be able to start up the 50 processes, but have only 4 of them actively running at any time, and have the other 46 paused. Once one of the processes finishes, I'd like one of the paused processes to become un-paused, until all 50 are completed. That way, I can do
./super_script.rb > folder_1_log.txt
*edit config.yml*
./super_script.rb > folder_2_log.txt
*edit config.yml*
...
And focus on something else until the processing's done.
Is it possible to do this? Are there some terms for what I'm wanting that I can google?
(Another alternative would be to make super_script capable of multi-threading - maybe I'm a scaredy-cat for not taking that approach)
(The operating system is Ubuntu Linux, and most of the CPU time isn't taken up by super_script.rb, but by other ruby programs it calls via system())
Answer
Here's a bash script that looks as if it does something close to what you want to do -- it starts up a number of processes in parallel but makes sure that no more than n are ever running at the same time.
On the other hand, if what you're doing is disk-bound, rather than CPU-bound (I'm asking because you say you've got "50 folders of data to process"), then you may actually be better off running all of your processes serially to avoid contention for the disk between the processes.
No comments:
Post a Comment