Skip to content Skip to sidebar Skip to footer

Amazon Aws - Python For Beginners

I have a computationally intensive program doing calculations that I intend to parallelise. It is written in python and I hope to use the multiprocess module. I would like some hel

Solution 1:

you would want to use the multiprocess module only if you want the processes to share data in memory. That is something I would recommend ONLY if you absolutely have to have shared memory due to performance considerations. python multiprocess applications are non-trivial to write and debug.

If you are doing something like the distributed.net or seti@home projects, where even though the tasks are computationally intenive they are reasonably isolated, you can follow the following process.

  1. Create a master application that would break down the large task into smaller computation chunks (assuming that the task can be broken down and the results then can be combined centrally).
  2. Create python code that would take the task from the server (perhaps as a file or some other one time communication with instructions on what to do) and run multiple copies of these python processes
  3. These python processes will work independently from each other, process data and then return the results to the master process for collation of results.

you could run these processes on AWS single core instances if you wanted, or use your laptop to run as many copies as you have cores to spare.

EDIT: Based on the updated question

So your master process will create files (or some other data structures) that will have the parameter info in them. As many files as you have params to process. This files will be stored in a shared folder called needed-work

Each python worker (on AWS instances) will look at the needed-work shared folder, looking for available files to work on (or wait on a socket for the master process to assign the file to them).

The python process that takes on a file that needs work, will work on it and store the result in a separate shared folder with the the parameter as part of the file structure.

The master process will look at the files in the work-done folder, process these files and generate the combined response

This whole solution could be implemented as sockets as well, where workers will listen to sockets for the master to assign work to them, and master will wait on a socket for the workers so submit response.

The file based approach would require a way for the workers to make sure that the work they pick up is not taken on by another worker. This could be fixed by having separate work folders for each worker and the master process would decided when there needs to be more work for the worker.

Workers could delete files that they pick up from the work folder and master process could keep a watch on when a folder is empty and add more work files to it.

Again more elegant to do this using sockets if you are comfortable with that.

Post a Comment for "Amazon Aws - Python For Beginners"