This article is the third in a series on Xgrid, see Part I and Part II. In the present article, we look at a real life example to see how one can use Xgrid to actually get something done.
It has come up several times (on the Web on the various Forums and on the apple mailing list): what is Xgrid good for? Xgrid is good for programs that can be broken up is smaller pieces independent of each other. An example in science are Monte Carlo calculations, where the same (relatively simple) calculation is repeated several million times. Another would be what's called a "parameter study", where the same program is run several times with different parameters. The MandelBrot calculation map provided with Xgrid.app is another good example: the calculation at a given point is completely independent of other calculations for other points (it is a calculation of the "speed" at which the recursive application of a function diverges). An example that is most likely to be interesting to most people is graphic rendering. Each part of an image tends to be independent of other parts. Hence one can break up an image (e.g., or a scene in a movie) in smaller images and render them on several computers. This is exactly what we are going to do here, using a program called Persistence of Vision Raytracer (or POVray for short). I will try to keep the details of povray out of this article whenever possible.
Xgrid and the command line version of Povray. Two simple programs: generate (to generate the .INI files) and combineppm (to stich together the graphics). Links to those files are available below (in context).
We want to render (that is, create) a complex image using POVray. We need to have the command line version of povray installed. This can be done with darwinport with sudo port install povray. The program gets installed in the /opt/local/ tree, which is assumed in this article. POVray comes with a wide selection of scenes, we will render chess2.pov, available in the scenes/advanced/ directory. We will generate a file at a resolution of 1024 x 768. Because the rendering can take a long time (say, hours), it is advantages to split it in several subtasks and have several computers render a small piece of the image. That's how Xgrid can help.
There is no magic with Xgrid: if you want every agent to do a slightly different task than the other agents, then you need to provide a list of slightly different arguments. What they are and how you generate them depends on the problem at hand. For POVray, we can create .INI files that are passed as arguments to the povray executable. The .INI files have everything POVray needs to do its thing. We generate those files such that each node will generate a slightly different slice of the image and save it under a specific name. (I use a trivial Perl script called generate to generate the files). I arbitrarily decided that 4 slices were enough, but typically, you would set up as many task as you have agents, as long as those tasks are not too small (there is a point where splitting is not giving you anything since Xgrid will spend most of its time copying files over the network):
This can be generated from the Perl script "generate".
Hence, at this point, we have 4 files in Povray_args (no other files since each file will get passed as an argument to the agent, you don't want anything else).


We then fill in the form with:

which pretty much says "Run the command /opt/local/bin/povray from the (equivalent of) the working directory /opt/local/share/povray-3.5/ with all the files in ~Desktop/Povray_args/ as arguments and store the result (I'll get back to that) on the desktop. We need to run from /opt/local/share/povray-3.5/ since povray needs access to all sorts of files that are stored in that directory. At this point, if you click Submit Job (and you don't have any mistakes in the argument files), everything will go through and will start processing. If you have more than 4 machines, you might want to split the job in more than 4 slices of images (see the generate script).
A note: I had Job Timeout several times, but I don't think they really were: the Tachyometer was up, and an excursion to the terminal showed that povray was actually running. Since the output files get created at the very end of the job (when all of the tasks are done), you don't see any output until the end. I never had the patience to wait and see if I got the files eventually.

To stich the files together, you need to use a simple program that will take those files and produce one big final file. I found a program called combineppm that does just that (the web page I got it from also discusses POVray on a grid incidentally).
You can then open chess2.ppm in GraphicConverter (Preview.app does not open ppm files). You now have a nicely rendered graphic that looks like this:

You don't need to have the executable nor all of the "working" directories on the agent machines to make Xgrid work: the binary and the working directory get tarred and extracted into /tmp/xgagent.XXXXX and /tmp/xgagent.YYYYY on the agents (the full directory tree get extracted). Hence, when the binary is launched from the working directory, all the files are accessible. Moreover, when done, the working directory (which was copied to the agent) is copied back to your computer (via a tar command again, I assume). Hence, at the end of the job, you have in your destination directory (~Desktop/ in this case) a copy of the working directory in the state it was at the end of the calculation, including any output files. A side effect: you must make sure that each job produces a file with a different name, because if you don't they will get overwritten.
The purpose of BEEP in all this is to provide the underlying protocol between agent, controller and clients. I don't know enough about it to say much, except that it is BEEP that makes it relatively easy to exchange more than just text without having to redefine an entirely new protocol.
Since all XGrid tasks run in user space as nobody (not in the kernel) it is safe. In addition, the communication between agents and controller is well-defined and convenient: agents contact the controller, hence only the controller needs to have its firewall adjusted (open port 4111), not the agents.
Those are exactly the kind of things that are not implemented in a home made solution, and this is why Apple should do them.
Also, a better way to set arguments: you can't provide dependent ranges (like 1-10, 11-20, 21 to 30, etc... It would have been useful in the present case
Lots of things could be done (and it has been discussed on the (archives, archives) Xgrid mailing list). The most important to me is agents for other architectures. An other application, for instance, is to create (I haven't tried it, but looks promising) an applescript that would contact the local machine (via a remote AppleScript call) under a username and password defined for that machine (and that user is logged at the console and the machine accepts remote applescripts) to process something using GUI applications. There certainly isn't anything that would prevent this from happening in the current implementation of Xgrid.
Feel free to contact me for comments and questions at dccote_at_novajo.ca about this article.
Some keywords: example, tutorial, Xgrid, Apple, cluster, parallel processing, rendering, render farm, povray, Mac OS X
Posted by dccote at January 11, 2004 09:04 PM | TrackBackI'm a student employee for my university's IT department. My job right now is to set up a distributed rendering cluster using the spare cycles on a bunch of Mac G5s.
This tutorial is a great resource. I really appreciate you taking the time to put it together. Thanks, it helped me a lot.
Posted by: Aaron Suggs at February 23, 2004 07:17 PMNice article. I used alot of it in understanding Xgrid, and it helped alot creating some of my own documentation:
http://www.macos.utah.edu/xgrid
Posted by: James Reynolds at March 8, 2004 07:38 PM