Part II. Running an application on the cluster
Prev		Next

Part II. Running an application on the cluster

These exercises will introduce you to the example mandelbrot application, and then to the cluster job queueing system. Then you will have to try to make them work together.

The mandelbrot rendering application

The application that we will experiment with renders mandelbrot fractals.

The C code to do this is in mandel10.c in my home directory.

Copy the source code file from Ben's home directory and compile it:

$ cp ~benc/mandel/mandel10.c .
$ gcc -lm -o mandel10 mandel10.c

Now you should have a command called mandel10

You can generate a small mandelbrot image using these parameters:

$ ./mandel10 0 0 1 0.0582 1.99965 2000  1000 1000 32000 > fractal.pgm
$ convert fractal.pgm fractal.png

So now, fractal.png should contain the rendered fractal.

Now you can view the picture by moving or copying it into your webspace and viewing it in your web browser, like in the intro exercise.

$ mv fractal.png public_html/

You should see a fractal in your web browser.

You can time how long the fractal generation took by prefixing the time command when running mandel10.

$ time ./mandel10 0 0 1 0.0582 1.99965 2000  1000 1000 32000 > fractal.pgm

You will see the same output, followed by a time summary:

...
row 997
row 998
row 999

real    0m8.298s
user    0m8.052s
sys     0m0.122s

This says that the wallclock time, how long the job took according to a clock on the wall of the room, was 8.298 seconds, and that 8.052 + 0.122 = 8.174 seconds of that was used by osg-ui.grid.ac.za in computing your job, with the rest being used for other things.

If you are running at the same time as other users, it is likely that the real time will be much longer than the user+sys time, because osg-ui.grid.ac.za needs to be shared between lots of users; but the user+sys time should still add up to about 8 seconds, no matter how long the real time is.

Splitting the fractal into several tiles

The same fractal can be rendered in several smaller components. The first three parameters of mandel10 control this. The first two parameters specify the x and y tile number (starting at 0), and the third parameter specifies how many tiles across and down there are.

Above, we specified 1x1 tiles (the third parameter is 1), and we chose to render tile (0,0) which is the single tile in a 1x1 division of the plane.

We can divide the rendering up into more sections by increasing the third parameter. So for example, we could divide up 2x2, render each of the four tiles separately, and join them together to get the same image.

Here are commands to render four tiles in a 2x2 division. Write down the real times, and user+sys times in your copybooks or on a handy scrap of paper.

$ time ./mandel10 0 0 2 0.0582 1.99965 2000  1000 1000 32000 > tile-0-0.pgm
$ time ./mandel10 0 1 2 0.0582 1.99965 2000  1000 1000 32000 > tile-0-1.pgm
$ time ./mandel10 1 0 2 0.0582 1.99965 2000  1000 1000 32000 > tile-1-0.pgm
$ time ./mandel10 1 1 2 0.0582 1.99965 2000  1000 1000 32000 > tile-1-1.pgm

Now convert each of those tiles into a png and view it in your web browser:

$ convert tile-0-0.pgm tile-0-0.png
$ mv tile-0-0.png public_html/

and so on for the other three tiles. You should be able to see that each of those tiles is a quarter of the first fractal that we generated.

Now combine the tiles together like this:

$ montage -tile 2x2 -geometry +0+0 tile-*-*.pgm fractal-assembled.png
$ mv fractal-assembled.png public_html/

You should be able to see that this image looks like the original fractal.png - we've split this application up into four pieces, and assembled the results and its just like we ran the application all in one piece.

Submitting jobs to the cluster

Now will submit some jobs to the cluster using PBS.

The qsub command submits jobs to PBS:

$ qsub
echo hello world
CTRL-D

51155.gridvm.grid.uj.ac.za

This job will go into the queue and eventually be executed. When the cluster has low load, it will run immediately. If the cluster has a heavier load, then this job will wait in the queue until a CPU is available to run the job. The qstat command gives job status:

$ qstat 51155
qstat: Unknown Job Id 51155.gridvm.grid.uj.ac.za

In the above output, the job had already finished, so qstat cannot find the job in the queue. If the job was wainting in the queue, the output would look like this:

$ qstat 51156
Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
51155.gridvm        STDIN            benc                   0 Q batch

When the job is eventually finished, the output appears in a file STDIN.o51155:

$ cat STDIN.o51155
hello world

So now you should be able to run jobs on the cluster.

Fractals on the cluster

Using information from the above sessions, you should be able to recreate the runs shown in the slides, where 589s of computation is performed in less than 589s.

The run show in the slides used almost the same parameters, except that the number of iterations, the 6th parameter, is set to 200000 (two hundred thousand) instead of 2000. This will cause the renderer to generate a more detailed picture, but take much more CPU time. So try to break this command into four pieces, submit each piece separately to the cluster, so that each one runs simultaneously, and then join them together to generate a single more detailed picture. This is the command for a single run - don't run it directly, as it will be very slow and make osg-ui.grid.uj.ac.za slow for everyone else too:

./mandel10 0 0 1 0.0582 1.99965 2000  1000 1000 32000

Prev		Next
Part I. Getting started in the local environment	Home	Part III. lab 3 - using condor-g and dagman to submit to the grid