Integrating Sun Grid Engine and Globus systems

Stefán Péter <stefan@iif.hu>

NIIFI


The goal of this presentation is twofold: first the necessity of using a job scheduler such as Sun Grid Engine (SGE) will be outlined, and second some integration issues with a higher level job management tool set, such as Globus Tools, will be pointed out.

SGE is the successor of the former CODINE batch scheduling system, made open-source by SUN in July, 2001. This software aims to organize high-performance computing (HPC) jobs into queues and schedule their execution on high-end computing servers and/or clusters of workstations. There are numerous benefits of using SGE, such as load balancing between different execution hosts, enabling loose/tight integration between other parallel tools such as Parallel Virtual Machine (PVM) or SUN HPC Cluster Tools. Furthermore it has been shown that if a HPC server is more than 20 percent overloaded, parallel barrier operations take drastically increasing amount of time as the number of parallel ranks increases. SGE successfully takes over burden from the operating systems’ scheduler, and, therefore, preventing the Solaris scheduler from overloading.

SGE is said to be a local scheduler, responsible for scheduling jobs in a single cluster. Individual clusters can be connected via Globus toolkits. Basically two operations are considered: passing jobs from Globus to SGE, and passing jobs from SGE to Globus. Both require some particular configuration setup and shell script programming effort on both sides.