The AliEn System for DemoGrid

Judit Novák <n_jude@elte.hu>

György Vesztergombi <veszter@rmki.kfki.hu>

Predrag Buncic <Predrag.Buncic@cern.ch>

Jan-Erik Revsbech <revsbech@sys.ku.dk>

Pablo Saiz <pablo.saiz@cern.ch>

http://alien.cern.ch

Abstract

The aim of the AliEn framework is to implement a subset of Grid features on distributed computing resources. Users of AliEn can use this collection of loosely connected computers as a single computing resource capable of running batch jobs that need huge amounts of computing capacity.

The jobs are scheduled for execution on local batch system of the participating servers. CERN ALICE experiment is the center of the development of AliEn, but about two-dozen research institutes from all around the world have already added processors to the system while participating in detector simulation activity.

The participation of the DemoGRID collaboration in the AliEn framework has two basic aims: (a) it should provide a working system for large volume physics data analysis in Budapest (b) it should demonstrate the ability of Hungarian institutes of joining international grid-like structures.

Introduction

The DemoGRID project, intends to strengthen Grid development and usage in Hungary. It will collaborate closely with other projects involved in advanced grid technology and scalable storage solutions: DataGrid, LHC Computing Grid and Sloan Digital Sky Survey.

The testbed of this project will connect eight universities' and research labs' heterogeneous computing resources into a meta-computer or virtual supercomputer, which will have 300 hosts and a 5 Terabyte storage connected by the National Information Infrastructure Development Program network. To demonstrate the applicability of the grid middleware applications will be developed to several research domains: physics, neural biology and cosmology.

In this article we want to concentrate on a physics application in the AliEn framework.

Goals of the DemoGrid Project

Recently it became a word wide trend among the scientific and research institutes to connect their clusters and supercomputers using high-speed network connections: unifying their resources in the GRID. This project intends to strengthen this progress in Hungary as well by satisfying the following goals:

Connecting eight university and research lab's heterogeneous computing resources into a meta-computer or virtual supercomputer.
Proof of its functionality using pilot projects and applications solving real scientific problems.
Evaluation of a general computing service for resource intensive applications based on the GRID infrastructure.

General GRID architecture

There are mature tools that successfully address the communication and data storage problems of supercomputers and clusters on local or site-wide networks, but extending these solutions to connect geographically widespread, heterogeneous networks requires different approaches in some aspects. A meta-computing environment is to be built on the top of custom computing environments. The Globus Toolkit or a similar system will be the basis of this environment.

GRID subsystems

To study GRID subsystems and enhance them to meet the requirements of pilot applications.

Storage subsystem

In a couple of years, applications will require storage capacity of Petabyte order. Storage solutions must be scalable to this order without change in the basic technology. The following forms of storage will be evaluated:

Relational database
Object-oriented database
Geometric database
Distributed file system

Monitoring subsystem

Monitoring is essential for developing effective applications, debugging and the effective use of the GRID resources.

Security subsystem

Existing security solutions are not applicable in widespread, heterogeneous systems, either because of their technological limitations or because of high costs. We believe that an adequate infrastructure based on open source software can be implemented.

Applications

It is an important goal of the DemoGRID project to develop and run pilot applications in real environments and aid the early adaptation of this technology. The following types of applications will be considered, to ensure the system's applicability on large variety of problems.

Data intensive
Tightly coupled
Loosely coupled
Domain decomposition

Hardware

Storage: Expand storage subsystem to 5 terabyte.
CPU: Expand CPU farms to 300 processors.
Network: Improve local networks and use NIIF (National Information Infrastructure Development Program) network for inter-site communication.

Participants

(Numbers on the figure above indicate which institutes participate in the given sub-project.)

International Relations

CERN LHC Computing

We join the LHC Computing GRID project at CERN, as the two projects have many goals in common.

DataGRID

We join the European DataGrid project through several sub-projects.

Sloan Digital Sky Survey

We plan to partially mirror the American database of SDSS. In long term we plan full mirroring and being their European Tier 0 partner.

AliEn

ALIce production Environment for ALICE heavy ion experiment at CERN LHC.

CERN

CERN, the European Organization for Nuclear Research, funded by 20 European nations, is constructing a new particle accelerator on the Swiss-French border on the outskirts of Geneva. When it begins operation in 2006, this machine, the Large Hadron Collider (LHC), will be the most powerful machine of its type in the world, providing research facilities for several thousand High Energy Physics (HEP) researchers from all over the world.

The computational requirements of the experiments that will use the LHC are enormous: 5-8 Petabytes of data will be generated each year, the analysis of which will require some 10 Petabytes of disk storage and the equivalent of 200,000 of today's fastest PC processors. Even allowing for the continuing increase in storage densities and processor performance this will be a very large and complex computing system, and about two thirds of the computing capacity will be installed in "regional computing centers" spread across Europe, America and Asia.

The computing facility for LHC will thus be implemented as a global computational grid, with the goal of integrating large geographically distributed computing fabrics into a virtual computing environment. There are challenging problems to be tackled in many areas, including: distributed scientific applications; computational grid middleware, automated computer system management; high performance networking; object database management; security; global grid operations.

LHC poses in security exceptional challenges given the contrary needs of providing data and computational resources worldwide to a large and open scientific community, while assuring the confidentiality of the data and operating its computing facilities in a secure way.

The development and prototyping work is being organized as a project that includes many scientific institutes and industrial partners, coordinated by CERN. The project will be integrated with several European national computational grid activities (such as GridPP in the United Kingdom and the INFN Grid in Italy), and it will collaborate closely with other projects involved in advanced grid technology and high performance wide area networking, such as: GEANT, DataGrid and DataTAG (partially funded by the European Union), GriPhyN, Globus, iVDGL and PPDG (funded in the US by the National Science Foundation and Department of Energy).

CERN is involved in the EU funded DataGrid project, which aims to develop technologies for data-intensive applications, like the ones in the LHC Computing Grid. There are also several internal projects - like AliEn -, which aim to bring similar solutions, prototypes to help the better understanding of these technologies and aid the development of grid-aware high energy physics applications today.

The DemoGrid project participates in both kinds of activities, because we also want to make improvements on the middleware and help the development of grid-aware applications in our country.

This article discusses a project of the second category, which brings an early, but already working prototype of this grid technology to the developers of the Alice and heavy ion experiments.

Glossary

ACL	Access Control List; used to enable/disable kinds of access to resources.
CE	Computing Element; a Grid-enabled computing resource.
EDG	European Data Grid; (project name is often written "DataGrid").
FTD	File Transfer Daemon; performs file transfers between remote computers.
GSI	Grid Security Infrastructure (Globus Security mechanism).
HEP	High Energy Physics.
IS	Information Service
LCM	Local Cache Manager; deals with locally cached files.
LFN	Logical File Name; A globally unique name to identify a specific file which is mapped by the RC onto one or more PFNs.
LRMS	Local Resource Management System; controls resources within a CE e.g. PBS or LSF.
LQ	Local queue; order of jobs being executed in a cluster. Managed by a LRMS.
MSS	Mass Storage System.
PFN	Physical File Name; URL of actual physical instance of an LFN.
RC	Replica Catalog; associates an LFN to one or more PFNs.
Replica	A copy of a file that is managed by the Grid middleware.
SE	Storage Element; a Grid-enabled storage system.
VO	Virtual Organization; A set of individuals defined by certain sharing rules - e.g. members of collaborations.

AliEn at Grid

The AliEn framework is an implementation of Grid which consists of clusters, labs with distributed computing resources all around the world, where jobs can be submitted for scheduled execution.

Introduction: Alice Environment

As the name shows, the AliEn (Alice Environment) is developed in CERN for Alice particle physics experiment and at present is used to carry out CPU intensive simulation, which produces huge amounts of data. Since resources for such an enormous task are not available at one site, it is obviously needed to use all available CPU power distributed around the world. The cost on networking is almost negligible in the case of long running jobs mainly working on calculating with mathematical algorithms for physics application. As a solution -according to Grid's philosophy- AliEn connect remote clusters as a virtual unity can be used just as one computer.

Interfaces

Two interfaces are provided:

a web portal
a UNIX-like command-line interface

The first one can be found at http://alien.cern.ch . Here not only information about the framework can be found, but all running and finished jobs, registered sites are monitored, too. There's a list of Active Clusters, where we can get information about all events that have happened since AliEn is installed on that Site. We can also send messages to the clusters to initiate procedures like

Start/Stop picking up jobs
Kill a job

File transfers can be started using the web, and jobs can be submitted, too.

Entering the (already properly installed) system means just writing the command alien at the UNIX prompt. A connection is opened to CERN, and we find ourselves in a directory structure, where every user has home directory, but can create any others, can access files by the basic UNIX commands (cd, ls, etc., even tab completion works). help gives a list of commands that can be used. Permissions on files can be manipulated, too (chmod).

This gives another possibility for submitting jobs, getting or storing files on Remote Sites.

Implementation

Though it gives feeling of being a user in a UNIX computer, this is only a virtual structure. All the information, which is needed to build it up, is stored in relational database (e.g. MySQL). So when a user accesses a file, the path (s)he gives is just a logical filename (Lfn), and does not need to know, where the file is physically stored (Pfn). One Lfn can refer to more than one physical files on different hosts. When an Lfn is requested, the nearest copy will be delivered.

Lfns are in global namespaces. At the moment there's one for Alice (Lfn://alice) and one for na49 (Lfn://na49). The jobs to be submitted must be in Lfn://bin or in the user's $HOME/bin, which is in \alien\system table, which stores the virtual /proc (Lfn://proc) directory too. Users can find their jobs' standard output and standard error files here, under the subdirectory named by their process ID-s.

Compatibility

Aim to keep it runable on the most possible platforms, the code is written in Perl5. The communication between different components is implemented using SOAP protocol. Globus/X509, SSH and token authentication methods are supported using SASL protocol, and different types of file transfer methods (like FTP, BBFTP, GSIFTP) can be used.

On the local sites the already supported batch queue systems are LSF, PBS, CONDOR, DQS, BQS, Globus, but since the code is freely distributable, it's not too difficult to write a module to communicate with any other batch system.

Main components

The central engine

The heart of AliEn is located in CERN, no other site has the information that can be found here.

File Catalogue

Databases are storing all information about different users, remote sites, and the files registered. They can't be directly accessed, but hidden behind the interface of a file structure.

Each node on the branches of the directory tree can reside on a different DB.

CPUServer

MetaScheduler, managing the TaskQueue, which contains all submitted, but not finished jobs.

The Information Service is to store the locations of all FTDs. This is necessary to be able to open any kind of connection to a remote site.

What all clusters have

FTD

File Transport Daemons contact each other in every case when file transfer happens. They can access Storage Elements which have the registered files, but file transport requests are made trough them, too.

LCM

Local Cache Manager is to access directly the CATALOGUE at CERN. This object makes parsing Pfns. It's needed when the Local System has to decide if the file must be transferred via network, or it's stored just in that system's SE.

Storage elements can be disk serves, tapes, etc. When a file is registered, it must exist in at least one SE.

MSS

Mass Storage System, disks, tapes, etc. where registered files can be found physically.

Cluster, which run jobs. Most important tasks: monitoring, keeping connection with the Local Queue Manager.

This object is a realization of the communication interface with the Local Queue Managers on each Site.

ProcessMonitor

All jobs being executed have an own one. Since clusters are usually behind a firewall, this is the object of the Monitoring System, which can directly listen to their execution.

ClusterMonitor

It is part of the Monitoring System, which is "on" the firewall, making connection between outside world and the Cluster.

AliEn Use Cases

For illustration how the framework is realized some use cases are presented here. They are typical for event processing jobs running in this environment.

Registering a file

All files used by AliEn must be registered in the File Catalogue. This is the way, how a new file's attributes get into the registry.

There are two cases. First let's see what happens, if the actual location of the file is the place where we would like to store it in the future.

The LFN makes a request to the central CATALOGUE sending the Lfn and the current location as Pfn.
The new entry is inserted in the CATALOGUE.

The case, when the file has to be delivered to a remote SE is more complicated.

LCM sends both the Lfn-Pfn pair, and the name of the SE to the Information Service in CERN.

IS sends the hostname where the FTD of the Storage Element is located.
LCM contacts the File Transport Daemon of the Local Site.

3. Local FTD builds the connection to the Remote Site contacting FTD there. If the Mass Storage System is behind a firewall, FTD is the object "on" the firewall. It can communicate both with that are outside, both with the system behind it. So the copy of the file is first transferred and cached in the Remote System's FTD.

Remote FTD can send the file to the Storage Element.
Before file copy, Mass Storage System has to give a new name to it. This is necessary to make filenames distinct.
Having the new filename MSS is ready to store the copy.
When the transfer is completed, the return value of the invocated function tells if the operation was succesful.
FTD now notices the Local Site to tell about the result of the transfer.

a.) If there was no problem, then the new Pfn is returned, too.

b.) If some error occurred, then the FTD on the Local Site sends request again to the Remote one to get the file and put it to the MSS.

Local FTD now sends LCM the new Phisical File name.
LCM is the one that can access the database in CERN. So it inserts the attributes of
the new file, what means it will be registered in AliEn.

Request to a file

When an user refers to a file, just uses Lfn in the virtual filesystem, but this means that the system has to access the file on a remote SE.

As a first step the Lfn is sent to CERN to be translated to a Pfn. If the file is stored in more than one sites, then the one will be transferred, which is the nearest to Local Site.
LCM receives the Pfn.
Parsing Pfn. This way LCM can decide if the file is stored just in the Local site.

a.) in this case it can be directly accessed

b.) otherwise: Step 4

File is on a remote SE. FTD has to contatct IS in CERN to get the location of the host which is running that Storage Element's FTD.
Getting the hostname to contact.
Making a request to Remote Site's FTD.
Accessing SE's Mass Storage System via Remote Site's FTD. This operation can't be done directly because MSS is possibly behind a firewall.
Copying the file from the MSS to FTD, to pass it through the firewall.
Now Local FTD can transfer the file.
Local Site sends acknowledgement, if the transfer was succesful.
Cached copy can be removed from the Remote Site's FTD.

Submitting a job

After logging in AliEn user on the Local Site executes /bin/somecommand. This, in reality is Lfn://bin/somecommand (so somecommand is an already registered executable file), in other words just a request for a file. First we ha ve to find the file in the SE where it's strored, then deliver to the CE where it'll be executed. Since file transfer was already discussed, here only the steps of scheduling are mentioned.

Invocation of /bin/somecommand.
Submitted job is inserted to the global Task Queue with Job Status SUBMITTED.

The Broker in CPUServer always checking for SUBMITTED jobs finds the new one.
Makes a request to IS for a Site where it should be executed.
Getting CE from IS.
Now the Job Status can be changed to WAITING.
Every Site is periodically checking for jobs submitted to be executed on their Cluster. When they find one for them, they just pick it up, and get the corresponding files.
The Remote Computing Element, which can contatct its Local Queue submbits the job. (So it's running with the username and rights of the AliEn production user, who installed the system there).

Summary
1. The present system

The present AliEn framework consists 13 clusters in 9 sites distributed world-wide, including sites from CERN (Switzerland), CCIN2P3 (France), LBL (USA), Bari, Catania, Padova, Torino (Italy), OSC, NIKHEF (The Netherlands).

In 2001 this collaboration ran a data-challenge corresponding to 10⁵ CPU hours and up to 300 concurrently running jobs. During this five weeks test period 5682 events were validated of which 118 has failed (2%). These jobs have generated 5TB data at the sites with mass storage capability (CERN 73%, CCIN2P3 14%, LBL 14%, OSC 1%).

Since this data-challenge the new collaborators started the installation in GSI-Darmstadt, Karlsruhe (Germany), Dubna (Russia), Nantes (France), Zagreb (Croatia), Birmingham (UK), Calcutta (India) and Budapest (Hungary).

Hungarian participation

The Hungarian participation is based on the DemoGRID project. The basic aim of the DemoGRID project is to investigate the existing and planned realizations of the grid-like environments. This is a rapidly evolving field. At the moment one doesn't know whether exists an all-embracing universal grid framework, which is optimal for each type of applications. Some participants of the DemoGRID project are basic research oriented and it is not guaranteed that the commercial systems will satisfy their needs. The AliEn framework provides an ideal testing ground for them, because it is tailored to this specific application domain.

The Hungarian participants want to provide similar computing resources to the other international participants. KFKI/RMKI will install a 1 TB storage system and ELTE makes available its computer cluster consisting of 25 PCs.

The Hungarian participants will also investigate the installation and administration processes of this previously RedHat Linux distribution oriented framework in a Debian environment.

Future plans

The next step is to repeat the previous data-challenge in a five times larger scale:

Up to 1000 concurrently running jobs.
One month of sustained production.
25 TB storage world-wide.

The essence of this challenge is to produce data for real physics applications.