Distributed GRID-based data storage

Nagy Zsombor <>
NIIF Intézet

This presentation explains the work which was carried through in the last one and a half year as part of the KnowARC project in the topic of GRID-based distributed data storage. I will briefly present the attributes of the chosen framework and the functionality of the system from a user point of view.

The KnowARC project is part of the Sixth Framework Programme of the European Union, its goal is to develop a modern GRID middleware. One of the main parts of this project was to design and implement a distributed data storage system, which task stared in 2007 November and finished in 2009 March.

The main design goals of the storage system were: simplicity, reliability, transparency, user-friendliness. It is important that a user should not care about the internal structure of the system, the user just uploads the files into the system, and can be sure that they will always be accessible, through an interface which is very similar to what we usually use for local file systems. The uploaded files have multiple copies ensuring high-availability.

The system works as the interaction of several web-service-based service. The services' scope are well-defined: communicating with te users; physically storing the files; managing the metadata of the distributed file system; etc.

The HED hosting environment framework was developed in the KnowARC project as well, and we used it to implement our services. The HED framework is written in the C++ language, and it makes it very easy to develop web services. Besides the C++ it is possible to use Java or Python for developing services. The services of the storage system were developed in Python.

The system currently only has a command-line user interface with commands which are very similar to regular commands used in local filesystems: we can create directories (so-called collections), we can upload and download files, query their metadata, change access policies, etc.