Title: | Enabling Scalable Data Processing and Management through Standards-based Job Execution and the Global Federated File System |
Author: |
|
Date: | 2016-05-01 |
Language: | English |
Scope: | 115-128 |
University/Institute: | Háskóli Íslands University of Iceland |
School: | Verkfræði- og náttúruvísindasvið (HÍ) School of Engineering and Natural Sciences (UI) |
Department: | Iðnaðarverkfræði-, vélaverkfræði- og tölvunarfræðideild (HÍ) Faculty of Industrial Eng., Mechanical Eng. and Computer Science (UI) |
Series: | Scalable Computing: Practice and Experience;17(2) |
ISSN: | 1895-1767 |
DOI: | 10.12694/scpe.v17i2.1160 |
Subject: | Statistical data mining; Data processing,; Distributed file system; Gagnavinnsla; Gagnanám; Skráning gagna |
URI: | https://hdl.handle.net/20.500.11815/184 |
Citation:Shahbaz Memon, Morris Riedel, Shiraz Memon, Chris Koeritz, Andrew Grimshaw, Helmut Neukirchen. (2016). Enabling Scalable Data Processing and Management through Standards-based Job Execution and the Global Federated File System. Scalable Computing: Practice and Experience, 17(2). 115-128. DOI: http://dx.doi.org/10.1051/kmae/2011046
|
|
Abstract:Emerging challenges for scientific communities are to efficiently process big data obtained by experimentation
and computational simulations. Supercomputing architectures are available to support scalable and high performant processing
environment, but many of the existing algorithm implementations are still unable to cope with its architectural complexity. One
approach is to have innovative technologies that effectively use these resources and also deal with geographically dispersed large
datasets. Those technologies should be accessible in a way that data scientists who are running data intensive computations
do not have to deal with technical intricacies of the underling execution system. Our work primarily focuses on providing data
scientists with transparent access to these resources in order to easily analyze data. Impact of our work is given by describing
how we enabled access to multiple high performance computing resources through an open standards-based middleware that takes
advantage of a unified data management provided by the the Global Federated File System. Our architectural design and its
associated implementation is validated by a usecase that requires massivley parallel DBSCAN outlier detection on a 3D point
clouds dataset.
|
|
Rights:Open Access
|