Gitter DOI

Squerall

An implementation of the so-called Semantic Data Lake, using Apache Spark and Presto. Semantic Data Lake is a Data Lake accessed using Semantic Web technologies: ontologies and query language (SPARQL).

Currently supported data sources:

To get an understanding of Squerall basics, which also helps understand the installation steps hereafter, please refer to this Wiki page: Squerall Basics.

Setup and Execution

- Prerequisite: You need Maven to build Squerall from the source. Refer to the official documentations for installation instructions: Maven and SBT. Once that is installed, run:

git clone https://github.com/EIS-Bonn/squerall.git
cd squerall
mvn package
cd target

…by default, you find a squerall-0.2.0.jar file.

Squerall (previously Sparkall) uses Spark and Presto as query engine. User specifies which underlying query engine to use. Therefore Spark and/or Presto has to be installed beforehand. Both Spark and Presto are known to among the easiest frameworks to configure and get started with. You can choose to run Spark/Presto and thus Squerall in a single node, or deploy them in a cluster.

Spark

Presto

Squerall-GUI

Squerall has 3 interfaces to (1) provide access configuration to data in the Data Lake, (2) map data to ontology terms and (3) query the mapped data. These interfaces generate the needed input files used for query execution: config, mappings and query, respectively. Refer to Squerall-GUI repository here: Squerall-GUI for more information.

Evaluation

We provide in this repository the code-source, queries and docker image for anyone who wants to try Squerall on their own. Refer to the dedicated page.

Extensibility

Squerall is extensible by design, developers can themselves add support to more data sources, or even add a new query engine alongside Spark and Presto. Refer to the Wiki for the details.

Publications

Contact

For any setup difficulties or other inquiries, please contact me on: mami@cs.uni-bonn.de, or ask directly on Gitter chat.

License

This project is openly shared under the terms of the Apache License v2.0 (read for more).