Beyer and Laney1 coined the definition of Big Data as High Volume, High Velocity, and High Variety. Volume means large amounts of data; velocity addresses how much information is handled in real time; variety addresses data diversity. The implemented Luzzu framework currently scales for both Volume and Variety. With regard to Volume, the processor runtime grows linearly with the amount of triples. We also cater for Variety since in Luzzu the results are not affected by data diversity. In particular, since we support the analysis of all kinds of data being represented as RDF any data schema and even various data models are supported as long as they can be mapped or encoded in RDF (e.g. relational data with R2RML mappings). Velocity completes the Big Data definition. Currently we employed Luzzu for quality assessment at well-defined checkpoints rather than in real time. However, due to its streaming nature, Luzzu can easily assess the performance of data streams as well thus catering for velocity.
We regularly evaluate our stream processors (Jena Stream Processor and Spark Stream Processor) and framework with regard to the scalability and big data. The following test parameters were used: