Customize GeoRDFBench for your Hosts

By Theofilos Ioannidis (tioannid [at] di [dot] uoa [dot] gr), created on , last updated on


Why should you do it?

Short version
Long version

The expected user of GeoRDFBench Framework is either a Semantic Web benchmarking researcher or a reviewer. Users have their own hosts (virtual or physical) where they will install the framework along with necessary software and datasets. Benchmarking researchers are strongly adviced to configure their hosts before starting experiments. The reason is because the resources (IP address, RAM size) and names (host name, filesystem mount points) found in their hosts will most probably not match the hosts that are pre-bundled with GeoRDFBench which represent the GeoRDFBench developer's hosts. This configuration is required once at installation time and every time that you add new hosts or modify your hosts capabilities.

Which are the prebundled hosts?

The prebundled hosts, represent machines used for the development of GeoRDFBench or machines where benchmark experiments were executed with GeoRDFBench. These hosts are:

Where are the host descriptions?

Version 1.0 of GeoRDFBench Framework currently has two different locations where host information is kept. Properties for all hosts have to be included in the repository creation preparation script and in the JSON Specs library. The properties of HOST_X in the preparation script have to match the properties in the corresponding host specification file.

Preparation Script for Repository Creation

This is the script geordfbench/scripts/prepareRunEnvironment.sh that needs to be sourced in order to avoid providing multiple arguments for the systems' repository creation scripts. Its syntax is:

/data/geordfbench/scripts$ source prepareRunEnvironment.sh
SYNTAX1: source prepareRunEnvironment.sh <environment> <activesut> <short description>
SYNTAX2: source prepareRunEnvironment.sh <environment> <activesut>
SYNTAX3: source prepareRunEnvironment.sh <environment>
	<environment>	:	Environment the Geographica will run. One of {VM | PAVILIONDV7 | PYRAVLOS6 | TELEIOS3 | NUC8I7BEH}
	<activesut>	:	Active SUT. One of {RDF4JSUT | GraphDBSUT | StrabonSUT | StardogSUT | VirtuosoSUT | JenaGeoSPARQLSUT}
	<shortdesc>	:	Experiment short description     

This script sets up environment variables for all hosts and all systems. There are common variables and system specific variables. SYNTAX1 allows the user to setup the environment for creating repositories for a specific system on a specific host and provide a description for this environment.

Hosts folder under the JSON Benchmark Specification Library

The second location is in the specification files under the json_defs/hosts folder:

/data/geordfbench tree -L 1 json_defs/hosts/
json_defs/hosts/
├── nuc8i7behHOSToriginal.json
├── teleios3HOSToriginal.json
├── tioa-paviliondv7HOSToriginal.json
├── ubuntu_vma_tioaHOSToriginal_1.json
└── ubuntu_vma_tioaHOSToriginal.json

First, setup your development host

The most important host is the development host where you initially deploy GeoRDFBench. You will modify the description of the VM host so that it matches your development machine (physical or virtual machine). In Figure 1 we can see the script locations (red color) and values (green color) that need to be modified to match the actual development host.

Dataset File Actual Path
Fig.1 - Customize VM Host in Preparation Script

The marked environment variables have the following usage:

When you source this updated preparation script, a number of other calculated environment variables will also be set, which you can test by using the print environment script, e.g.:

/data/geordfbench/scripts$ source prepareRunEnvironment.sh vm RDF4JSUT "CreateScal10KRepoRDF4J"
Running script with syntax: source prepareRunEnvironment.sh VM RDF4JSUT CreateScal10KRepoRDF4J
tioannid@ubuntu-vma-tioa:/media/sf_VM_Shared/PHD/NetBeansProjects/PhD/GeoRDFBench/scripts$ ./printRunEnvironment.sh 
All SUTs
--------
Environment = VM
GeographicaScriptsDir = /media/sf_VM_Shared/PHD/NetBeansProjects/PhD/GeoRDFBench/scripts
DatasetBaseDir = /media/sf_VM_Shared/PHD/Geographica2_Datasets
QuerysetBaseDir = /media/sf_VM_Shared/PHD/Geographica2_Datasets/QuerySets
ResultsBaseDir = /media/sf_VM_Shared/PHD/Results_Store/VM_Results
ResultsDirName = 2#_2023-05-08_RDF4JSUT_CreateScal10KRepoRDF4J
ActiveSUT = RDF4JSUT
ExperimentResultDir = /media/sf_VM_Shared/PHD/Results_Store/VM_Results/RDF4JSUT/2#_2023-05-08_RDF4JSUT_CreateScal10KRepoRDF4J
ExperimentDesc = 2#_2023-05-08_RDF4JSUT_CreateScal10KRepoRDF4J
CompletionReportDaemonIP = 10.0.2.15
CompletionReportDaemonPort = 3333
ScalabilityGenScriptName = /media/sf_VM_Shared/PHD/Geographica2_Datasets/Scalability/scalabilityDSGen.sh
ScalabilityGzipRefDSName = /media/sf_VM_Shared/PHD/Geographica2_Datasets/Scalability/scalability500MRefDS.nt.gz
SystemMemorySizeInGB = 11 GBs
JVM_Xmx = -Xmx8g
...
RDF4J SUT
---------
RDF4JRepoBaseDir = /media/sf_VM_Shared/PHD/RDF4J_3.7.7_Repos/server
EnableLuceneSail = false
RDF4JLuceneReposPrefix = 
Version = 3.7.7
...

We now have to make similar modifications to the VM host specification file under the JSON Library json_defs/hosts/ubuntu_vma_tioaHOSToriginal.json. In Figure 2, we can see the locations and values that can be modified in the JSON host file.

Dataset File Actual Path
Fig.2 - Customize JSON Host Specification file for VM

The listed Json properties have the following usage:

In the host JSON file, the user can also specify the hostname, IP address.