Docker Container Example : RDF4J, GraphDB and JenaGeoSPARQL
against Scalability-{10K, 100K, 1M} Workloads

By Theofilos Ioannidis (tioannid [at] di [dot] uoa [dot] gr), created on , last updated on


Caution

Containers are not ideal for benchmarking purposes similar to the one GeoRDFBench Framework performs, because they do not allow clearing system caches. The reason for this is that:

Therefore, in the following example, although the user can verify that the experiments run properly and results are correctly calculated and reported, the COLD cache response times will not be accurate. However, for experiments that do not require COLD cache response time measurements, e.g., macro benchmark scenarios, response times should be accurate enough for drawing basic conclusions.

Key Features

This example, features:

Docker Image

For this example, we will use a Windows 10 host machine, with 16 GB of memory and Intel i7-9700 CPU and a 1TB HDD 2.5" SATA III 7200rpm for data disk.

Pull the image

The docker image is stored in the Github registry under tioannid/geordfbench/multistore/scal10k_1m. We assume that the current directory is D:\TEMP. Then we issue the following commands, which pull the image from the registry, and verify its presence in the docker image list:

D:\TEMP>docker pull ghcr.io/tioannid/geordfbench/multistore/scal10k_1m:latest
latest: Pulling from tioannid/geordfbench/multistore/scal10k_1m
43f89b94cd7d: Already exists
a210b5448f2b: Pull complete
ccb39becbf1d: Pull complete
6b9dc856ed68: Pull complete
c055aac453ce: Pull complete
40f6fe015a03: Pull complete
d8e9218a888a: Pull complete
bf45bd810d56: Pull complete
4a990b4589b9: Pull complete
Digest: sha256:a0e137b93d82ab3dabd0b5b8e6a66123b6894e541b07649d72474ab8356200c5
Status: Downloaded newer image for ghcr.io/tioannid/geordfbench/multistore/scal10k_1m:latest
ghcr.io/tioannid/geordfbench/multistore/scal10k_1m:latest

D:\TEMP>docker images
REPOSITORY                                           TAG       IMAGE ID       CREATED       SIZE
ghcr.io/tioannid/geordfbench/multistore/scal10k_1m   latest    751b6aa47c5f   4 hours ago   3.14GB
Start a container

We start a container named mybench from this image with the following command:

D:\TEMP>docker run -e POSTGRES_PASSWORD=postgres -p 5432:5432/tcp --hostname NUC8i7BEH --cpus="4" --memory="11g" --memory-swap="11g" --mount "type=bind,src=%cd%,target=/src" \
--name mybench -d ghcr.io/tioannid/geordfbench/multistore/scal10k_1m
8ef7c438dcdbe2f2a79138995f8c2938f5e73edad10944f6237f3a9635939a96

This command, launches the mybench container, while defining:

Launch the experiments through a terminal

We start the experiments by connecting to the mybench running container with a terminal and issue the command:

D:\TEMP>docker exec -it mybench /bin/bash
root@NUC8i7BEH:/data# ./startUpScript.sh 
 * Starting PostgreSQL 14 database server                                                                                                                                                                                                                                [ OK ] 
...
CREATE DATABASE
ALTER DATABASE
You are now connected to database "geographica3" as user "postgres".
...
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
...
GRANT
GRANT
...

The default terminal will act as a log window and after some time all experiments (3 stores * 3 workloads = 9) will end with:

...
207994 [main] INFO  JDBCRepSrc  - Deferred mode for JDBCRepSrc was enabled. 18 records were flushed
207994 [main] INFO  GenericExprerimentResultsCollector  - Export statistics in "/data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload"
207995 [main] INFO  GenericExprerimentResultsCollector  - Created non existing directory
207996 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold
207996 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold-long
207996 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/01-SC2_Intensive_Geometries_Intersect_Geometries-cold
207996 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/01-SC2_Intensive_Geometries_Intersect_Geometries-cold-long
207997 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/02-SC3_Relaxed_Geometries_Intersect_Geometries-cold
207997 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/02-SC3_Relaxed_Geometries_Intersect_Geometries-cold-long
207997 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-warm
207997 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-warm-long
207998 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/01-SC2_Intensive_Geometries_Intersect_Geometries-warm
207998 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/01-SC2_Intensive_Geometries_Intersect_Geometries-warm-long
207998 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/02-SC3_Relaxed_Geometries_Intersect_Geometries-warm
207998 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2023-11-24_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/02-SC3_Relaxed_Geometries_Intersect_Geometries-warm-long
207998 [main] INFO  GenericExprerimentResultsCollector  - Cache COLD
207998 [main] INFO  GenericExprerimentResultsCollector  - 	Query 0
207998 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 13393855 + 5298249279 = 5311643134 nsecs, 80500 results, 0 scan errors
207998 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 522743 + 2507703570 = 2508226313 nsecs, 80500 results, 0 scan errors
207998 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 560117 + 2723106278 = 2723666395 nsecs, 80500 results, 0 scan errors
207998 [main] INFO  GenericExprerimentResultsCollector  - 	Query 1
207998 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 564997 + 5743910675 = 5744475672 nsecs, 813 results, 0 scan errors
207998 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 561855 + 5245618640 = 5246180495 nsecs, 813 results, 0 scan errors
207998 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 477203 + 4990848640 = 4991325843 nsecs, 813 results, 0 scan errors
207998 [main] INFO  GenericExprerimentResultsCollector  - 	Query 2
207998 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 387923 + 4445226344 = 4445614267 nsecs, 239 results, 0 scan errors
207998 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 459486 + 4666550119 = 4667009605 nsecs, 239 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 420259 + 4367426647 = 4367846906 nsecs, 239 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - Cache WARM
207999 [main] INFO  GenericExprerimentResultsCollector  - 	Query 0
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 283741 + 564062310 = 564346051 nsecs, 80500 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 365657 + 543304810 = 543670467 nsecs, 80500 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 333364 + 462587392 = 462920756 nsecs, 80500 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - 	Query 1
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 276068 + 3535236646 = 3535512714 nsecs, 813 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 264122 + 3482117315 = 3482381437 nsecs, 813 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 334113 + 3207956521 = 3208290634 nsecs, 813 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - 	Query 2
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 285634 + 3835300193 = 3835585827 nsecs, 239 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 377806 + 4705149249 = 4705527055 nsecs, 239 results, 0 scan errors
207999 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 223249 + 3930920794 = 3931144043 nsecs, 239 results, 0 scan errors
207999 [main] INFO  RunJenaGeoSPARQLExperimentWorkload  - End ScalabilityFunc
Start time = Wed Nov 29 21:57:52 UTC 2023
End time = We1d Nov 29 22:01:20 UTC 2023
Verify that the repositories have been created
root@NUC8i7BEH:/data# tree -L 1 RDF4J_3.7.7_Repos/server/repositories/
RDF4J_3.7.7_Repos/server/repositories/
|-- scalability_100K
|-- scalability_10K
`-- scalability_1M

root@NUC8i7BEH:/data# tree -L 1 graphdb-free-9.11.1/data/repositories/
graphdb-free-9.11.1/data/repositories/
|-- scalability_100K
|-- scalability_10K
`-- scalability_1M

root@NUC8i7BEH:/data# tree -L 1 JenaGeoSPARQL_3.17.0_Repos/           
JenaGeoSPARQL_3.17.0_Repos/
|-- scalability_100K
|-- scalability_10K
`-- scalability_1M
Verify that the repository creation logs have been generated
root@NUC8i7BEH:/data# ls -lsa geordfbench/RDF4JSUT/scripts/CreateRepos/*.log
8 -rw-r--r-- 1 root root 5247 Nov 29 20:32 geordfbench/RDF4JSUT/scripts/CreateRepos/logCreateRepos_Scal10K_1M_RDF4J.log
root@NUC8i7BEH:/data# ls -lsa geordfbench/GraphDBSUT/scripts/CreateRepos/*.log
32 -rw-r--r-- 1 root root 31527 Nov 29 20:49 geordfbench/GraphDBSUT/scripts/CreateRepos/logCreateRepo_Scal10K_1M_GraphDB.log
root@NUC8i7BEH:/data# ls -lsa geordfbench/JenaGeoSPARQLSUT/scripts/CreateRepos/*.log
8 -rw-r--r-- 1 root root 6338 Nov 29 21:53 geordfbench/JenaGeoSPARQLSUT/scripts/CreateRepos/logCreateRepo_Scal10K_1M_JenaGeoSPARQL.log
Verify that the experiment logs have been generated

Experiment run logs may be quite long, therefore the user can click the links below to view the details of the execution.

root@NUC8i7BEH:/data# ls -lsa geordfbench/RDF4JSUT/scripts/RunTests3/*.log
76 -rw-r--r-- 1 root root 73065 Nov 29 20:39 geordfbench/RDF4JSUT/scripts/RunTests3/RunWLRDF4JExp_Scal100K.log
76 -rw-r--r-- 1 root root 70318 Nov 29 20:35 geordfbench/RDF4JSUT/scripts/RunTests3/RunWLRDF4JExp_Scal10K.log
76 -rw-r--r-- 1 root root 73128 Nov 29 20:48 geordfbench/RDF4JSUT/scripts/RunTests3/RunWLRDF4JExp_Scal1M.log
root@NUC8i7BEH:/data# ls -lsa geordfbench/GraphDBSUT/scripts/RunTests3/*.log
84 -rw-r--r-- 1 root root 81535 Nov 29 21:01 geordfbench/GraphDBSUT/scripts/RunTests3/RunWLGraphDBExp_Scal100K.log
84 -rw-r--r-- 1 root root 78751 Nov 29 20:52 geordfbench/GraphDBSUT/scripts/RunTests3/RunWLGraphDBExp_Scal10K.log
84 -rw-r--r-- 1 root root 81893 Nov 29 21:52 geordfbench/GraphDBSUT/scripts/RunTests3/RunWLGraphDBExp_Scal1M.log
root@NUC8i7BEH:/data# ls -lsa geordfbench/JenaGeoSPARQLSUT/scripts/RunTests3/*.log
80 -rw-r--r-- 1 root root 74404 Nov 29 21:57 geordfbench/JenaGeoSPARQLSUT/scripts/RunTests3/RunWLJenaGeoSPARQLExp_Scal100K.log
76 -rw-r--r-- 1 root root 71780 Nov 29 21:55 geordfbench/JenaGeoSPARQLSUT/scripts/RunTests3/RunWLJenaGeoSPARQLExp_Scal10K.log
80 -rw-r--r-- 1 root root 74489 Nov 29 22:01 geordfbench/JenaGeoSPARQLSUT/scripts/RunTests3/RunWLJenaGeoSPARQLExp_Scal1M.log
Verify that the experiment results have been generated in the Default Location

Experiment results are stored by default in the file system. For this demonstration the base location for all results was /data/Results_Store. The output of the following command has been modified to fully expand the last entry only, with the result of tree -L 4 Results_Store/RDF4JSUT/2023-11-24_RDF4JSUT_RunWL_Scal1M/ :

root@NUC8i7BEH:/data# tree -L 2 Results_Store/
Results_Store/
|-- GraphDBSUT
|   |-- 2023-11-24_GraphDBSUT_RunWL_Scal100K
|   |-- 2023-11-24_GraphDBSUT_RunWL_Scal10K
|   `-- 2023-11-24_GraphDBSUT_RunWL_Scal1M
|-- JenaGeoSPARQLSUT
|   |-- 2023-11-24_JenaGeoSPARQL_RunWL_Scal100K
|   |-- 2023-11-24_JenaGeoSPARQL_RunWL_Scal10K
|   `-- 2023-11-24_JenaGeoSPARQL_RunWL_Scal1M
`-- RDF4JSUT
    |-- 2023-11-24_RDF4JSUT_RunWL_Scal100K
    |-- 2023-11-24_RDF4JSUT_RunWL_Scal10K
    `-- 2023-11-24_RDF4JSUT_RunWL_Scal1M
        `-- Scalability
            `-- 1M
                `-- RDF4JSUT-ExperimentWorkload
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-cold
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-cold-long
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-warm
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-warm-long
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold-long
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm-long
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold-long
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm
                    `-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm-long

root@NUC8i7BEH:/data# more Results_Store/RDF4JSUT/2023-11-24_RDF4JSUT_RunWL_Scal1M/Scalability/1M/RDF4JSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold
80500 6745899026
root@NUC8i7BEH:/data# more Results_Store/RDF4JSUT/2023-11-24_RDF4JSUT_RunWL_Scal1M/Scalability/1M/RDF4JSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold-long 
80500 42063774 7517612106 7559675880
80500 2500775 6743398251 6745899026
80500 7657479 5442562737 5450220216

For each query and execution type (warm, cold) there are two files, a short and a long version. The long version has 4 columns (noOfResults, evaluationTime, scanTime, totalTime) and one row for each execution iteration performed. The short version has 2 columns (noOfResults, totalTime) and only one row which represents the average or median totalTime of the execution iterations from the long version. All times are in nano seconds.

Verify that the experiment results have been generated in the PostgreSQL database

Experiment results are also stored in a custom location, a PostgreSQL database. Each experiment details are recorded, with a unique ID, in a row of the EXPERIMENTS table. Each query execution iteration details are in a similar manner recorded in the QUERYEXECUTIONS table. A set of views can provide aggregation for the totalTime and calculation of the Average and Median value of totalTime for each query and execution type (warm, cold). All times are in milli seconds. In the figures below we can see the actual snapshots from the PgAdmin v4 interface.

Fig.1 - Experiment entries
Fig.2 - Experiment aggregate results

An example line chart with the results of 3 experiments with Scalability-100K workload is shown below:

Fig.3 - Scalability 100K chart

The above chart shows clearly that for the Scalability 100K workload, Jena GeoSPARQL performed the best and RDF4J followed. In this chart we can also see, that for all 3 systems COLD and WARM caches are almost identical. This is attributed to the fact that clearing caches failed inside the container, as it was mentioned in the Caution section of this page, basically making all runs behave as WARM.

Check if Accuracy checking was performed

For the Scalability-10K workload we used a modified copy of the default generated JSON specification and created a Gold Standard. The differences between the two files are minimal and are shown in the following diff output:

root@NUC8i7BEH:/data/geordfbench/json_defs/workloads# diff scalabilityFunc10K_WLoriginal_GOLD_STANDARD.json scalabilityFunc10K_WLoriginal.json
45c45
<         "expectedResults" : 554
---
>         "expectedResults" : -1
51c51
<         "expectedResults" : 2
---
>         "expectedResults" : -1
57c57
<         "expectedResults" : 2
---
>         "expectedResults" : -1
79c79
< }
---

We replaced the -1 value with the expected number of results for each one of the queries for the specific dataset of the Scalability-10K workload. For example, we can search for the ACCURATE token in the RDF4J experiment log file /data/geordfbench/RDF4JSUT/scripts/RunTests3/RunWLRDF4JExp_Scal10K.log in order to verify that the token is found 18 times (3 queries * 3 COLD repetitions + 3 queries * 3 WARM repetitions). The same holds for the corresponding GraphDB and JenaGeoSPARQL logs for the Scalability-10K workload. For the non Gold Standard workloads, Scalability-{100K | 1M}, the response is different, since we receive ACCURACY NOT DETERMINED indicating that no expected result value was provided for the queries in those experiments.

Explanation of what happened?

The more interested user, can look at the simple Bash script, /data/startUpScript.sh, which launched the experiments. The actions taken by this script are: