Docker Container Example : RDF4J, GraphDB and JenaGeoSPARQL
against Scalability-{10K, 100K, 1M} Workloads

By Theofilos Ioannidis (tioannid [at] di [dot] uoa [dot] gr), created on , last updated on


Caution

Containers are not ideal for benchmarking purposes similar to the one GeoRDFBench Framework performs, because they do not allow clearing system caches. The reason for this is that:

Therefore, in the following example, although the user can verify that the experiments run properly and results are correctly calculated and reported, the COLD cache response times will not be accurate. However, for experiments that do not require COLD cache response time measurements, e.g., macro benchmark scenarios, response times should be accurate enough for drawing basic conclusions.

Key Features

This example, features:

Docker Image

For this example, we will use a Windows 10 host machine, with 16 GB of memory and Intel i7-9700 CPU and a 1TB HDD 2.5" SATA III 7200rpm for data disk.

Pull the image

The docker image is stored in the Github registry under tioannid/geordfbench/multistore/scal10k_1m. We assume that the current directory is D:\TEMP. Then we issue the following commands, which pull the image from the registry, and verify its presence in the docker image list:

D:\TEMP>docker pull ghcr.io/tioannid/geordfbench/multistore/scal10k_1m:latest
latest: Pulling from tioannid/geordfbench/multistore/scal10k_1m
9b857f539cb1: Already exists 
e34aa86df3c8: Pull complete 
80caa81c86dd: Pull complete 
6b7a96575964: Pull complete 
7e94ba62ec79: Pull complete 
af72a3f9c8c8: Pull complete 
2245696f7a7c: Pull complete 
c14138bcc0b5: Pull complete 
b1cbd9bb3233: Pull complete 
Digest: sha256:b30c0edf140991ac2227b5d2f6f5fcd480757c2200c5a87e63143b3153073074
Status: Downloaded newer image for ghcr.io/tioannid/geordfbench/multistore/scal10k_1m:latest
ghcr.io/tioannid/geordfbench/multistore/scal10k_1m:latest

D:\TEMP>docker images
REPOSITORY                                           TAG       IMAGE ID       CREATED        SIZE
ghcr.io/tioannid/geordfbench/multistore/scal10k_1m   latest    583e20b24d99   17 hours ago   2.21GB
Start a container

We start a container named mybench from this image with the following command:

D:\TEMP>docker run -e POSTGRES_PASSWORD=postgres -p 5430:5432/tcp --hostname NUC8i7BEH --cpus="4" --memory="11g" --memory-swap="11g" \
--mount "type=bind,src=%cd%,target=/src" --name multiscal10k1m -d ghcr.io/tioannid/geordfbench/multistore/scal10k_1m
caf4182604dcf3accf3a0ab8a9aefb47ba81f26631e444b113277b72becba08c

This command, launches the mybench container, while defining:

Launch the experiments through a terminal

We start the experiments by connecting to the mybench running container with a terminal and issue the command:

D:\TEMP>docker exec -it multiscal10k1m /bin/bash
root@NUC8i7BEH:/data# ./startUpScript.sh 
 * Starting PostgreSQL 14 database server                                                                                                                                                                                                                                [ OK ] 
...
CREATE DATABASE
ALTER DATABASE
You are now connected to database "geographica3" as user "postgres".
...
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
...
GRANT
GRANT
...

The default terminal will act as a log window and after some time all experiments (3 stores * 3 workloads = 9) will end with:

...
188420 [main] INFO  JDBCRepSrc  - Deferred mode for JDBCRepSrc was enabled. 18 records were flushed
188420 [main] INFO  GenericExprerimentResultsCollector  - Export statistics in "/data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload"
188422 [main] INFO  GenericExprerimentResultsCollector  - Created non existing directory
188424 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold
188425 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold-long
188426 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/01-SC2_Intensive_Geometries_Intersect_Geometries-cold
188427 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/01-SC2_Intensive_Geometries_Intersect_Geometries-cold-long
188427 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/02-SC3_Relaxed_Geometries_Intersect_Geometries-cold
188428 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/02-SC3_Relaxed_Geometries_Intersect_Geometries-cold-long
188429 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-warm
188429 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-warm-long
188430 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/01-SC2_Intensive_Geometries_Intersect_Geometries-warm
188431 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/01-SC2_Intensive_Geometries_Intersect_Geometries-warm-long
188432 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/02-SC3_Relaxed_Geometries_Intersect_Geometries-warm
188432 [main] INFO  GenericExprerimentResultsCollector  - Statistiscs printed: /data/Results_Store/JenaGeoSPARQLSUT/2024-08-11_JenaGeoSPARQL_RunWL_Scal1M/Scalability/1M/JenaGeoSPARQLSUT-ExperimentWorkload/02-SC3_Relaxed_Geometries_Intersect_Geometries-warm-long
188432 [main] INFO  GenericExprerimentResultsCollector  - Cache COLD
188432 [main] INFO  GenericExprerimentResultsCollector  - 	Query 0
188433 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 9552778 + 4132877922 = 4142430700 nsecs, 80500 results, 0 scan errors
188433 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 672930 + 1880973419 = 1881646349 nsecs, 80500 results, 0 scan errors
188433 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 472974 + 1846138520 = 1846611494 nsecs, 80500 results, 0 scan errors
188433 [main] INFO  GenericExprerimentResultsCollector  - 	Query 1
188433 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 1225036 + 4561265290 = 4562490326 nsecs, 813 results, 0 scan errors
188433 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 1449089 + 4030943534 = 4032392623 nsecs, 813 results, 0 scan errors
188433 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 306012 + 3933005699 = 3933311711 nsecs, 813 results, 0 scan errors
188434 [main] INFO  GenericExprerimentResultsCollector  - 	Query 2
188434 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 507441 + 4277727968 = 4278235409 nsecs, 239 results, 0 scan errors
188434 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 266341 + 3952868847 = 3953135188 nsecs, 239 results, 0 scan errors
188434 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 942322 + 3985496877 = 3986439199 nsecs, 239 results, 0 scan errors
188434 [main] INFO  GenericExprerimentResultsCollector  - Cache WARM
188434 [main] INFO  GenericExprerimentResultsCollector  - 	Query 0
188434 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 283616 + 396030572 = 396314188 nsecs, 80500 results, 0 scan errors
188434 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 279001 + 377293492 = 377572493 nsecs, 80500 results, 0 scan errors
188434 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 266834 + 380899139 = 381165973 nsecs, 80500 results, 0 scan errors
188434 [main] INFO  GenericExprerimentResultsCollector  - 	Query 1
188435 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 277018 + 3029702564 = 3029979582 nsecs, 813 results, 0 scan errors
188435 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 326171 + 3018712786 = 3019038957 nsecs, 813 results, 0 scan errors
188435 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 294436 + 3169451975 = 3169746411 nsecs, 813 results, 0 scan errors
188435 [main] INFO  GenericExprerimentResultsCollector  - 	Query 2
188435 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 276563 + 3060291695 = 3060568258 nsecs, 239 results, 0 scan errors
188435 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 279986 + 3196104160 = 3196384146 nsecs, 239 results, 0 scan errors
188435 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 266747 + 3032419928 = 3032686675 nsecs, 239 results, 0 scan errors
188435 [main] INFO  RunJenaGeoSPARQLExperimentWorkload  - End ScalabilityFunc
Start time = Sun Aug 11 11:32:54 UTC 2024
End time = Sun Aug 11 11:36:02 UTC 2024
Verify that the repositories have been created
root@NUC8i7BEH:/data# tree -L 1 RDF4J_3.7.7_Repos/server/repositories/
RDF4J_3.7.7_Repos/server/repositories/
|-- scalability_100K
|-- scalability_10K
`-- scalability_1M

root@NUC8i7BEH:/data# tree -L 1 graphdb-free-9.11.1/data/repositories/
graphdb-free-9.11.1/data/repositories/
|-- scalability_100K
|-- scalability_10K
`-- scalability_1M

root@NUC8i7BEH:/data# tree -L 1 JenaGeoSPARQL_3.17.0_Repos/           
JenaGeoSPARQL_3.17.0_Repos/
|-- scalability_100K
|-- scalability_10K
`-- scalability_1M
Verify that the repository creation logs have been generated
root@NUC8i7BEH:/data# ls -lsa geordfbench/RDF4JSUT/scripts/CreateRepos/*.log
8 -rw-r--r-- 1 root root 5247 Aug 11 10:29 geordfbench/RDF4JSUT/scripts/CreateRepos/logCreateRepos_Scal10K_1M_RDF4J.log
root@NUC8i7BEH:/data# ls -lsa geordfbench/GraphDBSUT/scripts/CreateRepos/*.log
32 -rw-r--r-- 1 root root 31532 Aug 11 10:43 geordfbench/GraphDBSUT/scripts/CreateRepos/logCreateRepo_Scal10K_1M_GraphDB.log
root@NUC8i7BEH:/data# ls -lsa geordfbench/JenaGeoSPARQLSUT/scripts/CreateRepos/*.log
8 -rw-r--r-- 1 root root 6345 Aug 11 11:28 geordfbench/JenaGeoSPARQLSUT/scripts/CreateRepos/logCreateRepo_Scal10K_1M_JenaGeoSPARQL.log
Verify that the experiment logs have been generated

Experiment run logs may be quite long, therefore the user can click the links below to view the details of the execution.

root@NUC8i7BEH:/data# ls -lsa geordfbench/RDF4JSUT/scripts/RunTests3/*.log
76 -rw-r--r-- 1 root root 73065 Nov 29 20:39 geordfbench/RDF4JSUT/scripts/RunTests3/RunWLRDF4JExp_Scal100K.log
76 -rw-r--r-- 1 root root 70318 Nov 29 20:35 geordfbench/RDF4JSUT/scripts/RunTests3/RunWLRDF4JExp_Scal10K.log
76 -rw-r--r-- 1 root root 73128 Nov 29 20:48 geordfbench/RDF4JSUT/scripts/RunTests3/RunWLRDF4JExp_Scal1M.log
root@NUC8i7BEH:/data# ls -lsa geordfbench/GraphDBSUT/scripts/RunTests3/*.log
84 -rw-r--r-- 1 root root 81535 Nov 29 21:01 geordfbench/GraphDBSUT/scripts/RunTests3/RunWLGraphDBExp_Scal100K.log
84 -rw-r--r-- 1 root root 78751 Nov 29 20:52 geordfbench/GraphDBSUT/scripts/RunTests3/RunWLGraphDBExp_Scal10K.log
84 -rw-r--r-- 1 root root 81893 Nov 29 21:52 geordfbench/GraphDBSUT/scripts/RunTests3/RunWLGraphDBExp_Scal1M.log
root@NUC8i7BEH:/data# ls -lsa geordfbench/JenaGeoSPARQLSUT/scripts/RunTests3/*.log
80 -rw-r--r-- 1 root root 74404 Nov 29 21:57 geordfbench/JenaGeoSPARQLSUT/scripts/RunTests3/RunWLJenaGeoSPARQLExp_Scal100K.log
76 -rw-r--r-- 1 root root 71780 Nov 29 21:55 geordfbench/JenaGeoSPARQLSUT/scripts/RunTests3/RunWLJenaGeoSPARQLExp_Scal10K.log
80 -rw-r--r-- 1 root root 74489 Nov 29 22:01 geordfbench/JenaGeoSPARQLSUT/scripts/RunTests3/RunWLJenaGeoSPARQLExp_Scal1M.log
Verify that the experiment results have been generated in the Default Location

Experiment results are stored by default in the file system. For this demonstration the base location for all results was /data/Results_Store. The output of the following command has been modified to fully expand the last entry only, with the result of tree -L 4 Results_Store/RDF4JSUT/2024-08-11_RDF4JSUT_RunWL_Scal1M/ :

root@NUC8i7BEH:/data# tree -L 2 Results_Store/
Results_Store/
|-- GraphDBSUT
|   |-- 2024-08-11_GraphDBSUT_RunWL_Scal100K
|   |-- 2024-08-11_GraphDBSUT_RunWL_Scal10K
|   `-- 2024-08-11_GraphDBSUT_RunWL_Scal1M
|-- JenaGeoSPARQLSUT
|   |-- 2024-08-11_JenaGeoSPARQL_RunWL_Scal100K
|   |-- 2024-08-11_JenaGeoSPARQL_RunWL_Scal10K
|   `-- 2024-08-11_JenaGeoSPARQL_RunWL_Scal1M
`-- RDF4JSUT
    |-- 2024-08-11_RDF4JSUT_RunWL_Scal100K
    |-- 2024-08-11_RDF4JSUT_RunWL_Scal10K
    `-- 2024-08-11_RDF4JSUT_RunWL_Scal1M
        `-- Scalability
            `-- 1M
                `-- RDF4JSUT-ExperimentWorkload
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-cold
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-cold-long
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-warm
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-warm-long
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold-long
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm-long
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold-long
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm
                    `-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm-long

root@NUC8i7BEH:/data# more Results_Store/RDF4JSUT/2024-08-11_RDF4JSUT_RunWL_Scal1M/Scalability/1M/RDF4JSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold
80500 4165400820
root@NUC8i7BEH:/data# more Results_Store/RDF4JSUT/2024-08-11_RDF4JSUT_RunWL_Scal1M/Scalability/1M/RDF4JSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold-long 
80500 34588783 7085045629 7119634412
80500 2422466 4162978354 4165400820
80500 5623270 3946238038 3951861308

For each query and execution type (warm, cold) there are two files, a short and a long version. The long version has 4 columns (noOfResults, evaluationTime, scanTime, totalTime) and one row for each execution iteration performed. The short version has 2 columns (noOfResults, totalTime) and only one row which represents the average or median totalTime of the execution iterations from the long version. All times are in nano seconds.

Verify that the experiment results have been generated in the PostgreSQL database

Experiment results are also stored in a custom location, a PostgreSQL database. Each experiment details are recorded, with a unique ID, in a row of the EXPERIMENTS table. Each query execution iteration details are in a similar manner recorded in the QUERYEXECUTIONS table. A set of views can provide aggregation for the totalTime and calculation of the Average and Median value of totalTime for each query and execution type (warm, cold). All times are in milli seconds. In the figures below we can see the actual snapshots from the PgAdmin v4 interface.

Experiment Entries
Fig.1 - Experiment entries
Experiment Entries
Fig.2 - Experiment aggregate results

An example bar chart with the results (median) of 3 experiments with Scalability-100K workload is shown below:

Scalability 100K
Fig.3 - Scalability 100K chart

The above chart shows clearly that for the Scalability 100K workload, Jena GeoSPARQL performed the best and RDF4J followed. In this chart we can also see, that for RDF4J and GraphDB COLD and WARM caches are almost identical. This is attributed to the fact that clearing caches failed inside the container, as it was mentioned in the Caution section of this page, basically making all runs behave as WARM. JenaGeoSPARQL on the other hand is not affect by this because indexing and caching of spatial objects and relations is performed on-demand during query execution.

Check if Accuracy checking was performed

For the Scalability-10K workload we used a modified copy of the default generated JSON specification and created a Gold Standard. The differences between the two files are minimal and are shown in the following diff output:

root@NUC8i7BEH:/data/geordfbench/json_defs/workloads# diff scalabilityFunc10K_WLoriginal_GOLD_STANDARD.json scalabilityFunc10K_WLoriginal.json
45c45
<         "expectedResults" : 554
---
>         "expectedResults" : -1
51c51
<         "expectedResults" : 2
---
>         "expectedResults" : -1
57c57
<         "expectedResults" : 2
---
>         "expectedResults" : -1
79c79
< }
---

We replaced the -1 value with the expected number of results for each one of the queries for the specific dataset of the Scalability-10K workload. For example, we can search for the ACCURATE token in the RDF4J experiment log file /data/geordfbench/RDF4JSUT/scripts/RunTests3/RunWLRDF4JExp_Scal10K.log in order to verify that the token is found 18 times (3 queries * 3 COLD repetitions + 3 queries * 3 WARM repetitions). The same holds for the corresponding GraphDB and JenaGeoSPARQL logs for the Scalability-10K workload. For the non Gold Standard workloads, Scalability-{100K | 1M}, the response is different, since we receive ACCURACY NOT DETERMINED indicating that no expected result value was provided for the queries in those experiments.

Explanation of what happened?

The more interested user, can look at the simple Bash script, /data/startUpScript.sh, which launched the experiments. The actions taken by this script are: