Docker Container Example : Scalability 10K Workload with RDF4J

By Theofilos Ioannidis (tioannid [at] di [dot] uoa [dot] gr), created on , last updated on


Caution

Containers are not ideal for benchmarking purposes similar to the one GeoRDFBench Framework performs, because they do not allow clearing system caches. The reason for this is that:

Therefore, in the following example, although the user can verify that the experiments run properly and results are correctly calculated and reported, the COLD cache response times will not be accurate. However, for experiments that do not require COLD cache response time measurements, e.g., macro benchmark scenarios, response times should be accurate enough for drawing basic conclusions.

Key Features

This example, features:

Docker Image

The geordfbench_nuc_rdf4j.zip is a zipped file which contains a docker image. When run, the image will generate a container which will execute the Scalability-10K workload with RDF4J on the NUC8i7BEH host (for simplicity reasons, in this example, we do not allocate memory, cpus, IP and hostname for the container). We assume that we have the ~/Downloads/geordfbench_nuc_rdf4j.zip. Then we uncompress in /data:

/data$ unzip geordfbench_nuc_rdf4j.zip
/data$ cd geordfbench_nuc_rdf4j
/data/geordfbench_nuc_rdf4j$ docker build -t geordfbench_nuc_rdf4j .
/data/geordfbench_nuc_rdf4j$ docker run -it --name test geordfbench_nuc_rdf4j /bin/bash

The default terminal will act as a log window and after some time the experiment will end with:

...
144431 [main] INFO  GenericExprerimentResultsCollector  - Cache COLD
144431 [main] INFO  GenericExprerimentResultsCollector  - 	Query 0
144431 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 28734018 + 693243213 = 721977231 nsecs, 554 results, 0 scan errors
144431 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 4034276 + 309748342 = 313782618 nsecs, 554 results, 0 scan errors
144431 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 3005777 + 243356481 = 246362258 nsecs, 554 results, 0 scan errors
144431 [main] INFO  GenericExprerimentResultsCollector  - 	Query 1
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 17586203 + 239027786 = 256613989 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 2819656 + 154320599 = 157140255 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 5752366 + 166884988 = 172637354 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 	Query 2
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 2609175 + 155153038 = 157762213 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 2548615 + 154570164 = 157118779 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 3796632 + 149082904 = 152879536 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - Cache WARM
144432 [main] INFO  GenericExprerimentResultsCollector  - 	Query 0
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 520773 + 220539046 = 221059819 nsecs, 554 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 423052 + 173751455 = 174174507 nsecs, 554 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 498243 + 185245195 = 185743438 nsecs, 554 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 	Query 1
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 1227311 + 121353249 = 122580560 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 1387298 + 116242473 = 117629771 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 1145577 + 120575113 = 121720690 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 	Query 2
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 1015724 + 137564542 = 138580266 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 1135437 + 145388424 = 146523861 nsecs, 2 results, 0 scan errors
144432 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 963440 + 144746807 = 145710247 nsecs, 2 results, 0 scan errors
144432 [main] INFO  RunRDF4JExperimentWorkload  - End ScalabilityFunc
Start time = Fri Jun 23 20:45:21 UTC 2023
End time = Fri Jun 23 20:47:46 UTC 2023

From another terminal we can connect to the test container and check the results in the geographica3 database in PostgreSQL:

/data$  docker exec -it test /bin/bash
root@9185155c6a9c:/data# su postgres
postgres@9185155c6a9c:/data$ psql
psql (14.8 (Ubuntu 14.8-0ubuntu0.22.04.1))
Type "help" for help.

postgres=# \l
                                 List of databases
     Name     |    Owner     | Encoding | Collate |  Ctype  |   Access privileges   
--------------+--------------+----------+---------+---------+-----------------------
 geographica3 | geographica3 | UTF8     | C.UTF-8 | C.UTF-8 | 
 postgres     | postgres     | UTF8     | C.UTF-8 | C.UTF-8 | 
 template0    | postgres     | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
              |              |          |         |         | postgres=CTc/postgres
 template1    | postgres     | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
              |              |          |         |         | postgres=CTc/postgres
(4 rows)

postgres=# \c geographica3
You are now connected to database "geographica3" as user "postgres".
geographica3=# \d+
                                                      List of relations
 Schema |               Name               |   Type   |    Owner     | Persistence | Access method |    Size    | Description 
--------+----------------------------------+----------+--------------+-------------+---------------+------------+-------------
 public | EXPERIMENT                       | table    | geographica3 | permanent   | heap          | 16 kB      | 
 public | EXPERIMENT_id_seq                | sequence | geographica3 | permanent   |               | 8192 bytes | 
 public | QUERYEXECUTION                   | table    | geographica3 | permanent   | heap          | 8192 bytes | 
 public | QUERYEXECUTION_experiment_id_seq | sequence | geographica3 | permanent   |               | 8192 bytes | 
 public | QUERYEXECUTION_id_seq            | sequence | geographica3 | permanent   |               | 8192 bytes | 
 public | vquery_ordered_aggrs             | view     | postgres     | permanent   |               | 0 bytes    | 
 public | vquery_ordered_aggrs2            | view     | postgres     | permanent   |               | 0 bytes    | 
 public | vqueryexecution                  | view     | postgres     | permanent   |               | 0 bytes    | 
 public | vqueryexecution2                 | view     | postgres     | permanent   |               | 0 bytes    | 
(9 rows)

geographica3=# select * from "EXPERIMENT";
 id |          instime           |          exectime          |            description            |                                    host                                     |               os               | 
  sut    |    queryset     |     dataset     |                                                 executionspec                                                 |    reportspec    |      type       
----+----------------------------+----------------------------+-----------------------------------+-----------------------------------------------------------------------------+--------------------------------+-
---------+-----------------+-----------------+---------------------------------------------------------------------------------------------------------------+------------------+-----------------
  1 | 2023-06-23 20:45:31.172+00 | 2023-06-23 20:45:31.154+00 | 2023-08-05_RDF4JSUT_RunWL_Scal10K | SimpleHost{ NUC8i7BEH, 192.168.1.44, 32GB, GenericLinuxOS{ Ubuntu-jammy } } | GenericLinuxOS{ Ubuntu-jammy } | 
RDF4JSUT | scalabilityFunc | scalability_10K | SimpleES{ COLD=3, WARM=3, action=RUN, maxduration=604800 secs, repmaxduration=86400 secs, func=QUERY_MEDIAN } | SimpleReportSpec | ScalabilityFunc
(1 row)

geographica3=# select * from vquery_ordered_aggrs;
 experiment_id | query_no | cache_type | no_iterations | mean  | median 
---------------+----------+------------+---------------+-------+--------
             1 |        0 | COLD       |             3 | 0.427 |  0.314
             1 |        0 | WARM       |             3 | 0.194 |  0.186
             1 |        1 | COLD       |             3 | 0.196 |  0.173
             1 |        1 | WARM       |             3 | 0.121 |  0.122
             1 |        2 | COLD       |             3 | 0.156 |  0.157
             1 |        2 | WARM       |             3 | 0.144 |  0.146
(6 rows)

Afterwards we can verify the result files generated in the filesystem:

geographica3=# \q
postgres@f9a1d01d4750:/data$ exit
exit
root@f9a1d01d4750:/data# tree /data/Results_Store 
/data/Results_Store
`-- RDF4JSUT
    `-- 2023-08-05_RDF4JSUT_RunWL_Scal10K
        `-- Scalability
            `-- 10K
                `-- RDF4JSUT-ExperimentWorkload
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-cold
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-cold-long
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-warm
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-warm-long
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold-long
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm-long
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold-long
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm
                    `-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm-long

5 directories, 12 files
root@f9a1d01d4750:/data# more /data/Results_Store/RDF4JSUT/2023-08-05_RDF4JSUT_RunWL_Scal10K/Scalability/10K/RDF4JSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold
554 341338164
root@f9a1d01d4750:/data# more /data/Results_Store/RDF4JSUT/2023-08-05_RDF4JSUT_RunWL_Scal10K/Scalability/10K/RDF4JSUT-ExperimentWorkload/00-SC1_Geometries_Intersects_GivenPolygon-cold-long 
554 37442204 739450536 776892740
554 1730537 339607627 341338164
554 8631253 258019229 266650482

When you are done with the docker container, you can terminate it with:

/data$ docker ps -a
CONTAINER ID   IMAGE                   COMMAND                  CREATED          STATUS          PORTS      NAMES
f9a1d01d4750   geordfbench_nuc_rdf4j   "/bin/sh -c '/data/s…"   12 minutes ago   Up 12 minutes   5432/tcp   test
/data$ docker rm -f test
test       

Explanation of what happened?

The more interested user, can look at the simple Bash script, /data/startUpScript.sh, which is the entry point of the docker description file. The simple actions taken are: