Docker Container Example : Scalability 10K Workload with RDF4J

By Theofilos Ioannidis (tioannid [at] di [dot] uoa [dot] gr), created on , last updated on


Caution

Containers are not ideal for benchmarking purposes similar to the one GeoRDFBench Framework performs, because they do not allow clearing system caches. The reason for this is that:

Therefore, in the following example, although the user can verify that the experiments run properly and results are correctly calculated and reported, the COLD cache response times will not be accurate. However, for experiments that do not require COLD cache response time measurements, e.g., macro benchmark scenarios, response times should be accurate enough for drawing basic conclusions.

Key Features

This example, features:

Docker Image

The geordfbench_nuc_rdf4j.zip is a zipped file which contains a docker image. When run, the image will generate a container which will execute the Scalability-10K workload with RDF4J on the NUC8i7BEH host (for simplicity reasons, in this example, we do not allocate memory, cpus, IP and hostname for the container). We assume that we have the ~/Downloads/geordfbench_nuc_rdf4j.zip. Then we uncompress in /data:

/data$ unzip ~/Downloads/geordfbench_nuc_rdf4j.zip; cd geordfbench_nuc_rdf4j
/data/geordfbench_nuc_rdf4j$ docker build -t geordfbench_nuc_rdf4j .
/data/geordfbench_nuc_rdf4j$ docker run -it --name rdf4jscal10k  geordfbench_nuc_rdf4j

The default terminal will act as a log window and after some time the experiment will end with:

...
142034 [main] INFO  GenericExprerimentResultsCollector  - Cache COLD
142035 [main] INFO  GenericExprerimentResultsCollector  - 	Query 0
142035 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 24420555 + 484377200 = 508797755 nsecs, 554 results, 0 scan errors
142035 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 1410756 + 214240542 = 215651298 nsecs, 554 results, 0 scan errors
142035 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 7386783 + 164673770 = 172060553 nsecs, 554 results, 0 scan errors
142036 [main] INFO  GenericExprerimentResultsCollector  - 	Query 1
142036 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 9819585 + 224390970 = 234210555 nsecs, 2 results, 0 scan errors
142036 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 18176936 + 184655973 = 202832909 nsecs, 2 results, 0 scan errors
142036 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 4340932 + 140167593 = 144508525 nsecs, 2 results, 0 scan errors
142037 [main] INFO  GenericExprerimentResultsCollector  - 	Query 2
142037 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 4651682 + 187169885 = 191821567 nsecs, 2 results, 0 scan errors
142037 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 2443206 + 196298644 = 198741850 nsecs, 2 results, 0 scan errors
142037 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 14501298 + 169481425 = 183982723 nsecs, 2 results, 0 scan errors
142037 [main] INFO  GenericExprerimentResultsCollector  - Cache WARM
142037 [main] INFO  GenericExprerimentResultsCollector  - 	Query 0
142038 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 433401 + 149580823 = 150014224 nsecs, 554 results, 0 scan errors
142038 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 489047 + 136297258 = 136786305 nsecs, 554 results, 0 scan errors
142038 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 399087 + 130589982 = 130989069 nsecs, 554 results, 0 scan errors
142038 [main] INFO  GenericExprerimentResultsCollector  - 	Query 1
142038 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 1123667 + 122082444 = 123206111 nsecs, 2 results, 0 scan errors
142038 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 1208589 + 117647937 = 118856526 nsecs, 2 results, 0 scan errors
142039 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 1034712 + 114634712 = 115669424 nsecs, 2 results, 0 scan errors
142039 [main] INFO  GenericExprerimentResultsCollector  - 	Query 2
142039 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 0	 1239245 + 124528086 = 125767331 nsecs, 2 results, 0 scan errors
142039 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 1	 1217343 + 123254940 = 124472283 nsecs, 2 results, 0 scan errors
142039 [main] INFO  GenericExprerimentResultsCollector  - 		Rep 2	 875293 + 114946356 = 115821649 nsecs, 2 results, 0 scan errors
142039 [main] INFO  RunRDF4JExperimentWorkload  - End ScalabilityFunc
Start time = Sat Aug 10 15:43:35 UTC 2024
End time = Sat Aug 10 15:45:58 UTC 2024

From another terminal we can connect to the rdf4jscal10k container and check the results in the geographica3 database in PostgreSQL:

/data$ docker exec -it rdf4jscal10k /bin/bash
root@0e8125946708:/data# su postgres
postgres@0e8125946708:/data$ psql
psql (14.12 (Ubuntu 14.12-0ubuntu0.22.04.1))
Type "help" for help.

postgres=# \l
                                 List of databases
     Name     |    Owner     | Encoding | Collate |  Ctype  |   Access privileges   
--------------+--------------+----------+---------+---------+-----------------------
 geographica3 | geographica3 | UTF8     | C.UTF-8 | C.UTF-8 | 
 postgres     | postgres     | UTF8     | C.UTF-8 | C.UTF-8 | 
 template0    | postgres     | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
              |              |          |         |         | postgres=CTc/postgres
 template1    | postgres     | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
              |              |          |         |         | postgres=CTc/postgres
(4 rows)

postgres=# \c geographica3
You are now connected to database "geographica3" as user "postgres".
geographica3=# \d+
                                                      List of relations
 Schema |               Name               |   Type   |    Owner     | Persistence | Access method |    Size    | Description 
--------+----------------------------------+----------+--------------+-------------+---------------+------------+-------------
 public | EXPERIMENT                       | table    | geographica3 | permanent   | heap          | 8192 bytes | 
 public | EXPERIMENT_id_seq                | sequence | geographica3 | permanent   |               | 8192 bytes | 
 public | QUERYEXECUTION                   | table    | geographica3 | permanent   | heap          | 0 bytes    | 
 public | QUERYEXECUTION_experiment_id_seq | sequence | geographica3 | permanent   |               | 8192 bytes | 
 public | QUERYEXECUTION_id_seq            | sequence | geographica3 | permanent   |               | 8192 bytes | 
 public | vquery_ordered_aggrs             | view     | postgres     | permanent   |               | 0 bytes    | 
 public | vquery_ordered_aggrs2            | view     | postgres     | permanent   |               | 0 bytes    | 
 public | vquery_ordered_aggrs_3           | view     | postgres     | permanent   |               | 0 bytes    | 
 public | vqueryexecution                  | view     | postgres     | permanent   |               | 0 bytes    | 
 public | vqueryexecution2                 | view     | postgres     | permanent   |               | 0 bytes    | 
 public | vqueryexecution3                 | view     | postgres     | permanent   |               | 0 bytes    | 
 public | vreport                          | view     | postgres     | permanent   |               | 0 bytes    | 
(12 rows)

geographica3=# select * from "EXPERIMENT";
 id |          instime           |          exectime          |            description            |                                    host                                     |               os               |   sut    |    queryset     |     dataset     |                                                 executionspec                                                 |    reportspec    |      type       
----+----------------------------+----------------------------+-----------------------------------+-----------------------------------------------------------------------------+--------------------------------+----------+-----------------+-----------------+---------------------------------------------------------------------------------------------------------------+------------------+-----------------
 1  | 2024-08-10 18:43:41.942+03 | 2024-08-10 18:43:41.923+03 | 2024-08-09_RDF4JSUT_RunWL_Scal10K | SimpleHost{ NUC8i7BEH, 192.168.1.44, 32GB, GenericLinuxOS{ Ubuntu-jammy } } | GenericLinuxOS{ Ubuntu-jammy } | RDF4JSUT | scalabilityFunc | scalability_10K | SimpleES{ COLD=3, WARM=3, action=RUN, maxduration=604800 secs, repmaxduration=86400 secs, func=QUERY_MEDIAN } | SimpleReportSpec | ScalabilityFunc
(1 row)

geographica3=# select * from vquery_ordered_aggrs;
 experiment_id | query_no | cache_type | no_iterations | mean  | median 
---------------+----------+------------+---------------+-------+--------
            92 |        0 | COLD       |             3 | 0.299 |  0.216
            92 |        0 | WARM       |             3 | 0.139 |  0.137
            92 |        1 | COLD       |             3 | 0.194 |  0.203
            92 |        1 | WARM       |             3 | 0.119 |  0.119
            92 |        2 | COLD       |             3 | 0.192 |  0.192
            92 |        2 | WARM       |             3 | 0.122 |  0.124
(6 rows)

Afterwards we can verify the result files generated in the filesystem:

geographica3=# \q
postgres@f9a1d01d4750:/data$ exit
exit
root@f9a1d01d4750:/data# tree /data/Results_Store
/data/Results_Store
`-- RDF4JSUT
    `-- 2024-08-10_RDF4JSUT_RunWL_Scal10K
        `-- Scalability
            `-- 10K
                `-- RDF4JSUT-ExperimentWorkload
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-cold
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-cold-long
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-warm
                    |-- 00-SC1_Geometries_Intersects_GivenPolygon-warm-long
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold-long
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm
                    |-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm-long
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold-long
                    |-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm
                    `-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm-long

5 directories, 12 files

When you are done with the docker container, you can terminate it with:

/data$ tioannid@NUC8i7BEH:~$ docker ps -a
CONTAINER ID   IMAGE                   COMMAND                  CREATED         STATUS                       PORTS     NAMES
b8753e0265ac   geordfbench_nuc_rdf4j   "/bin/sh -c '/data/s…"   5 minutes ago   Exited (130) 4 seconds ago             rdf4jscal10k
/data$ docker rm -f rdf4jscal10k
rdf4jscal10k

Explanation of what happened?

The more interested user, can look at the simple Bash script, /data/startUpScript.sh, which is the entry point of the docker description file. The simple actions taken are: