By Theofilos Ioannidis (tioannid [at] di [dot] uoa [dot] gr), created on , last updated on
Containers are not ideal for benchmarking purposes similar to the one GeoRDFBench Framework performs, because they do not allow clearing system caches. The reason for this is that:
Therefore, in the following example, although the user can verify that the experiments run properly and results are correctly calculated and reported, the COLD cache response times will not be accurate. However, for experiments that do not require COLD cache response time measurements, e.g., macro benchmark scenarios, response times should be accurate enough for drawing basic conclusions.
This example, features:
The geordfbench_nuc_rdf4j.zip is a zipped file which contains a docker image. When run, the image will generate a container which will execute the Scalability-10K workload with RDF4J on the NUC8i7BEH host (for simplicity reasons, in this example, we do not allocate memory, cpus, IP and hostname for the container). We assume that we have the ~/Downloads/geordfbench_nuc_rdf4j.zip. Then we uncompress in /data:
/data$ unzip ~/Downloads/geordfbench_nuc_rdf4j.zip; cd geordfbench_nuc_rdf4j
/data/geordfbench_nuc_rdf4j$ docker build -t geordfbench_nuc_rdf4j .
/data/geordfbench_nuc_rdf4j$ docker run -it --name rdf4jscal10k geordfbench_nuc_rdf4j
The default terminal will act as a log window and after some time the experiment will end with:
...
142034 [main] INFO GenericExprerimentResultsCollector - Cache COLD
142035 [main] INFO GenericExprerimentResultsCollector - Query 0
142035 [main] INFO GenericExprerimentResultsCollector - Rep 0 24420555 + 484377200 = 508797755 nsecs, 554 results, 0 scan errors
142035 [main] INFO GenericExprerimentResultsCollector - Rep 1 1410756 + 214240542 = 215651298 nsecs, 554 results, 0 scan errors
142035 [main] INFO GenericExprerimentResultsCollector - Rep 2 7386783 + 164673770 = 172060553 nsecs, 554 results, 0 scan errors
142036 [main] INFO GenericExprerimentResultsCollector - Query 1
142036 [main] INFO GenericExprerimentResultsCollector - Rep 0 9819585 + 224390970 = 234210555 nsecs, 2 results, 0 scan errors
142036 [main] INFO GenericExprerimentResultsCollector - Rep 1 18176936 + 184655973 = 202832909 nsecs, 2 results, 0 scan errors
142036 [main] INFO GenericExprerimentResultsCollector - Rep 2 4340932 + 140167593 = 144508525 nsecs, 2 results, 0 scan errors
142037 [main] INFO GenericExprerimentResultsCollector - Query 2
142037 [main] INFO GenericExprerimentResultsCollector - Rep 0 4651682 + 187169885 = 191821567 nsecs, 2 results, 0 scan errors
142037 [main] INFO GenericExprerimentResultsCollector - Rep 1 2443206 + 196298644 = 198741850 nsecs, 2 results, 0 scan errors
142037 [main] INFO GenericExprerimentResultsCollector - Rep 2 14501298 + 169481425 = 183982723 nsecs, 2 results, 0 scan errors
142037 [main] INFO GenericExprerimentResultsCollector - Cache WARM
142037 [main] INFO GenericExprerimentResultsCollector - Query 0
142038 [main] INFO GenericExprerimentResultsCollector - Rep 0 433401 + 149580823 = 150014224 nsecs, 554 results, 0 scan errors
142038 [main] INFO GenericExprerimentResultsCollector - Rep 1 489047 + 136297258 = 136786305 nsecs, 554 results, 0 scan errors
142038 [main] INFO GenericExprerimentResultsCollector - Rep 2 399087 + 130589982 = 130989069 nsecs, 554 results, 0 scan errors
142038 [main] INFO GenericExprerimentResultsCollector - Query 1
142038 [main] INFO GenericExprerimentResultsCollector - Rep 0 1123667 + 122082444 = 123206111 nsecs, 2 results, 0 scan errors
142038 [main] INFO GenericExprerimentResultsCollector - Rep 1 1208589 + 117647937 = 118856526 nsecs, 2 results, 0 scan errors
142039 [main] INFO GenericExprerimentResultsCollector - Rep 2 1034712 + 114634712 = 115669424 nsecs, 2 results, 0 scan errors
142039 [main] INFO GenericExprerimentResultsCollector - Query 2
142039 [main] INFO GenericExprerimentResultsCollector - Rep 0 1239245 + 124528086 = 125767331 nsecs, 2 results, 0 scan errors
142039 [main] INFO GenericExprerimentResultsCollector - Rep 1 1217343 + 123254940 = 124472283 nsecs, 2 results, 0 scan errors
142039 [main] INFO GenericExprerimentResultsCollector - Rep 2 875293 + 114946356 = 115821649 nsecs, 2 results, 0 scan errors
142039 [main] INFO RunRDF4JExperimentWorkload - End ScalabilityFunc
Start time = Sat Aug 10 15:43:35 UTC 2024
End time = Sat Aug 10 15:45:58 UTC 2024
From another terminal we can connect to the rdf4jscal10k container and check the results in the geographica3 database in PostgreSQL:
/data$ docker exec -it rdf4jscal10k /bin/bash
root@0e8125946708:/data# su postgres
postgres@0e8125946708:/data$ psql
psql (14.12 (Ubuntu 14.12-0ubuntu0.22.04.1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
--------------+--------------+----------+---------+---------+-----------------------
geographica3 | geographica3 | UTF8 | C.UTF-8 | C.UTF-8 |
postgres | postgres | UTF8 | C.UTF-8 | C.UTF-8 |
template0 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
postgres=# \c geographica3
You are now connected to database "geographica3" as user "postgres".
geographica3=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+----------------------------------+----------+--------------+-------------+---------------+------------+-------------
public | EXPERIMENT | table | geographica3 | permanent | heap | 8192 bytes |
public | EXPERIMENT_id_seq | sequence | geographica3 | permanent | | 8192 bytes |
public | QUERYEXECUTION | table | geographica3 | permanent | heap | 0 bytes |
public | QUERYEXECUTION_experiment_id_seq | sequence | geographica3 | permanent | | 8192 bytes |
public | QUERYEXECUTION_id_seq | sequence | geographica3 | permanent | | 8192 bytes |
public | vquery_ordered_aggrs | view | postgres | permanent | | 0 bytes |
public | vquery_ordered_aggrs2 | view | postgres | permanent | | 0 bytes |
public | vquery_ordered_aggrs_3 | view | postgres | permanent | | 0 bytes |
public | vqueryexecution | view | postgres | permanent | | 0 bytes |
public | vqueryexecution2 | view | postgres | permanent | | 0 bytes |
public | vqueryexecution3 | view | postgres | permanent | | 0 bytes |
public | vreport | view | postgres | permanent | | 0 bytes |
(12 rows)
geographica3=# select * from "EXPERIMENT";
id | instime | exectime | description | host | os | sut | queryset | dataset | executionspec | reportspec | type
----+----------------------------+----------------------------+-----------------------------------+-----------------------------------------------------------------------------+--------------------------------+----------+-----------------+-----------------+---------------------------------------------------------------------------------------------------------------+------------------+-----------------
1 | 2024-08-10 18:43:41.942+03 | 2024-08-10 18:43:41.923+03 | 2024-08-09_RDF4JSUT_RunWL_Scal10K | SimpleHost{ NUC8i7BEH, 192.168.1.44, 32GB, GenericLinuxOS{ Ubuntu-jammy } } | GenericLinuxOS{ Ubuntu-jammy } | RDF4JSUT | scalabilityFunc | scalability_10K | SimpleES{ COLD=3, WARM=3, action=RUN, maxduration=604800 secs, repmaxduration=86400 secs, func=QUERY_MEDIAN } | SimpleReportSpec | ScalabilityFunc
(1 row)
geographica3=# select * from vquery_ordered_aggrs;
experiment_id | query_no | cache_type | no_iterations | mean | median
---------------+----------+------------+---------------+-------+--------
92 | 0 | COLD | 3 | 0.299 | 0.216
92 | 0 | WARM | 3 | 0.139 | 0.137
92 | 1 | COLD | 3 | 0.194 | 0.203
92 | 1 | WARM | 3 | 0.119 | 0.119
92 | 2 | COLD | 3 | 0.192 | 0.192
92 | 2 | WARM | 3 | 0.122 | 0.124
(6 rows)
Afterwards we can verify the result files generated in the filesystem:
geographica3=# \q
postgres@f9a1d01d4750:/data$ exit
exit
root@f9a1d01d4750:/data# tree /data/Results_Store
/data/Results_Store
`-- RDF4JSUT
`-- 2024-08-10_RDF4JSUT_RunWL_Scal10K
`-- Scalability
`-- 10K
`-- RDF4JSUT-ExperimentWorkload
|-- 00-SC1_Geometries_Intersects_GivenPolygon-cold
|-- 00-SC1_Geometries_Intersects_GivenPolygon-cold-long
|-- 00-SC1_Geometries_Intersects_GivenPolygon-warm
|-- 00-SC1_Geometries_Intersects_GivenPolygon-warm-long
|-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold
|-- 01-SC2_Intensive_Geometries_Intersect_Geometries-cold-long
|-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm
|-- 01-SC2_Intensive_Geometries_Intersect_Geometries-warm-long
|-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold
|-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-cold-long
|-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm
`-- 02-SC3_Relaxed_Geometries_Intersect_Geometries-warm-long
5 directories, 12 files
When you are done with the docker container, you can terminate it with:
/data$ tioannid@NUC8i7BEH:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b8753e0265ac geordfbench_nuc_rdf4j "/bin/sh -c '/data/s…" 5 minutes ago Exited (130) 4 seconds ago rdf4jscal10k
/data$ docker rm -f rdf4jscal10k
rdf4jscal10k
The more interested user, can look at the simple Bash script, /data/startUpScript.sh, which is the entry point of the docker description file. The simple actions taken are: