Search code examples
linuxbenchmarkingcompiler-warningsrdbms

Can't manage to get the Star-Schema DBMS benchmark data generator to run properly


One of the commonly (?) used DBMS benchmarks is called SSB, the Star-Schema Benchmark. To run it, you need to generate your schema, i.e. your tables with the data in them. Well, there's a generator program you can find in all sorts of places (on github):

and possibly elsewhere. I'm not sure those all have exactly the same code, but I seem to be experiencing the same problem with them. I'm using a Linux 64-bit system (Kubuntu 14.04 if that helps); and am trying to build and run the `dbgen' program from that package.

When building, I get type/size-related warnings:

me@myhost:~/src/ssb-dbgen$ make
... etc. etc. ...
gcc -O -DDBNAME=\"dss\" -DLINUX -DDB2 -DSSBM   -c -o varsub.o varsub.c
rnd.c: In function גrow_stopג:
rnd.c:60:6: warning: format ג%dג expects argument of type גintג, but argument 4 has type גlong intג [-Wformat=]
      i, Seed[i].usage);
      ^
driver.c: In function גpartialג:
driver.c:606:4: warning: format ג%dג expects argument of type גintג, but argument 4 has type גlong intג [-Wformat=]
... etc. etc. ...

Then, I make sure all the right files are in place, try to generate my tables, and only get two of them! I try to explicitly generate the LINEORDER table, and get a strange failure:

eyal@vivaldi:~/src/ssb-dbgen$ ls
bcd2.c      build.c    driver.c    HISTORY         makefile_win   print.c  rnd.c                      speed_seed.o      varsub.c
bcd2.h      build.o    driver.o    history.html    mkf.macos      print.o  rnd.h                      ssb-dbgen-master  varsub.o
bcd2.o      CHANGES    dss.ddl     load_stub.c     permute.c      qgen     rnd.o                      text.c
bm_utils.c  config.h   dss.h       load_stub.o     permute.h      qgen.c   rxin-ssb-dbgen-master.zip  text.o
bm_utils.o  dbgen      dss.ri      Makefile        permute.o      qgen.o   shared.h                   tpcd.h
BUGS        dists.dss  dsstypes.h  makefile.suite  PORTING.NOTES  README   speed_seed.c               TPCH_README
me@myhost:~/src/ssb-dbgen$ ./dbgen -vfF -s 1
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000
Generating data for suppliers table [pid: 32303]done.
Generating data for customers table [pid: 32303]done.
Generating data for (null) [pid: 32303]done.
Generating data for (null) [pid: 32303]done.
Generating data for (null) [pid: 32303]done.
Generating data for (null) [pid: 32303]done.
me@myhost:~/src/ssb-dbgen$ ls *.tbl
customer.tbl  supplier.tbl
me@myhost:~/src/ssb-dbgen$ ./dbgen -vfF -s 1 -T l
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000
Generating data for lineorder table [pid: 32305]*** buffer overflow detected ***: ./dbgen terminated
======= Backtrace: =========
... etc. etc. ...
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fcea1b79ec5]
./dbgen[0x401219]
======= Memory map: ========
... etc. etc. ...

Now, if I switch to a 32-bit Linux system, I don't get any of these warnings (although there two warnings about pointer-to-non-pointer conversion); but running the generation again produces only two tables. Now, other individual tables can be produced - but they don't correspond to one another at all, I would think...

Has anyone encountered a similar problem? Am I doing something wrong? Am I using the wrong sources somehow?

(This is almost a dupe of SSB dbgen Linux - Segmentation Fault ... but I can't "take over" somebody else's question when they may have encountered other problems than mine. Also, that one has no answers...)


Solution

  • So, eventually, I ended up surveying all versions of ssb-dbgen on GitHub, and creating a unified repository:

    https://github.com/eyalroz/ssb-dbgen/

    this repository:

    1. incorporates fixes for all bugs fixed in any of those versions, and a few others. In particular, the format mismatch due to different int sizes on Linux and Windows for 64-bit machines is resolved.
    2. Switches the build to using CMake, rather than needing to manually edit Makefiles. Specifically, building on Windows and MacOS is supported. Building on more exotic systems is theoretically supported.
    3. has CI build testing of commits to make sure that at least the building doesn't break.