Generate TPC datasets in Windows
TPC-H and TPC-DS toolset compilation with Mingw64
Introduction
While the average data-engineer works in a Unix-like platform occasionally he/she might be on a Windows machine. In that case TPC-H and TPC-DS benchmark datasets are usually hard to get (with the exception of DuckDB or a Snowflake trial for the TPC-H dataset). The original TPC-H and TPC-DS toolsets are not really distributed through a source sharing platform like Codeberg, Bitbucket or some other provider. Normally you apply with your personal information and a link is shared with you to download the source archive. For Unix-like Operating Systems the support is extensive. But for Windows you really need the Microsoft Compiler Collection so as to compile the Visual Studio solution. What if you are not interested in this approach and would like to use the latest and greatest from what the Free Open Source Software (FOSS) camp has to offer? If I describe your use case, then, read on.
Getting ready
There is always the option to use Cywgin cross-compilation to compile these datasets. Pretty much what you will read will possibly apply in this setup with minimal to no changes. However, we opt to use Msys2 and Mingw64 combo for ease of use and more native experience. So, let’s get started. In my case I downloaded the latest installer

and installed it to the default location according to their excellent guide:
c:\msys64
We open the msys2 shell and update all packages with
pacman -Syu

We now need to install our toolchain. For this we need mingw64 and mingw32-make at minimum for the x86_64 architecture. First we need the instructions for the ucrt runtime gcc from here and ucrt runtime make from here.

We export everything to the PATH and we are ready to compile
export PATH=$PATH:/c/msys64/ucrt64/bin/
Since we may need to edit files, a good editor like Notepad++ or VSCodium can be handy.
Compiling the TPC-H toolset
Having downloaded the TPC-H and TPC-DS archives, I made a c:\datasets folder and unzip them there. I also renamed TPC-H to something without spaces in the file name (e.g. tpc_h_v_3_0_1).

We navigate to
C:\datasets\tpc_h_v_3_0_1\dbgen
Now we are ready to change the configuration so as to compile our tools.
In C:\datasets\tpc_h_v_3_0_1\dbgen\makefile.suite we edit some variables to read
CC = gcc
DATABASE= INFORMIX
MACHINE = WIN32
WORKLOAD = TPCH
EXE = .exe
Through msys2 we navigate to
cd /c/datasets/tpc_h_v_3_0_1/dbgen
Let’s compile with
mingw32-make -f makefile.suite
We fail. The reason according to this is that we need to change in config.h
#define RNG_A 6364136223846793005uI64
#define RNG_C 1uI64
to
#define RNG_A 6364136223846793005ull
#define RNG_C 1ull
One more
mingw32-make -f makefile.suite
and we are done

All that is remaining is to execute it
./dbgen.exe -vf -s 1
and create a 1GB data set in that folder

Easy?
Compiling the TPC-DS toolset
With some twists we follow a similar approach. We navigate to
C:\datasets\DSGen-software-code-3.2.0rc1\tools
There is a similar Makefile.suite there. We again need an edit
OS = WIN64
and some OS specific additions in corresponding places
WIN64_CC = gcc
WIN64_CFLAGS = -O3 -Wall -fcommon
WIN64_EXE = .exe
WIN64_LEX = flex
WIN64_LIBS = -lm -lws2_32
WIN64_YACC = bison
WIN64_YFLAGS = -d -v
We definitely need bison and flex. Change through msys2 to the appropriate directory after installation
cd /c/datasets/DSGen-software-code-3.2.0rc1/tools/
and we are ready to compile.
mingw32-make -f Makefile.suite
We fail. After a trial and error we find out we need to make some extra changes.
In “r_params.c” change
#define OPTION_START ‘/’
to
#define OPTION_START ‘-’
and in Makefile.suite
y.tab.c: qgen.y
$(YACC) $(YFLAGS) qgen.y
y.tab.o: y.tab.c
y.tab.h: qgen.y
$(YACC) $(YFLAGS) qgen.y
to
y.tab.c: qgen.y
$(YACC) $(YFLAGS) — file-prefix=y qgen.y
y.tab.o: y.tab.c
y.tab.h: qgen.y
$(YACC) $(YFLAGS) — file-prefix=y qgen.y
Let’s try now.
mingw32-make -f Makefile.suite clean
mingw32-make -f Makefile.suite

and we can generate

Epilogue
In this article I shared my knowledge in generating these well-known benchmarks. The instructions are provided as quick and dirty solutions. The porting may contain bugs and my approach is provided on an AS-IS BASIS. I hope you find it useful. Please give feedback in the comments whether it worked for you or not. Until next time, happy generating.