Generate TPC datasets in Windows

Vasileios Anagnostopoulos
4 min readApr 8, 2024

TPC-H and TPC-DS toolset compilation with Mingw64

Introduction

While the average data-engineer works in a Unix-like platform occasionally he/she might be on a Windows machine. In that case TPC-H and TPC-DS benchmark datasets are usually hard to get (with the exception of DuckDB or a Snowflake trial for the TPC-H dataset). The original TPC-H and TPC-DS toolsets are not really distributed through a source sharing platform like Codeberg, Bitbucket or some other provider. Normally you apply with your personal information and a link is shared with you to download the source archive. For Unix-like Operating Systems the support is extensive. But for Windows you really need the Microsoft Compiler Collection so as to compile the Visual Studio solution. What if you are not interested in this approach and would like to use the latest and greatest from what the Free Open Source Software (FOSS) camp has to offer? If I describe your use case, then, read on.

Getting ready

There is always the option to use Cywgin cross-compilation to compile these datasets. Pretty much what you will read will possibly apply in this setup with minimal to no changes. However, we opt to use Msys2 and Mingw64 combo for ease of use and more native experience. So, let’s get started. In my case I downloaded the latest installer

and installed it to the default location according to their excellent guide:

c:\msys64

We open the msys2 shell and update all packages with

pacman -Syu

We now need to install our toolchain. For this we need mingw64 and mingw32-make at minimum for the x86_64 architecture. First we need the instructions for the ucrt runtime gcc from here and ucrt runtime make from here.

We export everything to the PATH and we are ready to compile

export PATH=$PATH:/c/msys64/ucrt64/bin/

Since we may need to edit files, a good editor like Notepad++ or VSCodium can be handy.

Compiling the TPC-H toolset

Having downloaded the TPC-H and TPC-DS archives, I made a c:\datasets folder and unzip them there. I also renamed TPC-H to something without spaces in the file name (e.g. tpc_h_v_3_0_1).

We navigate to

C:\datasets\tpc_h_v_3_0_1\dbgen

Now we are ready to change the configuration so as to compile our tools.

In C:\datasets\tpc_h_v_3_0_1\dbgen\makefile.suite we edit some variables to read

CC = gcc
DATABASE= INFORMIX
MACHINE = WIN32
WORKLOAD = TPCH
EXE = .exe

Through msys2 we navigate to

cd /c/datasets/tpc_h_v_3_0_1/dbgen

Let’s compile with

mingw32-make -f makefile.suite

We fail. The reason according to this is that we need to change in config.h

#define RNG_A 6364136223846793005uI64
#define RNG_C 1uI64

to

#define RNG_A 6364136223846793005ull
#define RNG_C 1ull

One more

mingw32-make -f makefile.suite

and we are done

All that is remaining is to execute it

./dbgen.exe -vf -s 1

and create a 1GB data set in that folder

Easy?

Compiling the TPC-DS toolset

With some twists we follow a similar approach. We navigate to

C:\datasets\DSGen-software-code-3.2.0rc1\tools

There is a similar Makefile.suite there. We again need an edit

OS = WIN64

and some OS specific additions in corresponding places

WIN64_CC = gcc
WIN64_CFLAGS = -O3 -Wall -fcommon
WIN64_EXE = .exe
WIN64_LEX = flex
WIN64_LIBS = -lm -lws2_32
WIN64_YACC = bison
WIN64_YFLAGS = -d -v

We definitely need bison and flex. Change through msys2 to the appropriate directory after installation

cd /c/datasets/DSGen-software-code-3.2.0rc1/tools/

and we are ready to compile.

mingw32-make -f Makefile.suite

We fail. After a trial and error we find out we need to make some extra changes.

In “r_params.c” change

#define OPTION_START ‘/’

to

#define OPTION_START ‘-’

and in Makefile.suite

y.tab.c: qgen.y
$(YACC) $(YFLAGS) qgen.y
y.tab.o: y.tab.c
y.tab.h: qgen.y
$(YACC) $(YFLAGS) qgen.y

to

y.tab.c: qgen.y
$(YACC) $(YFLAGS) — file-prefix=y qgen.y
y.tab.o: y.tab.c
y.tab.h: qgen.y
$(YACC) $(YFLAGS) — file-prefix=y qgen.y

Let’s try now.

mingw32-make -f Makefile.suite clean

mingw32-make -f Makefile.suite

and we can generate

Epilogue

In this article I shared my knowledge in generating these well-known benchmarks. The instructions are provided as quick and dirty solutions. The porting may contain bugs and my approach is provided on an AS-IS BASIS. I hope you find it useful. Please give feedback in the comments whether it worked for you or not. Until next time, happy generating.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response