Parallel Database Query Processing System

This project implements a parallel query-processing engine designed to run SQL-like queries over large, structured datasets using high-performance computing techniques. The goal is to build a lightweight, command-line database system that supports fast data ingestion, indexing, and query execution using a combination of:

B+ tree based indexing
Serial, OpenMP, and MPI execution modes
Parallel query evaluation and parallel data scanning

The system provides a full pipeline—from data generation to query parsing to parallel execution—making it useful for system administrators who need a fast, embeddable tool for scanning large logs or structured records.

Compilation & Execution

Within this project, we have helpers for downloading dependencies, generating synthetic data, and executing the programs.

Downloading Dependencies

To ensure that all requirements are satisfied, run the convienance requirements.sh file:

bash requirements.sh

Generating synthetic data

Our data generation helper (generate_commands.py) will look at a bank of known commands to randomly generate a given amount of known data. This function takes in two parameters: a requiremented parameter of tuples to generate (we'll say 50,000) and an optimal parameter of a filename to save to.

python generate_commands.py 50000

Executing the QPE Files

To execute our full tests, we can utilize predefined configs in the makefile.

Firstly, to compile all relevant .c files:

# Compile and link all relevant files
make

Once all files are compiled, we can use other makefile helpers to execute each version of our QPE testing functions:

Serial:

# Serial version
make run-omp

OpenMP Parallel Version:

# OpenMP Version
make run-omp

OpenMPI Parallel Version:

# OpenMPI Version
make run-mpi

Once testing is complete, the make clean command can be run to clean all artifacts and object files.

File Structure

project-root/
├── build/                      # Compiled binaries and test executables
├── data-generation/            # Scripts for generating synthetic datasets
├── docs/                       # Project documentation, diagrams, design notes
├── engine/                     # Core database engine implementation
│   ├── mpi/                    # MPI-specific build + execution logic
│   ├── omp/                    # OpenMP-specific build + execution logic
│   ├── serial/                 # Serial build + execution logic
│   └── bplus.c                 # B+ Tree data structure implementation
├── include/                    # Shared headers across modules
├── tests/                      # Unit test cases + verification utilities
├── tokenizer/                  # SQL parsing + tokenization logic
├── connectEngine.c             # Bridge between parser and execution engines
├── makefile                    # Build rules and compiler instructions
├── QPEMPI.c                    # Main entry for MPI execution engine
├── QPEOMP.c                    # Main entry for OpenMP execution engine
├── QPESeq.c                    # Main entry for serial execution engine
├── requirements.sh             # Environment + dependency setup script
└── sample-queries.txt          # Example queries for debugging + validation

Report & Analysis

See Proj2.pdf for:

Speedup and efficiency using Amdahl’s Law
Scalability with increased problem size
Optimal thread and process count for performance

Ensuring Correctness Without Sacrificing Performance

We verified accuracy through targeted testing and edge-case checks, while profiling and optimizing critical paths to keep execution fast. This continual cycle of testing and refinement ensured the system remained both correct and efficient.

Contributors

JJ McCauley: Serial engines, makefiles/testing, docs, & QPE testing files
Ian Davis:
Anthony Czerwinski: Sample queries & Select parallelizations
Sam Dickerson: Parser & Insert parallelizations
Logan Kelsch: Data generation &

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parallel Database Query Processing System

Compilation & Execution

Downloading Dependencies

Generating synthetic data

Executing the QPE Files

File Structure

Report & Analysis

Ensuring Correctness Without Sacrificing Performance

Contributors

About

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.github/workflows		.github/workflows
.vscode		.vscode
data-generation		data-generation
docs		docs
engine		engine
include		include
tests		tests
tokenizer		tokenizer
.gitignore		.gitignore
LICENSE		LICENSE
QPEMPI		QPEMPI
QPEMPI.c		QPEMPI.c
QPEOMP.c		QPEOMP.c
QPESeq.c		QPESeq.c
README.md		README.md
analysis.py		analysis.py
benchmark.py		benchmark.py
connectEngine.c		connectEngine.c
makefile		makefile
parallel.txt		parallel.txt
requirements.sh		requirements.sh
sample-queries-FULL.txt		sample-queries-FULL.txt
sample-queries.txt		sample-queries.txt
test_omp.c		test_omp.c
walkthrough.md		walkthrough.md

License

Jairik/Parallel-Query-Processing-System

Folders and files

Latest commit

History

Repository files navigation

Parallel Database Query Processing System

Compilation & Execution

Downloading Dependencies

Generating synthetic data

Executing the QPE Files

File Structure

Report & Analysis

Ensuring Correctness Without Sacrificing Performance

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 4

Uh oh!

Languages