You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project implements a parallel query-processing engine designed to run SQL-like queries over large, structured datasets using high-performance computing techniques. The goal is to build a lightweight, command-line database system that supports fast data ingestion, indexing, and query execution using a combination of:
6
4
7
-
- B+ tree based storage
5
+
- B+ tree based indexing
8
6
- Serial, OpenMP, and MPI execution modes
9
7
- Parallel query evaluation and parallel data scanning
10
8
11
9
The system provides a full pipeline—from data generation to query parsing to parallel execution—making it useful for system administrators who need a fast, embeddable tool for scanning large logs or structured records.
12
10
13
-
<!-- Should modify this later
14
-
## Expected Components
11
+
---
12
+
13
+
<!-- How to compile and run your programs (including how to generate the data) (makefile and python file) -->
14
+
## Compilation & Execution
15
15
16
-
* **`QPESeq.c`** — Serial query processing engine.
17
-
* **`QPEOMP.c`** — Parallel version using **OpenMP**.
18
-
* **`QPEMPI.c`** — Parallel version using **MPI**.
19
-
* **`Proj2.pdf`** — Documentation and runtime analysis.
Within this project, we have helpers for downloading dependencies, generating synthetic data, and executing the programs.
22
17
23
-
-->
18
+
### Downloading Dependencies
19
+
To ensure that all requirements are satisfied, run the convienance `requirements.sh` file:
24
20
25
-
## Current File Structure
21
+
```bash
22
+
bash requirements.sh
23
+
```
26
24
27
-
***data-generation/** - Schema and scripts for generating log data
28
-
***engine/** - B+ tree implementation and query functionality (serial/parallel)
29
-
***include/** - Header files
30
-
***tokenizer/** - Command tokenizing functionality for main program
31
-
***docs/** - Various MD documentation on design choices and architectural motivation, as well as reports
32
-
***QPESeq.c** - Main serial implementation, using the serial engine
33
-
***QPEOMP.c** - Main parallel implementation, using the OpenMP engine
34
-
***QPEMPI.c** - Main parallel implementation, using the OpenMPI engine
25
+
### Generating synthetic data
35
26
36
-
## Compilation & Execution
27
+
Our data generation helper (`generate_commands.py`) will look at a bank of known commands to randomly generate a given amount of known data. This function takes in two parameters: a requiremented parameter of tuples to generate (we'll say 50,000) and an optimal parameter of a filename to save to.
37
28
38
29
```bash
39
-
# Serial execution
40
-
gcc QPESeq.c -o QPESeq
41
-
./QPESeq db.txt sql.txt
30
+
python generate_commands.py 50000
31
+
```
42
32
43
-
w/ makefile: make run
33
+
### Executing the QPE Files
44
34
45
-
# OpenMP version
46
-
gcc -fopenmp QPEOMP.c -o QPEOMP
47
-
./QPEOMP db.txt sql.txt
35
+
To execute our full tests, we can utilize predefined configs in the `makefile`.
└── sample-queries.txt # Example queries for debugging + validation
90
+
91
+
---
92
+
59
93
## Report & Analysis
60
94
61
95
See **Proj2.pdf** for:
@@ -64,11 +98,15 @@ See **Proj2.pdf** for:
64
98
* Scalability with increased problem size
65
99
* Optimal thread and process count for performance
66
100
67
-
## Contributors
101
+
## Ensuring Correctness Without Sacrificing Performance
68
102
69
-
* *Name A*: Data generation & serial QPE
70
-
* *Name B*: OpenMP implementation
71
-
* *Name C*: MPI implementation & runtime analysis
72
-
-->
73
-
---
103
+
We verified accuracy through targeted testing and edge-case checks, while profiling and optimizing critical paths to keep execution fast. This continual cycle of testing and refinement ensured the system remained both correct and efficient.
104
+
105
+
<!-- TODO Update these with finished deliverables -->
106
+
## Contributors
74
107
108
+
**JJ McCauley*: Serial engines, makefiles/testing, docs, & QPE testing files
Copy file name to clipboardExpand all lines: docs/engine.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Engine Documentation
2
2
3
-
This document is a comprehensive technical reference for the Serial engine used by the Parallel-Query-Processing-System repository. It describes the B+ tree index implementation, build utilities that load CSV data and construct indexes, and the execute engine that implements SQL-like operations (SELECT, INSERT, DELETE). The goal is to make the codebase easy to understand for contributors who need to maintain, extend, or benchmark the engine.
3
+
This document is a comprehensive technical reference for the engines used by the Parallel-Query-Processing-System repository. It describes the B+ tree index implementation, build utilities that load CSV data and construct indexes, and the execute engines that implement SQL-like operations (SELECT, INSERT, DELETE). While the Serial engine is described in detail, the OpenMP and MPI implementations follow a similar structure.
4
4
5
5
Table of Contents
6
6
- Section 1 — B+ Tree (structure, public API, internals)
- Load CSV data into an in-memory `record **` representation and provide helpers to build B+ tree indexes from those records.
69
+
- Note: Each implementation (Serial, OpenMP, MPI) has its own build engine file (e.g., `buildEngine-serial.c`, `buildEngine-omp.c`, `buildEngine-mpi.c`).
69
70
70
71
Key functions and behavior
71
72
-`record **getAllRecordsFromFile(const char *filepath, int *num_records)`
Copy file name to clipboardExpand all lines: docs/fileStructure.md
+12-3Lines changed: 12 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,16 @@ Documentation and reports. Used mostly for developer experience and to help expl
14
14
15
15
## `/engine`
16
16
17
-
This serves as the **main powerhouse** of the program. This holds the B+ tree implementation (*bplus-x.c*), utility functions for *building* the different trees that will be used for indexing (*buildEngine-x.c*), and various functions that can be used for executing specific commands (*executeEngine-x.c*). The execute file will hold the specific commands for things such as SELECT, WHERE, INSERT, DELETE, etc. This should be used as a means to abstract the lower-level functionality to be used in the root-level files (*QPEx.c*). For more of an explanation of how this will work, see the **bplus.md** md file in the docs folder.
17
+
This serves as the **main powerhouse** of the program. It contains:
18
+
-`bplus.c`: The core B+ tree implementation.
19
+
-`recordSchema.c`: Schema definitions and helpers.
20
+
-`serial/`, `omp/`, `mpi/`: Subdirectories containing specific implementations for Serial, OpenMP, and MPI execution engines.
21
+
22
+
Each subdirectory contains:
23
+
-`buildEngine-*.c`: Utility functions for *building* the indexes.
24
+
-`executeEngine-*.c`: Functions for executing specific commands (SELECT, INSERT, DELETE).
25
+
26
+
This structure abstracts lower-level functionality for use in the root-level files (`QPE*.c`). For more explanation, see `bplus.md` and `engine.md`.
18
27
19
28
## `/include`
20
29
@@ -28,6 +37,6 @@ Any basic tests ran during development to verify the functionality of any utilit
28
37
29
38
Given a string SQL command, will parse it to determine the actual functionality desired by the user.
30
39
31
-
## `QPE.c`
40
+
## `QPE*.c`
32
41
33
-
The *QPEMPI.c*, *QPEOMP.c*, and *QPESeq.c* uses the wrapper functions in the `/engine` directory to perform high-level queries. For now, these files should read in each command in the `sample-queries.txt` file, use the parser to determine which specific functionality is desired, then run it through the engine to get the specific results. This will also perform high-level benchmarking.
42
+
The `QPEMPI.c`, `QPEOMP.c`, and `QPESeq.c` files use the wrapper functions in the `/engine` directory to perform high-level queries. These files read commands from `sample-queries.txt`, use the tokenizer to parse them, and then run them through the appropriate engine (Serial, OpenMP, or MPI) to get results. They also handle high-level benchmarking.
Copy file name to clipboardExpand all lines: docs/schema.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,11 @@
23
23
24
24
There should be minimal default row indexes, as each one will require a seperate B+ tree to be stored in memory. Below are the currently chosen *default* indexes:
0 commit comments