Skip to content

Commit 9b3371e

Browse files
committed
Updating documentation
Still need to finish up contributors, but other than that all documentation should be finalized and ready to go.
1 parent 81acf48 commit 9b3371e

File tree

6 files changed

+103
-52
lines changed

6 files changed

+103
-52
lines changed

README.md

Lines changed: 79 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,95 @@
11
# Parallel Database Query Processing System
22

3-
## Overview
4-
53
This project implements a parallel query-processing engine designed to run SQL-like queries over large, structured datasets using high-performance computing techniques. The goal is to build a lightweight, command-line database system that supports fast data ingestion, indexing, and query execution using a combination of:
64

7-
- B+ tree based storage
5+
- B+ tree based indexing
86
- Serial, OpenMP, and MPI execution modes
97
- Parallel query evaluation and parallel data scanning
108

119
The system provides a full pipeline—from data generation to query parsing to parallel execution—making it useful for system administrators who need a fast, embeddable tool for scanning large logs or structured records.
1210

13-
<!-- Should modify this later
14-
## Expected Components
11+
---
12+
13+
<!-- How to compile and run your programs (including how to generate the data) (makefile and python file) -->
14+
## Compilation & Execution
1515

16-
* **`QPESeq.c`** — Serial query processing engine.
17-
* **`QPEOMP.c`** — Parallel version using **OpenMP**.
18-
* **`QPEMPI.c`** — Parallel version using **MPI**.
19-
* **`Proj2.pdf`** — Documentation and runtime analysis.
20-
* **`db.txt`** — Sample generated dataset.
21-
* **`sample-queries.txt`** — Sample SQL-like queries.
16+
Within this project, we have helpers for downloading dependencies, generating synthetic data, and executing the programs.
2217

23-
-->
18+
### Downloading Dependencies
19+
To ensure that all requirements are satisfied, run the convienance `requirements.sh` file:
2420

25-
## Current File Structure
21+
```bash
22+
bash requirements.sh
23+
```
2624

27-
* **data-generation/** - Schema and scripts for generating log data
28-
* **engine/** - B+ tree implementation and query functionality (serial/parallel)
29-
* **include/** - Header files
30-
* **tokenizer/** - Command tokenizing functionality for main program
31-
* **docs/** - Various MD documentation on design choices and architectural motivation, as well as reports
32-
* **QPESeq.c** - Main serial implementation, using the serial engine
33-
* **QPEOMP.c** - Main parallel implementation, using the OpenMP engine
34-
* **QPEMPI.c** - Main parallel implementation, using the OpenMPI engine
25+
### Generating synthetic data
3526

36-
## Compilation & Execution
27+
Our data generation helper (`generate_commands.py`) will look at a bank of known commands to randomly generate a given amount of known data. This function takes in two parameters: a requiremented parameter of tuples to generate (we'll say 50,000) and an optimal parameter of a filename to save to.
3728

3829
```bash
39-
# Serial execution
40-
gcc QPESeq.c -o QPESeq
41-
./QPESeq db.txt sql.txt
30+
python generate_commands.py 50000
31+
```
4232

43-
w/ makefile: make run
33+
### Executing the QPE Files
4434

45-
# OpenMP version
46-
gcc -fopenmp QPEOMP.c -o QPEOMP
47-
./QPEOMP db.txt sql.txt
35+
To execute our full tests, we can utilize predefined configs in the `makefile`.
4836

49-
w/ makefile: make run-omp
37+
Firstly, to **compile** all relevant .c files:
5038

51-
# MPI version
52-
mpicc QPEMPI.c -o QPEMPI
53-
mpirun -np <num_processes> ./QPEMPI db.txt sql.txt
39+
```bash
40+
# Compile and link all relevant files
41+
make
42+
```
43+
44+
Once all files are compiled, we can use other makefile helpers to execute each version of our QPE testing functions:
45+
46+
**Serial**:
5447

55-
w/ makefile: make run-mpi
48+
```bash
49+
# Serial version
50+
make run-omp
5651
```
5752

58-
<!--
53+
**OpenMP Parallel Version**:
54+
```bash
55+
# OpenMP Version
56+
make run-omp
57+
```
58+
59+
**OpenMPI Parallel Version**:
60+
```bash
61+
# OpenMPI Version
62+
make run-mpi
63+
```
64+
65+
Once testing is complete, the `make clean` command can be run to clean all artifacts and object files.
66+
67+
---
68+
69+
## File Structure
70+
71+
project-root/
72+
├── build/ # Compiled binaries and test executables
73+
├── data-generation/ # Scripts for generating synthetic datasets
74+
├── docs/ # Project documentation, diagrams, design notes
75+
├── engine/ # Core database engine implementation
76+
│ ├── mpi/ # MPI-specific build + execution logic
77+
│ ├── omp/ # OpenMP-specific build + execution logic
78+
│ ├── serial/ # Serial build + execution logic
79+
│ └── bplus.c # B+ Tree data structure implementation
80+
├── include/ # Shared headers across modules
81+
├── tests/ # Unit test cases + verification utilities
82+
├── tokenizer/ # SQL parsing + tokenization logic
83+
├── connectEngine.c # Bridge between parser and execution engines
84+
├── makefile # Build rules and compiler instructions
85+
├── QPEMPI.c # Main entry for MPI execution engine
86+
├── QPEOMP.c # Main entry for OpenMP execution engine
87+
├── QPESeq.c # Main entry for serial execution engine
88+
├── requirements.sh # Environment + dependency setup script
89+
└── sample-queries.txt # Example queries for debugging + validation
90+
91+
---
92+
5993
## Report & Analysis
6094

6195
See **Proj2.pdf** for:
@@ -64,11 +98,15 @@ See **Proj2.pdf** for:
6498
* Scalability with increased problem size
6599
* Optimal thread and process count for performance
66100

67-
## Contributors
101+
## Ensuring Correctness Without Sacrificing Performance
68102

69-
* *Name A*: Data generation & serial QPE
70-
* *Name B*: OpenMP implementation
71-
* *Name C*: MPI implementation & runtime analysis
72-
-->
73-
---
103+
We verified accuracy through targeted testing and edge-case checks, while profiling and optimizing critical paths to keep execution fast. This continual cycle of testing and refinement ensured the system remained both correct and efficient.
104+
105+
<!-- TODO Update these with finished deliverables -->
106+
## Contributors
74107

108+
* *JJ McCauley*: Serial engines, makefiles/testing, docs, & QPE testing files
109+
* *Ian Davis*:
110+
* *Anthony Czerwinski*: Sample queries & Select parallelizations
111+
* *Sam Dickerson*: Parser & Insert parallelizations
112+
* *Logan Kelsch*: Data generation &

db.csv

Lines changed: 0 additions & 2 deletions
This file was deleted.

docs/bplus.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,10 @@ The engine is structured into three core components:
4747
1. **bplus.c**
4848
Implements the B+ tree storage structure itself.
4949

50-
2. **buildEngine-serial.c**
50+
2. **buildEngine-*.c** (serial, omp, mpi)
5151
Builds tables and creates B+ tree indexes over chosen attributes.
5252

53-
3. **queryEngine-serial.c**
53+
3. **executeEngine-*.c** (serial, omp, mpi)
5454
Executes commands (SELECT, WHERE) and uses the B+ tree for fast lookups.
5555

5656
### Index Lifecycle

docs/engine.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Engine Documentation
22

3-
This document is a comprehensive technical reference for the Serial engine used by the Parallel-Query-Processing-System repository. It describes the B+ tree index implementation, build utilities that load CSV data and construct indexes, and the execute engine that implements SQL-like operations (SELECT, INSERT, DELETE). The goal is to make the codebase easy to understand for contributors who need to maintain, extend, or benchmark the engine.
3+
This document is a comprehensive technical reference for the engines used by the Parallel-Query-Processing-System repository. It describes the B+ tree index implementation, build utilities that load CSV data and construct indexes, and the execute engines that implement SQL-like operations (SELECT, INSERT, DELETE). While the Serial engine is described in detail, the OpenMP and MPI implementations follow a similar structure.
44

55
Table of Contents
66
- Section 1 — B+ Tree (structure, public API, internals)
@@ -62,10 +62,11 @@ Design notes and caveats
6262

6363
## Section 2 — Build Engine
6464

65-
Files: `engine/serial/buildEngine-serial.c`, `include/buildEngine-serial.h`, `engine/recordSchema.c`, `include/recordSchema.h`
65+
Files: `engine/*/buildEngine-*.c`, `include/buildEngine-*.h`, `engine/recordSchema.c`, `include/recordSchema.h`
6666

6767
Purpose
6868
- Load CSV data into an in-memory `record **` representation and provide helpers to build B+ tree indexes from those records.
69+
- Note: Each implementation (Serial, OpenMP, MPI) has its own build engine file (e.g., `buildEngine-serial.c`, `buildEngine-omp.c`, `buildEngine-mpi.c`).
6970

7071
Key functions and behavior
7172
- `record **getAllRecordsFromFile(const char *filepath, int *num_records)`
@@ -92,10 +93,11 @@ Design notes
9293

9394
## Section 3 — Execute Engine
9495

95-
Files: `engine/serial/executeEngine-serial.c`, `include/executeEngine-serial.h`
96+
Files: `engine/*/executeEngine-*.c`, `include/executeEngine-*.h`
9697

9798
Purpose
9899
- Implements application-level query execution (SELECT / INSERT / DELETE) and query predicate evaluation. Connects in-memory records, persistent CSV storage, and B+ tree indexes.
100+
- Note: Each implementation (Serial, OpenMP, MPI) has its own execute engine file.
99101

100102
Core types
101103
- `struct engineS` — engine state with fields for in-memory records, index roots, index metadata, and the CSV datafile path.

docs/fileStructure.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,16 @@ Documentation and reports. Used mostly for developer experience and to help expl
1414

1515
## `/engine`
1616

17-
This serves as the **main powerhouse** of the program. This holds the B+ tree implementation (*bplus-x.c*), utility functions for *building* the different trees that will be used for indexing (*buildEngine-x.c*), and various functions that can be used for executing specific commands (*executeEngine-x.c*). The execute file will hold the specific commands for things such as SELECT, WHERE, INSERT, DELETE, etc. This should be used as a means to abstract the lower-level functionality to be used in the root-level files (*QPEx.c*). For more of an explanation of how this will work, see the **bplus.md** md file in the docs folder.
17+
This serves as the **main powerhouse** of the program. It contains:
18+
- `bplus.c`: The core B+ tree implementation.
19+
- `recordSchema.c`: Schema definitions and helpers.
20+
- `serial/`, `omp/`, `mpi/`: Subdirectories containing specific implementations for Serial, OpenMP, and MPI execution engines.
21+
22+
Each subdirectory contains:
23+
- `buildEngine-*.c`: Utility functions for *building* the indexes.
24+
- `executeEngine-*.c`: Functions for executing specific commands (SELECT, INSERT, DELETE).
25+
26+
This structure abstracts lower-level functionality for use in the root-level files (`QPE*.c`). For more explanation, see `bplus.md` and `engine.md`.
1827

1928
## `/include`
2029

@@ -28,6 +37,6 @@ Any basic tests ran during development to verify the functionality of any utilit
2837

2938
Given a string SQL command, will parse it to determine the actual functionality desired by the user.
3039

31-
## `QPE.c`
40+
## `QPE*.c`
3241

33-
The *QPEMPI.c*, *QPEOMP.c*, and *QPESeq.c* uses the wrapper functions in the `/engine` directory to perform high-level queries. For now, these files should read in each command in the `sample-queries.txt` file, use the parser to determine which specific functionality is desired, then run it through the engine to get the specific results. This will also perform high-level benchmarking.
42+
The `QPEMPI.c`, `QPEOMP.c`, and `QPESeq.c` files use the wrapper functions in the `/engine` directory to perform high-level queries. These files read commands from `sample-queries.txt`, use the tokenizer to parse them, and then run them through the appropriate engine (Serial, OpenMP, or MPI) to get results. They also handle high-level benchmarking.

docs/schema.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,11 @@
2323

2424
There should be minimal default row indexes, as each one will require a seperate B+ tree to be stored in memory. Below are the currently chosen *default* indexes:
2525

26-
- TODO
26+
- `command_id` (UINT64)
27+
- `user_id` (INT)
28+
- `risk_level` (INT)
29+
- `exit_code` (INT)
30+
- `sudo_used` (BOOL)
2731

2832
## Generation
2933

0 commit comments

Comments
 (0)