Skip to content

Commit 5120cfe

Browse files
committed
Bugfix of build command
1 parent 261787e commit 5120cfe

File tree

4 files changed

+20
-5
lines changed

4 files changed

+20
-5
lines changed

Dockerfile.spark

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,8 @@ RUN java -version
1616
# Install pandas
1717
RUN pip install pandas
1818

19+
# Install build
20+
RUN pip install build
21+
1922
# Install PySpark
2023
RUN pip install pyspark

Dockerfile.test

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
1-
FROM spark_docker_v2
1+
FROM spark_docker_base
22

33
# Build the package
4-
RUN python setup.py sdist bdist_wheel
4+
#RUN python setup.py sdist bdist_wheel
55

66
# Add the distribution
77
COPY src src
88

9-
RUN python -m build src
9+
# Add the config files
10+
ADD pyproject.toml pyproject.toml
11+
12+
RUN python -m build
1013

1114
# Install the package
1215
RUN pip install dist/*.whl

readme.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,13 @@ Run tests by building the Dockerfile.test file using
2424
```bash
2525
docker build -f Dockerfile.test -t test_package .
2626
```
27+
28+
If you are running the tests for the first you first have to build the base dockerfile containing pyspark.
29+
30+
```bash
31+
docker build -f Dockerfile.spark -t spark_docker_base .
32+
```
33+
2734
### Usage
2835
First import the required function
2936

tests/test_text.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,13 @@
1212
class TestVectorDataFrame(unittest.TestCase):
1313
def test_vector_dataframe(self):
1414
spark = SparkSession.builder.master("local").getOrCreate()
15-
with open(os.path.join(DATA_SET_PATH, "text/example_annotation.json"), "r") as f:
15+
with open(os.path.join(DATA_SET_PATH, "text/example_annotation.json"),
16+
"r") as f:
1617
data = json.load(f)
1718

1819
actual_df = get_text_dataframe([data], spark)
1920

20-
expected_df = spark.read.parquet(os.path.join(DATA_SET_PATH, "text/expected_df.parquet"))
21+
expected_df = spark.read.parquet(os.path.join(
22+
DATA_SET_PATH, "text/expected_df.parquet"))
2123
self.assertEqual(sorted(actual_df.collect()),
2224
sorted(expected_df.collect()))

0 commit comments

Comments
 (0)