File tree Expand file tree Collapse file tree 4 files changed +20
-5
lines changed
Expand file tree Collapse file tree 4 files changed +20
-5
lines changed Original file line number Diff line number Diff line change @@ -16,5 +16,8 @@ RUN java -version
1616# Install pandas
1717RUN pip install pandas
1818
19+ # Install build
20+ RUN pip install build
21+
1922# Install PySpark
2023RUN pip install pyspark
Original file line number Diff line number Diff line change 1- FROM spark_docker_v2
1+ FROM spark_docker_base
22
33# Build the package
4- RUN python setup.py sdist bdist_wheel
4+ # RUN python setup.py sdist bdist_wheel
55
66# Add the distribution
77COPY src src
88
9- RUN python -m build src
9+ # Add the config files
10+ ADD pyproject.toml pyproject.toml
11+
12+ RUN python -m build
1013
1114# Install the package
1215RUN pip install dist/*.whl
Original file line number Diff line number Diff line change @@ -24,6 +24,13 @@ Run tests by building the Dockerfile.test file using
2424``` bash
2525docker build -f Dockerfile.test -t test_package .
2626```
27+
28+ If you are running the tests for the first you first have to build the base dockerfile containing pyspark.
29+
30+ ``` bash
31+ docker build -f Dockerfile.spark -t spark_docker_base .
32+ ```
33+
2734### Usage
2835First import the required function
2936
Original file line number Diff line number Diff line change 1212class TestVectorDataFrame (unittest .TestCase ):
1313 def test_vector_dataframe (self ):
1414 spark = SparkSession .builder .master ("local" ).getOrCreate ()
15- with open (os .path .join (DATA_SET_PATH , "text/example_annotation.json" ), "r" ) as f :
15+ with open (os .path .join (DATA_SET_PATH , "text/example_annotation.json" ),
16+ "r" ) as f :
1617 data = json .load (f )
1718
1819 actual_df = get_text_dataframe ([data ], spark )
1920
20- expected_df = spark .read .parquet (os .path .join (DATA_SET_PATH , "text/expected_df.parquet" ))
21+ expected_df = spark .read .parquet (os .path .join (
22+ DATA_SET_PATH , "text/expected_df.parquet" ))
2123 self .assertEqual (sorted (actual_df .collect ()),
2224 sorted (expected_df .collect ()))
You can’t perform that action at this time.
0 commit comments