Skip to content

Commit f04c7e9

Browse files
committed
Fix Sphinx docs + GH pages deployment
- Update artifact for github pages deployment workflow, triggers on push/PR to main - Restore working .ipynb for Genz bcs example - Update docs makefile
1 parent 1f6bd77 commit f04c7e9

File tree

17 files changed

+474
-242
lines changed

17 files changed

+474
-242
lines changed

.github/workflows/documentation.yml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
name: Deploy to GitHub Pages
22

3-
on: [push, pull_request, workflow_dispatch]
3+
on:
4+
pull_request:
5+
branches:
6+
- main
7+
push:
8+
branches:
9+
- main # Change this to your main branch if different
410

511
permissions:
612
contents: write
@@ -18,6 +24,7 @@ jobs:
1824
run: |
1925
pip install sphinx sphinx_rtd_theme myst_parser sphinx-autoapi ipython sphinx-gallery
2026
pip install .[dev]
27+
pip install .[all]
2128
2229
- name: List files
2330
run: |
@@ -50,4 +57,4 @@ jobs:
5057
uses: actions/upload-artifact@v4
5158
with:
5259
name: sphinx-html
53-
path: docs/_build/html
60+
path: docs/_build/html/index.html

docs/Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,5 @@ clean:
3636
rm -rf $(BUILDDIR)/*
3737
# rm -rf auto_examples/
3838
rm -rf auto_tutorials/
39-
rm -rf api/
39+
rm -rf api/
40+
rm sg_execution_times.rst
-136 Bytes
Binary file not shown.
-129 Bytes
Binary file not shown.

docs/auto_examples/ex_genz_bcs.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
},
1616
"outputs": [],
1717
"source": [
18-
"import os\nimport sys\n\nimport numpy as np\nimport copy\nimport math\nimport pytuq.utils.funcbank as fcb\nfrom matplotlib import pyplot as plt\nfrom sklearn.metrics import root_mean_squared_error\n\nfrom pytuq.surrogates.pce import PCE\nfrom pytuq.utils.maps import scaleDomTo01\nfrom pytuq.func.genz import GenzOscillatory"
18+
"import numpy as np\nimport copy\nimport math\nfrom matplotlib import pyplot as plt\nfrom sklearn.metrics import root_mean_squared_error\n\nfrom pytuq.surrogates.pce import PCE\nfrom pytuq.utils.maps import scaleDomTo01\nfrom pytuq.func.genz import GenzOscillatory"
1919
]
2020
},
2121
{

docs/auto_examples/ex_genz_bcs.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,10 @@
2828
"""
2929
# %%
3030

31-
import os
32-
import sys
3331

3432
import numpy as np
3533
import copy
3634
import math
37-
import pytuq.utils.funcbank as fcb
3835
from matplotlib import pyplot as plt
3936
from sklearn.metrics import root_mean_squared_error
4037

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
5470448baa785fa7c8ecd0b4922b46bd
1+
a25d30a5101f142688b3c516dc28d1f2

docs/auto_examples/ex_genz_bcs.rst

Lines changed: 35 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -45,18 +45,15 @@ with parity plots and Root Mean Square Error (RMSE) values used to compare their
4545
To follow along with the cross-validation algorithm for selecting the optimal eta, see section "Functions for cross-validation algorithm" in the second half of the notebook.
4646
These methods have been implemented under-the-hood in PyTUQ. Refer to example "Polynomial Chaos Expansion Construction" (``ex_pce.py``) for a demonstration of how to use these methods through a direct call to the PCE class.
4747

48-
.. GENERATED FROM PYTHON SOURCE LINES 30-45
48+
.. GENERATED FROM PYTHON SOURCE LINES 30-42
4949
5050
.. code-block:: Python
5151
5252
53-
import os
54-
import sys
5553
5654
import numpy as np
5755
import copy
5856
import math
59-
import pytuq.utils.funcbank as fcb
6057
from matplotlib import pyplot as plt
6158
from sklearn.metrics import root_mean_squared_error
6259
@@ -71,7 +68,7 @@ These methods have been implemented under-the-hood in PyTUQ. Refer to example "P
7168
7269
7370
74-
.. GENERATED FROM PYTHON SOURCE LINES 46-54
71+
.. GENERATED FROM PYTHON SOURCE LINES 43-51
7572
7673
Constructing PC surrogate and generating data
7774
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -82,7 +79,7 @@ along with training data and testing data with output noise. This data and the c
8279
will be used to create the same PC surrogate fitted in all three examples: first with linear regression,
8380
next using BCS with a given eta, and third using BCS with the most optimal eta.
8481

85-
.. GENERATED FROM PYTHON SOURCE LINES 56-61
82+
.. GENERATED FROM PYTHON SOURCE LINES 53-58
8683
8784
.. code-block:: Python
8885
@@ -98,7 +95,7 @@ next using BCS with a given eta, and third using BCS with the most optimal eta.
9895
9996
10097
101-
.. GENERATED FROM PYTHON SOURCE LINES 62-87
98+
.. GENERATED FROM PYTHON SOURCE LINES 59-84
10299
103100
.. code-block:: Python
104101
@@ -134,13 +131,13 @@ next using BCS with a given eta, and third using BCS with the most optimal eta.
134131
135132
136133
137-
.. GENERATED FROM PYTHON SOURCE LINES 88-91
134+
.. GENERATED FROM PYTHON SOURCE LINES 85-88
138135
139136
With a stochastic dimensionality of 4 (defined above) and a chosen polynomial order of 4, we construct the PC surrogate that
140137
will be used in both builds. By calling the ``printInfo()`` method from the PCRV variable, you can print the PC surrogate's
141138
full basis and current coefficients, before BCS selects and retains the most significant PC terms to reduce the basis.
142139

143-
.. GENERATED FROM PYTHON SOURCE LINES 91-104
140+
.. GENERATED FROM PYTHON SOURCE LINES 88-101
144141
145142
.. code-block:: Python
146143
@@ -246,19 +243,19 @@ full basis and current coefficients, before BCS selects and retains the most sig
246243
247244
248245
249-
.. GENERATED FROM PYTHON SOURCE LINES 105-108
246+
.. GENERATED FROM PYTHON SOURCE LINES 102-105
250247
251248
From the input parameters of our PC surrogate, we have 70 basis terms in our PCE. With 70 training points and no noise, having 70 basis terms would mean that we have a fully determined system, as the number of training points is the same as the number of basis terms. However, with the addition of noise in our training data, it becomes harder for the model to accurately fit all basis terms, leading to potential overfitting. This demonstrates the helpful role BCS might play as a choice for our regression build. As a sparse regression approach, BCS uses regularization to select only the most relevant basis terms, making it particularly effective in situations like this, where we do not have enough clear information to fit all basis terms without overfitting.
252249

253250
In the next sections, we will explore the effects of overfitting in more detail.
254251

255-
.. GENERATED FROM PYTHON SOURCE LINES 110-113
252+
.. GENERATED FROM PYTHON SOURCE LINES 107-110
256253
257254
Least Squares Regression
258255
^^^^^^^^^^^^^^^^^^^^^^^^^
259256
To start, we call the PCE class method of ``build()`` with no arguments to use the default regression option of least squares. Then, through ``evaluate()``, we can generate model predictions for our training and testing data.
260257

261-
.. GENERATED FROM PYTHON SOURCE LINES 115-123
258+
.. GENERATED FROM PYTHON SOURCE LINES 112-120
262259
263260
.. code-block:: Python
264261
@@ -283,7 +280,7 @@ To start, we call the PCE class method of ``build()`` with no arguments to use t
283280
284281
285282
286-
.. GENERATED FROM PYTHON SOURCE LINES 124-137
283+
.. GENERATED FROM PYTHON SOURCE LINES 121-134
287284
288285
.. code-block:: Python
289286
@@ -318,7 +315,7 @@ To start, we call the PCE class method of ``build()`` with no arguments to use t
318315
319316
320317
321-
.. GENERATED FROM PYTHON SOURCE LINES 138-154
318+
.. GENERATED FROM PYTHON SOURCE LINES 135-151
322319
323320
.. code-block:: Python
324321
@@ -356,7 +353,7 @@ To start, we call the PCE class method of ``build()`` with no arguments to use t
356353
357354
358355
359-
.. GENERATED FROM PYTHON SOURCE LINES 155-163
356+
.. GENERATED FROM PYTHON SOURCE LINES 152-160
360357
361358
.. code-block:: Python
362359
@@ -382,19 +379,19 @@ To start, we call the PCE class method of ``build()`` with no arguments to use t
382379
383380
384381
385-
.. GENERATED FROM PYTHON SOURCE LINES 164-167
382+
.. GENERATED FROM PYTHON SOURCE LINES 161-164
386383
387384
The results above show us the limitations of using least squares regression to construct our surrogate. From the parity plots, we can see how the testing predictions from the LSQ regression are more spread out from the parity line, while the training predictions are extremely close to the line. Because LSQ fits all the basis terms to the training data, the model fits too closely to the noisy training dataset, and the true underlying pattern of the function is not effectively captured. Our RMSE values align with this as well: while the training RMSE is extremely low, the testing RMSE is significantly higher, as the model struggles to generalize to the unseen test data.
388385

389386
To improve our model's generalization, we can build our model with BCS instead. As a sparse regression method, BCS reduces the number of basis terms with which we can fit our data to, reducing the risk of overfitting.
390387

391-
.. GENERATED FROM PYTHON SOURCE LINES 169-172
388+
.. GENERATED FROM PYTHON SOURCE LINES 166-169
392389
393390
BCS with default settings (default eta)
394391
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
395392
In this section, we use the same PC surrogate, ``pce_surr``, for the second build. With the flag ``regression='bcs'``, we choose the BCS method for the fitting. A user-defined eta of 1e-10 is also passed in.
396393

397-
.. GENERATED FROM PYTHON SOURCE LINES 172-181
394+
.. GENERATED FROM PYTHON SOURCE LINES 169-178
398395
399396
.. code-block:: Python
400397
@@ -441,11 +438,11 @@ In this section, we use the same PC surrogate, ``pce_surr``, for the second buil
441438
442439
443440
444-
.. GENERATED FROM PYTHON SOURCE LINES 182-183
441+
.. GENERATED FROM PYTHON SOURCE LINES 179-180
445442
446443
After fitting, we evaluate the PCE using our training and testing data. To analyze the model's goodness of fit, we first plot the surrogate predictions against the training and testing data respectively.
447444

448-
.. GENERATED FROM PYTHON SOURCE LINES 183-188
445+
.. GENERATED FROM PYTHON SOURCE LINES 180-185
449446
450447
.. code-block:: Python
451448
@@ -461,7 +458,7 @@ After fitting, we evaluate the PCE using our training and testing data. To analy
461458
462459
463460
464-
.. GENERATED FROM PYTHON SOURCE LINES 189-202
461+
.. GENERATED FROM PYTHON SOURCE LINES 186-199
465462
466463
.. code-block:: Python
467464
@@ -496,7 +493,7 @@ After fitting, we evaluate the PCE using our training and testing data. To analy
496493
497494
498495
499-
.. GENERATED FROM PYTHON SOURCE LINES 203-219
496+
.. GENERATED FROM PYTHON SOURCE LINES 200-216
500497
501498
.. code-block:: Python
502499
@@ -534,7 +531,7 @@ After fitting, we evaluate the PCE using our training and testing data. To analy
534531
535532
536533
537-
.. GENERATED FROM PYTHON SOURCE LINES 220-228
534+
.. GENERATED FROM PYTHON SOURCE LINES 217-225
538535
539536
.. code-block:: Python
540537
@@ -560,13 +557,13 @@ After fitting, we evaluate the PCE using our training and testing data. To analy
560557
561558
562559
563-
.. GENERATED FROM PYTHON SOURCE LINES 229-232
560+
.. GENERATED FROM PYTHON SOURCE LINES 226-229
564561
565562
From our parity plots, we can see how BCS already generalizes better to unseen data as compared to LSQ, with reduced error in our testing data predictions. In our RMSE calculations, notice how the training error is smaller than the testing error. Though the difference in value is small, this amount is still significant as we have noise in our training data yet no noise in our testing data. That the testing error is higher than the training error suggests that overfitting is still happening within our model.
566563

567564
In the next section, we explore how finding the optimal value of eta -- the stopping criterion for the BCS parameter of gamma, determined through a Bayesian evidence maximization approach -- can impact model sparsity and accuracy to avoid overfitting.
568565

569-
.. GENERATED FROM PYTHON SOURCE LINES 235-241
566+
.. GENERATED FROM PYTHON SOURCE LINES 232-238
570567
571568
BCS with optimal eta (found through cross-validation)
572569
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -575,7 +572,7 @@ Before we build our PC surrogate again with the most optimal eta, we first expos
575572
Functions for cross-validation algorithm
576573
+++++++++++++++++++++++++++++++++++++++++
577574

578-
.. GENERATED FROM PYTHON SOURCE LINES 243-285
575+
.. GENERATED FROM PYTHON SOURCE LINES 240-282
579576
580577
.. code-block:: Python
581578
@@ -628,7 +625,7 @@ Functions for cross-validation algorithm
628625
629626
630627
631-
.. GENERATED FROM PYTHON SOURCE LINES 286-330
628+
.. GENERATED FROM PYTHON SOURCE LINES 283-327
632629
633630
.. code-block:: Python
634631
@@ -683,7 +680,7 @@ Functions for cross-validation algorithm
683680
684681
685682
686-
.. GENERATED FROM PYTHON SOURCE LINES 331-448
683+
.. GENERATED FROM PYTHON SOURCE LINES 328-445
687684
688685
.. code-block:: Python
689686
@@ -811,15 +808,15 @@ Functions for cross-validation algorithm
811808
812809
813810
814-
.. GENERATED FROM PYTHON SOURCE LINES 449-454
811+
.. GENERATED FROM PYTHON SOURCE LINES 446-451
815812
816813
BCS build with the most optimal eta
817814
+++++++++++++++++++++++++++++++++++++
818815
Instead of using a default eta, here we call the cross-validation algorithm, ``optimize_eta()``, to choose the most optimal eta from a range of etas given below.
819816

820817
- With the flag ``plot=True``, the CV algorithm produces a graph of the training and testing (validation) data's RMSE values for each eta. The eta with the smallest RMSE for the validation data is the one chosen as the optimal eta.
821818

822-
.. GENERATED FROM PYTHON SOURCE LINES 454-461
819+
.. GENERATED FROM PYTHON SOURCE LINES 451-458
823820
824821
.. code-block:: Python
825822
@@ -1007,15 +1004,15 @@ Instead of using a default eta, here we call the cross-validation algorithm, ``o
10071004
10081005
10091006
1010-
.. GENERATED FROM PYTHON SOURCE LINES 462-467
1007+
.. GENERATED FROM PYTHON SOURCE LINES 459-464
10111008
10121009
From our eta plot above, we can see that our most optimal eta falls at :math:`1 \times 10^{-10}`, where the validation error is the lowest. While this indicates that the model performs well at this eta value, we can still observe a tendency towards overfitting in the model. For larger eta values, the training and validation RMSE lines are close together, suggesting that the model is performing similarly on both seen and unseen datasets, as would be desired. However, as eta decreases, the training RMSE falls while the validation RMSE rises, highlighting a region where overfitting occurs.
10131010

10141011
This behavior is expected because smaller eta values retain more basis terms, increasing the model's degrees of freedom. While this added flexibility allows the model to fit the training data more closely, it also makes the model more prone to fitting noise rather than capturing the true underlying function. Selecting the most optimal eta of :math:`1 \times 10^{-4}`, as compared to the earlier user-defined eta of :math:`1 \times 10^{-10}`, allows us to balance model complexity and generalization.
10151012

10161013
Now, with the optimum eta obtained, we can run the fitting again and produce parity plots for our predicted output.
10171014

1018-
.. GENERATED FROM PYTHON SOURCE LINES 467-476
1015+
.. GENERATED FROM PYTHON SOURCE LINES 464-473
10191016
10201017
.. code-block:: Python
10211018
@@ -1052,7 +1049,7 @@ Now, with the optimum eta obtained, we can run the fitting again and produce par
10521049
10531050
10541051
1055-
.. GENERATED FROM PYTHON SOURCE LINES 477-482
1052+
.. GENERATED FROM PYTHON SOURCE LINES 474-479
10561053
10571054
.. code-block:: Python
10581055
@@ -1068,7 +1065,7 @@ Now, with the optimum eta obtained, we can run the fitting again and produce par
10681065
10691066
10701067
1071-
.. GENERATED FROM PYTHON SOURCE LINES 483-496
1068+
.. GENERATED FROM PYTHON SOURCE LINES 480-493
10721069
10731070
.. code-block:: Python
10741071
@@ -1103,7 +1100,7 @@ Now, with the optimum eta obtained, we can run the fitting again and produce par
11031100
11041101
11051102
1106-
.. GENERATED FROM PYTHON SOURCE LINES 497-510
1103+
.. GENERATED FROM PYTHON SOURCE LINES 494-507
11071104
11081105
.. code-block:: Python
11091106
@@ -1138,7 +1135,7 @@ Now, with the optimum eta obtained, we can run the fitting again and produce par
11381135
11391136
11401137
1141-
.. GENERATED FROM PYTHON SOURCE LINES 511-519
1138+
.. GENERATED FROM PYTHON SOURCE LINES 508-516
11421139
11431140
.. code-block:: Python
11441141
@@ -1164,7 +1161,7 @@ Now, with the optimum eta obtained, we can run the fitting again and produce par
11641161
11651162
11661163
1167-
.. GENERATED FROM PYTHON SOURCE LINES 520-523
1164+
.. GENERATED FROM PYTHON SOURCE LINES 517-520
11681165
11691166
In these final RMSE calculations, we can see how our training RMSE has decreased from 1.80e-02 to 1.21e-02 by building with the most optimal eta. This indicates that our model has improved in generalization and is performing better on unseen data. Though our training error is still larger than our testing error, this can be attributed to the lack of noise in our testing data, while noise is present in our training data. While the optimal eta reduces overfitting and improves generalization, the noise in our training data still impacts the training error and remains an important consideration during our evaluation of the model performance.
11701167

@@ -1173,7 +1170,7 @@ While this demonstration calls the cross-validation algorithm as a function outs
11731170

11741171
.. rst-class:: sphx-glr-timing
11751172

1176-
**Total running time of the script:** (0 minutes 6.413 seconds)
1173+
**Total running time of the script:** (0 minutes 10.810 seconds)
11771174

11781175

11791176
.. _sphx_glr_download_auto_examples_ex_genz_bcs.py:

docs/auto_examples/ex_genz_bcs.zip

-117 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)