Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion ChangeLog
Original file line number Diff line number Diff line change
@@ -1,11 +1,21 @@

0.5: August 21, 2025
- Use array-api for cross-framework compatibility
- Fix rounding issue for large bfloat16s by @blissb-positron in #49
- Replaced microxscaling with torchao
- Update IEEE P3109 implementation to Interim Report v3

0.4: Nov 13, 2024
- Add stochastic rounding
- Add vectorized versions of round/encode/decode for near-JAX speed
- Update IEEE P3109 implementation to Interim report v2

0.3: Jun 10, 2024
- Use python ints throughout, adding float64 to test
- Simplify round, fix directed rounding
- Rename "ival" to "code" in FloatValue
- Shorten format names from "format_info_*" to "*"


0.2: May 21, 2024
- Add MX Formats
- Improved CI
Expand Down
45 changes: 28 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,24 +19,35 @@ See https://gfloat.readthedocs.io for documentation, or dive into the notebooks

For example, here's a table from the [02-value-stats](docs/source/02-value-stats.ipynb) notebook:

|name|B: Bits in the format|P: Precision in bits|E: Exponent field width in bits|0<x<1|1<x<Inf|Exact in float16?|maxFinite|minFinite|maxNormal|minNormal|minSubnormal|maxSubnormal|
|name|B: Bits in the format|P: Precision in bits|E: Exponent field width in bits|Exact in float16?|Exact in float32?|0<x<1|1<x<Inf|minSubnormal|maxSubnormal|minNormal|maxNormal| |
|--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
|ocp_e2m1|4|2|2|1|5|True|6|-6|6|1|0.5|0.5|
|ocp_e2m3|6|4|2|7|23|True|7.5|-7.5|7.5|1|0.125|0.875|
|ocp_e3m2|6|3|3|11|19|True|28|-28|28|0.25|0.0625|0.1875|
|ocp_e4m3|8|4|4|55|70|True|448|-448|448|0.015625|1*2^-9|7/4*2^-7|
|ocp_e5m2|8|3|5|59|63|True|57344|-57344|57344|1*2^-14|1*2^-16|3/2*2^-15|
|p3109_8p1|8|1|7|62|63|False|1*2^63|-1*2^63|1*2^63|1*2^-62|nan|nan|
|p3109_8p2|8|2|6|63|62|False|1*2^31|-1*2^31|1*2^31|1*2^-31|1*2^-32|1*2^-32|
|p3109_8p3|8|3|5|63|62|True|49152|-49152|49152|1*2^-15|1*2^-17|3/2*2^-16|
|p3109_8p4|8|4|4|63|62|True|224|-224|224|0.0078125|1*2^-10|7/4*2^-8|
|p3109_8p5|8|5|3|63|62|True|15|-15|15|0.125|0.0078125|15/8*2^-4|
|p3109_8p6|8|6|2|63|62|True|3.875|-3.875|3.875|0.5|0.015625|31/16*2^-2|
|bfloat16|16|8|8|16255|16383|False|255/128*2^127|-255/128*2^127|255/128*2^127|1*2^-126|1*2^-133|127/64*2^-127|
|ocp_int8|8|8|0|63|63|True|127/64*2^0|-2|nan|nan|0.015625|127/64*2^0|
|ocp_e8m0|8|1|8|127|127|False|1*2^127|1*2^-127|1*2^127|1*2^-127|nan|nan|


| name | B | P | E | rt16 | rt32 | lt1 | gt1 | minSubnormal | maxSubnormal | minNormal | maxNormal |
|--------------|-----|-----|-----|--------|--------|-------|-------|----------------|----------------|-------------|---------------|
| p3109_k3p2sf | 3 | 2 | 1 | True | True | 1 | 1 | 0.5 | 0.5 | 1 | 1.5 |
| ocp_e2m1 | 4 | 2 | 2 | True | True | 1 | 5 | 0.5 | 0.5 | 1 | 6 |
| p3109_k4p2sf | 4 | 2 | 2 | True | True | 3 | 3 | 0.25 | 0.25 | 0.5 | 3 |
| ocp_e2m3 | 6 | 4 | 2 | True | True | 7 | 23 | 0.125 | 0.875 | 1 | 7.5 |
| ocp_e3m2 | 6 | 3 | 3 | True | True | 11 | 19 | 0.0625 | 0.1875 | 0.25 | 28 |
| p3109_k6p3sf | 6 | 3 | 3 | True | True | 15 | 15 | 0.03125 | 0.09375 | 0.125 | 14 |
| p3109_k6p4sf | 6 | 4 | 2 | True | True | 15 | 15 | 0.0625 | 0.4375 | 0.5 | 3.75 |
| ocp_e4m3 | 8 | 4 | 4 | True | True | 55 | 70 | 2^-9 | 7/4*2^-7 | 0.015625 | 448 |
| ocp_e5m2 | 8 | 3 | 5 | True | True | 59 | 63 | 2^-16 | 3/2*2^-15 | 2^-14 | 57344 |
| p3109_k8p1se | 8 | 1 | 7 | False | True | 63 | 62 | n/a | n/a | 2^-63 | 2^62 |
| p3109_k8p1ue | 8 | 1 | 8 | False | True | 127 | 125 | n/a | n/a | 2^-127 | 2^125 |
| p3109_k8p3se | 8 | 3 | 5 | True | True | 63 | 62 | 2^-17 | 3/2*2^-16 | 2^-15 | 49152 |
| p3109_k8p3sf | 8 | 3 | 5 | True | True | 63 | 63 | 2^-17 | 3/2*2^-16 | 2^-15 | 57344 |
| p3109_k8p3ue | 8 | 3 | 6 | False | True | 127 | 125 | 2^-33 | 3/2*2^-32 | 2^-31 | 5/4*2^31 |
| p3109_k8p3uf | 8 | 3 | 6 | False | True | 127 | 126 | 2^-33 | 3/2*2^-32 | 2^-31 | 3/2*2^31 |
| p3109_k8p4se | 8 | 4 | 4 | True | True | 63 | 62 | 2^-10 | 7/4*2^-8 | 0.0078125 | 224 |
| p3109_k8p4sf | 8 | 4 | 4 | True | True | 63 | 63 | 2^-10 | 7/4*2^-8 | 0.0078125 | 240 |
| p3109_k8p4ue | 8 | 4 | 5 | True | True | 127 | 125 | 2^-18 | 7/4*2^-16 | 2^-15 | 53248 |
| p3109_k8p4uf | 8 | 4 | 5 | True | True | 127 | 126 | 2^-18 | 7/4*2^-16 | 2^-15 | 57344 |
| p3109_k8p7sf | 8 | 7 | 1 | True | True | 63 | 63 | 0.015625 | 63/32*2^-1 | 1 | 127/64*2^0 |
| p3109_k8p8uf | 8 | 8 | 1 | True | True | 127 | 126 | 0.0078125 | 127/64*2^-1 | 1 | 127/64*2^0 |
| binary16 | 16 | 11 | 5 | True | True | 15359 | 16383 | 2^-24 | 1023/512*2^-15 | 2^-14 | 65504 |
| bfloat16 | 16 | 8 | 8 | False | True | 16255 | 16383 | 2^-133 | 127/64*2^-127 | 2^-126 | 255/128*2^127 |
| ocp_e8m0 | 8 | 1 | 8 | False | True | 127 | 127 | n/a | n/a | 2^-127 | 2^127 |
| ocp_int8 | 8 | 8 | 0 | True | True | 63 | 63 | 0.015625 | 127/64*2^0 | n/a | n/a |
#### Notes

All NaNs are the same, with no distinction between signalling or quiet,
Expand Down
10 changes: 5 additions & 5 deletions docs/source/01-decode.ipynb

Large diffs are not rendered by default.

Loading