feat(rust/sedona-pointcloud) Initial LAZ format support #471

b4l · 2025-12-30T18:57:52Z

This provides basic listing table provider functionality for LAZ files, featuring partitioned/parallel read capabilities with low memory footprint.

paleolimbot · 2025-12-31T03:19:24Z

Looking forward to this!

Just a note that the packaging CI is mad because your files don't have Apache license headers (you can copy them from any other .rs file in the repo).

paleolimbot · 2026-01-06T04:02:35Z

Sorry for being slow here...I spent today catching up on reviews that had accumulated over the holidays and will take a look tomorrow!

paleolimbot

This is awesome! I left some comments inline but most of them can be follow-ups.

The main thing this needs before it can be merged is test coverage. I didn't spot any tests but feel free to let me know if I missed them! I would suggest:

Adding .laz file(s) that cover the range of input options you expose here to apache/sedona-testing, maybe with scripts to generate them if that's possible
Read the test files and check output with assert_batches_equal!(). There are some tests in sedona-geoparquet that do this kind of thing except yours can be easier because you can use your Plain geometry output to avoid having to check WKB.

sedona-db/rust/sedona-geoparquet/src/format.rs

Lines 564 to 573 in 68970b0

    
           fn setup_context() -> SessionContext { 
        
               let mut state = SessionStateBuilder::new().build(); 
        
               state 
        
                   .register_file_format(Arc::new(GeoParquetFormatFactory::new()), true) 
        
                   .unwrap(); 
        
               SessionContext::new_with_state(state).enable_url_table() 
        
           } 
        
           #[tokio::test] 
        
           async fn format_from_url_table() {

For some lower-level components you could also hard-code some byte ranges (e.g., the header bytes or the bytes for a records). For the builder you could load the builders with some data to check finish().

Cargo.toml

examples/sedonadb-rust-pointcloud/src/main.rs

rust/sedona-pointcloud/src/laz/builder.rs

rust/sedona-pointcloud/src/laz/format.rs

rust/sedona-pointcloud/src/laz/opener.rs

rust/sedona-pointcloud/src/laz/schema.rs

paleolimbot · 2026-01-06T16:57:26Z

rust/sedona-pointcloud/src/laz/metadata.rs

+        // TODO: proper size
+        std::mem::size_of_val(self)


chunk_table capacity in bytes + extra_attributes capacity in bytes?

Improved in cd2b8ba

Maybe also need to calculate the header, which can blow up with arbitrarily sized (E)VLRs

rust/sedona-pointcloud/src/laz/metadata.rs

paleolimbot

Thank you for this (and for handling the Arrow/DataFusion update!)

I left some inline comments about ways to test these files. I am mostly worried that a future contributor or an LLM acting on their behalf will roll through and break something and there won't be a failing test to detect a regression for (e.g.) Int8s with a nodata value.

I think probably a golden file that exercises the full matrix of extra attribute data types / with or without offset / with or without scale / with or without nodata value + assertions that we read the file's content correctly would be sufficient. If it's easy to build that file using the Builder on demand we can do that too.

rust/sedona-pointcloud/src/laz/opener.rs

paleolimbot · 2026-01-15T15:52:35Z

rust/sedona-pointcloud/src/laz/reader.rs

+    #[allow(static_mut_refs)]
+    #[tokio::test]
+    async fn reader_basic_e2e() {
+        // create laz file
+        static mut LAZ: Vec<u8> = Vec::new();


Thanks for adding this!

Does this really need to be static mut? For some other tests we use a temporary directory, which would also let you check a multi file read.

sedona-db/rust/sedona/src/context.rs

Line 622 in 9065867

let tmpdir = tempdir().unwrap();

paleolimbot · 2026-01-15T16:01:01Z

rust/sedona-pointcloud/src/laz/metadata.rs

+    #[allow(static_mut_refs)]
+    #[tokio::test]
+    async fn header_basic_e2e() {
+        // create laz file
+        static mut LAZ: Vec<u8> = Vec::new();


Maybe also just a tempfile unless this really needs to be static/mutable?

paleolimbot · 2026-01-15T16:03:02Z

rust/sedona-pointcloud/src/laz/reader.rs

+            .await
+            .unwrap();
+
+        assert_eq!(batch.num_rows(), 1);


This level of granularity is OK if all the lower-level pieces are tested, although in this case they are mostly not.

paleolimbot · 2026-01-15T16:10:45Z

rust/sedona-pointcloud/src/laz/builder.rs

+    }
+
+    Ok(width)
+}


Some suggestions for tests that should live in this file:

Create a builder with zero rows, check all of the output options to ensure they give you the schema (or at least number of columns) you are expecting

The building of attributes (tests should cover each branch here). One of the nice parts about refactoring this to use GATs if you can would be that there are fewer branches to test (although probably easier to use an rstest parameterized test like #[values(DataType::Int8, DataType::Int16, ...)]).

I don't think you'll need a test file for any of that (but you might need to create a mock header and mock attributes with/without offset, scale, and nodata).

paleolimbot · 2026-01-15T16:23:15Z

rust/sedona-pointcloud/src/laz/metadata.rs

+
+pub(crate) fn extra_bytes_attributes(
+    header: &Header,
+) -> Result<Vec<ExtraAttribute>, Box<dyn Error + Send + Sync>> {


Maybe?

Suggested change

) -> Result<Vec<ExtraAttribute>, Box<dyn Error + Send + Sync>> {

) -> Result<Vec<ExtraAttribute>, DataFusionError> {

paleolimbot · 2026-01-15T16:28:03Z

rust/sedona-pointcloud/src/laz/metadata.rs

+    store: &(impl ObjectStore + ?Sized),
+    object_meta: &ObjectMeta,
+    header: &Header,
+) -> Result<Vec<ChunkMeta>, Box<dyn Error + Send + Sync>> {


Suggested change

) -> Result<Vec<ChunkMeta>, Box<dyn Error + Send + Sync>> {

) -> Result<Vec<ChunkMeta>, DataFusionError> {

Probably use plan_err!() for these?

paleolimbot · 2026-01-15T16:30:47Z

rust/sedona-pointcloud/src/laz/metadata.rs

+            reader.header(),
+            &metadata_reader.fetch_header().await.unwrap()
+        );
+    }


With a dummy header (perhaps you can inline the bytes of a known test file, or perhaps you can create one using the Builder that exercises the code path for the full matrix of data types by offset/scale/nodata, or create a function that accepts DataType, offset, scale, nodata, and outputs a file with exactly one extra attribute that we can use in a parameterized test to check that it roundtrips.

b4l force-pushed the laz branch from 7ec7560 to e7414eb Compare December 30, 2025 19:03

b4l force-pushed the laz branch from e7414eb to 200d8f0 Compare January 2, 2026 10:44

b4l marked this pull request as ready for review January 2, 2026 10:50

b4l changed the title ~~[WIP] LAZ format support~~ feat(rust/sedona-pointcloud) Initial LAZ format support Jan 2, 2026

paleolimbot reviewed Jan 6, 2026

View reviewed changes

b4l added 9 commits January 7, 2026 10:09

Basic LAZ listing table implementation

e4fae6d

Add license header

82fbcb4

Configurable point encoding

19d4bad

Session context integration

19e657b

Add example

0e3c073

Improve options handling

558aa50

Rebase and address some review comments

a3211f8

Add spatial filter files pruning

aa66953

Improve metadata size and crs handling

cd2b8ba

b4l force-pushed the laz branch from cc0e006 to 2b1084a Compare January 15, 2026 08:42

Add minimal e2e tests

527fecd

b4l force-pushed the laz branch from 2b1084a to 527fecd Compare January 15, 2026 09:21

paleolimbot reviewed Jan 15, 2026

View reviewed changes

	fn setup_context() -> SessionContext {
	let mut state = SessionStateBuilder::new().build();
	state
	.register_file_format(Arc::new(GeoParquetFormatFactory::new()), true)
	.unwrap();
	SessionContext::new_with_state(state).enable_url_table()
	}

	#[tokio::test]
	async fn format_from_url_table() {

	) -> Result<Vec<ExtraAttribute>, Box<dyn Error + Send + Sync>> {
	) -> Result<Vec<ExtraAttribute>, DataFusionError> {

	) -> Result<Vec<ChunkMeta>, Box<dyn Error + Send + Sync>> {
	) -> Result<Vec<ChunkMeta>, DataFusionError> {

feat(rust/sedona-pointcloud) Initial LAZ format support #471

Are you sure you want to change the base?

feat(rust/sedona-pointcloud) Initial LAZ format support #471

Uh oh!

Conversation

b4l commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot commented Dec 31, 2025

Uh oh!

paleolimbot commented Jan 6, 2026

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

b4l commented Dec 30, 2025 •

edited

Loading