added predictors #86

feefladder · 2025-04-02T02:08:52Z

Added predictors and tests.

Since floating point predictors shuffle horizontal padding into the output, quite a lot more information is needed, so I made a public PredictorInfo struct with public methods that give tile/chunk info.

Summary:

Added crate-public unpredict_float/hdiff functions
- Endianness fixing is together with predictors, since the horizontal predictor fixes endianness first and then does the differencing, whereas the floating point predictor ~~first does the horizontal differencing (on bytes) and then fixes endianness together with the shuffling.~~ always orders bytes be-like spec pdf
- Predictor::None -> use fix_endianness
- Predictor::Horizontal horizontal prediction and endianness, inside unpredict_hdiff
- Predictor::Float floating point prediction and endianness. based on this comment
Because endianness is needed info, include in in IFD and their from_tags function.
Added a PredictorInfo struct, inspired by tiff2.
- provided functions are expected to be used by the user in their own implementation:
  - included PlanarConfiguration, even though no decoding actually supports Planar, only Chunky
  - included SampleFormat, even though we don't test for them in predictors
  - made functions work on Planar, except if bits_per_sample is non-homogeneous
- has all needed info to determine:
  - byte width of a chunk row
  - number of samples and their size(s)
Made the abstraction that a stripped tiff = tiled tiff with chunk_width=image_width
- In image-tiff and tiff2, strips and tiles are kept separate, where the end result is that the same calculations are done through different functions with different implementations.
- I wanted to make this abstraction earlier in tiff2, but realized it too late
- Can be easily undone by erroring when trying to create a PredictorInfo on non-tiled tiff

Some notes:

I did put in the possibility for multiple sample formats, which links our lifetime to the tile... I think it'd also be perfectly fine to accept only a single value, but then I think it'd also make sense to reflect that at IFD?
image-tiff does not support horizontal predictor for floating-point rasters, GDAL does. Since here the two layers are decoupled, I didn't put any checks, but just give the user bogus output if they give bogus input.
output being Bytes, rather than having a &mut [u8] function input. The &mut [u8] would be preferred by me, because then it can be directly read into a user-provided buffer that has also ensured the alignment of the buffer (e.g. using initializing a Vec<f32> buffer and then bytemucking to &mut [u8])

src/tile.rs

src/predictor.rs

kylebarron · 2025-04-02T15:52:29Z

src/predictor.rs

+        predictor_info: &PredictorInfo,
+        tile_x: u32,
+        tile_y: u32,
+    ) -> AsyncTiffResult<Bytes>; // having this Bytes will give alignment issues later on


Why? Bytes is pretty much just Arc<Vec<u8>>

So if:

the tiff is something with alignment>1, e.g. f32

the global allocator gives out a misaligned Vec (which is doesn't often do)
then the user has two options:

copy over the data into a Vec using f32::from_ne_bytes()

use bytemuck and hope for the best
I looked into this quite deeply, and afaik, most "standard" allocators allocate with alignment 8, it would just save a copy in my mind?

I think instead we should discuss: at what points should be storing Vec<u8> and at what points be storing structured array types.

I think to add to my previous comment: the question is where we convert out of Bytes. I think the core networking trait (AsyncFileReader) should remain as it is, where get_bytes returns bytes::Bytes. Most networking code, at least reqwest and object_store return buffers as Bytes.

There's no way to convert from a Bytes to a Vec<u8> zero-copy. (You can sometimes convert from a Bytes to BytesMut zero-copy). So that implies at some point we make a data copy from a Bytes into a Vec<T> in order to be safe. Or we could use similar code as from the Arrow project and build typed interfaces on top of an Arc<Bytes>, such as their Buffer type.

I think to add to my previous comment: the question is where we convert out of Bytes. I think the core networking trait (AsyncFileReader) should remain as it is, where get_bytes returns bytes::Bytes. Most networking code, at least reqwest and object_store return buffers as Bytes.

I'm all up for not changing AsyncFileReader. I would say after the decompression step is where we can have typed arrays, since decompression itself is not zero-copy (except for no compression). In contrast to reqwest and object_store, we actually know the underlying datatype

Then for the endianness fixing and horizontal predictor, it is also nice to have exclusive access to the underlying buffer (&mut[u8]), since that is zero-copy (not the float predictor). Passing typed arrays over to the predictor doesn't make much sense there imho either, since the operations don't differentiate between e.g. f64 and u64 and it's already quite complex as-is.

so I thought something like:

// pseudocode fn Tile.decode(&self, &decoder) -> AsyncTiffResult<DecodingResult> { // tiff2's DecodingResult let mut res: DecodingResult = todo!() // smart buffer sizing decoder.decode(self.compressed_bytes, &mut res.buf_mut()); //bytemuck casting match self.predictor_info.predictor { // mutates in-place Predictor::Horizontal => unpredict_horizontal(&mut res.buf_mut(), &self.predictor_info, self.x) // also mutates in-place, but uses a copy Predictor::Float => unpredict_float(&mut res.buf_mut(), &self.predictor_info, self.x, self.y) } }

But maybe put this in a separate issue/PR? I didn't put it in here, because this PR was already medium-large and "my ideal situation^" would change some non-related things.

There's no way to convert from a Bytes to a Vec<u8> zero-copy. (You can sometimes convert from a Bytes to BytesMut zero-copy). So that implies at some point we make a data copy from a Bytes into a Vec<T> in order to be safe. Or we could use similar code as from the Arrow project and build typed interfaces on top of an Arc<Bytes>, such as their Buffer type.

In the current (this PR) implementation, I already did a BytesMut::From() quite some times, even though we were still in the same decoding step.

Sidenote: there is also some discussion going on over at image-tiff, where they want to just output a Bytes or &[u8], but then also the alignment issue was raised and somewhat ignored.

but... separate issue/PR? I think this discussion is more about optimization/api than predictors? (even though they overlap quite a bit, could also be here?) #87

src/predictor.rs

src/tile.rs

geospatial-jeff · 2025-04-02T19:12:32Z

src/tile.rs

+    }
+
+    /// The number of chunks in the horizontal (x) direction
+    pub fn chunks_across(&self) -> u32 {


To add to my previous comment on removing the PredictorInfo struct; both chunks_across and chunks_down are useful outside of handling predictors. It would make sense to include these on the IFD struct (ex. IFD.chunks_across).

Even if we keep the PredictorInfo, as I think we probably should, we can put this functionality in a shared TileMath trait which both the IFD and PredictorInfo implement

also the PredictorInfo separate made tests easier than having to build a full IFD with partial unused required data.

One other option still is to add all PredictorInfo attributes to Tile and then passing a &Tile into the unpredict_... functions for a bit more of a flat struct layout.

Then implement the TileMath trait for Tile and IFD, which makes a lot of sense api-wise

kylebarron · 2025-04-02T20:46:31Z

I made a PR with suggested changes into this branch. See feefladder#1

feefladder · 2025-04-02T21:42:50Z

temporary sadness: "Ah noo... I just finished incorporating feedback :'("

feefladder · 2025-04-03T00:12:05Z

src/predictor.rs

+    ) -> AsyncTiffResult<Bytes> {
+        let output_row_stride = predictor_info.output_row_stride(tile_x)?;
+        let mut res: BytesMut =
+            BytesMut::zeroed(output_row_stride * predictor_info.output_rows(tile_y)?);


Here now I did an output_rows, which is more if we have Planar planar config. I thought then at least we can give the user the data and they can split it themselves?

So now the Planar output will be:

[ Red..., Green., Blue.., ]

where each band is chunk_width_pixels()*chunk_height_pixels()

I thought this made sense, because the decoder (decompression) step decodes the entire data, which is like structured like this

feefladder · 2025-04-03T00:12:41Z

src/predictor.rs

+    /// the number of rows the output has, taking padding and PlanarConfiguration into account.
+    fn output_rows(&self, y: u32) -> AsyncTiffResult<usize> {
+        match self.planar_configuration {
+            PlanarConfiguration::Chunky => Ok(self.chunk_height_pixels(y)? as usize),
+            PlanarConfiguration::Planar => {
+                Ok((self.chunk_height_pixels(y)? as usize)
+                    .saturating_mul(self.samples_per_pixel as _))
+            }
+        }
+    }
+
+    fn bits_per_pixel(&self) -> usize {
+        match self.planar_configuration {
+            PlanarConfiguration::Chunky => {
+                self.bits_per_sample as usize * self.samples_per_pixel as usize
+            }
+            PlanarConfiguration::Planar => self.bits_per_sample as usize,
+        }
+    }


these depend on planar_config

feefladder · 2025-04-03T00:14:23Z

Thanks @kylebarron for the changes!

… to fix_endianness

feefladder · 2025-04-03T04:38:14Z

src/predictor.rs

+    #[cfg(target_endian = "little")]
+    if let Endianness::LittleEndian = byte_order {
+        return buffer;
+    }
+    #[cfg(target_endian = "big")]
+    if let Endianness::BigEndian = byte_order {
+        return buffer;
+    }


I thought splitting the cfg->noop up here was a bit clearer, before cfgs were mixed inside the match statement

feefladder · 2025-04-03T04:41:21Z

src/tile.rs

+            Predictor::None => Ok(fix_endianness(
+                decoded_tile,
+                self.predictor_info.endianness(),
+                self.predictor_info.bits_per_sample(),
+            )),
+            Predictor::Horizontal => {
+                unpredict_hdiff(decoded_tile, &self.predictor_info, self.x as _)
+            }
+            Predictor::FloatingPoint => {
+                unpredict_float(decoded_tile, &self.predictor_info, self.x as _, self.y as _)
+            }


I removed the trait and only have crate-public functions now

This should also help with less-copy as mocked up in #87, since float predictor doesn't do in-place modification and having a shared trait doesn't allow the differentiation

feefladder · 2025-04-03T13:00:54Z

resolved: On another note: Since now there is this abstraction of "Chunks" for tiles and strips, the name Tile doesn't make much sense anymore for stripped tiffs, but I think that's fine, and is merely a documentation issue

feefladder added 4 commits April 2, 2025 03:58

added predictors

d49342c

improved tests, fixed some bugs

df407ba

added docs+doctests

c37e067

fixed clippy, doctest improvements

9f704d8

feefladder force-pushed the predictors branch 2 times, most recently from 19f36f8 to e0a80af Compare April 2, 2025 14:17

added padding test, fixed corresponding bugs, removed printlns

6c9a8cf

feefladder force-pushed the predictors branch from e0a80af to 6c9a8cf Compare April 2, 2025 14:20

kylebarron reviewed Apr 2, 2025

View reviewed changes

src/tile.rs Outdated Show resolved Hide resolved

kylebarron reviewed Apr 2, 2025

View reviewed changes

src/tile.rs Outdated Show resolved Hide resolved

kylebarron reviewed Apr 2, 2025

View reviewed changes

src/predictor.rs Show resolved Hide resolved

kylebarron reviewed Apr 2, 2025

View reviewed changes

src/predictor.rs Outdated Show resolved Hide resolved

geospatial-jeff reviewed Apr 2, 2025

View reviewed changes

kylebarron added 11 commits April 2, 2025 15:16

Remove registry

a3291b0

Rename to Unpredict

b3640c0

Change to pub(crate) fields

67eca9c

Change to pub(crate)

a50fb81

Remove lifetime and store a single element for bits_per_pixel

2b3fe03

Remove unused planar configuration

d754e14

chunk_width and chunk_height without unwrap

cda2bcc

Move PredictorInfo into predictor.rs

4787983

Remove unnecessary doctests and unused code

931f58d

Only call chunks_* once

2ac868a

Ensure no copies when endianness matches system endianness

be74506

feefladder force-pushed the predictors branch from fddeed5 to 6c9a8cf Compare April 2, 2025 23:00

added planar_configuration back, updated bits_per_pixel and added tests

a0ff033

feefladder mentioned this pull request Apr 3, 2025

Where to go from Bytes to DecodingResult? #87

Open

silly clippy things

26176f9

feefladder commented Apr 3, 2025

View reviewed changes

removed UnPredict trait in favour of separate functions; small change…

111630b

… to fix_endianness

feefladder commented Apr 3, 2025

View reviewed changes

added doc comment clarifying that strips are also tiles

73d4894

feefladder force-pushed the predictors branch from 6162334 to 73d4894 Compare April 3, 2025 13:17

weiji14 mentioned this pull request May 6, 2025

Raise NotFoundError rather than panic on non-existent file #90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added predictors #86

added predictors #86

feefladder commented Apr 2, 2025 •

edited

Loading

kylebarron Apr 2, 2025

feefladder Apr 2, 2025

kylebarron Apr 2, 2025

kylebarron Apr 2, 2025 •

edited

Loading

feefladder Apr 2, 2025 •

edited

Loading

geospatial-jeff Apr 2, 2025

kylebarron Apr 2, 2025

feefladder Apr 3, 2025 •

edited

Loading

kylebarron commented Apr 2, 2025

feefladder commented Apr 2, 2025 •

edited

Loading

feefladder Apr 3, 2025

feefladder Apr 3, 2025

feefladder Apr 5, 2025

feefladder Apr 3, 2025

feefladder commented Apr 3, 2025

feefladder Apr 3, 2025 •

edited

Loading

feefladder Apr 3, 2025

feefladder Apr 3, 2025 •

edited

Loading

feefladder commented Apr 3, 2025 •

edited

Loading

added predictors #86

Are you sure you want to change the base?

added predictors #86

Conversation

feefladder commented Apr 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylebarron Apr 2, 2025 • edited Loading

Choose a reason for hiding this comment

feefladder Apr 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feefladder Apr 3, 2025 • edited Loading

Choose a reason for hiding this comment

kylebarron commented Apr 2, 2025

feefladder commented Apr 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feefladder commented Apr 3, 2025

feefladder Apr 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feefladder Apr 3, 2025 • edited Loading

Choose a reason for hiding this comment

feefladder commented Apr 3, 2025 • edited Loading

feefladder commented Apr 2, 2025 •

edited

Loading

kylebarron Apr 2, 2025 •

edited

Loading

feefladder Apr 2, 2025 •

edited

Loading

feefladder Apr 3, 2025 •

edited

Loading

feefladder commented Apr 2, 2025 •

edited

Loading

feefladder Apr 3, 2025 •

edited

Loading

feefladder Apr 3, 2025 •

edited

Loading

feefladder commented Apr 3, 2025 •

edited

Loading