Investigate possible issues with partial views #244

vbarrielle · 2020-10-22T07:14:23Z

As evidenced by #242 and #243, middle_outer_views is not tested enough, and edge cases may happen in numerous algorithms. It's necessary to write tests for many functions involving these partial views, to reveal the presence of possible bugs.

The text was updated successfully, but these errors were encountered:

mulimoen · 2020-10-22T07:37:48Z

What are the requirements of data and indices? Do they have to be the same length, or is it enough for them to be as long as indptr requires? Does indices outside of the range given by indptr have to be convertible to usize?

mulimoen · 2020-10-22T07:40:39Z

Minor optimisation: to_owned of a view could adjust indptr to start at 0, and contain the minimum amount of entries in indices and data

vbarrielle · 2020-10-22T20:56:45Z

What are the requirements of data and indices? Do they have to be the same length, or is it enough for them to be as long as indptr requires? Does indices outside of the range given by indptr have to be convertible to usize?

In most algorithms, the only requirement is that for all i < indptr.len() - 1, indices[indptr[i]..indptr[i + 1] is not out of bounds, same for data. This imposes that data and indices have the same length, but does not impose anything outside that range. However, in the current state, the structure checks of CsMat are much more restrictive, and it's very probable some implementations depend on these more precise details.

I see two possible lines of action:

commit to the possibility to have zero-copy partial views of a matrix, relax structure checks and increase test coverage
take a shortcut and allocate when trying to slice a matrix. This seems to be scipy's approach.

I'm lean towards the first approach, as I don't like the performance penalty of the second one.

Minor optimisation: to_owned of a view could adjust indptr to start at 0, and contain the minimum amount of entries in indices and data

Yes that looks like a good idea.

vbarrielle · 2020-10-22T21:03:36Z

A drawback of the zero-copy approach is that equality between two sparse matrices is no longer trivial, because the non-indexed elements are arbitrary. Pretty sure the current implementation exposes this bug.

vbarrielle · 2020-10-23T06:32:55Z

There is a way to have a zero copy approach without having unused/invalid elements. The idea is to use another type for the indptr storage, that would store a slice and an offset to apply to the values. To make this fit into the API there would need to be a trait defining everything that can be done with an indptr, and all algorithms would be implemented using this trait. This way the type system would help to prove correct usage.

That would technically be a breaking change, but not one that would affect much code in practice I guess.

mulimoen · 2020-10-23T13:34:05Z

Equality as defined is given as structural equality, so don't think this is a big problem. The approx trait could be used for semantic equality

mulimoen · 2020-10-23T13:34:50Z

Scipy ensures indptr always starts at 0, what does suitesparse do?

vbarrielle · 2020-10-23T16:55:33Z

Looks like SuiteSparse enforces the indptr to start at 0, for instance in ldl_valid_matrix.

vbarrielle · 2020-10-23T16:59:57Z

There is a way to have a zero copy approach without having unused/invalid elements. The idea is to use another type for the indptr storage, that would store a slice and an offset to apply to the values. To make this fit into the API there would need to be a trait defining everything that can be done with an indptr, and all algorithms would be implemented using this trait. This way the type system would help to prove correct usage.

That would technically be a breaking change, but not one that would affect much code in practice I guess.

I've started prototyping a trait around that idea, any feedback is appreciated!

mulimoen · 2020-10-24T09:53:34Z

How about this approach: https://gist.github.com/mulimoen/c476ee6a2daf5ba9e341d87dfbc46e14? data and indices should always be a part of the matrix, indptr[0] is just a remapping down to zero.

vbarrielle · 2020-10-24T21:25:01Z

That's a nice insight, no need for a trait with your approach!

mulimoen · 2020-10-25T14:05:41Z

Another alternative: indptr could return a Cow

@mulimoen

As suggested by @mulimoen (#244 (comment)), it's doable to have a concrete type abstracting the classic indptr case with the first element at 0, and the sliced matrix case, by simply performing the subtraction everywhere.

@mulimoen

As suggested by @mulimoen (#244 (comment)), it's doable to have a concrete type abstracting the classic indptr case with the first element at 0, and the sliced matrix case, by simply performing the subtraction everywhere.

vbarrielle · 2020-10-25T21:37:11Z

I really like the concrete type approach with indptr[0] used as an offset, that makes the type a guarantee of valid data, and it can take its own share of the structure checks. I've started implementing it, it looks like this for now: https://github.com/vbarrielle/sprs/blob/99663dcad21e34e3260b64e37b6e5fa33d14ff5a/src/sparse/indptr.rs

mulimoen · 2020-10-26T15:17:08Z

Looking great! I have a couple of ideas below

I'm missing a form of canonical indptr, which can be used for FFI, maybe expressed as a Cow for an improper indptr?
ndarray calls the slice operation as_slice, should we follow the same naming?
new consumes storage on failure, could we return this on failure?
Internal API idea: a method ranges which returns an iterator of std::ops::Range<usize> which can be directly consumed
A lot of the maps can be removed, and we could use unwrap, we should always have a first and last element

vbarrielle · 2020-10-27T16:51:16Z

* I'm missing a form of canonical `indptr`, which can be used for FFI, maybe expressed as a `Cow` for an improper indptr?

Yes this looks like a good idea.

* `ndarray` calls the `slice` operation `as_slice`, should we follow the same naming?

I'll change that for coherence. But it's fallible here, so maybe try_as_slice?

* `new` consumes `storage` on failure, could we return this on failure?

I'll see if it can be done.

* Internal API idea: a method `ranges` which returns an iterator of `std::ops::Range<usize>` which can be directly consumed

That's interesting indeed, maybe it should be what's returned by iter_outer? There's one downside to that, as Rangeis not Copy AFAIR, but other than that I like this idea even for a public API.

* A lot of the `map`s can be removed, and we could use `unwrap`, we should always have a `first` and `last` element

I don't think we will always have a first and last element, for instance I don't think we should disallow creating a view around 0 rows. I know in dense linear algebra this kind of edge case can be very useful.

mulimoen · 2020-10-27T17:15:26Z

ndarrays as_slice returns an Option, try_as_slice seems a bit verbose.

A size one indptr has two elements, it would be "natural" to have a zero sized indptr with one element. Although a length zero would mean we could implement Default

vbarrielle · 2020-10-27T22:38:47Z

@mulimoen was that the kind of Cow-returning you had in mind?

mulimoen · 2020-10-27T22:44:52Z

That was exactly what I had in mind, we can now use this in to_owned on CsMatbase

vbarrielle · 2020-10-31T13:09:40Z

Hello, just an update to inform that my work on this is going to be a bit slow, my second son is born this week and I'm having much less free time. I still hope to be able to keep working on this though.

vbarrielle · 2020-11-13T21:01:27Z

My current state is shown is #245, it's not without issues but I think most of the work is there now.

vbarrielle self-assigned this Oct 22, 2020

vbarrielle mentioned this issue Nov 13, 2020

Refactor to allow partial views #245

Merged

vbarrielle closed this as completed in #245 Nov 23, 2020

Investigate possible issues with partial views #244

Investigate possible issues with partial views #244

Comments

vbarrielle commented Oct 22, 2020

mulimoen commented Oct 22, 2020

Uh oh!

mulimoen commented Oct 22, 2020

Uh oh!

vbarrielle commented Oct 22, 2020

Uh oh!

vbarrielle commented Oct 22, 2020

Uh oh!

vbarrielle commented Oct 23, 2020

Uh oh!

mulimoen commented Oct 23, 2020

Uh oh!

mulimoen commented Oct 23, 2020

Uh oh!

vbarrielle commented Oct 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vbarrielle commented Oct 23, 2020

Uh oh!

mulimoen commented Oct 24, 2020

Uh oh!

vbarrielle commented Oct 24, 2020

Uh oh!

mulimoen commented Oct 25, 2020

Uh oh!

vbarrielle commented Oct 25, 2020

Uh oh!

mulimoen commented Oct 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vbarrielle commented Oct 27, 2020

Uh oh!

mulimoen commented Oct 27, 2020

Uh oh!

vbarrielle commented Oct 27, 2020

Uh oh!

mulimoen commented Oct 27, 2020

Uh oh!

vbarrielle commented Oct 31, 2020

Uh oh!

vbarrielle commented Nov 13, 2020

Uh oh!

vbarrielle commented Oct 23, 2020 •

edited

Loading

mulimoen commented Oct 26, 2020 •

edited

Loading