-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net: Introduce new record model #11264
Conversation
dotnet/src/Connectors/Connectors.Memory.Pinecone/PineconeVectorStoreRecordCollection.cs
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed 102 out of 168 files, so far LGTM. PTAL at my comments, I'll try to finish the review later today.
dotnet/src/Connectors/Connectors.Memory.AzureAISearch/AzureAISearchGenericDataModelMapper.cs
Outdated
Show resolved
Hide resolved
...rs/Connectors.Memory.AzureCosmosDBMongoDB/AzureCosmosDBMongoDBVectorStoreRecordCollection.cs
Outdated
Show resolved
Hide resolved
...nectors/Connectors.Memory.AzureCosmosDBMongoDB/Connectors.Memory.AzureCosmosDBMongoDB.csproj
Outdated
Show resolved
Hide resolved
.../Connectors/Connectors.Memory.AzureCosmosDBNoSQL/AzureCosmosDBNoSQLGenericDataModelMapper.cs
Show resolved
Hide resolved
...ectors/Connectors.Memory.AzureCosmosDBNoSQL/AzureCosmosDBNoSQLVectorStoreRecordCollection.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs
Outdated
Show resolved
Hide resolved
...et/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordPropertyModel.cs
Show resolved
Hide resolved
...et/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordPropertyModel.cs
Show resolved
Hide resolved
...et/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordPropertyModel.cs
Outdated
Show resolved
Hide resolved
...et/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordPropertyModel.cs
Outdated
Show resolved
Hide resolved
...et/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordPropertyModel.cs
Show resolved
Hide resolved
...et/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordPropertyModel.cs
Outdated
Show resolved
Hide resolved
.../Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordVectorPropertyModel.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...rc/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordDataPropertyModel.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/VectorData.Abstractions.csproj
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/VectorStoreGenericDataModel.cs
Show resolved
Hide resolved
...ntegrationTests/Connectors/Memory/AzureCosmosDBNoSQL/AzureCosmosDBNoSQLVectorStoreFixture.cs
Show resolved
Hide resolved
dotnet/src/InternalUtilities/connectors/Memory/MongoDB/MongoDBGenericDataModelMapper.cs
Show resolved
Hide resolved
dotnet/src/InternalUtilities/src/Data/VectorStoreRecordMapping.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs
Show resolved
Hide resolved
...Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuildingOptions.cs
Outdated
Show resolved
Hide resolved
this.Customize(); | ||
this.Finalize(clrType); | ||
|
||
return new(clrType, this.KeyProperties, this.DataProperties, this.VectorProperties, this.PropertyMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this class be reused to process multiple types and definitions?
If not, I think we should have some very explicit warnings in the xml comments around this, so that consumers do not accidentally use this incorrectly. Or preferably, just make the class thread safe and reusable, so that warnings are unnecessary.
Or make the constructor internal and force invocations via a static method on the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it isn't reusable or thread-safe. We could easily make it reusable, but making it thread safety would mean we can't use any members (e.g. the various lists and dictionaries) and must constantly pass everything as parameters.
The way the model builder (currently) works is that a collection creates one in its constructor, uses it to create the model, and then discards the builder - it's a one-time, startup operation. Performance doesn't seem very important here - the couple extra allocations on instantiating a new collection are generally negligible, unless we really want to start optimizing for super-efficient creation of collections (which would likely imply much more important/heavy things first).
So I basically don't currently see a need for making one-time thing reusable/thread-safe... For now I've added a doc remark that the type is single-use and not thread-safe. Let me know if you disagree and think we should do something else.
Note that all the new connector-facing types under ConnectorSupport should be considered a first version and not yet completely fleshed out (that's why they're [Experimental]). We're very likely still going to have to iterate and mature all this (NativeAOT, hierarchical types, and various other thing). So I'm not (yet) applying the same level of design rigor as we need to on the user-facing public surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this after merging, but I'd definitely recommend switching this design to something that matches it's intended usage pattern.
This is after all intended to be used by other implementers too, and considering that the constructor param values are essentially static to a connector, if I saw the design of this class I would think I can define a static instance in my connector and then just invoke it multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that putting something as static implies not just invoking multiple times (reusability), but also thread-safety, since collections might get created concurrently... At that point I'm really hoping connector implementors don't assume a type is thread-safe without carefully checking the docs etc. (and am also assuming other connector implementors basically copy-paste our code).
FWIW there's a similar question for the fillter translators (though those are internal to each connector rather than a general MEVD component). Expression visitors may need to save some state in members to communicate things between layers, which would be incompatible with concurrent use; and since IVectorStoreRecordCollection is thread-safe, a new translator needs to be instantiated for each query (though we can pool them if we think that's really necessary for perf).
In the interest of moving fast, let's defer this for now - maybe open an issue with possible design evolution around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can do this in a follow up. I think it would be a fairly trivial change. I did note that the filter translators are using a similar pattern, but they are not something we are shipping publicly. Also, the developer who wrote the filter will also be the one who uses the filter, so not the same level of risk or scrutiny required.
dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs
Outdated
Show resolved
Hide resolved
IsFilterable = dataAttribute.IsFilterable, | ||
IsFullTextSearchable = dataAttribute.IsFullTextSearchable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making it possible to just pass the attribute or definition as needed to the VectorStoreRecordDataPropertyModel so that it can do cloning of settings. That way, if we add new properties, cloning is just happening there instead of each point of consumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that you'd like e.g. VectorStoreRecordDataPropertyModel to have direct references to VectorStoreRecordDataAttribute and to VectorStoreRecordDataProperty? Wouldn't that just move the logic of having to access properties on these two types from the model builder to the PropertyModels?
(I think I may be misunderstanding what you have in mind)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the goal. When adding a new property to e.g. VectorStoreRecordDataPropertyModel the copying from VectorStoreRecordDataAttribute and VectorStoreRecordDataProperty can be updated once within the VectorStoreRecordDataPropertyModel class, instead of having to find every place in the code where we currently instantiate new instances of VectorStoreRecordDataPropertyModel from VectorStoreRecordDataAttribute VectorStoreRecordDataProperty and update each of those locations to copy all required properties across.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Instantiating and populating a PropertyModel is something that should only ever happen in the ModelBuilder (or possibly a subclass), never elsewhere - so the scope here is a bit limited; but I get what you're saying. Plus it's worth mentioning that we don't only instantiate a PropertyModel from an attribute/definition, we also populate it (e.g. we create a PropertyModel from an attribute, and then there's also a definition that needs to be added (an possible overwrite) stuff on that PropertyModel.
I'd rather avoid having PropertyModel directly reference the attribute/definition (e.g. in a constructor), which are just sources for it, so how about this: given this all happens within the model builder only, we can just have a Populate() method on VectorStoreRecordModelBuilder which does the copying, this way this all stays inside that scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sounds good. Ultimately this is a nice to have thing, so I'm fine whatever direction you take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at implementing this, and we currently have 3 places inside the model builder where we transfer information from an attribute/definition to a property model:
- When traversing a CLR type's properties, we discover an attribute. Here we create a property model, copying over the information from the attribute.
- When processing the record definition, we find a definition for a property we don't already have (i.e. we didn't encounter it in step 1 above, e.g. because of dynamic mapping). Here we just create the property model, but don't populate it from the definition (only the mandatory constructor stuff - name+Type).
- After the above, regardless of whether the property was instantiated in step 1 or 2, we copy over the information from the definition to the property model. This allows the definition to clobber the attribute (which is the design I was going for).
So there's one piece of code that creates and populates from an attribute, one that just creates, and one that just populates from a record definition - these look like they're going to have to stay distinct (e.g. there need to be two distinct code paths for attribute vs. definition as these are two different types)... I can move that code out to live elsewhere (in special methods inside the builder, or in the PropertyModel itself, which I dislike since it's only part of model building); but that just seems like moving code around and making it more complex to understand the program flow...
Maybe I'm still missing what you have in mind though... Unless it's important to you to do this now, maybe simply submit a PR later proposing what you'd like to have?
See #11140 for the overall approach here. In a nutshell:
Closes #11140