.Net: Introduce new record model #11264

roji · 2025-03-28T18:59:25Z

See #11140 for the overall approach here. In a nutshell:

Evolve VectorStoreRecordPropertyReader into a proper VectorStoreRecordModel, which gets built from the user's CLR type and/or VectorStoreRecordDefinition. This is now the centralized place for all information about properties and information about them, as well as validation etc. The model and model building logic is exposed publicly, so that external connectors can be built; these types are in a separate ConnectorSupport namespace (they're connector-facing and not user-facing), and are also marked as experimental so we can evolve them.
Properties in the model are also represented by their own types, and are capable of reading/writing themselves to/from the user's record type. This abstracts away the specific method of accessing properties (reflection, dictionary lookup, source generated delegate...) away from connector code.
As a result, connectors which handle serialization themselves no longer need separate mappers for POCOs and for VectorStoreGenericDataModel. Connectors which use an external serializer (JSON) still need different mappers, but once we transition from VectorStoreGenericDataModel to a plain dictionary, that will likely go away too.
Note that we now apply the user's VectorStoreRecordDefinition in addition to (and overriding) the attributes (we were previously doing one or the other).
- One motivation was that an embedding generator cannot be configured via an attribute, since it typically requires endpoint/auth configuration.
- We see EF users doing this, i.e. expressing what they can via attributes, and falling back to the Fluent API for the rest.
- This also allows the same CLR type - with some general attributes - to be used with varying VectorStoreRecordDefinitions.

Closes #11140

dotnet/docs/EXPERIMENTS.md

dotnet/src/Connectors/Connectors.Memory.Pinecone/PineconeVectorStoreRecordCollection.cs

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs

adamsitnik

I've reviewed 102 out of 168 files, so far LGTM. PTAL at my comments, I'll try to finish the review later today.

dotnet/src/Connectors/Connectors.Memory.AzureAISearch/AzureAISearchGenericDataModelMapper.cs

...rs/Connectors.Memory.AzureCosmosDBMongoDB/AzureCosmosDBMongoDBVectorStoreRecordCollection.cs

...nectors/Connectors.Memory.AzureCosmosDBMongoDB/Connectors.Memory.AzureCosmosDBMongoDB.csproj

.../Connectors/Connectors.Memory.AzureCosmosDBNoSQL/AzureCosmosDBNoSQLGenericDataModelMapper.cs

...ectors/Connectors.Memory.AzureCosmosDBNoSQL/AzureCosmosDBNoSQLVectorStoreRecordCollection.cs

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs

...et/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordPropertyModel.cs

.../Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordVectorPropertyModel.cs

adamsitnik

Thank you for all the hard work you have put into this refactoring @roji !

As soon as you rename those few public properties that me and @westey-m has commented on, I am going to hit the approve button.

Thanks!

...rc/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordDataPropertyModel.cs

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs

dotnet/src/Connectors/VectorData.Abstractions/VectorData.Abstractions.csproj

dotnet/src/Connectors/VectorData.Abstractions/VectorStoreGenericDataModel.cs

...ntegrationTests/Connectors/Memory/AzureCosmosDBNoSQL/AzureCosmosDBNoSQLVectorStoreFixture.cs

dotnet/src/InternalUtilities/connectors/Memory/MongoDB/MongoDBGenericDataModelMapper.cs

dotnet/src/InternalUtilities/src/Data/VectorStoreRecordMapping.cs

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs

...Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuildingOptions.cs

westey-m · 2025-04-01T15:57:30Z

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs

+        this.Customize();
+        this.Finalize(clrType);
+
+        return new(clrType, this.KeyProperties, this.DataProperties, this.VectorProperties, this.PropertyMap);


Can this class be reused to process multiple types and definitions?
If not, I think we should have some very explicit warnings in the xml comments around this, so that consumers do not accidentally use this incorrectly. Or preferably, just make the class thread safe and reusable, so that warnings are unnecessary.
Or make the constructor internal and force invocations via a static method on the class.

Currently it isn't reusable or thread-safe. We could easily make it reusable, but making it thread safety would mean we can't use any members (e.g. the various lists and dictionaries) and must constantly pass everything as parameters.

The way the model builder (currently) works is that a collection creates one in its constructor, uses it to create the model, and then discards the builder - it's a one-time, startup operation. Performance doesn't seem very important here - the couple extra allocations on instantiating a new collection are generally negligible, unless we really want to start optimizing for super-efficient creation of collections (which would likely imply much more important/heavy things first).

So I basically don't currently see a need for making one-time thing reusable/thread-safe... For now I've added a doc remark that the type is single-use and not thread-safe. Let me know if you disagree and think we should do something else.

Note that all the new connector-facing types under ConnectorSupport should be considered a first version and not yet completely fleshed out (that's why they're [Experimental]). We're very likely still going to have to iterate and mature all this (NativeAOT, hierarchical types, and various other thing). So I'm not (yet) applying the same level of design rigor as we need to on the user-facing public surface.

We can do this after merging, but I'd definitely recommend switching this design to something that matches it's intended usage pattern.
This is after all intended to be used by other implementers too, and considering that the constructor param values are essentially static to a connector, if I saw the design of this class I would think I can define a static instance in my connector and then just invoke it multiple times.

Just noting that putting something as static implies not just invoking multiple times (reusability), but also thread-safety, since collections might get created concurrently... At that point I'm really hoping connector implementors don't assume a type is thread-safe without carefully checking the docs etc. (and am also assuming other connector implementors basically copy-paste our code).

FWIW there's a similar question for the fillter translators (though those are internal to each connector rather than a general MEVD component). Expression visitors may need to save some state in members to communicate things between layers, which would be incompatible with concurrent use; and since IVectorStoreRecordCollection is thread-safe, a new translator needs to be instantiated for each query (though we can pool them if we think that's really necessary for perf).

In the interest of moving fast, let's defer this for now - maybe open an issue with possible design evolution around this?

Sure, we can do this in a follow up. I think it would be a fairly trivial change. I did note that the filter translators are using a similar pattern, but they are not something we are shipping publicly. Also, the developer who wrote the filter will also be the one who uses the filter, so not the same level of risk or scrutiny required.

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs

westey-m · 2025-04-01T16:03:50Z

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs

+                    IsFilterable = dataAttribute.IsFilterable,
+                    IsFullTextSearchable = dataAttribute.IsFullTextSearchable,


Consider making it possible to just pass the attribute or definition as needed to the VectorStoreRecordDataPropertyModel so that it can do cloning of settings. That way, if we add new properties, cloning is just happening there instead of each point of consumption.

Do you mean that you'd like e.g. VectorStoreRecordDataPropertyModel to have direct references to VectorStoreRecordDataAttribute and to VectorStoreRecordDataProperty? Wouldn't that just move the logic of having to access properties on these two types from the model builder to the PropertyModels?

(I think I may be misunderstanding what you have in mind)

Yeah, that's the goal. When adding a new property to e.g. VectorStoreRecordDataPropertyModel the copying from VectorStoreRecordDataAttribute and VectorStoreRecordDataProperty can be updated once within the VectorStoreRecordDataPropertyModel class, instead of having to find every place in the code where we currently instantiate new instances of VectorStoreRecordDataPropertyModel from VectorStoreRecordDataAttribute VectorStoreRecordDataProperty and update each of those locations to copy all required properties across.

OK. Instantiating and populating a PropertyModel is something that should only ever happen in the ModelBuilder (or possibly a subclass), never elsewhere - so the scope here is a bit limited; but I get what you're saying. Plus it's worth mentioning that we don't only instantiate a PropertyModel from an attribute/definition, we also populate it (e.g. we create a PropertyModel from an attribute, and then there's also a definition that needs to be added (an possible overwrite) stuff on that PropertyModel.

I'd rather avoid having PropertyModel directly reference the attribute/definition (e.g. in a constructor), which are just sources for it, so how about this: given this all happens within the model builder only, we can just have a Populate() method on VectorStoreRecordModelBuilder which does the copying, this way this all stays inside that scope?

Sure, sounds good. Ultimately this is a nice to have thing, so I'm fine whatever direction you take.

I took a look at implementing this, and we currently have 3 places inside the model builder where we transfer information from an attribute/definition to a property model:

When traversing a CLR type's properties, we discover an attribute. Here we create a property model, copying over the information from the attribute.

When processing the record definition, we find a definition for a property we don't already have (i.e. we didn't encounter it in step 1 above, e.g. because of dynamic mapping). Here we just create the property model, but don't populate it from the definition (only the mandatory constructor stuff - name+Type).

After the above, regardless of whether the property was instantiated in step 1 or 2, we copy over the information from the definition to the property model. This allows the definition to clobber the attribute (which is the design I was going for).

So there's one piece of code that creates and populates from an attribute, one that just creates, and one that just populates from a record definition - these look like they're going to have to stay distinct (e.g. there need to be two distinct code paths for attribute vs. definition as these are two different types)... I can move that code out to live elsewhere (in special methods inside the builder, or in the PropertyModel itself, which I dislike since it's only part of model building); but that just seems like moving code around and making it more complex to understand the program flow...

Maybe I'm still missing what you have in mind though... Unless it's important to you to do this now, maybe simply submit a PR later proposing what you'd like to have?

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs

...zureCosmosDBMongoDB.UnitTests/AzureCosmosDBMongoDBVectorStoreCollectionSearchMappingTests.cs

...rc/Connectors/Connectors.MongoDB.UnitTests/MongoDBVectorStoreCollectionSearchMappingTests.cs

.../src/Connectors/Connectors.Qdrant.UnitTests/QdrantVectorStoreCollectionSearchMappingTests.cs

...et/src/Connectors/Connectors.Redis.UnitTests/RedisVectorStoreCollectionSearchMappingTests.cs

...ectors/Connectors.Weaviate.UnitTests/WeaviateVectorStoreRecordCollectionQueryBuilderTests.cs

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs

...rc/Connectors/Connectors.MongoDB.UnitTests/MongoDBVectorStoreCollectionSearchMappingTests.cs

Closes microsoft#11140

markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel documentation memory labels Mar 28, 2025

github-actions bot changed the title ~~Introduce new record model~~ .Net: Introduce new record model Mar 28, 2025

roji temporarily deployed to integration March 28, 2025 18:59 — with GitHub Actions Inactive

roji force-pushed the MEVDModel branch from 655e5b7 to 6e52f60 Compare March 28, 2025 20:41

roji had a problem deploying to integration March 28, 2025 20:42 — with GitHub Actions Error

roji force-pushed the MEVDModel branch from 6e52f60 to 493a9b2 Compare March 28, 2025 20:44

roji temporarily deployed to integration March 28, 2025 20:45 — with GitHub Actions Inactive

roji force-pushed the MEVDModel branch from 493a9b2 to c7c1a7e Compare March 28, 2025 22:00

roji temporarily deployed to integration March 28, 2025 22:00 — with GitHub Actions Inactive

roji commented Mar 28, 2025

View reviewed changes

roji marked this pull request as ready for review March 28, 2025 22:20

roji requested a review from a team as a code owner March 28, 2025 22:20

roji force-pushed the MEVDModel branch from c7c1a7e to c8b4752 Compare March 30, 2025 15:12

roji temporarily deployed to integration March 30, 2025 15:12 — with GitHub Actions Inactive

adamsitnik reviewed Mar 31, 2025

View reviewed changes

westey-m reviewed Apr 1, 2025

View reviewed changes

...et/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordPropertyModel.cs Show resolved Hide resolved

westey-m reviewed Apr 1, 2025

View reviewed changes

...et/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordPropertyModel.cs Outdated Show resolved Hide resolved

westey-m reviewed Apr 1, 2025

View reviewed changes

.../Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordVectorPropertyModel.cs Show resolved Hide resolved

adamsitnik reviewed Apr 1, 2025

View reviewed changes

westey-m reviewed Apr 1, 2025

View reviewed changes

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs Outdated Show resolved Hide resolved

westey-m reviewed Apr 1, 2025

View reviewed changes

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModel.cs Show resolved Hide resolved

westey-m reviewed Apr 1, 2025

View reviewed changes

...Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuildingOptions.cs Outdated Show resolved Hide resolved

westey-m reviewed Apr 1, 2025

View reviewed changes

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs Outdated Show resolved Hide resolved

westey-m reviewed Apr 1, 2025

View reviewed changes

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs Outdated Show resolved Hide resolved

westey-m reviewed Apr 1, 2025

View reviewed changes

dotnet/src/Connectors/VectorData.Abstractions/ConnectorSupport/VectorStoreRecordModelBuilder.cs Outdated Show resolved Hide resolved