Skip to content
This repository was archived by the owner on Nov 12, 2024. It is now read-only.

Commit 76185e2

Browse files
1 parent e153e4a commit 76185e2

File tree

6 files changed

+791
-91
lines changed

6 files changed

+791
-91
lines changed

Diff for: docs/navigation/standard.md

+14
Original file line numberDiff line numberDiff line change
@@ -227,6 +227,20 @@
227227
- label: 'Creating sequence tables'
228228
slug: '/docs/sharding/sequence-tables'
229229

230+
- label: 'Vectors'
231+
icon: 'vectors'
232+
items:
233+
- label: 'Vectors overview'
234+
slug: '/docs/vectors/overview'
235+
- label: 'Concepts and terminology'
236+
slug: '/docs/vectors/terminology-and-concepts'
237+
- label: 'Use cases'
238+
slug: '/docs/vectors/use-cases'
239+
- label: 'Using with an ORM'
240+
slug: '/docs/vectors/using-with-an-orm'
241+
- label: 'Reference'
242+
slug: '/docs/vectors/reference'
243+
230244
- label: 'Security and access'
231245
icon: 'security'
232246
items:
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,22 @@
11
---
22
title: 'Vector search and storage'
33
subtitle: 'Learn how to use PlanetScale vector search and storage.'
4-
date: '2024-09-30'
4+
date: '2024-10-29'
55
---
66

7-
Welcome to the PlanetScale vectors beta! The goal of this private beta period is to get the product in the hands of our customers so you can build alongside us while we continue to improve the feature — with your feedback.
7+
Welcome to the PlanetScale vectors beta! The goal of this beta period is to get the product in the hands of our customers so you can build alongside us while we continue to improve the feature — with your feedback.
88

9-
If at any point you experience issues with vectors while using the beta, we highly encourage you to get in touch. Your feedback is extremely valuable during this beta period, so don’t hesitate to reach out. You can [submit a support ticket](/contact) to relay any feedback or issues. We also have a private [Discord](https://discord.com/invite/pDUGAAFEJx) channel for the vectors beta. If you'd like to added, fill out our [contact form](/contact).
9+
If at any point you experience issues with vectors while using the beta, we highly encourage you to get in touch. Your feedback is extremely valuable during this beta period, so don’t hesitate to reach out. You can [submit a support ticket](/contact) to relay any feedback or issues. We also have a vectors channel in our [Discord](https://discord.com/invite/pDUGAAFEJx) where you can ask questions and share feedback.
1010

1111
{% callout type="warning" %}
1212
This feature is still beta quality and is not intended for use in production workloads. We recommend limiting use of PlanetScale vector search and storage to testing and evaluation purposes only. PlanetScale vectors is considered a Beta Feature as noted in our Agreement with you, and any use of PlanetScale vectors is in accordance with the Agreement.
1313
{% /callout %}
1414

15-
This documentation outlines how to get started with vectors, known issues and limitations, some example usage, and how to share feedback.
16-
17-
## Known issues and limitations
18-
19-
- Building a **one-shot** **index** (an index built in bulk on an existing set of vectors) requires enough RAM to fit all of the vectors in memory simultaneously. This limitation will be lifted by the time the beta is complete.
20-
- **Incremental indexes** (indexes that begin empty and update as new vectors are added) function correctly, but are significantly slower to build compared to a one-shot index. Disk usage is much higher due to potentially very high InnoDB blob fragmentation issues, so it's much easier to run out of disk space.
21-
- **Online DDL** and deploy requests do not work well yet, because they build incremental indexes. Please use direct DDL for now. We plan to improve this significantly during the beta.
22-
- Since this is a beta, there may be bugs, performance, and security issues that have not yet been uncovered. We also may need to change query or DDL syntax before the feature is generally available. Don’t run this on a production database.
23-
- Once you opt a branch into the vectors feature, that branch must continue to run a vectors-enabled version of MySQL. You can remove your vector columns/tables, but you cannot downgrade that branch to its prior version of MySQL.
24-
2515
## Overview
2616

27-
Vectors are a data structure that captures opaque semantic meaning about something and allows a database to search for resources by similarity based on this opaque meaning. As a data type, a vector is just an array of floating-point numbers. Those numbers are generated by submitting some resource — a word, a string, a document, an image, audio, etc — to an *embedding*¹, which converts the resource to a vector.
17+
Vectors are a data structure that captures opaque semantic meaning about something and allows a database to search for resources by similarity based on this opaque meaning. As a data type, a vector is just an array of floating-point numbers. Those numbers are generated by submitting some resource — a word, a string, a document, an image, audio, etc. — to an _embedding model_ which converts the resource to a vector.
2818

29-
A vector database stores those vector embeddings alongside other relational data. In practice, that might look like a table with columns for ID (a primary key), content (as a BLOB or VARCHAR), and a vector. Then it becomes possible to perform queries that find content similar to a search query, like so:
19+
A vector database stores those vector embeddings alongside other relational data. In practice, that might look like a table with columns for ID (a primary key), content (as a BLOB or VARCHAR), and a vector. Then it becomes possible to perform queries that find content similar to a search vector, like so:
3020

3121
```sql
3222
SELECT id
@@ -35,17 +25,15 @@ SELECT id
3525
LIMIT 10;
3626
```
3727

38-
Possible applications include recommendation engines that show products similar to a user's purchase history, or search engines that find documents or other resources based on natural-language queries.
28+
Possible applications include recommendation engines that show products similar to a user's purchase history, or search engines that find documents or other resources based on natural-language queries. Read our [applications of vector databases](/docs/vectors/use-cases) docs to learn more about how vector databases can be applied in the real world.
3929

4030
PlanetScale has added support for vector columns, vector distance functions, and vector indexes, as described below.
4131

42-
[¹]: PlanetScale does not currently provide an embedding service. You can find several good cloud-based options like OpenAI or AWS Titan, or local options like Python sentence_transformers.
32+
[¹]: PlanetScale does not currently provide an embedding service. You can find several good cloud-based options like OpenAI or AWS Titan, or local options like Python `sentence_transformers`.
4333

4434
## Enrolling in the PlanetScale vectors beta
4535

46-
This is currently a closed beta. To access the beta, you must have received an invite. If you did not receive an invite and wish to join, or you would like to enroll a different organization, please fill out our [contact form](/contact).
47-
48-
PlanetScale has a series of MySQL images that have been extended with vector support. Vector support can be enabled on a per-branch basis, however, you have to first opt-in to the beta from your database settings page. After that, you will choose which branch(es) you’d like to opt-in to the vectors beta. The branch will be updated to the vectors-enabled version of MySQL at the time of opting the branch in.
36+
PlanetScale has a custom version of MySQL that has been extended with vector support. Vector support can be enabled on a per-branch basis, however, you have to first opt-in to the beta from your database settings page. After that, you will choose which branch(es) you’d like to opt-in to the vectors beta. The branch will be updated to the vectors-enabled version of MySQL at the time of opting the branch in.
4937
To enable the vector support on a branch:
5038

5139
1. Click on the database that you’d like to enroll in the vectors beta.
@@ -56,7 +44,7 @@ To enable the vector support on a branch:
5644
6. Click on the small gear icon underneath the “Connect” button on the right.
5745
7. Click the toggle next to “Enable vectors”.
5846
8. Click “Save branch settings”.
59-
9. The branch will upgrade asynchronously to the correct version of MySQL, which may take 30-60 minutes. You can confirm when this process is complete by executing a “SELECT @@version” query. The vector-enabled version is 8.0.37.
47+
9. The branch will upgrade asynchronously to the correct version of MySQL, which may take 30-60 minutes. While this happens, the database dashboard will show an "Enabling vectors" badge, which changes to a "Vector-enabled" badge when the upgrade is complete.
6048

6149
## Adding vector columns
6250

@@ -88,12 +76,12 @@ SELECT id, DISTANCE(TO_VECTOR('[3, 3, 3, 3]'), embedding, 'L2_SQUARED') AS d
8876

8977
Use an `EXPLAIN` query to confirm that the query uses the new index. This query actually won’t use the index until the table has around 50 rows in it.
9078

91-
Note that vector indexes provide approximate results. An unindexed query with LIMIT 100 returns exactly the 100 rows closest to the reference vector, after performing a full table scan and a sort. An indexed query returns, on average, about 100 of the top 105 (around 95%) of the rows closest to the reference vector, but much faster than a full table scan. This is expected, because all efficient vector indexes, including PlanetScale’s vector indexes, perform approximate nearest neighbor (ANN) searches.
79+
Vector indexes provide approximate results. An unindexed query with LIMIT 100 returns exactly the 100 rows closest to the reference vector, after performing a full table scan and a sort. An indexed query returns, on average, about 100 of the top 105 (around 95%) of the rows closest to the reference vector, but much faster than a full table scan. This is expected, because all efficient vector indexes, including PlanetScale’s vector indexes, perform approximate nearest neighbor (ANN) searches.
9280

93-
If you are adding vectors from an app, you may want to use prepared statements, although we do not recommend it. `TO_VECTOR` works in that setting, but serializing the vectors on the client side and uploading them as binary is faster. The serialized format is IEEE-754 32-bit floats, which you can serialize with code like this:
81+
If you are adding vectors to your database from an application, you may want to use prepared statements, although we do not recommend it. `TO_VECTOR` works in that setting, but serializing the vectors on the client side and uploading them as binary is faster. The serialized format is IEEE-754 32-bit floats, which you can serialize with code like this:
9482

95-
- Python: `struct.pack('ffff', *float_array)`
96-
- Ruby: `float_array.pack(“ffff”)`
83+
- Python: `struct.pack(f'{len(float_array)}f', *float_array)`
84+
- Ruby: `float_array.pack(“f*”)`
9785
- Rust: `float_array.map(|f| f.to_ne_bytes()).flatten().collect()`
9886

9987
You can use the resulting blob (which will be 4 bytes times the number of dimensions in the vector) in an `INSERT` statement like this:
@@ -147,76 +135,17 @@ This query selects the ten products from a given seller that are closest to some
147135

148136
The MySQL query planner chooses whether to use the vector index or some other index automatically based on the query and based on the contents of the table, to maximize query performance. Use `EXPLAIN` on any given query to see how it will execute.
149137

150-
As part of the private beta, we’re looking for feedback on how well MySQL plans vector queries. If you believe you’ve hit an edge case or something looks wrong, please [open a support ticket](/contact) and let us know.
151-
152-
## Vector function reference
153-
154-
**`TO_VECTOR(string)`**
155-
Converts a text string to a binary vector value. The text string is an array of floating point numbers in JSON format.
156-
157-
- alias `STRING_TO_VECTOR(string)`
158-
- Example: `SELECT TO_VECTOR('[1, 2.78, 3.14]');`
159-
160-
`-> 0x0000803F85EB3140C3F54840`
161-
162-
**`FROM_VECTOR(string)`**
163-
Converts a binary vector to a human-readable string.
164-
165-
- alias `VECTOR_TO_STRING(vector)`
166-
- Example: `SELECT FROM_VECTOR(0x0000803F85EB3140C3F54840);`
167-
168-
`-> [1.00000e+00,2.78000e+00,3.14000e+00]`
169-
170-
**`VECTOR_DIM(string)`**
171-
Calculates the dimension of a vector
172-
173-
- Example: `SELECT VECTOR_DIM(TO_VECTOR('[1,2,3]')); -> 3`
174-
175-
**`DISTANCE(vector1, vector2, [metric])`**
176-
Calculates the distance between two vectors. The optional third parameter specifies which distance metric is to be used: `DOT`, `COSINE`, L2 (`EUCLIDEAN)`, or L2_SQUARED (`EUCLIDEAN_SQUARED)`. If the distance metric is omitted, it defaults to `DOT`.
177-
178-
- `DOT` means the dot product. Example:
138+
As part of the beta, we’re looking for feedback on how well MySQL plans vector queries. If you believe you’ve hit an edge case or something looks wrong, please [open a support ticket](/contact) and let us know.
179139

180-
`SELECT DISTANCE(TO_VECTOR('[1,2]'), TO_VECTOR('[5,4]'), 'DOT');`
140+
## Known issues and limitations for the beta
181141

182-
`-> 13`
183-
184-
- `COSINE` means the cosine of the angle between the two vectors, which is the same as the dot product divided by the magnitude of the two vectors. Example:
185-
186-
`SELECT DISTANCE(TO_VECTOR('[1,2]'), TO_VECTOR('[5,4]'), 'COSINE');`
187-
188-
`-> 0.09204061549954834`
189-
190-
- `L2` (or `EUCLIDEAN`) means the length of a line between the ends of the vectors. Example:
191-
192-
`SELECT DISTANCE(TO_VECTOR('[1,2]'), TO_VECTOR('[5,4]'), 'L2');`
193-
194-
`-> 4.47213595499958`
195-
196-
- `L2_SQUARED` (or `EUCLIDEAN_SQUARED`) is the square of the Euclidean distance
197-
198-
`SELECT DISTANCE(TO_VECTOR('[1,2]'), TO_VECTOR('[5,4]'), 'L2_SQUARED');`
199-
200-
`-> 20`
201-
202-
**`DISTANCE_DOT(vector1, vector2)`**
203-
Is the same as `DISTANCE(vector1, vector2, 'DOT')`
204-
205-
**`DISTANCE_COSINE(vector1, vector2)`**
206-
Is the same as `DISTANCE(vector1, vector2, 'COSINE')`
207-
208-
**`DISTANCE_L2(vector1, vector2)`**
209-
Is the same as `DISTANCE(vector1, vector2, 'L2')`
210-
211-
- alias: `DISTANCE_EUCLIDEAN(vector1, vector2)`
212-
213-
**`DISTANCE_L2_SQUARED(vector1, vector2)`**
214-
Is the same as `DISTANCE(vector1, vector2, 'L2_SQUARED')`
215-
216-
- alias: `DISTANCE_EUCLIDEAN_SQUARED(vector1, vector2)`
142+
- Building a **one-shot** **index** (an index built in bulk on an existing set of vectors) requires enough RAM to fit roughly half of all the vector dataset in memory simultaneously. This will be improved throughout the beta.
143+
- **Incremental indexes** (indexes that begin empty and update as new vectors are added) function correctly, but are significantly slower to build compared to a one-shot index. Disk usage is much higher due to potentially very high InnoDB blob fragmentation issues, so it's much easier to run out of disk space.
144+
- Since this is a beta, there may be bugs, performance, and security issues that have not yet been uncovered. We also may need to change query or DDL syntax before the feature is generally available. Don’t run this on a production database.
145+
- Once you opt a branch into the vectors feature, that branch must continue to run a vectors-enabled version of MySQL. You can remove your vector columns/tables, but you cannot downgrade that branch to its prior version of MySQL.
217146

218147
## Feedback
219148

220149
We want to make our vectors offering as reliable, fast, and feature-rich as possible. Feedback from our early users will help make this possible. If you encounter any issues, crashes, unexpected errors or poor performance, please [submit a support ticket](/contact). You are also welcome to reach out with general feedback and suggestions.
221150

222-
We also have a private [Discord](https://discord.com/invite/pDUGAAFEJx) channel for the vectors beta where you can ask questions, share feedback, and discuss what you’re working on. If you'd like to added, please fill out the [contact form](/contact).
151+
We also have a [Discord](https://discord.com/invite/pDUGAAFEJx) channel for the vectors beta where you can ask questions, share feedback, and discuss what you’re working on.

0 commit comments

Comments
 (0)