Skip to content

Latest commit

 

History

History
160 lines (107 loc) · 10.2 KB

EmbeddingsApi.md

File metadata and controls

160 lines (107 loc) · 10.2 KB

CarbonJsSdk.EmbeddingsApi

All URIs are relative to https://api.carbon.ai

Method HTTP request Description
embeddingsEmbeddingsPost POST /embeddings Embeddings
retrieveEmbeddingsAndContentTextChunksPost POST /text_chunks Retrieve Embeddings And Content
uploadChunksAndEmbeddingsUploadChunksAndEmbeddingsPost POST /upload_chunks_and_embeddings Upload Chunks And Embeddings

embeddingsEmbeddingsPost

DocumentResponseList embeddingsEmbeddingsPost(authorization, getEmbeddingDocumentsBody, opts)

Embeddings

For pre-filtering documents, using `tags_v2` is preferred to using `tags` (which is now deprecated). If both `tags_v2` and `tags` are specified, `tags` is ignored. `tags_v2` enables building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example: ```json { "OR": [ { "key": "subject", "value": "holy-bible", "negate": false }, { "key": "person-of-interest", "value": "jesus christ", "negate": false }, { "key": "genre", "value": "religion", "negate": true } { "AND": [ { "key": "subject", "value": "tao-te-ching", "negate": false }, { "key": "author", "value": "lao-tzu", "negate": false } ] } ] } ``` In this case, files will be filtered such that: 1. "subject" = "holy-bible" OR 2. "person-of-interest" = "jesus christ" OR 3. "genre" != "religion" OR 4. "subject" = "tao-te-ching" AND "author" = "lao-tzu" Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply: 1. "key" isn't optional and must be a `string` 2. "value" isn't optional and can be `any` or list[`any`] 3. "negate" is optional and must be `true` or `false`. If present and `true`, then the filter block is negated in the resulting query. It is `false` by default. When querying embeddings, you can optionally specify the `media_type` parameter in your request. By default (if not set), it is equal to "TEXT". This means that the query will be performed over files that have been parsed as text (for now, this covers all files except image files). If it is equal to "IMAGE", the query will be performed over image files (for now, `.jpg` and `.png` files). You can think of this field as an additional filter on top of any filters set in `file_ids` and When `hybrid_search` is set to true, a combination of keyword search and semantic search are used to rank and select candidate embeddings during information retrieval. By default, these search methods are weighted equally during the ranking process. To adjust the weight (or "importance") of each search method, you can use the `hybrid_search_tuning_parameters` property. The description for the different tuning parameters are: - `weight_a`: weight to assign to semantic search - `weight_b`: weight to assign to keyword search You must ensure that `sum(weight_a, weight_b,..., weight_n)` for all n weights is equal to 1. The equality has an error tolerance of 0.001 to account for possible floating point issues. In order to use hybrid search for a customer across a set of documents, two flags need to be enabled: 1. Use the `/modify_user_configuration` endpoint to to enable `sparse_vectors` for the customer. The payload body for this request is below: ``` { "configuration_key_name": "sparse_vectors", "value": { "enabled": true } } ``` 2. Make sure hybrid search is enabled for the documents across which you want to perform the search. For the `/uploadfile` endpoint, this can be done by setting the following query parameter: `generate_sparse_vectors=true` Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's `text-embedding-ada-002` and Cohere's embed-multilingual-v3.0. The model can be specified via the `embedding_model` parameter (in the POST body for `/embeddings`, and a query parameter in `/uploadfile`). If no model is supplied, the `text-embedding-ada-002` is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with `OPENAI`, and files C and D have embeddings generated with `COHERE_MULTILINGUAL_V3`, then by default, queries will only consider files A and B. If `COHERE_MULTILINGUAL_V3` is specified as the `embedding_model` in `/embeddings`, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, do not set `VERTEX_MULTIMODAL` as an `embedding_model`. This model is used automatically by Carbon when it detects an image file.

Example

import CarbonJsSdk from 'carbon-js-sdk';

let apiInstance = new CarbonJsSdk.EmbeddingsApi();
let authorization = "authorization_example"; // String | 
let getEmbeddingDocumentsBody = new CarbonJsSdk.GetEmbeddingDocumentsBody(); // GetEmbeddingDocumentsBody | 
let opts = {
  'customerId': "customerId_example" // String | 
};
apiInstance.embeddingsEmbeddingsPost(authorization, getEmbeddingDocumentsBody, opts, (error, data, response) => {
  if (error) {
    console.error(error);
  } else {
    console.log('API called successfully. Returned data: ' + data);
  }
});

Parameters

Name Type Description Notes
authorization String
getEmbeddingDocumentsBody GetEmbeddingDocumentsBody
customerId String [optional]

Return type

DocumentResponseList

Authorization

No authorization required

HTTP request headers

  • Content-Type: application/json
  • Accept: application/json

retrieveEmbeddingsAndContentTextChunksPost

EmbeddingsAndChunksResponse retrieveEmbeddingsAndContentTextChunksPost(authorization, embeddingsAndChunksQueryInput, opts)

Retrieve Embeddings And Content

Example

import CarbonJsSdk from 'carbon-js-sdk';

let apiInstance = new CarbonJsSdk.EmbeddingsApi();
let authorization = "authorization_example"; // String | 
let embeddingsAndChunksQueryInput = new CarbonJsSdk.EmbeddingsAndChunksQueryInput(); // EmbeddingsAndChunksQueryInput | 
let opts = {
  'customerId': "customerId_example" // String | 
};
apiInstance.retrieveEmbeddingsAndContentTextChunksPost(authorization, embeddingsAndChunksQueryInput, opts, (error, data, response) => {
  if (error) {
    console.error(error);
  } else {
    console.log('API called successfully. Returned data: ' + data);
  }
});

Parameters

Name Type Description Notes
authorization String
embeddingsAndChunksQueryInput EmbeddingsAndChunksQueryInput
customerId String [optional]

Return type

EmbeddingsAndChunksResponse

Authorization

No authorization required

HTTP request headers

  • Content-Type: application/json
  • Accept: application/json

uploadChunksAndEmbeddingsUploadChunksAndEmbeddingsPost

GenericSuccessResponse uploadChunksAndEmbeddingsUploadChunksAndEmbeddingsPost(authorization, chunksAndEmbeddingsUploadInput, opts)

Upload Chunks And Embeddings

Example

import CarbonJsSdk from 'carbon-js-sdk';

let apiInstance = new CarbonJsSdk.EmbeddingsApi();
let authorization = "authorization_example"; // String | 
let chunksAndEmbeddingsUploadInput = new CarbonJsSdk.ChunksAndEmbeddingsUploadInput(); // ChunksAndEmbeddingsUploadInput | 
let opts = {
  'customerId': "customerId_example" // String | 
};
apiInstance.uploadChunksAndEmbeddingsUploadChunksAndEmbeddingsPost(authorization, chunksAndEmbeddingsUploadInput, opts, (error, data, response) => {
  if (error) {
    console.error(error);
  } else {
    console.log('API called successfully. Returned data: ' + data);
  }
});

Parameters

Name Type Description Notes
authorization String
chunksAndEmbeddingsUploadInput ChunksAndEmbeddingsUploadInput
customerId String [optional]

Return type

GenericSuccessResponse

Authorization

No authorization required

HTTP request headers

  • Content-Type: application/json
  • Accept: application/json