You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
# Add a data source to read files from a directory
data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))
# Add a collector for data to be exported to the vector indexdoc_embeddings = data_scope.add_collector()# Transform data of each documentwith data_scope["documents"].row() as doc: # Split the document into chunks, put into `chunks` field doc["chunks"] = doc["content"].transform( cocoindex.functions.SplitRecursively( language="markdown", chunk_size=300, chunk_overlap=100)) # Transform data of each chunk with doc["chunks"].row() as chunk: # Embed the chunk, put into `embedding` field chunk["embedding"] = chunk["text"].transform( cocoindex.functions.SentenceTransformerEmbed( model="sentence-transformers/all-MiniLM-L6-v2")) # Collect the chunk into the collector. doc_embeddings.collect(filename=doc["filename"], location=chunk["location"], text=chunk["text"], embedding=chunk["embedding"])# Export collected data to a vector index.doc_embeddings.export( "doc_embeddings", cocoindex.storages.Postgres(), primary_key_fields=["filename", "location"], vector_index=[("embedding", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
截图或演示视频
No response
The text was updated successfully, but these errors were encountered:
项目地址
https://github.com/cocoindex-io/cocoindex
类别
Python
项目标题
全世界第一款支持自定义逻辑并且自带增量更新的数据索引框架
项目描述
CocoIndex是全世界第一款支持自定义逻辑,并且自带增量更新(incremental update)的数据框架。CocoIndex 可以有效地帮你给AI准备数据(RAG,Semantic Search)。以最简单的形式,像乐高一样搭建你的ETL pipeline,并且提供增量更新(incremental update)。
CocoIndex框架+引擎,里面可以套任何的自定义模块,各种PDF parsing,chunking,embedding都可以套进去用。
🔥 核心feature:
亮点
文档齐全,新手包友好。模块化的搭建你的RAG Pipeline,五分钟上手🚀。
示例代码
截图或演示视频
No response
The text was updated successfully, but these errors were encountered: