Skip to content

Document doc mapping update bug #5739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/reference/rest-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,12 @@ Updates the configurations of an index. This endpoint follows PUT semantics, whi
- The indexing settings update is automatically picked up by the indexer nodes once the control plane emits a new indexing plan.
- The doc mapping update is automatically picked up by the indexer nodes once the control plane emit a new indexing plan.

:::warning

If you use the ingest or ES bulk API (V2), the old doc mapping will still be used to validate new documents that end up being persisted on existing shards (see [#5738](https://github.com/quickwit-oss/quickwit/issues/5738)).

:::

Updating the doc mapping doesn't reindex existing data. Queries and results are mapped on a best-effort basis when querying older splits. For more details, check [the reference](updating-mapper.md) out.

#### PUT payload
Expand Down
6 changes: 6 additions & 0 deletions docs/reference/updating-mapper.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ Quickwit allows updating the mapping it uses to add more fields to an existing i

When you update a doc mapping for an index, Quickwit will restart indexing pipelines to take the changes into account. As both this operation and the document ingestion are asynchronous, there is no strict happens-before relationship between ingestion and update. This means a document ingested just before the update may be indexed according to the newer doc mapper, and document ingested just after the update may be indexed with the older doc mapper.

:::warning

If you use the ingest or ES bulk API (V2), the old doc mapping will still be used to validate new documents that end up being persisted on existing shards (see [#5738](https://github.com/quickwit-oss/quickwit/issues/5738)).

:::

## Querying

Quickwit always validate queries against the most recent mapping.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,12 @@
// See the License for the specific language governing permissions and
// limitations under the License.

use std::fmt::Write;
use std::time::Duration;

use quickwit_config::service::QuickwitService;
use quickwit_rest_client::models::IngestSource;
use quickwit_rest_client::rest_client::CommitType;
use serde_json::{json, Value};

use super::assert_hits_unordered;
Expand All @@ -30,7 +33,6 @@ async fn validate_search_across_doc_mapping_updates(
ingest_after_update: &[Value],
query_and_expect: &[(&str, Result<&[Value], ()>)],
) {
quickwit_common::setup_logging_for_tests();
let sandbox = ClusterSandboxBuilder::build_and_start_standalone().await;

{
Expand Down Expand Up @@ -579,3 +581,131 @@ async fn test_update_doc_mapping_add_field_on_strict() {
)
.await;
}

#[tokio::test]
#[ignore]
// TODO(#5738)
async fn test_update_doc_validation() {
quickwit_common::setup_logging_for_tests();
let index_id = "update-doc-validation";
let sandbox = ClusterSandboxBuilder::default()
.add_node([
QuickwitService::Searcher,
QuickwitService::Metastore,
QuickwitService::Indexer,
QuickwitService::ControlPlane,
QuickwitService::Janitor,
])
.build_and_start()
.await;

{
// Wait for indexer to fully start.
// The starting time is a bit long for a cluster.
tokio::time::sleep(Duration::from_secs(3)).await;
let indexing_service_counters = sandbox
.rest_client(QuickwitService::Indexer)
.node_stats()
.indexing()
.await
.unwrap();
assert_eq!(indexing_service_counters.num_running_pipelines, 0);
}

// Create index
sandbox
.rest_client(QuickwitService::Indexer)
.indexes()
.create(
json!({
"version": "0.8",
"index_id": index_id,
"doc_mapping": {
"field_mappings": [
{"name": "body", "type": "u64"}
]
},
"indexing_settings": {
"commit_timeout_secs": 1
},
})
.to_string(),
quickwit_config::ConfigFormat::Json,
false,
)
.await
.unwrap();

assert!(sandbox
.rest_client(QuickwitService::Indexer)
.node_health()
.is_live()
.await
.unwrap());

// Wait until indexing pipelines are started.
sandbox.wait_for_indexing_pipelines(1).await.unwrap();

let unsigned_payload = (0..20).fold(String::new(), |mut buffer, id| {
writeln!(&mut buffer, "{{\"body\": {id}}}").unwrap();
buffer
});

let unsigned_response = sandbox
.rest_client(QuickwitService::Indexer)
.ingest(
index_id,
IngestSource::Str(unsigned_payload.clone()),
None,
None,
CommitType::Auto,
)
.await
.unwrap();

assert_eq!(unsigned_response.num_rejected_docs.unwrap(), 0);

sandbox
.rest_client(QuickwitService::Searcher)
.indexes()
.update(
index_id,
json!({
"version": "0.8",
"index_id": index_id,
"doc_mapping": {
"field_mappings": [
{"name": "body", "type": "i64"}
]
},
"indexing_settings": {
"commit_timeout_secs": 1,
},
})
.to_string(),
quickwit_config::ConfigFormat::Json,
)
.await
.unwrap();

let signed_payload = (-20..0).fold(String::new(), |mut buffer, id| {
writeln!(&mut buffer, "{{\"body\": {id}}}").unwrap();
buffer
});

let signed_response = sandbox
.rest_client(QuickwitService::Indexer)
.ingest(
index_id,
IngestSource::Str(signed_payload.clone()),
None,
None,
CommitType::Auto,
)
.await
.unwrap();

assert_eq!(signed_response.num_rejected_docs.unwrap(), 0);

sandbox.shutdown().await.unwrap();
}
Loading