Skip to content

DRIVERS-2884 Avoid connection churn when operations timeout #1675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
0f12706
DRIVERS-2884 Add connection churn spec tests
prestonvasquez Oct 14, 2024
fe18120
DRIVERS-2884 Update json
prestonvasquez Oct 14, 2024
05cc88b
DRIVERS-2884 Clean up spec tests
prestonvasquez Oct 30, 2024
98c2a73
Update CMAP to include foreground read
prestonvasquez Oct 30, 2024
4827995
Update changelog
prestonvasquez Oct 30, 2024
234b729
Add justification for CMAP update
prestonvasquez Oct 30, 2024
ccfbcf1
Remove unecessary example
prestonvasquez Oct 30, 2024
fed567b
Use consistent keys
prestonvasquez Oct 30, 2024
8840be4
Update timeouts
prestonvasquez Nov 7, 2024
c1bee3b
DRIVERS-2884 Resolve merge conflicts
prestonvasquez Apr 22, 2025
c0e5aee
DRIVERS-2884 Update pending response unified spec tests
prestonvasquez Apr 22, 2025
dde9e22
DRIVERS-2884 Add UML and update wording
prestonvasquez Apr 23, 2025
5e0305a
DRIVERS-2884 Remove uneeded text from code snippet
prestonvasquez Apr 23, 2025
496724c
DRIVERS-2884 Add prose tests
prestonvasquez Apr 23, 2025
258edf8
DRIVERS-2884 Clean up presentation
prestonvasquez Apr 23, 2025
d217d10
DRIVERS-2884 Add logs and events
prestonvasquez Apr 24, 2025
cc8aec0
DRIVERS-2884 Add log part
prestonvasquez Apr 24, 2025
3d98039
DRIVERS-2884 Add Q&A section
prestonvasquez Apr 25, 2025
07e75bd
DRIVERS-2884 Add changelog
prestonvasquez Apr 25, 2025
8d9e71b
DRIVERS-2884 Fix Markdown failures
prestonvasquez Apr 25, 2025
5c68f77
DRIVERS-2884 Update schema
prestonvasquez Apr 25, 2025
e2653cb
DRIVERS-2884 Update schema w/ new connection events
prestonvasquez Apr 25, 2025
00aa620
DRIVERS-2884 Remove additional properties
prestonvasquez Apr 25, 2025
b29d6cc
DRIVERS-2884 Remove ignoring extra events
prestonvasquez Apr 25, 2025
b04b340
DRIVERS-2884 Clean up tests
prestonvasquez Apr 25, 2025
40b302c
DRIVERS-2884 Uncapitalize D in ID
prestonvasquez Apr 25, 2025
4348b44
DRIVERS-2884 Another ID cleanup
prestonvasquez Apr 25, 2025
dd0dbe9
DRIVERS-2884 Remove the word write
prestonvasquez Apr 25, 2025
e43c466
DRIVERS-2884 Remove the word write
prestonvasquez Apr 25, 2025
89754ce
Add punctuation
prestonvasquez Apr 28, 2025
bc893c6
Merge branch 'master' into DRIVERS-2884
prestonvasquez Apr 30, 2025
a3c00a4
DRIVERS-2884 Update schema latest
prestonvasquez Apr 30, 2025
6a663eb
DRIVERS-2884 Clarify schema bump
prestonvasquez Apr 30, 2025
9500fd5
DRIVERS-2884 Add pending response state
prestonvasquez May 5, 2025
5f3726b
DRIVERS-2884 Move logging tests to csot-specific file
prestonvasquez May 5, 2025
d286786
DRIVERS-2884 Generate connection-logging-csot.json
prestonvasquez May 5, 2025
9c5b33a
DRIVERS-2884 Fix bug; add test super section
prestonvasquez May 5, 2025
7bd5c00
DRIVERS-2884 Add commandName: ping
prestonvasquez May 8, 2025
cc9ec5c
DRIVERS-2884 Clarify behavior for exhaust cursors
prestonvasquez May 8, 2025
a397306
DRIVERS-2884 Account for both pull and push i/o patterns
prestonvasquez May 9, 2025
ef75645
DRIVERS-2884 Update duration commentary
prestonvasquez May 9, 2025
b5c1202
DRIVERS-2884 Ensure all branches are tested
prestonvasquez May 9, 2025
c785d0a
DRIVERS-2884 Make Q&A read/receive agnostic
prestonvasquez May 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
642 changes: 642 additions & 0 deletions source/client-side-operations-timeout/tests/pending-response.json

Large diffs are not rendered by default.

344 changes: 344 additions & 0 deletions source/client-side-operations-timeout/tests/pending-response.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
description: "Connection churn is prevented by reading pending responses during connection checkout"
schemaVersion: "1.24"
runOnRequirements:
- minServerVersion: "4.4"
# TODO(SERVER-96344): When using failpoints, mongos returns MaxTimeMSExpired
# after maxTimeMS, whereas mongod returns it after
# max(blockTimeMS, maxTimeMS). Until this ticket is resolved, these tests
# will not pass on sharded clusters.
topologies: ["single", "replicaset"]
createEntities:
- client:
id: &failPointClient failPointClient
useMultipleMongoses: false
- client:
id: &client client
uriOptions:
maxPoolSize: 1
useMultipleMongoses: false
observeEvents:
- commandFailedEvent
- commandSucceededEvent
- connectionCheckedOutEvent
- connectionCheckedInEvent
- connectionClosedEvent
- connectionPendingResponseSucceeded
- connectionPendingResponseStarted
- connectionPendingResponseFailed
- database:
id: &database test
client: *client
databaseName: *database
- collection:
id: &collection coll
database: *database
collectionName: *collection
initialData:
- collectionName: *collection
databaseName: *database
documents: []
tests:
# Attempting a pending response read on a non-timeout operation that can
# immediately read from the TCP buffer should complete the pending read and
# the connection should be checked out.
- description: "non-timeout op with response and no error"
operations:
# Run a ping command to pre-load the pool with a connection.
- name: runCommand
object: *database
arguments:
command:
ping: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commandName: ping
# Create a failpoint to block the first operation.
- name: failPoint
object: testRunner
arguments:
client: *failPointClient
failPoint:
configureFailPoint: failCommand
mode: {times: 1}
data:
failCommands: ["insert"]
blockConnection: true
blockTimeMS: 100
# Execute operation with timeout less than block time.
- name: insertOne
object: *collection
arguments:
timeoutMS: 75
document: {_id: 3, x: 1}
expectError:
isTimeoutError: true
# Execute a subsequent operation to complete the read.
- name: findOne
object: *collection
arguments:
filter: {_id: 1}
expectEvents:
- client: *client
events:
- commandSucceededEvent:
commandName: ping # Pre-loading the connection pool.
- commandFailedEvent:
commandName: insert
- commandSucceededEvent:
commandName: find
- client: *client
eventType: cmap
events:
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Ping finishes.
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Insert fails.
- connectionPendingResponseStarted: {}
- connectionPendingResponseSucceeded: {} # Find operation drains connection.
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Find succeeds.
# Attempting a pending response read on a non-timeout operation that gets no
# response from the server after 3s should close the connection.
- description: "non-timeout op with no response"
operations:
# Run a ping command to pre-load the pool with a connection.
- name: runCommand
object: *database
arguments:
command:
ping: 1
commandName: ping
# Create a failpoint to block the first operation.
- name: failPoint
object: testRunner
arguments:
client: *failPointClient
failPoint:
configureFailPoint: failCommand
mode: {times: 1}
data:
failCommands: ["insert"]
blockConnection: true
blockTimeMS: 3100
# Execute operation with timeout less than block time.
- name: insertOne
object: *collection
arguments:
timeoutMS: 50
document: {_id: 3, x: 1}
expectError:
isTimeoutError: true
# Execute a subsequent operation to complete the read.
- name: findOne
object: *collection
arguments:
filter: {_id: 1}
expectError:
isTimeoutError: true
expectEvents:
- client: *client
events:
- commandSucceededEvent:
commandName: ping # Pre-loading the connection pool.
- commandFailedEvent:
commandName: insert
# No second failed event since we timed out attempting to check out
# the connection for the second operation.
- client: *client
eventType: cmap
events:
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Ping finishes.
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Insert fails.
- connectionPendingResponseStarted: {}
- connectionPendingResponseFailed:
reason: timeout
- connectionClosedEvent:
reason: error
# Attempting a pending response read on a realistic timeout operation that can
# immediately read from the TCP buffer should complete the pending read and
# the connection should be checked out.
- description: "timeout op with response and no error"
operations:
# Run a ping command to pre-load the pool with a connection.
- name: runCommand
object: *database
arguments:
command:
ping: 1
commandName: ping
# Create a failpoint to block the first operation.
- name: failPoint
object: testRunner
arguments:
client: *failPointClient
failPoint:
configureFailPoint: failCommand
mode: {times: 1}
data:
failCommands: ["insert"]
blockConnection: true
blockTimeMS: 250
# Execute operation with timeout less than block time.
- name: insertOne
object: *collection
arguments:
timeoutMS: 75
document: {_id: 3, x: 1}
expectError:
isTimeoutError: true
# Execute a subsequent operation to complete the read.
- name: findOne
object: *collection
arguments:
timeoutMS: 200
filter: {_id: 1}
expectEvents:
- client: *client
events:
- commandSucceededEvent:
commandName: ping # Pre-loading the connection pool.
- commandFailedEvent:
commandName: insert
- commandSucceededEvent:
commandName: find
- client: *client
eventType: cmap
events:
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Ping finishes.
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Insert fails.
- connectionPendingResponseStarted: {}
- connectionPendingResponseSucceeded: {}
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Find succeeds.
# It may take multiple calls to the pending response handler to drain the
# inbound buffer.
- description: "multiple calls to drain buffer"
operations:
# Run a ping command to pre-load the pool with a connection.
- name: runCommand
object: *database
arguments:
command:
ping: 1
commandName: ping
# Create a failpoint to block the first and second operation.
- name: failPoint
object: testRunner
arguments:
client: *failPointClient
failPoint:
configureFailPoint: failCommand
mode: {times: 1}
data:
failCommands: ["insert"]
blockConnection: true
blockTimeMS: 500
# Execute operation with timeout less than block time.
- name: insertOne
object: *collection
arguments:
timeoutMS: 50
document: {_id: 3, x: 1}
expectError:
isTimeoutError: true
# Execute a subsequent operation with a timeout less than the block time.
- name: findOne
object: *collection
arguments:
timeoutMS: 50
filter: {_id: 1}
expectError:
isTimeoutError: true
# Execute a final operation to drain the buffer.
- name: findOne
object: *collection
arguments:
filter: {_id: 1}
expectEvents:
- client: *client
events:
- commandSucceededEvent:
commandName: ping # Pre-loading the connection pool.
- commandFailedEvent:
commandName: insert
- commandSucceededEvent:
commandName: find
- client: *client
eventType: cmap
events:
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Ping finishes.
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Insert fails.
- connectionPendingResponseStarted: {} # First find fails
- connectionPendingResponseFailed:
reason: timeout
- connectionPendingResponseStarted: {} # Second find drains the buffer.
- connectionPendingResponseSucceeded: {}
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Second find succeeds.
# If the connection is closed server-side while draining the response, the
# driver must close the connection.
- description: "connection closed server-side while draining response"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a sufficient check that if the awaitPendingResponse function fails with a non-timeout error the connection should be closed.

operations:
# Run a ping command to pre-load the pool with a connection.
- name: runCommand
object: *database
arguments:
command:
ping: 1
commandName: ping
# Create a failpoint to block the first and second operation.
- name: failPoint
object: testRunner
arguments:
client: *failPointClient
failPoint:
configureFailPoint: failCommand
mode: {times: 1}
data:
failCommands: ["insert"]
blockConnection: true
blockTimeMS: 500
closeConnection: true
# Execute operation with timeout less than block time.
- name: insertOne
object: *collection
arguments:
timeoutMS: 50
document: {_id: 3, x: 1}
expectError:
isTimeoutError: true
- name: wait
object: testRunner
arguments:
ms: 500
# Execute a subsequent operation with a timeout less than the block time.
- name: findOne
object: *collection
arguments:
timeoutMS: 50
filter: {_id: 1}
expectError:
isTimeoutError: false
expectEvents:
- client: *client
events:
- commandSucceededEvent:
commandName: ping # Pre-loading the connection pool.
- commandFailedEvent:
commandName: insert
- client: *client
eventType: cmap
events:
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Ping finishes.
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {} # Insert fails.
- connectionPendingResponseStarted: {} # First find fails
- connectionPendingResponseFailed:
reason: error
- connectionClosedEvent:
reason: error
Loading
Loading