-
Notifications
You must be signed in to change notification settings - Fork 3.9k
11622 : OutlierDetection should use Ticker, not TimeProvider #12110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
ec44974
8f513b7
debbd7a
3735a6c
c831f1e
b8c0e0c
aaf5016
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ | |
import static java.util.concurrent.TimeUnit.NANOSECONDS; | ||
|
||
import com.google.common.annotations.VisibleForTesting; | ||
import com.google.common.base.Ticker; | ||
import com.google.common.collect.ForwardingMap; | ||
import com.google.common.collect.ImmutableList; | ||
import com.google.common.collect.ImmutableSet; | ||
|
@@ -39,7 +40,6 @@ | |
import io.grpc.Status; | ||
import io.grpc.SynchronizationContext; | ||
import io.grpc.SynchronizationContext.ScheduledHandle; | ||
import io.grpc.internal.TimeProvider; | ||
import java.net.SocketAddress; | ||
import java.util.ArrayList; | ||
import java.util.Collection; | ||
|
@@ -82,7 +82,7 @@ public final class OutlierDetectionLoadBalancer extends LoadBalancer { | |
private final SynchronizationContext syncContext; | ||
private final Helper childHelper; | ||
private final GracefulSwitchLoadBalancer switchLb; | ||
private TimeProvider timeProvider; | ||
private Ticker ticker; | ||
private final ScheduledExecutorService timeService; | ||
private ScheduledHandle detectionTimerHandle; | ||
private Long detectionTimerStartNanos; | ||
|
@@ -95,14 +95,14 @@ public final class OutlierDetectionLoadBalancer extends LoadBalancer { | |
/** | ||
* Creates a new instance of {@link OutlierDetectionLoadBalancer}. | ||
*/ | ||
public OutlierDetectionLoadBalancer(Helper helper, TimeProvider timeProvider) { | ||
public OutlierDetectionLoadBalancer(Helper helper, Ticker ticker) { | ||
logger = helper.getChannelLogger(); | ||
childHelper = new ChildHelper(checkNotNull(helper, "helper")); | ||
switchLb = new GracefulSwitchLoadBalancer(childHelper); | ||
endpointTrackerMap = new EndpointTrackerMap(); | ||
this.syncContext = checkNotNull(helper.getSynchronizationContext(), "syncContext"); | ||
this.timeService = checkNotNull(helper.getScheduledExecutorService(), "timeService"); | ||
this.timeProvider = timeProvider; | ||
this.ticker = ticker; | ||
logger.log(ChannelLogLevel.DEBUG, "OutlierDetection lb created."); | ||
} | ||
|
||
|
@@ -151,7 +151,7 @@ public Status acceptResolvedAddresses(ResolvedAddresses resolvedAddresses) { | |
// If a timer has started earlier we cancel it and use the difference between the start | ||
// time and now as the interval. | ||
initialDelayNanos = Math.max(0L, | ||
config.intervalNanos - (timeProvider.currentTimeNanos() - detectionTimerStartNanos)); | ||
config.intervalNanos - (ticker.read() - detectionTimerStartNanos)); | ||
} | ||
|
||
// If a timer has been previously created we need to cancel it and reset all the call counters | ||
|
@@ -201,7 +201,7 @@ class DetectionTimer implements Runnable { | |
|
||
@Override | ||
public void run() { | ||
detectionTimerStartNanos = timeProvider.currentTimeNanos(); | ||
detectionTimerStartNanos = ticker.read(); | ||
|
||
endpointTrackerMap.swapCounters(); | ||
|
||
|
@@ -638,7 +638,7 @@ public boolean maxEjectionTimeElapsed(long currentTimeNanos) { | |
config.baseEjectionTimeNanos * ejectionTimeMultiplier, | ||
maxEjectionDurationSecs); | ||
|
||
return currentTimeNanos > maxEjectionTimeNanos; | ||
return currentTimeNanos - maxEjectionTimeNanos > 0; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. It is still not solving the complete problem. As it was pointed before, you just need follow the variable in which you're storing ticker.read() and you will find where it is being used in an incorrect way. Oh I see, you reached to this point, it's just 2 lines above where you need to see. Probably there could be some more places, idk you need to check. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've not looked really looked at all the changes yet, but this does have the correct shape and is the case I had noticed. @shivaspeaks, what's happening here is the subtraction may overflow/underflow in order to produce the correct result, as the nanotime itself is allowed to overflow during the execution of the process. But the difference will still be accurate. Doing the subtraction essentially "removes" nanotime from the result ((nanotime + A) - (nanoTime + B) = A - B), independent of all overflow/underflow, as long as A-B itself isn't too large/small to fit in a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but that subtraction needs to be done in one equation. Storing in a variable itself can also overflow. That's what I was referring to by saying algebraic manipulation. It's just we are trying to avoid any addition and storing in long (which potentially can overflow/underflow). We can do something like this: Both works fine. The first one will also work since it's in one equation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh the second one will not work, we should avoid that. We are doing same mistake as original if we go by the second way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, so let me put it this way. There are two problems here overlapped.
I over looked 1st one. I had it first but this time while reviewing I totally over looked what javadoc was saying. So yes, Eric is right about it. This change does solve that. But should we also try to solve 2nd concern here? @ejona86 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Storing it in a variable has no impact to the result, as long as the variable is not smaller than the original operands. Floating point sometimes can get messy with the order of operations, but integer math is very safe for addition/subtraction (in Java). Try it out:
Signed addition/subtraction is identical in twos complement as unsigned addition/subtraction. So there's actually not any data lost in the above; just some of the result is encoded in the sign bit. But the same actually holds when the high bits are truncated. If you reverse the operation, information is not lost.
The (Note that C++ allows unsigned integer overflow/underflow but considers signed integer overflow/underflow undefined behavior. So in other languages, you might have to be careful. The process is fine with all this, but there can be per-language details.) You may be confused some by Java auto-promoting types to integer (or long, or double). Those rules are confusing, but it isn't relevant in this case because the intermediate results are
The first shows |
||
} | ||
|
||
/** Tracks both successful and failed call counts. */ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you need to audit fully, if somewhere in inner methods it is being used in unwanted way - like it is mentioned in javadoc. The comparison should always be done doing subtraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have taken the report using grep commands and verified with the IDE search with ticker.read() but I did not see the scenario where the ticker.read() is directly used in finding the difference without doing the subtraction and We are frequently using the this ticker.read() in junit's but not seen it's used in unwanted way
I can see the ticker.read() in a few below implementation classes but not seen it's used in an unwanted way , please find the attached Audit_TR.txt for Your reference.
Aduit_ticker_read.txt
AbstractNettyHandler.java
NettyServerHandler.java
CachingRlsLbClient.java
LinkedHashLruCache.java
AdaptiveThrottler.java
PingTracker.java
OutlierDetectionLoadBalancer.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to audit all the files. This single file has such a bug. I'd outright tell you, but then I have to audit it for any other missed cases, because clearly you aren't finding them, and that is literally the only interesting part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and the guidance , I have done a careful audit on the OutlierDetectionLoadBalancer method calls and noticed ticker.read() has invoked two api's
We have already had discussion on the below API and we are good with existing code as it's already been using subtraction to find the time difference instead of using > or < directly in finding the time difference and we are intended to the System.nanoTime() documentation
initialDelayNanos = Math.max(0L,
config.intervalNanos - (ticker.read() - detectionTimerStartNanos));
I'm hoping our long discussions on finding the bug in this API and I have observed the maybeUnejectOutliers method has invoked it in run() with detectionTimerStartNanos and which is assigned with ticker.read() and using the currentTimeNanos > maxEjectionTimeNanos expression while returning the boolean value if the currentTimeNanos is after the maxEjectionTimeNanos on maxEjectionTimeElapsed
endpointTrackerMap.maybeUnejectOutliers(detectionTimerStartNanos);
grpc-java/util/src/main/java/io/grpc/util/OutlierDetectionLoadBalancer.java
Line 212 in b8c0e0c
I have addressed this issue in the latest commit , please review it and let me know Your thoughts if I have missed to find any other bugs in the OutlierDetectionLoadBalancer