-
Notifications
You must be signed in to change notification settings - Fork 882
Memory leak as a result of no cleanup for ThreadLocal in CodedOutputStream?? #7083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looking at this class, one of the tricky things we need to make sure in order to fix this class is that the thread that created the object and its threadlocal variable needs to also be the thread that cleans up the threadlocal variable. That is, suppose you create an expose a new method on the class to allow proper cleanup public void cleanup()
{
THREAD_LOCAL_CODED_OUTPUT_STREAM.remove()
} If a thread1 creates and uses an In summary, thread1 has to do the cleanup for the EncodedOutputStream that was created and used within thread1; other threads cannot cleanup the threadlocal variable created on other threads. |
Looking at the code, it appears both CodedOutputStream and ProtoSerializer use threadlocals without cleanup. Therefore the area of interest is: Marshaler.writeBinaryTo(). Once this is called, you have (technically 2) memory leaks because this method creates a ProtoSerializer (which sets the threadLocal in that class, which doesn't have any cleanup), and this also creates a CodedOutputStream (which sets the threadlocal in that class, which doesn't have any cleanup). Marshaler.writeBinaryTo() is the only place where a ProtoSerializer is created, and the ProtoSerializer is the only place the CodedOutputStream is created. The simplest fix is to make use of the fact ProtoSerializer implements AutoCloseable.
public void cleanup()
{
THREAD_LOCAL_CODED_OUTPUT_STREAM.remove()
}
@Override
public void close() throws IOException {
try {
output.flush();
output.cleanup(); // NEW: fixes memory leak 1 by removing threadlocal from thread when "done"
idCache.clear(); // see my next github comment about removing the threadlocal in ProtoSerializer class
} catch (IOException e) {
// If close is called automatically as part of try-with-resources, it's possible that
// output.flush() will throw the same exception. Re-throwing the same exception in a finally
// block triggers an IllegalArgumentException indicating illegal self suppression. To avoid
// this, we wrap the exception so a different instance is thrown.
throw new IOException(e);
}
} |
Here's a way to replace the THREAD_LOCAL_ID_CACHE in ProtoSerializer. Use a "global" ConcurrentHashMap to store id conversions. Pros:
import java.util.concurrent.ConcurrentHashMap;
private static final int CACHE_MAX_SIZE = 10_000; // Adjust as needed
private static final ConcurrentHashMap<String, byte[]> GLOBAL_ID_CACHE = new ConcurrentHashMap<>();
private static byte[] getCachedOrCompute(String id, int length) {
if (GLOBAL_ID_CACHE.size() > CACHE_MAX_SIZE) {
GLOBAL_ID_CACHE.clear();
}
return GLOBAL_ID_CACHE.computeIfAbsent(id, key ->
OtelEncodingUtils.bytesFromBase16(key, length)
);
} |
Yes I think this is indeed a (small) memory leak. OTLP exporters initialize with a thread pool, which is shutdown upon close. But the thread local variables are not cleaned up. But I'm curious how you came across this? The number of threads is small and the memory allocated to this thread local is small. I would think you would need to start / shut down many SDK exporter instances before this problem becomes noticeable. Is this the case? (BTW, I do think we should fix this. Just curious how you managed to find it 🙂) |
I think it's fairly common to use ThreadLocal for this kind of optimization, I suspect the issue here is that the object stored in the ThreadLocal is not a base |
Yes...so, the TLDR is that the specific application I work on is a spring-based java web application, and the configuration for the web application becomes available as a Spring bean early in the spring life cycle. That configuration mentions things like "ProductName", which is necessary in order to create the underlying opentelemetry Resource object. This is somewhat limiting for observability because it would mean you can't observe any execution of statically defined code nor observe spring activities prior to the configuration bean becoming available. So, what I'm doing is creating a "fake" resource and opentelemetry object, using that to observe startup activities, emitting the observed data into an in-memory "catcher", and then when the configuration bean becomes available, I rebuild the opentelemetry object with the new product details and export the data that was caught during startup. I am "rebuilding" the Opentelemetry object because its immutable, and so to achieve what I want it seems I need to create a new OpenTelemetry object and shut down the old one. So, at this point, yes, I am destroying 3 ish "real" exporters (that export to an inmemory data structure) and creating 3 new ones (that export to a real external location). |
Even if you don't do what I do above (i.e. you only create the exporters once for the entire life-cycle of the application), you may still be at risk of the memory leak if the thread that executes the java code is not owned by the application. For example, if the java code is executed by a thread owned by the long-running Tomcat application server, then it creates a threadlocal on the application server's thread, and you've created the memory leak because that's a thread that lives beyond the lifecycle of the application. If the code is executed by an ExecutorService created by the application (with its own threads), then I think if you issue a proper shutdown call then you won't get a leak because the shutdown call also destroys the threads. I think. |
It sounds like teh question to ask is: "Is the ThreadPool we are creating re-using existing threads, or is it creating its own brand new threads?". If it's the first case then you get a leak if you don't clean up the threads. If you're creating new threads, then all you need to do is shutdown the threadpool properly. |
After a decent amount of work I was able to reproduce this. ThreadLocals are released when threads stop, and so as long as all the threads that use CodedOutputStream are stopped, then the ThreadLocals are released and GC'd. The OTLP exporters all create and manage their own thread pools and clean them up upon shutdown, so what gives? It turns out that BatchSpanProcessor is the problem! There's a race condition where BatchSpanProcessor#shutdown()'s CompletableResultCode returns and resolves before its worker thread is shut down and (possibly?) before its exporter shutdown resolves. When I fixed this (in a scrappy way) locally, the error went away. Will work on getting a proper fix out. |
I am indeed using the batch span processor. When you commit the changes to your main branch, can you comment here? I'll see if patching my local jars with those changes will the solve problem for me too. |
Your analysis sounds right. If you shutdown the top level thing that spawned the threads (ExecutorService, threadpool, etc), and that thing isn't re-using existing threads (such as from an application server like tomcat), then it is okay technically if those threads used threadlocals. This is because the shutdown of the top-level thing will allow the Threads spawned by it to be stopped, which allows the GC of the threadlocal storage, and thus the references to the class loader that loaded your application classes (so on)--allowing everything to be GC'd. But! if you forget to shutdown the top level constructs (ExecutorService, ThreadPools) AND the spawned threads DID use threadlocal storage, then you're hit with a bigger problem: every web application restart will result in a class loader memory leak preventing a lot from being GC'd. A single reference to any application class in ThreadLocal storage will prevent the GC of everything loaded by that class loader. In other words, using Threadlocals stabs you in the back in this case because it takes something very possible (forgetting to explicitly shutdown an ExecutorService on application restart) and creates a real problem out of it (memory leak). |
(I've written a long block below of something I thought was causing a leak, but now I am not entirely sure it was a cause for a leak). I am beginning to suspect my specific memory leak has to do with the following below, but it surfaces an important issue. So when I construct an OpenTelemetry object, I am injecting an exporter but in addition asking that exporter to do an export call with the caught opentelemetry data during startup. I create the OpenTelemetry exporters (and the overall OpenTelemetry object) in an application thread, and on the same application thread I ask the exporter(s) to export that data for me that was caught during startup. This is the key detail: because I am invoking a method on your exporter in an application thread, your exporter creates a thread local on my application thread, which pollutes the thread of my application. The reason the exporter works fine in your BatchSpanProcessor is because all of its usage is contained in a thread you created, which is eventually gc'd. However, if any developer (in their application thread) invokes a method on your classes that uses underlying threadlocals, you will end up putting threadlocals on their threads (unless you internally invoke that logic on a new thread you create). Although it is dirty, one workaround I have in mind (trying it now) is to have my direct invocation of export (which happens approximately once in an application lifecycle) to be on a new ExecutorService I create and immediately shutdown (to make sure I do not directly invoke methods on your classes that use threadlocal storage on the application thread that might be owned by tomcat/websphere/so on). In summary, I think this hints that any classes you have that use threadlocal storage should remain as hidden as possible from external users--so you can control invocations of those methods that create threadlocals to be only on threads you create, or you refactor to incorporate clean up of threadlocals (so it becomes always safe to invoke the methods of the classes from any thread). I'm spitballing out loud though |
random followup: making some changes above, I get fewer class loader leak warnings from tomcat at shutdown, but i still get 1 leak. If I take a heap dump right at application shutdown, it appears the threadlocal is held on the BatchSpanProcessor_WorkerThread-1. I am fairly certain I am shutting it down via the top level OpenTelemetry shutdown call. Will look more closely still |
Discussed in #7082
Originally posted by asclark109 February 7, 2025
I am using the io.opentelemetry:opentelemetry-exporter-common:1.38.0 jar in my java web application project running on Tomcat 10. I am getting memory leaks at application shutdown (one is a io.netty.util.internal.InternalThreadLocalMap that is tracked in Netty). The other appears below.
I have looked at your class in release 1.38.0 and on main: CodedOutputStream.java
I notice that a ThreadLocal is created and updated but never cleaned up (i.e. there is no call to do
THREAD_LOCAL_CODED_OUTPUT_STREAM.remove()
).opentelemetry-java/exporters/common/src/main/java/io/opentelemetry/exporter/internal/marshal/CodedOutputStream.java
Lines 85 to 104 in 30d16eb
If someone can offer help to get around this (or patch a fix), it would be appreciated. thanks.
Only discussion page I could find on ThreadLocals #6584
The text was updated successfully, but these errors were encountered: