Skip to content

Memory leak as a result of no cleanup for ThreadLocal in CodedOutputStream?? #7083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asclark109 opened this issue Feb 7, 2025 · 14 comments

Comments

@asclark109
Copy link

asclark109 commented Feb 7, 2025

Discussed in #7082

Originally posted by asclark109 February 7, 2025
I am using the io.opentelemetry:opentelemetry-exporter-common:1.38.0 jar in my java web application project running on Tomcat 10. I am getting memory leaks at application shutdown (one is a io.netty.util.internal.InternalThreadLocalMap that is tracked in Netty). The other appears below.

07-Feb-2025 12:36:08.735 SEVERE [main] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [agtest-trunk] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@658c5a19]) and a value of type [io.opentelemetry.exporter.internal.marshal.CodedOutputStream.OutputStreamEncoder] (value [io.opentelemetry.exporter.internal.marshal.CodedOutputStream$OutputStreamEncoder@421e361]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

I have looked at your class in release 1.38.0 and on main: CodedOutputStream.java

I notice that a ThreadLocal is created and updated but never cleaned up (i.e. there is no call to do THREAD_LOCAL_CODED_OUTPUT_STREAM.remove()).

private static final ThreadLocal<OutputStreamEncoder> THREAD_LOCAL_CODED_OUTPUT_STREAM =
new ThreadLocal<>();
/**
* Create a new {@code CodedOutputStream} wrapping the given {@code OutputStream}.
*
* <p>NOTE: The provided {@link OutputStream} <strong>MUST NOT</strong> retain access or modify
* the provided byte arrays. Doing so may result in corrupted data, which would be difficult to
* debug.
*/
static CodedOutputStream newInstance(final OutputStream output) {
OutputStreamEncoder cos = THREAD_LOCAL_CODED_OUTPUT_STREAM.get();
if (cos == null) {
cos = new OutputStreamEncoder(output);
THREAD_LOCAL_CODED_OUTPUT_STREAM.set(cos);
} else {
cos.reset(output);
}
return cos;
}

If someone can offer help to get around this (or patch a fix), it would be appreciated. thanks.

Only discussion page I could find on ThreadLocals #6584

@asclark109
Copy link
Author

asclark109 commented Feb 7, 2025

Looking at this class, one of the tricky things we need to make sure in order to fix this class is that the thread that created the object and its threadlocal variable needs to also be the thread that cleans up the threadlocal variable.

That is, suppose you create an expose a new method on the class to allow proper cleanup

public void cleanup()
{
  THREAD_LOCAL_CODED_OUTPUT_STREAM.remove()
}

If a thread1 creates and uses an eos EncodedOutputStream, it creates a threadlocal variable on thread1. If a thread2 comes along and invokes eos.cleanup(), this will NOT actually cleanup the threadlocal variable on thread1 because that variable only exists on thread1--not thread2. That's important to point out I think.

In summary, thread1 has to do the cleanup for the EncodedOutputStream that was created and used within thread1; other threads cannot cleanup the threadlocal variable created on other threads.

@asclark109
Copy link
Author

asclark109 commented Feb 10, 2025

Looking at the code, it appears both CodedOutputStream and ProtoSerializer use threadlocals without cleanup. Therefore the area of interest is:

Marshaler.writeBinaryTo(). Once this is called, you have (technically 2) memory leaks because this method creates a ProtoSerializer (which sets the threadLocal in that class, which doesn't have any cleanup), and this also creates a CodedOutputStream (which sets the threadlocal in that class, which doesn't have any cleanup).

Marshaler.writeBinaryTo() is the only place where a ProtoSerializer is created, and the ProtoSerializer is the only place the CodedOutputStream is created.

The simplest fix is to make use of the fact ProtoSerializer implements AutoCloseable.

  1. add the method CodedOutputStream.cleanup(), which does:
public void cleanup()
{
  THREAD_LOCAL_CODED_OUTPUT_STREAM.remove()
}
  1. in ProtoSerializer.close(), invoke output.cleanup(). This will fix memory leak in CodedOutputStream. If you also want to remove the leak in ProtoSerializer (which I saw via inspection but didn't detect yet), you also invoke idCache.remove(). This will effectively remove your caching mechanism until you change the implementation to not use threadlocals (i.e. a different kind of caching) or to use threadLocals very carefully.
  @Override
  public void close() throws IOException {
    try {
      output.flush();
      output.cleanup(); // NEW: fixes memory leak 1 by removing threadlocal from thread when "done"
      idCache.clear(); // see my next github comment about removing the threadlocal in ProtoSerializer class
    } catch (IOException e) {
      // If close is called automatically as part of try-with-resources, it's possible that
      // output.flush() will throw the same exception. Re-throwing the same exception in a finally
      // block triggers an IllegalArgumentException indicating illegal self suppression. To avoid
      // this, we wrap the exception so a different instance is thrown.
      throw new IOException(e);
    }
  }

@asclark109
Copy link
Author

asclark109 commented Feb 10, 2025

Here's a way to replace the THREAD_LOCAL_ID_CACHE in ProtoSerializer. Use a "global" ConcurrentHashMap to store id conversions. Pros:

  1. no threadlocals (avoid class loader memory leaks) therefore does not pollute threads
  2. works well in multi-threaded environments
  3. has an eviction policy to prevent unbouunded growth
import java.util.concurrent.ConcurrentHashMap;

private static final int CACHE_MAX_SIZE = 10_000;  // Adjust as needed
private static final ConcurrentHashMap<String, byte[]> GLOBAL_ID_CACHE = new ConcurrentHashMap<>();

private static byte[] getCachedOrCompute(String id, int length) {
    if (GLOBAL_ID_CACHE.size() > CACHE_MAX_SIZE) {
        GLOBAL_ID_CACHE.clear();
    }

    return GLOBAL_ID_CACHE.computeIfAbsent(id, key -> 
        OtelEncodingUtils.bytesFromBase16(key, length)
    );
}

@jack-berg
Copy link
Member

I am getting memory leaks at application shutdown (one is a io.netty.util.internal.InternalThreadLocalMap that is tracked in Netty). The other appears below.

Yes I think this is indeed a (small) memory leak. OTLP exporters initialize with a thread pool, which is shutdown upon close. But the thread local variables are not cleaned up. But I'm curious how you came across this? The number of threads is small and the memory allocated to this thread local is small. I would think you would need to start / shut down many SDK exporter instances before this problem becomes noticeable. Is this the case?

(BTW, I do think we should fix this. Just curious how you managed to find it 🙂)

@trask
Copy link
Member

trask commented Feb 18, 2025

I think it's fairly common to use ThreadLocal for this kind of optimization, I suspect the issue here is that the object stored in the ThreadLocal is not a base java.* class (and so not in the bootstrap class loader), and so it prevents unloading of the class loader that our library is in

@asclark109
Copy link
Author

asclark109 commented Feb 18, 2025

I am getting memory leaks at application shutdown (one is a io.netty.util.internal.InternalThreadLocalMap that is tracked in Netty). The other appears below.

Yes I think this is indeed a (small) memory leak. OTLP exporters initialize with a thread pool, which is shutdown upon close. But the thread local variables are not cleaned up. But I'm curious how you came across this? The number of threads is small and the memory allocated to this thread local is small. I would think you would need to start / shut down many SDK exporter instances before this problem becomes noticeable. Is this the case?

(BTW, I do think we should fix this. Just curious how you managed to find it 🙂)

Yes...so, the TLDR is that the specific application I work on is a spring-based java web application, and the configuration for the web application becomes available as a Spring bean early in the spring life cycle. That configuration mentions things like "ProductName", which is necessary in order to create the underlying opentelemetry Resource object. This is somewhat limiting for observability because it would mean you can't observe any execution of statically defined code nor observe spring activities prior to the configuration bean becoming available.

So, what I'm doing is creating a "fake" resource and opentelemetry object, using that to observe startup activities, emitting the observed data into an in-memory "catcher", and then when the configuration bean becomes available, I rebuild the opentelemetry object with the new product details and export the data that was caught during startup. I am "rebuilding" the Opentelemetry object because its immutable, and so to achieve what I want it seems I need to create a new OpenTelemetry object and shut down the old one.

So, at this point, yes, I am destroying 3 ish "real" exporters (that export to an inmemory data structure) and creating 3 new ones (that export to a real external location).

@asclark109
Copy link
Author

asclark109 commented Feb 18, 2025

Even if you don't do what I do above (i.e. you only create the exporters once for the entire life-cycle of the application), you may still be at risk of the memory leak if the thread that executes the java code is not owned by the application.

For example, if the java code is executed by a thread owned by the long-running Tomcat application server, then it creates a threadlocal on the application server's thread, and you've created the memory leak because that's a thread that lives beyond the lifecycle of the application. If the code is executed by an ExecutorService created by the application (with its own threads), then I think if you issue a proper shutdown call then you won't get a leak because the shutdown call also destroys the threads. I think.

@asclark109
Copy link
Author

It sounds like teh question to ask is:

"Is the ThreadPool we are creating re-using existing threads, or is it creating its own brand new threads?". If it's the first case then you get a leak if you don't clean up the threads. If you're creating new threads, then all you need to do is shutdown the threadpool properly.

@jack-berg
Copy link
Member

After a decent amount of work I was able to reproduce this.

ThreadLocals are released when threads stop, and so as long as all the threads that use CodedOutputStream are stopped, then the ThreadLocals are released and GC'd. The OTLP exporters all create and manage their own thread pools and clean them up upon shutdown, so what gives?

It turns out that BatchSpanProcessor is the problem! There's a race condition where BatchSpanProcessor#shutdown()'s CompletableResultCode returns and resolves before its worker thread is shut down and (possibly?) before its exporter shutdown resolves. When I fixed this (in a scrappy way) locally, the error went away.

Will work on getting a proper fix out.

@asclark109
Copy link
Author

asclark109 commented Apr 9, 2025

I am indeed using the batch span processor. When you commit the changes to your main branch, can you comment here? I'll see if patching my local jars with those changes will the solve problem for me too.

@asclark109
Copy link
Author

Your analysis sounds right. If you shutdown the top level thing that spawned the threads (ExecutorService, threadpool, etc), and that thing isn't re-using existing threads (such as from an application server like tomcat), then it is okay technically if those threads used threadlocals. This is because the shutdown of the top-level thing will allow the Threads spawned by it to be stopped, which allows the GC of the threadlocal storage, and thus the references to the class loader that loaded your application classes (so on)--allowing everything to be GC'd.

But! if you forget to shutdown the top level constructs (ExecutorService, ThreadPools) AND the spawned threads DID use threadlocal storage, then you're hit with a bigger problem: every web application restart will result in a class loader memory leak preventing a lot from being GC'd. A single reference to any application class in ThreadLocal storage will prevent the GC of everything loaded by that class loader.

In other words, using Threadlocals stabs you in the back in this case because it takes something very possible (forgetting to explicitly shutdown an ExecutorService on application restart) and creates a real problem out of it (memory leak).

@asclark109
Copy link
Author

asclark109 commented Apr 15, 2025

(I've written a long block below of something I thought was causing a leak, but now I am not entirely sure it was a cause for a leak).

I am beginning to suspect my specific memory leak has to do with the following below, but it surfaces an important issue.
So, because of the limitations of the OpenTelemetry object being an immutable create-once-at-application-start type object, I have a limitation whereby in order to observe before and during spring startup, I create an OpenTelemetry object with minimal configuration, catching all the collected data in memory (I have in memory data structures that catch the emitted data). When spring is fully up, and beans that define configuration information are available, I then have enough information to create a new OpenTelemetry object (with all legit application-level configuration information). I create a new OpenTelemetry and shutdown the old one. With all the data caught during startup, I re-export that data using the new exporters injected into the new OpenTelemetry object. This leads to a memory leak described below:

So when I construct an OpenTelemetry object, I am injecting an exporter but in addition asking that exporter to do an export call with the caught opentelemetry data during startup. I create the OpenTelemetry exporters (and the overall OpenTelemetry object) in an application thread, and on the same application thread I ask the exporter(s) to export that data for me that was caught during startup. This is the key detail: because I am invoking a method on your exporter in an application thread, your exporter creates a thread local on my application thread, which pollutes the thread of my application. The reason the exporter works fine in your BatchSpanProcessor is because all of its usage is contained in a thread you created, which is eventually gc'd. However, if any developer (in their application thread) invokes a method on your classes that uses underlying threadlocals, you will end up putting threadlocals on their threads (unless you internally invoke that logic on a new thread you create).

Although it is dirty, one workaround I have in mind (trying it now) is to have my direct invocation of export (which happens approximately once in an application lifecycle) to be on a new ExecutorService I create and immediately shutdown (to make sure I do not directly invoke methods on your classes that use threadlocal storage on the application thread that might be owned by tomcat/websphere/so on). In summary, I think this hints that any classes you have that use threadlocal storage should remain as hidden as possible from external users--so you can control invocations of those methods that create threadlocals to be only on threads you create, or you refactor to incorporate clean up of threadlocals (so it becomes always safe to invoke the methods of the classes from any thread). I'm spitballing out loud though

@asclark109
Copy link
Author

asclark109 commented Apr 15, 2025

random followup: making some changes above, I get fewer class loader leak warnings from tomcat at shutdown, but i still get 1 leak. If I take a heap dump right at application shutdown, it appears the threadlocal is held on the BatchSpanProcessor_WorkerThread-1. I am fairly certain I am shutting it down via the top level OpenTelemetry shutdown call. Will look more closely still

Image

@asclark109
Copy link
Author

asclark109 commented Apr 15, 2025

Image

feels like it could have to do with BatchSpanProcessor.shutdown() merely returning worker.shutdown() (and not also waiting for the worker thread to complete)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants