Skip to content

[GR-45250] [GR-45734] Strict Dynamic Access Inference #11079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

graalvmbot
Copy link
Collaborator

@graalvmbot graalvmbot commented Apr 24, 2025

Problem description

Currently, the constant reflection analysis used by Native Image is optimization dependent. This can lead to unexpected results during image run-time when using reflection. For example, the Class.forName call in the following snippet will be folded by the analysis:

public static void main(String[] args) throws Throwable {
    System.out.println(wrapperDepthOne("SomeClass").getName());
}
 
private static Class<?> wrapperDepthOne(String className) throws Throwable {
    return wrapperDepthTwo(className);
}
 
private static Class<?> wrapperDepthTwo(String className) throws Throwable {
    return Class.forName(className);
}

However, adding a simple printing statement to either of the wrappers or toggling different optimizations (for example, -H:InlineBeforeAnalysisAllowedDepth=1) during build-time can cause the method to be non-inlinable and the Class.forName call won't be folded:

public static void main(String[] args) throws Throwable {
    System.out.println(wrapperDepthOne("SomeClass").getName());
}
 
private static Class<?> wrapperDepthOne(String className) throws Throwable {
    System.out.println("A bit of logging..."); // Added logging
    return wrapperDepthTwo(className);
}
 
private static Class<?> wrapperDepthTwo(String className) throws Throwable {
    return Class.forName(className); // Throws ClassNotFoundException / MissingReflectionRegistrationError
}

This is only a simple example of this kind of behavior. The inlining context can potentially be multiple levels deep, making it virtually impossible to reason about when it's safe to use reflective methods without specifying additional metadata. This also opens up the possibility of code breaking upon toggling different optimizations or seemingly unrelated changes in user code.

In short, the main issues with the current approach are:

  • Hard or impossible to reason about safe reflection usage without additional metadata;
  • Non-deterministic image run-time behavior w.r.t. reflection;
  • Blocks other projects which require a deterministic analysis for constant reflection.

Additions from this PR

This PR introduces a bytecode-level inference scheme for invocations which would otherwise require a reachability metadata entry. Some of the properties of this scheme are:

  • It is not susceptible to optimizations - the run-time behavior of the image has to be the same with any set of toggled optimizations;
  • A formal specification in terms of bytecode is possible;
  • It follows relatively simple, easy to explain rules in terms of Java source code;
  • It covers a large percentage of currently folded cases (219/231 in the case of Spring PetClinic (about 95%)).

Two new hosted options are introduced:

  • -H:StrictDynamicAccessInference=Disable|Warn|Enforce
    • Disable: Disable the strict mode and fall back to the optimization dependent inference for dynamic invocations.
    • Warn: Use optimization dependent inference for dynamic invocations, but print a warning for invocations inferred outside of the strict mode.
    • Enforce: Infer only dynamic invocations proven to be constant in the strict inference mode
  • -H:+ReportDynamicAccessInference
    • Generates a .json report of inferred dynamic access invocations.

Strict dynamic access inference scheme specification / description

A method is composed of a sequence of instructions. The analysis records the abstract frame state before the abstract interpretation of each instruction in that sequence. The abstract frame state is composed of a representation of the operand stack and local variable table, both containing abstract representation of values.

An abstract value can be either of the following:

  • Not a compile-time constant;
  • A compile-time constant, represented by a tuple (source BCI, inferred value);
    • The source BCI represents the BCI of the instruction which put that abstract value on the operand stack or in the local variable table;
    • The inferred value represents the actual value which would be observed during run-time if that method were executed;
    • We differentiate between array typed and non-array typed compile time constants.

A data flow analysis is then employed on the method until a fixed point is reached (no changes in the abstract frames are observed). The abstract frames are computed in the following way:

  • Each instruction has an input state, the abstract frame before its execution, which it transforms into an output state (which will become part of the input for all of its successors);
    • Propagation of compile time constants usually starts with constant pushing instructions (e.g., ACONST_NULL, ICONST_1, LDC, ...);
      • For example, an LDC instruction with BCI 6 and referencing a string "SomeClassName" in the constant pool would push a (6, "SomeClassName") compile-time constant onto the operand stack;
    • To also cover primitive type "class references" (e.g., int.class), GETSTATIC instructions referencing the appropriate boxed type's TYPE field push the corresponding Class object as a compile time constant;
      • For example, an INVOKESTATIC instruction with BCI 42 referencing the java.lang.Boolean.TYPE field pushes a (42, boolean.class) compile-time constant on the operand stack;
    • If the instruction is a variable store (ASTORE, ...) or a variable load (ALOAD, ...) instruction, and the value they are targeting is a non-array compile-time constant, then that value can be propagated (the source BCI must change);
      • For example, an ASTORE with BCI 8 and an operand which is a compile time constant (6, "SomeClassName") would put a compile-time constant (8, "SomeClassName") into the local variable table;
    • In order to cover as many cases as possible, we also propagate compile time constants through certain method invocations;
      • For example, if an INVOKESTATIC instruction calling java.lang.Class.forName(String className) has a compile time constant operand and the call can be executed without throwing an exception, a compile time constant holding the result can be pushed on the operand stack.
    • Since arrays are always mutable, we have to pay special attention when inferring array typed compile time constants;
      • Instructions creating a new array (NEWARRAY, ANEWARRAY, MULTIANEWARRAY) push a compile time constant on the operand stack if and only if all the count operands are compile time constants;
        • In that case, an abstract compile time array constant of the appropriate size with all default values is pushed onto the stack;
      • Instructions which modify the array by storing elements into it (e.g., AASTORE) and which have a compile time constant array reference operand (array source BCI, array value) have to modify all of the compile time constants corresponding to that array reference which are currently in the abstract frame (for which the source BCI is equal to the array source BCI), and not only the operand itself. The updating is done in the following way:
        • If the index and value operands are also compile time constants, then all matching compile time constants are updated by setting the element in the underlying array value and updating their source BCI to the BCI of the store instruction;
        • If either of those operands are not compile time constants, then all the matching compile time constants are replaced with a not a compile time constant marker;
      • To prevent possible escaping of array references and their modification in other methods, having a compile time constant array (array source BCI, array value) as an operand to method invocation, variable store or field store instructions replaces all the matching compile time constants (those whose source BCI is equal to the array source BCI) in the current abstract frame with a not a compile time constant marker;
  • If an instruction has multiple predecessor instructions (predecessors in the control flow graph of the method), their output states are merged into a single input state for the selected instruction;
    • Matching values on the operand stack and in the local variable table can be merged into a compile time constant if and only if they are themselves compile time constants and their source BCIs match;
    • By tracking and comparing source BCIs on every merge, there is no need to compare the actual inferred values - a matching source BCI implies a matching inferred value;
    • Otherwise, they are merged into a not a compile time constant abstract value.

Strict dynamic access inference in terms of Java source code

The inference scheme is formally defined in terms of bytecode, but can roughly be translated to the following, relatively simple rules in terms of Java code - we define compile-time constants; if all arguments to the dynamic access invocation are compile-time constants, we can infer it:

  • Constant expressions, as defined by the JLS, are compile-time constants (primitive type literals, String literals, final fields initialized with constant expressions);
private static final String CLASS_NAME = "SomeClass";
...
Class.forName(CLASS_NAME);
Class.forName("SomeClass");
  • Class literals are compile-time constants;
SomeClass.class.getField("SomeField");
  • null literals and casts on them are compile-time constants;
SomeClass.class.getConstructor((Class[]) null);
  • Direct array initializations where every element is a compile time constant are compile time constants;
SomeClass.getMethod("someMethod", new Class<?>[] { String.class, Integer.class });
SomeClass.getMethod("someMethod", String.class, Integer.class);
  • Names referring to non-array type local variables which are dominated by an assignment of a compile-time constant to that variable;
String className = "A";
if (someCondition()) {
    className = "B";
    Class.forName(className); // Can be inferred - returns Class B
}
Class.forName(className); // Cannot be inferred - throws MissingReflectionRegistrationError
  • Certain method invocations, if their arguments are compile-time constants, can propagate their value.
Class.forName("SomeClass").getField("SomeField");

Future work - annotations

The proposed analysis works within the bounds of a single method and only for simple patterns. In other cases, the user has to specify reachability metadata in external files. Those can be cumbersome for maintaining and writing, especially if done by hand (and not generated with the tracing agent). One alternative to this approach is specifying reachability metadata through annotations. One candidate for the structure of such hypothetical annotations would mimic the structure of reachability .json files:

@ReflectiveAccess(
    type         = <String>,
    fields       = <@FieldAccess[]>,
    constructors = <@ConstructorAccess[]>,
    methods      = <@MethodAccess[]>,
 
    allPublicFields         = <boolean>,
    allDeclaredFields       = <boolean>,
    allPublicConstructors   = <boolean>,
    allDeclaredConstructors = <boolean>,
    allPublicMethods        = <boolean>,
    allDeclaredMethods      = <boolean>
)
 
@FieldAccess(
    name = <String>
)
 
@ConstructorAccess(
    parameterTypes = <String[]>
)
 
@MethodAccess(
    name           = <String>,
    parameterTypes = <String[]>
)

Usage example:

@ReflectiveAccess(
    type="LoadedClass",
    fields={
        @FieldAccess(name="SOME_FIELD")
    },
    methods={
        @MethodAccess(name="increment", parameterTypes={"int"})
    }
)
public static void main(String[] args) throws Throwable {
    Class<?> clazz = Class.forName(opaque("LoadedClass")); // Local analysis cannot deduce the value of the argument because of the opaque method call - we have to use annotations
    System.out.println(clazz);
 
    Field f = clazz.getField("SOME_FIELD");
    System.out.println(f.get(null));
 
    Method m = clazz.getMethod("increment", int.class);
    System.out.println(m.invoke(null, 3));
}

A more realistic scenario where the proposed analysis fails would be in cases where the arguments of the reflective call depend on the arguments of the containing method. As an example, the following method is used in the Spring framework (defined in org.springframework.util.ClassUtils):

public static Method getMethod(Class<?> clazz, String methodName, Class<?>... paramTypes) {
    Assert.notNull(clazz, "Class must not be null");
    Assert.notNull(methodName, "Method name must not be null");
    if (paramTypes != null) {
        try {
            return clazz.getMethod(methodName, paramTypes);
        }
        catch (NoSuchMethodException ex) {
            throw new IllegalStateException("Expected method not found: " + ex);
        }
    }
    else {
        Set<Method> candidates = findMethodCandidatesByName(clazz, methodName);
        if (candidates.size() == 1) {
            return candidates.iterator().next();
        }
        else if (candidates.isEmpty()) {
            throw new IllegalStateException("Expected method not found: " + clazz.getName() + '.' + methodName);
        }
        else {
            throw new IllegalStateException("No unique method found: " + clazz.getName() + '.' + methodName);
        }
    }
}

The method calls Class.getMethod, but the arguments are passed from an outside context. A specific example of its call site in one place is:

private static final HandlerMethod PREFLIGHT_AMBIGUOUS_MATCH =
        new HandlerMethod(new EmptyHandler(), ClassUtils.getMethod(EmptyHandler.class, "handle"));

The arguments of ClassUtils.getMethod are constant and could be detected, but it is a user defined method which isn't subject to our analysis. To get around that, we would either need to extend the analysis to be interprocedural, which would be too inefficient and complicated to reason about, or extend the previously proposed annotations to interoperate with our analysis. As an example, the annotation could then look something like this:

@ReflectiveAccess(type="#1", methods={@MethodAccess(name="#2", parameterTypes="#3")})
public static Method getMethod(Class<?> clazz, String methodName, Class<?>... paramTypes) {
    ...
}

The "#n" tags would imply binding the annotation parameter to the value of that method's argument, effectively turning a method annotated with "@ReflectiveAccess" into an ordinary reflective method which from the point of view of our analysis. Upon detecting that a method invocation with the "@ReflectiveAccess" annotation has constant arguments, the appropriate types, fields and methods would be registered for reflection. With this strategy, one annotation could potentially replace a large number of reflection metadata entries.

Some benefits of annotations for reachability metadata in comparison to .json files are:

  • No need to maintain separate files - the metadata is specified directly in code, at the place where the reflective lookups actually occur;
  • One annotation potentially replaces a large number of metadata entries;
  • Implicit conditional registration - do not register for reflection if call site isn't reachable;
  • Possible to warn users if an annotated (or base reflective) call is invoked without the required constant parameters - if there are no warnings, library is proven to be safe for usage w.r.t. reflection.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 24, 2025
@graalvmbot graalvmbot force-pushed the alekstef/GR-45250-GR-45734-bytecode-level-reflection-analysis branch 3 times, most recently from 51bf4e1 to 50a8e02 Compare May 13, 2025 12:33
@graalvmbot graalvmbot force-pushed the alekstef/GR-45250-GR-45734-bytecode-level-reflection-analysis branch 14 times, most recently from db34a29 to 5a4d449 Compare May 22, 2025 10:53
@graalvmbot graalvmbot force-pushed the alekstef/GR-45250-GR-45734-bytecode-level-reflection-analysis branch 12 times, most recently from 4d9ed1b to 385e93e Compare May 27, 2025 13:33
@graalvmbot graalvmbot force-pushed the alekstef/GR-45250-GR-45734-bytecode-level-reflection-analysis branch 9 times, most recently from e236ee3 to d3b758d Compare June 5, 2025 06:07
@graalvmbot graalvmbot changed the title [GR-45250][GR-45734] Reachability proofs for reflective operations [GR-45250] [GR-45734] Reachability proofs for reflective operations Jun 5, 2025
@graalvmbot graalvmbot force-pushed the alekstef/GR-45250-GR-45734-bytecode-level-reflection-analysis branch 2 times, most recently from a2a6d67 to d19698f Compare June 6, 2025 09:55
@graalvmbot graalvmbot force-pushed the alekstef/GR-45250-GR-45734-bytecode-level-reflection-analysis branch from 6499016 to e6f31c8 Compare June 9, 2025 08:23
@graalvmbot graalvmbot force-pushed the alekstef/GR-45250-GR-45734-bytecode-level-reflection-analysis branch from 831bce9 to 9a5faec Compare June 25, 2025 12:04
@graalvmbot graalvmbot changed the title [GR-45250] [GR-45734] Reachability proofs for reflective operations [GR-45250] [GR-45734] Strict Dynamic Access Inference Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants