Skip to content

Inconsistency in handling empty arguments #802

Open
@penelopeysm

Description

@penelopeysm

When differentiating with respect to an empty array, the results tend to vary:

using DifferentiationInterface, ForwardDiff, ReverseDiff, Mooncake, Enzyme

ADTYPES = [
    AutoForwardDiff(),
    AutoReverseDiff(),
    AutoMooncake(; config=nothing),
    AutoEnzyme(; mode=Forward),
    AutoEnzyme(; mode=Reverse),
    # and more...
]

for adtype in ADTYPES
    DifferentiationInterface.value_and_gradient(sum, adtype, Float64[])
end

ReverseDiff, Mooncake, and reverse Enzyme all happily return (0.0, []) 😄

Forward Enzyme tries to use a batch size of 0 and errors:

function DI.pick_batchsize(::AutoEnzyme, N::Integer)
B = DI.reasonable_batchsize(N, 16)
return DI.BatchSizeSettings{B}(N)
end

And ForwardDiff tries to construct a GradientResult which errors:

fc = DI.fix_tail(f, map(DI.unwrap, contexts)...)
result = GradientResult(x)
result = gradient!(result, fc, x)
return DR.value(result), DR.gradient(result)

https://github.com/JuliaDiff/DiffResults.jl/blob/fcf7858d393f0597fc74e195ed46f7bcbe5ff66c/src/DiffResults.jl#L64-L65

Funnily enough gradient with ForwardDiff (rather than value_and_gradient) is fine because it doesn't try to construct the GradientResult. I imagine the other operators would also have varying behaviour.

I suppose it is a bit of a trivial edge case, but would it be possible to unify the behaviour of the AD backends?

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendRelated to one or more autodiff backends

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions