Skip to content

Make CommonSubexprEliminate faster by stop copying so many strings #10426

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Part of #5637

One of the optimizer passes is "common subexpression elimination" that removes redundant computation

However, as @peter-toth noted on #10396 and the CSE code says

/// Identifier for each subexpression.
///
/// Note that the current implementation uses the `Display` of an expression
/// (a `String`) as `Identifier`.
///
/// An identifier should (ideally) be able to "hash", "accumulate", "equal" and "have no
/// collision (as low as possible)"
///
/// Since an identifier is likely to be copied many times, it is better that an identifier
/// is small or "copy". otherwise some kinds of reference count is needed. String description
/// here is not such a good choose.
type Identifier = String;

The way it tracks common subexpressions is with string manipulation is is non ideal for several reasons (including the cost of creating those strings)

Describe the solution you'd like

Revisit the identifiers as using these string identifiers as the keys of ExprStats was not the best choice. Please note this is how CSE has been working since the feature was added initially.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions