-
Notifications
You must be signed in to change notification settings - Fork 0
Updated ruleBlockIfNoExit to better select branch if both are noexit. #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe changes introduce new methods to compute and query the nesting depth of control flow blocks in the decompiler's internal representation. Several block classes now override a virtual Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Decompiler
participant FlowBlock
participant BlockGraph
participant BlockList/BlockCondition/BlockSwitch/etc.
User->>Decompiler: Request block nesting depth
Decompiler->>FlowBlock: getBlockDepth()
FlowBlock-->>Decompiler: Return 0 or overridden value
Decompiler->>BlockGraph: getBlockDepth()
BlockGraph->>BlockGraph: getInnerBlockDepth()
BlockGraph->>Contained Blocks: getBlockDepth() (recursive)
Contained Blocks-->>BlockGraph: Return depth
BlockGraph-->>Decompiler: Return max depth + 1
Decompiler-->>User: Return overall block depth
sequenceDiagram
participant CollapseStructure
participant FlowBlock
participant selectBestNoExit
CollapseStructure->>FlowBlock: Examine outgoing edges for no-exit clauses
FlowBlock-->>CollapseStructure: Return candidate clauses
CollapseStructure->>selectBestNoExit: Compare two clauses
selectBestNoExit-->>CollapseStructure: Return best clause index
CollapseStructure->>FlowBlock: Apply selection logic and update condition
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @arvi18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request addresses an issue in the decompiler where the ruleBlockIfNoExit
function would eagerly select the first no-exit branch when collapsing blocks. The changes introduce heuristics to improve the readability of the decompiled code when both branches are no-exit. These heuristics prioritize smaller block depths, placing returns last, and selecting branches containing only a return opcode. The heuristics are applied in order, falling back to the original behavior if none match. The changes involve modifications to block.cc
, block.hh
, blockaction.cc
, blockaction.hh
, and op.hh
to implement these heuristics and calculate block depths.
Highlights
- Heuristics for No-Exit Branch Selection: Introduces three heuristics to improve the selection of no-exit branches in
ruleBlockIfNoExit
: smallest block depth first, return last, and only return. - Block Depth Calculation: Adds methods to calculate the depth of code blocks (
getBlockDepth
) in various block types (BlockGraph
,BlockList
,BlockCondition
,BlockSwitch
) to support the first heuristic. - Return Statement Detection: Adds
isStandardReturn
method toPcodeOp
to identify return operations for the second and third heuristics.
Changelog
Click here to see the changelog
- Ghidra/Features/Decompiler/src/decompile/cpp/block.cc
- Added
getInnerBlockDepth
method toBlockGraph
to calculate the maximum depth of child blocks (lines 1326-1337). - Added
getOpSize
method toBlockBasic
to return the number of PcodeOps contained in the block (lines 2522-2526). - Added
getBlockDepth
method toBlockList
to return the inner block depth (lines 2770-2775). - Added
getBlockDepth
method toBlockCondition
to return the inner block depth (lines 2861-2866). - Added
getBlockDepth
method toBlockSwitch
to calculate block depth based on case block depths (lines 3222-3232).
- Added
- Ghidra/Features/Decompiler/src/decompile/cpp/block.hh
- Added virtual
getBlockDepth
method toFlowBlock
(line 184). - Added
getInnerBlockDepth
andgetBlockDepth
methods toBlockGraph
(lines 303-304). - Added
getOpSize
method toBlockBasic
(line 407). - Added
getBlockDepth
method toBlockList
(line 508). - Added
getBlockDepth
method toBlockCondition
(line 538). - Added
getBlockDepth
method toBlockSwitch
(line 684).
- Added virtual
- Ghidra/Features/Decompiler/src/decompile/cpp/blockaction.cc
- Modified
ruleBlockIfNoExit
to incorporate heuristics for selecting the best no-exit branch (lines 1484-1488). - Added
selectBestNoExit
method to compare two no-exit branches based on block depth, return statements, and block content (lines 1504-1534).
- Modified
- Ghidra/Features/Decompiler/src/decompile/cpp/blockaction.hh
- Added
selectBestNoExit
declaration toCollapseStructure
class (line 220).
- Added
- Ghidra/Features/Decompiler/src/decompile/cpp/op.hh
- Added
isStandardReturn
method toPcodeOp
to check if the op is a standard return (line 180).
- Added
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
When 'if's paths both end,
A choice must now be made,
Depth, return, or less.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces heuristics to improve the readability of decompiled code when both branches of an if statement are noexit. The heuristics prioritize smaller block depth, placing returns last, and selecting branches with only return opcodes. The changes seem well-structured and the provided examples are helpful in understanding the intent. The before/after examples are very helpful to illustrate the impact of the changes.
Summary of Findings
- Missing null check: In
BlockBasic::getOpSize
, there is no check to ensure thatop
is not empty before callingop.size()
. This could lead to a crash if the block has no operations. While this might be an uncommon scenario, it's good to add a check for robustness. - Potential integer overflow: In
BlockSwitch::getBlockDepth
, the calculationmaxDepth + 2
could potentially lead to an integer overflow ifmaxDepth
is sufficiently large. While the likelihood of this occurring in practice might be low, it's worth considering adding a check to prevent this.
Merge Readiness
The pull request introduces valuable heuristics to improve decompiled code readability. While the changes are generally well-structured, there are a couple of potential issues identified that should be addressed before merging. I recommend addressing the critical
and high
severity issues before merging. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging.
int4 BlockBasic::getOpSize(void) | ||
|
||
{ | ||
return op.size(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no check to ensure that op
is not empty before calling op.size()
. This could lead to a crash if the block has no operations. While this might be an uncommon scenario, it's good to add a check for robustness.
int4 BlockBasic::getOpSize(void) | |
{ | |
return op.size(); | |
} | |
{ | |
if (op.empty()) { | |
return 0; // Or some other appropriate value | |
} | |
return op.size(); | |
} |
maxDepth=depth; | ||
} | ||
} | ||
return maxDepth+2; // +1 for switch block and +1 for case/default block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation maxDepth + 2
could potentially lead to an integer overflow if maxDepth
is sufficiently large. While the likelihood of this occurring in practice might be low, it's worth considering adding a check to prevent this.
return maxDepth+2; // +1 for switch block and +1 for case/default block | |
{ | |
if (maxDepth > INT_MAX - 2) { // INT_MAX is defined in <limits.h> | |
return INT_MAX; // Or handle the overflow in some other way | |
} | |
return maxDepth + 2; // +1 for switch block and +1 for case/default block | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
Ghidra/Features/Decompiler/src/decompile/cpp/blockaction.cc (1)
1464-1498
: Guard logic inruleBlockIfNoExit
is harder to reason aboutThe current two-pass approach (
bestIndex
initialised to-1
, then potentially overwritten in the second match) works but:
- Calls
selectBestNoExit()
even when only one branch qualifies, doing an unnecessary depth walk.- Relies on side-effects of the loop index to detect “second match”, which hampers readability.
A simpler, single-pass selection avoids both issues:
- for(i=0;i<2;++i) { - clauseblock = bl->getOut(i); - ... - if (bestIndex==-1){ - bestIndex=i; - }else{ // both match - bestIndex = selectBestNoExit(bl->getOut(0),bl->getOut(1)); - } - } + FlowBlock* candidates[2] = {nullptr,nullptr}; + int candCount = 0; + for(i=0;i<2;++i) { + clauseblock = bl->getOut(i); + if (!/* all the same qualification tests */) continue; + candidates[candCount++] = clauseblock; + } + + if (candCount == 0) return false; + bestIndex = (candCount == 1) + ? (candidates[0] == bl->getOut(0) ? 0 : 1) + : selectBestNoExit(bl->getOut(0), bl->getOut(1));This removes hidden coupling on
i
and prevents redundant work.Ghidra/Features/Decompiler/src/decompile/cpp/blockaction.hh (1)
220-221
: Minor: markselectBestNoExit
asstatic
The helper does not rely on
this
state; making itstatic
(or even moving it to an anonymous namespace inside the.cc
) clarifies that it is purely functional and avoids the accidental capture of member data in the future.- int4 selectBestNoExit(FlowBlock *clause0,FlowBlock *clause1); + static int4 selectBestNoExit(FlowBlock *clause0, FlowBlock *clause1);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Ghidra/Features/Decompiler/src/decompile/cpp/block.cc
(5 hunks)Ghidra/Features/Decompiler/src/decompile/cpp/block.hh
(6 hunks)Ghidra/Features/Decompiler/src/decompile/cpp/blockaction.cc
(2 hunks)Ghidra/Features/Decompiler/src/decompile/cpp/blockaction.hh
(1 hunks)Ghidra/Features/Decompiler/src/decompile/cpp/op.hh
(1 hunks)
🔇 Additional comments (6)
Ghidra/Features/Decompiler/src/decompile/cpp/block.cc (5)
1326-1338
: Implemented getInnerBlockDepth method to support block depth queryingThis new method calculates the maximum block depth across all sub-blocks in a BlockGraph, which is a key part of the improved heuristics for selecting between no-exit branches.
2522-2526
: Added getOpSize method to count operations in a basic blockSimple but useful accessor that returns the number of operations in a basic block, which will help in determining if a branch contains "only return" for the new heuristics.
2770-2775
: Added getBlockDepth implementation for BlockListThis implementation specifies that list blocks join blocks together but don't increase depth themselves, maintaining a flat hierarchy when appropriate.
2861-2866
: Added getBlockDepth implementation for BlockConditionSimilar to BlockList, this implementation specifies that condition blocks also join blocks together without increasing depth.
3222-3232
: Added getBlockDepth implementation for BlockSwitchThis implementation correctly accounts for the nesting hierarchy of switch constructs by adding 2 to the maximum depth found in case blocks - one for the switch block itself and one for the case/default blocks.
Ghidra/Features/Decompiler/src/decompile/cpp/op.hh (1)
179-180
: Well-implemented helper method for identifying standard returns.The new
isStandardReturn()
method clearly distinguishes standard return operations from other return-like operations with special halt conditions. This addition supports the "Return last" heuristic mentioned in the PR objectives, allowing the decompiler to identify when a branch ends with a standard return operation.
virtual int4 getInnerBlockDepth(); ///< Return max depth of child blocks | ||
virtual int4 getBlockDepth() {return getInnerBlockDepth()+1;} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Propagate const
/override
to all implementations
The newly added overrides in BlockGraph
, BlockList
, BlockCondition
, and BlockSwitch
follow the non-const
prototype.
After adjusting the base declaration, remember to update these:
-virtual int4 getInnerBlockDepth();
-int4 getBlockDepth() { return getInnerBlockDepth() + 1; }
+int4 getInnerBlockDepth() const;
+int4 getBlockDepth() const override { return getInnerBlockDepth() + 1; }
and similarly for the other three classes.
Failing to add override
can silently create a new unrelated virtual instead of overriding, should the signature drift in the future.
Also applies to: 508-509, 538-539, 684-685
virtual int4 getBlockDepth(void) {return 0;} ///< Return the depth in code block of \b this | ||
void saveXmlEdges(ostream &s) const; ///< Save edge information to an XML stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Declare getBlockDepth()
as const
and mark overrides explicitly
getBlockDepth()
is a query that does not mutate state, yet it’s non-const
in the base class and all overrides.
Keeping it non-const
prevents calling it on const FlowBlock
objects, forces const_cast
s elsewhere, and diverges from the surrounding accessors (e.g. getType()
).
- virtual int4 getBlockDepth(void) {return 0;} ///< Return the depth in code block of \b this
+ virtual int4 getBlockDepth(void) const { return 0; } ///< Return the depth in code block of \b this
Then, in every derived class add the const
qualifier and the override
specifier, e.g.
- virtual int4 getBlockDepth() { return getInnerBlockDepth() + 1; }
+ int4 getBlockDepth() const override { return getInnerBlockDepth() + 1; }
Besides catching accidental signature mismatches at compile-time, this makes the intent explicit and enables further optimisation by the compiler.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
virtual int4 getBlockDepth(void) {return 0;} ///< Return the depth in code block of \b this | |
void saveXmlEdges(ostream &s) const; ///< Save edge information to an XML stream | |
virtual int4 getBlockDepth(void) const { return 0; } ///< Return the depth in code block of \b this | |
void saveXmlEdges(ostream &s) const; ///< Save edge information to an XML stream |
int4 getOpSize(void); ///< Number of PcodeOps contained in \b this block | ||
static bool noInterveningStatement(PcodeOp *first,int4 path,PcodeOp *last); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
BlockBasic::getOpSize()
should be const
The helper only inspects the op
list; making it const
allows calling it on const-qualified FlowBlock
pointers (as done in selectBestNoExit
) without a const_cast
.
-int4 getOpSize(void);
+int4 getOpSize(void) const;
Remember to update the definition in block.cc
.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
int4 getOpSize(void); ///< Number of PcodeOps contained in \b this block | |
static bool noInterveningStatement(PcodeOp *first,int4 path,PcodeOp *last); | |
int4 getOpSize(void) const; ///< Number of PcodeOps contained in \b this block | |
static bool noInterveningStatement(PcodeOp *first,int4 path,PcodeOp *last); |
int4 CollapseStructure::selectBestNoExit(FlowBlock *clause0,FlowBlock *clause1) | ||
|
||
{ | ||
// select lowest block depth | ||
int4 depth0 = clause0->getBlockDepth(); | ||
int4 depth1 = clause1->getBlockDepth(); | ||
if (depth0<depth1)return 0; | ||
if (depth1<depth0)return 1; | ||
|
||
// same depth, prefer non return | ||
bool isRet0 = clause0->lastOp()!=(PcodeOp *)0 && clause0->lastOp()->isStandardReturn(); | ||
bool isRet1 = clause1->lastOp()!=(PcodeOp *)0 && clause1->lastOp()->isStandardReturn(); | ||
if(isRet0 && !isRet1) return 1; | ||
if(isRet1 && !isRet0) return 0; | ||
|
||
// prefer block containing only return op | ||
if(isRet0){ // both are return | ||
FlowBlock* fb; | ||
if(clause0->getType()==FlowBlock::t_copy){ | ||
fb = ((BlockCopy*)clause0)->subBlock(0); | ||
if(fb->getType()==FlowBlock::t_basic && ((BlockBasic*)fb)->getOpSize()==1) return 0; | ||
} | ||
if(clause1->getType()==FlowBlock::t_copy){ | ||
fb = ((BlockCopy*)clause1)->subBlock(0); | ||
if(fb->getType()==FlowBlock::t_basic && ((BlockBasic*)fb)->getOpSize()==1) return 1; | ||
} | ||
graph.newBlockIf(bl,clauseblock); | ||
return true; | ||
} | ||
return false; | ||
|
||
// fall back to previous behavior | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
selectBestNoExit
– unchecked casts and missing headers
static_cast<BlockCopy*>(clauseX)
is used without verifying thatclauseX
really is at_copy
. If an unforeseenFlowBlock::block_type
slips through, the cast is UB. Add a guard:
if (clause0->getType() == FlowBlock::t_copy) {
auto *cpy = static_cast<BlockCopy*>(clause0);
...
}
BlockCopy
/BlockBasic
are referenced butblock.hh
is not included inblockaction.cc
. Today this works only because a chain of includes drags the header in indirectly; add an explicit include to avoid accidental breakage:
#include "block.hh"
-
Consider early-returning when the depths differ to avoid computing return properties needlessly.
-
A nit:
int4 depth0 = ...
can under-flow ifgetBlockDepth()
ever returns-1
to signal “unknown”. Guard withassert(depth >= 0)
or treat-1
as “worst” to prevent negative comparison surprises.
At the moment, when collapsing block, the
ruleBlockIfNoExit
eagerly selects the first noexit branch it find. This merge request introduces some basic heuristics when both branch are noexit to improve the decompiled code's readability.heuristic 1 : smallest block depth first
This first heuristic compares the depth in term of generated code block and select the smallest depth.
This tries to flatten the code by preferring small sequential "if return" to nested if blocks.
heuristic 2 : return last
If only one of the branch will end with a
return
in the final code, place it last.This makes to code closer to what a human would write.
heuristic 3 : only return
If both branch end with a
return
in the final code and one contains only the return op code, select it.This also tries to flatten the code as a block containing only a return is most probably a fast logic skip.
These heuristics are tried in order (1 to 3) stopping at the first that matches. If none match, we fallback to the previous behavior (selecting the first branch) to only introduce wanted changes.
Here is the full decompiled code of libc before and after this merge request to compare the diffs on a large binary.
Summary by CodeRabbit
New Features
Improvements