-
Notifications
You must be signed in to change notification settings - Fork 0
Reduce far pointers to near in segmented code #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Proposal to convert far pointers to near pointers which are bound as far pointers in type parameters to functions for example which is otherwise not detected or properly translated. The casting time is a too late and not ideal time for this code. Possibly at function binding time would be better to detect it. However it is tested and working and reasonably generic using the default segment e.g. global RAM for its basis, checking the segment op, checking matching sizes, and finally that a near pointer already is present which can cause discarding of the segment and this awkward CONCAT structure that gets emitted in code. CALLOTHER or even SEGMENTOP is better but needs to be processed earlier or CONCAT just becomes SEGMENT in the code. Otherwise this direct simplification (which ruleaction segment reducer should deal with) also works. First proposal only. Sequences like this are now fixed: PUSH SS PUSH AX Fixes other half of NationalSecurityAgency#817
The real issue is this fix is too narrow. It needs to be applied everywhere a far data or function pointer is found - output of functions, local variables - especially these 2 cases would continue just dramatically enhancing the output. I also may have mixed up base and inner calculation - obviously for x86-16, 2 and 2 makes no difference. |
@caheckman PLEASE will/can you advise if this is okay (and correct) to still merge and hasn't been outdated by any subsequent changes you've made? |
@GregoryMorse if you don't mind, I have recently added this fix to my local repo and I'm finding that when an offset portion of a segmented address has been increased by 1 this change losses that detail. Eg, this assembly instruction
but after adding becomes
I wonder if you'd be willing to help me fix this? Edit: I think the common line ( |
WalkthroughThis change introduces a new method, Changes
Sequence Diagram(s)sequenceDiagram
participant Decompiler
participant Funcdata
participant Varnode
participant Architecture
Decompiler->>Funcdata: funcLinkInput/funcLinkOutput/apply()
Funcdata->>Funcdata: segmentizeFarPtr(type, locked, varnode, segmentize)
Funcdata->>Architecture: Get user-defined segment op (if needed)
Funcdata->>Varnode: Rewrite varnode as segmented pointer (if applicable)
Funcdata-->>Decompiler: Return (segmented pointer varnode or unchanged)
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @arvi18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request focuses on improving the handling of far pointers in segmented code within the Ghidra decompiler. The primary goal is to ensure proper conversion of far pointers to near pointers, especially when these pointers are used as type parameters in function definitions. This enhancement aims to fix issues where far pointers were not being correctly translated, leading to suboptimal code generation. The changes involve modifying how the decompiler processes function inputs and outputs, as well as adding a mechanism to segmentize far pointers during the type inference stage. The author indicates that this change results in a significant improvement in the quality of generated code for segmented architectures.
Highlights
- Far Pointer Conversion: The pull request introduces a new
segmentizeFarPtr
method inFuncdata
class to handle the conversion of far pointers to near pointers. This method is called during function linking (input and output parameters) and type inference. - Function Input/Output Handling: The
ActionFuncLink
class is modified to callsegmentizeFarPtr
for both input and output parameters of functions. This ensures that far pointers are correctly handled when functions are called. - Type Inference Enhancement: The
ActionInferTypes
class is updated to include a loop that iterates through varnodes and callssegmentizeFarPtr
on each varnode to ensure that far pointers are correctly handled during type inference. - SegmentOp Handling: The changes ensure that segment operators are correctly identified and handled, particularly in cases where near pointers are already present, preventing the discarding of segment information.
Changelog
- Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc
- Modified
ActionFuncLink::funcLinkInput
to calldata.segmentizeFarPtr
after inserting input varnodes. - Modified
ActionFuncLink::funcLinkOutput
to calldata.segmentizeFarPtr
after creating output varnodes. - Modified
ActionInferTypes::apply
to iterate through varnodes and calldata.segmentizeFarPtr
during type inference.
- Modified
- Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc
- Added a new method
Funcdata::segmentizeFarPtr
to handle the conversion of far pointers to near pointers. - The
segmentizeFarPtr
method checks if a varnode is a locked pointer or code type and then inserts SUBPIECE and CALLOTHER operations to properly segmentize the pointer if necessary. It also handles the conversion to a SEGMENTOP ifsegmentize
is true.
- Added a new method
- Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.hh
- Added the declaration of the
segmentizeFarPtr
method to theFuncdata
class.
- Added the declaration of the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A pointer's a guide,
Through memory's vast stride,
Far or near it be,
Correctness, we decree,
In code, where truths reside.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request aims to improve the handling of far pointers in segmented code by converting them to near pointers where appropriate. This should lead to better code generation and improved output quality. The changes seem well-structured and address a significant issue. However, some areas could benefit from additional clarification and refinement.
Summary of Findings
- Missing null check: In
Funcdata::segmentizeFarPtr
, the code does not check ifct
is null before dereferencing it withct->getMetatype()
. This could lead to a crash ifct
is ever null. - Potential infinite loop: In
ActionInferTypes::apply
, the iterator manipulation logic within thefor
loop could potentially lead to an infinite loop ifdata.beginLoc()
anddata.endLoc()
do not behave as expected or if the conditions for skipping Varnodes are not correctly handled. - Lack of comments: The
segmentizeFarPtr
function inFuncdata
lacks detailed comments explaining the purpose and functionality of each code block, making it difficult to understand the logic behind the pointer conversion process.
Merge Readiness
The pull request introduces significant improvements to segmented code handling. However, the potential null pointer dereference and potential infinite loop must be addressed before merging. Additionally, adding more comments to the segmentizeFarPtr
function would greatly improve its maintainability. I am unable to directly approve this pull request, and recommend that another reviewer also reviews this code. At a minimum, the critical and high severity issues should be addressed before merging.
SegmentOp* segdef = (SegmentOp*)0; | ||
if ((ct->getMetatype() == TYPE_PTR || | ||
ct->getMetatype() == TYPE_CODE) && locked) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's crucial to check if ct
is null before dereferencing it. If ct
is null, ct->getMetatype()
will cause a crash. Can you add a null check here?
SegmentOp* segdef = (SegmentOp*)0; | |
if ((ct->getMetatype() == TYPE_PTR || | |
ct->getMetatype() == TYPE_CODE) && locked) { | |
SegmentOp* segdef = (SegmentOp*)0; | |
if (ct == nullptr) { | |
return; // Or handle the null case appropriately | |
} | |
if ((ct->getMetatype() == TYPE_PTR || |
void Funcdata::segmentizeFarPtr(Datatype* ct, bool locked, Varnode* vn, bool segmentize) | ||
{ | ||
SegmentOp* segdef = (SegmentOp*)0; | ||
if ((ct->getMetatype() == TYPE_PTR || | ||
ct->getMetatype() == TYPE_CODE) && locked) { | ||
Architecture* glb = getArch(); | ||
AddrSpace* rspc = glb->getDefaultSpace(); | ||
bool arenearpointers = glb->hasNearPointers(rspc); | ||
if (arenearpointers && rspc->getAddrSize() == vn->getSize()) | ||
segdef = glb->userops.getSegmentOp(rspc->getIndex()); | ||
} | ||
//need to go through every single time this varnode is written to | ||
if (segdef != (SegmentOp*)0 && vn->getDef() != (PcodeOp*)0 && !vn->isPtrCheck() && | ||
vn->getDef()->code() != CPUI_SEGMENTOP && vn->getDef()->code() != CPUI_CALLOTHER) { | ||
PcodeOp* segroot = vn->getDef(); | ||
Varnode* outvn = newUnique(vn->getSize()); | ||
opSetOutput(segroot, outvn); | ||
PcodeOp* piece1 = newOp(2, segroot->getAddr()); | ||
opSetOpcode(piece1, CPUI_SUBPIECE); | ||
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1); | ||
opSetInput(piece1, outvn, 0); | ||
opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1); | ||
opInsertAfter(piece1, segroot); | ||
PcodeOp* piece2 = newOp(2, segroot->getAddr()); | ||
opSetOpcode(piece2, CPUI_SUBPIECE); | ||
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2); | ||
opSetInput(piece2, outvn, 0); | ||
opSetInput(piece2, newConstant(outvn->getSize(), 0), 1); | ||
opInsertAfter(piece2, piece1); | ||
PcodeOp* newop = newOp(3, segroot->getAddr()); | ||
opSetOutput(newop, vn); | ||
opSetOpcode(newop, CPUI_CALLOTHER); | ||
//endianness could matter here - e.g. CALLF addr16 on x86 uses segment(*:2 (ptr+2),*:2 ptr) | ||
opSetInput(newop, newConstant(4, segdef->getIndex()), 0); | ||
opSetInput(newop, vn1, 1); //need to check size of segment | ||
opSetInput(newop, vn2, 2); //need to check size of offset | ||
opInsertAfter(newop, piece2); | ||
vn->setPtrCheck(); | ||
outvn->setPtrCheck(); | ||
|
||
if (segmentize) {//FuncLinkInput/FuncLinkOutput come before segmentize, the rest require at least Spacebase for type locks | ||
segroot = vn->getDef(); | ||
vector<Varnode*> bindlist; | ||
bindlist.push_back((Varnode*)0); | ||
bindlist.push_back((Varnode*)0); | ||
if (!segdef->unify(*this, segroot, bindlist)) { | ||
ostringstream err; | ||
err << "Segment op in wrong form at "; | ||
segroot->getAddr().printRaw(err); | ||
throw LowlevelError(err.str()); | ||
} | ||
|
||
if (segdef->getNumVariableTerms() == 1) | ||
bindlist[1] = newConstant(4, 0); | ||
// Redefine the op as a segmentop | ||
opSetOpcode(segroot, CPUI_SEGMENTOP); | ||
opSetInput(segroot, newVarnodeSpace(segdef->getSpace()), 0); | ||
opSetInput(segroot, bindlist[1], 1); | ||
opSetInput(segroot, bindlist[0], 2); | ||
for (int4 j = segroot->numInput() - 1; j > 2; --j) // Remove anything else | ||
opRemoveInput(segroot, j); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is quite complex and lacks detailed comments. Adding comments to explain the purpose and functionality of each code block would greatly improve its readability and maintainability. For example, what are the conditions under which the segmentize
block is executed, and what does it accomplish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.hh (1)
199-199
: Add documentation & guard clause for clarity and robustness
segmentizeFarPtr
is a non-trivial public utility, but it appears without any explanatory Doxygen and expectsct
to be non-null. A short description of its intent/params/side–effects and an early-return forct == nullptr
would make the interface safer and self-documenting.Example:
+ /// Transform a locked far pointer `vn` into an architecture-specific SEGMENTOP + /// representation (optionally unifying with the segment op template when + /// `segmentize == true`). + /// \param ct Datatype associated with \p vn (must be a pointer/code) + /// \param locked True if the type was locked by the prototype + /// \param vn The varnode to be rewritten + /// \param segmentize Perform SEGMENTOP unification (false for early-binding use-cases) + void segmentizeFarPtr(Datatype *ct,bool locked,Varnode *vn,bool segmentize);Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc (2)
379-381
: Hard-coded 4-byte immediate may break on non-32-bit targets
newConstant(4, segdef->getIndex())
assumes the user-op index fits in 4 bytes and that 4 is a legal constant size for every architecture.Use the default integer size or
glb->getDefaultSize()
to stay portable:- opSetInput(newop, newConstant(4, segdef->getIndex()), 0); + opSetInput(newop, + newConstant(glb->getDefaultSize(), segdef->getIndex()), 0);
398-406
: Input list cleanup leaves orphaned constant varnodeAfter converting the
CALLOTHER
intoCPUI_SEGMENTOP
, the original constant index (input 0) is overwritten, but the varnode itself remains live until later dead-code elimination.
Not fatal, but you can remove it immediately for a cleaner IR.// After opSetInput(...) calls if (bindlist[0] != nullptr) opRemoveInput(segroot, 3); // remove stale constant (was old input 0)Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc (1)
1224-1227
: Null-safety / defensive check missing beforesegmentizeFarPtr()
param->getType()
can legally benullptr
for K&R style or otherwise incomplete prototypes.
segmentizeFarPtr()
eventually dereferences theDatatype*
, so a guard will avoid a possible crash when the prototype information is incomplete:- data.segmentizeFarPtr(param->getType(), param->isTypeLocked(), op->getIn(op->numInput() - 1), false); + if (Datatype *pt = param->getType()) { + data.segmentizeFarPtr(pt, param->isTypeLocked(), + op->getIn(op->numInput() - 1), false); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc
(3 hunks)Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc
(1 hunks)Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.hh
(1 hunks)
🔇 Additional comments (2)
Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc (1)
361-368
: Potential data-flow inconsistency when rewriting the defining op
segroot
’s output is replaced with a freshoutvn
, but there is no assertion that
vn == segroot->getOut()
.
If earlier optimisation already changed the def-use chain, the rewrite could silently corrupt SSA form.Please add an explicit check:
if (segroot->getOut() != vn) throw LowlevelError("segmentizeFarPtr: unexpected varnode/output mismatch");Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc (1)
1253-1256
: Output-side invocation looks correct 👍The same helper is now applied to the return-value Varnode at the right moment (immediately after it is created), so the near/far adjustment happens before any later passes read the node.
No issues spotted here.
opSetOpcode(piece1, CPUI_SUBPIECE); | ||
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1); | ||
opSetInput(piece1, outvn, 0); | ||
opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1); | ||
opInsertAfter(piece1, segroot); | ||
PcodeOp* piece2 = newOp(2, segroot->getAddr()); | ||
opSetOpcode(piece2, CPUI_SUBPIECE); | ||
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2); | ||
opSetInput(piece2, outvn, 0); | ||
opSetInput(piece2, newConstant(outvn->getSize(), 0), 1); | ||
opInsertAfter(piece2, piece1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
SUBPIECE constants use pointer-size; should use arch default (usually 4/addrSize 1)
The second input of CPUI_SUBPIECE
is an offset constant whose value is important, not its size.
Creating it with outvn->getSize()
inflates the varnode unnecessarily (e.g., 8-byte constant on 64-bit).
- opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1);
+ opSetInput(piece1,
+ newConstant(glb->getDefaultSize(), segdef->getInnerSize()), 1);
...
- opSetInput(piece2, newConstant(outvn->getSize(), 0), 1);
+ opSetInput(piece2,
+ newConstant(glb->getDefaultSize(), 0), 1);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
opSetOpcode(piece1, CPUI_SUBPIECE); | |
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1); | |
opSetInput(piece1, outvn, 0); | |
opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1); | |
opInsertAfter(piece1, segroot); | |
PcodeOp* piece2 = newOp(2, segroot->getAddr()); | |
opSetOpcode(piece2, CPUI_SUBPIECE); | |
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2); | |
opSetInput(piece2, outvn, 0); | |
opSetInput(piece2, newConstant(outvn->getSize(), 0), 1); | |
opInsertAfter(piece2, piece1); | |
opSetOpcode(piece1, CPUI_SUBPIECE); | |
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1); | |
opSetInput(piece1, outvn, 0); | |
opSetInput(piece1, | |
newConstant(glb->getDefaultSize(), segdef->getInnerSize()), 1); | |
opInsertAfter(piece1, segroot); | |
PcodeOp* piece2 = newOp(2, segroot->getAddr()); | |
opSetOpcode(piece2, CPUI_SUBPIECE); | |
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2); | |
opSetInput(piece2, outvn, 0); | |
opSetInput(piece2, | |
newConstant(glb->getDefaultSize(), 0), 1); | |
opInsertAfter(piece2, piece1); |
SegmentOp* segdef = (SegmentOp*)0; | ||
if ((ct->getMetatype() == TYPE_PTR || | ||
ct->getMetatype() == TYPE_CODE) && locked) { | ||
Architecture* glb = getArch(); | ||
AddrSpace* rspc = glb->getDefaultSpace(); | ||
bool arenearpointers = glb->hasNearPointers(rspc); | ||
if (arenearpointers && rspc->getAddrSize() == vn->getSize()) | ||
segdef = glb->userops.getSegmentOp(rspc->getIndex()); | ||
} | ||
//need to go through every single time this varnode is written to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Null-pointer check & type-safety missing
ct
is dereferenced without validation; if a caller ever passes a null Datatype*
, the decompiler will crash.
Similarly, you cast the result of glb->userops.getSegmentOp()
to SegmentOp *
without verifying it is non-null.
- SegmentOp* segdef = (SegmentOp*)0;
- if ((ct->getMetatype() == TYPE_PTR ||
+ if (ct == nullptr) // defensive guard
+ return;
+
+ SegmentOp* segdef = nullptr;
+ if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
SegmentOp* segdef = (SegmentOp*)0; | |
if ((ct->getMetatype() == TYPE_PTR || | |
ct->getMetatype() == TYPE_CODE) && locked) { | |
Architecture* glb = getArch(); | |
AddrSpace* rspc = glb->getDefaultSpace(); | |
bool arenearpointers = glb->hasNearPointers(rspc); | |
if (arenearpointers && rspc->getAddrSize() == vn->getSize()) | |
segdef = glb->userops.getSegmentOp(rspc->getIndex()); | |
} | |
//need to go through every single time this varnode is written to | |
if (ct == nullptr) // defensive guard | |
return; | |
SegmentOp* segdef = nullptr; | |
if ((ct->getMetatype() == TYPE_PTR || | |
ct->getMetatype() == TYPE_CODE) && locked) { | |
Architecture* glb = getArch(); | |
AddrSpace* rspc = glb->getDefaultSpace(); | |
bool arenearpointers = glb->hasNearPointers(rspc); | |
if (arenearpointers && rspc->getAddrSize() == vn->getSize()) | |
segdef = glb->userops.getSegmentOp(rspc->getIndex()); | |
} | |
//need to go through every single time this varnode is written to |
if (localcount == 0) { | ||
Datatype* ct; | ||
Varnode* vn; | ||
VarnodeLocSet::const_iterator iter; | ||
|
||
for (iter = data.beginLoc(); iter != data.endLoc(); ++iter) { | ||
vn = *iter; | ||
if (vn->isAnnotation()) continue; | ||
if ((!vn->isWritten()) && (vn->hasNoDescend())) continue; | ||
ct = vn->getLocalType(); | ||
bool bBegin = false; | ||
if (iter == data.beginLoc()) bBegin = true; else iter--; | ||
data.segmentizeFarPtr(ct, vn->isTypeLock(), vn, true); | ||
if (bBegin) iter = data.beginLoc(); else iter++; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Iterator manipulation is unsafe and may skip / duplicate elements
Inside the newly-added initialisation block you mutate the iter
variable inside the for-loop that also increments it.
Because VarnodeLocSet
is an ordered container, decrementing and re-incrementing the iterator (iter-- / iter++
) produces undefined behaviour and can lead to:
- Infinite loops when
iter == begin
(--iter
is UB). - Skipped elements (the
++iter
in the for-header still executes). - Re-visiting the same element twice.
A safer pattern is to take a copy of the container up-front (or cache std::next(iter)
before calling segmentizeFarPtr
) and never touch the running iterator.
- for (iter = data.beginLoc(); iter != data.endLoc(); ++iter) {
- vn = *iter;
- ...
- bool bBegin = false;
- if (iter == data.beginLoc()) bBegin = true; else iter--;
- data.segmentizeFarPtr(ct, vn->isTypeLock(), vn, true);
- if (bBegin) iter = data.beginLoc(); else iter++;
- }
+ std::vector<Varnode*> snapshot;
+ snapshot.reserve(std::distance(data.beginLoc(), data.endLoc()));
+
+ for (auto it = data.beginLoc(); it != data.endLoc(); ++it) {
+ Varnode* vn = *it;
+ if (vn->isAnnotation()) continue;
+ if (!vn->isWritten() && vn->hasNoDescend()) continue;
+ snapshot.push_back(vn);
+ }
+
+ for (Varnode* vn : snapshot) {
+ data.segmentizeFarPtr(vn->getLocalType(), vn->isTypeLock(), vn, true);
+ }
This avoids iterator invalidation and keeps the logic simple and deterministic.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (localcount == 0) { | |
Datatype* ct; | |
Varnode* vn; | |
VarnodeLocSet::const_iterator iter; | |
for (iter = data.beginLoc(); iter != data.endLoc(); ++iter) { | |
vn = *iter; | |
if (vn->isAnnotation()) continue; | |
if ((!vn->isWritten()) && (vn->hasNoDescend())) continue; | |
ct = vn->getLocalType(); | |
bool bBegin = false; | |
if (iter == data.beginLoc()) bBegin = true; else iter--; | |
data.segmentizeFarPtr(ct, vn->isTypeLock(), vn, true); | |
if (bBegin) iter = data.beginLoc(); else iter++; | |
} | |
} | |
if (localcount == 0) { | |
Datatype* ct; | |
Varnode* vn; | |
VarnodeLocSet::const_iterator iter; | |
// Build a stable snapshot of the Varnodes we need to process | |
std::vector<Varnode*> snapshot; | |
snapshot.reserve(std::distance(data.beginLoc(), data.endLoc())); | |
for (auto it = data.beginLoc(); it != data.endLoc(); ++it) { | |
Varnode* vn = *it; | |
if (vn->isAnnotation()) continue; | |
if (!vn->isWritten() && vn->hasNoDescend()) continue; | |
snapshot.push_back(vn); | |
} | |
// Now safely iterate the snapshot without touching the original iterator | |
for (Varnode* vn : snapshot) { | |
data.segmentizeFarPtr(vn->getLocalType(), vn->isTypeLock(), vn, true); | |
} | |
} |
This will make proper conversion far pointers to near pointers which are bound as far pointers in type parameters to functions for example which is otherwise not detected or properly translated.
The casting time is a too late and not ideal time for this code. At function stack binding time is perfect as everything is present to detect it. However it is tested and working and reasonably generic using the default segment e.g. global RAM for its basis, checking the segment op, checking matching sizes, and finally that a near pointer already is present which can cause discarding of the segment and this awkward CONCAT structure that gets emitted in code.
CALLOTHER or (not SEGMENTOP which needs the appropriate resolver to be chosen and in such a generic case analysis needed) is better but needs to be processed earlier or CONCAT just becomes SEGMENT in the code. Otherwise this direct simplification (which ruleaction segment reducer should deal with) also works.
Sequences like this are now fixed:
PUSH SS
PUSH AX
Fixes other half of NationalSecurityAgency#817
This fixes a great deal of cases and is one of the greatest output quality improvements in segmented code to date. This is the rigorous and serious attempt at a generalized fix which appears to now work correctly and is versatile.
Summary by CodeRabbit