Skip to content

Reduce far pointers to near in segmented code #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
133c1cf
Reduce far pointers to near in segmented code
GregoryMorse Jul 27, 2019
019daaa
Update coreaction.cc
GregoryMorse Jul 27, 2019
1d975bb
Update coreaction.cc
GregoryMorse Jul 27, 2019
40ae483
Update coreaction.cc
GregoryMorse Aug 3, 2019
b56d0bb
Update coreaction.cc
GregoryMorse Aug 4, 2019
4aea5d6
Update funcdata.hh
GregoryMorse Aug 4, 2019
ed765ca
Update funcdata_varnode.cc
GregoryMorse Aug 4, 2019
3298e4b
Update funcdata_varnode.cc
GregoryMorse Aug 4, 2019
c92b7ad
Update funcdata.hh
GregoryMorse Aug 4, 2019
df639b5
Update coreaction.cc
GregoryMorse Aug 4, 2019
a018e19
Update funcdata_varnode.cc
GregoryMorse Aug 4, 2019
7318b17
Update coreaction.cc
GregoryMorse Aug 4, 2019
97a372d
Update funcdata.hh
GregoryMorse Aug 5, 2019
d72a2ee
Update funcdata_varnode.cc
GregoryMorse Aug 5, 2019
0288c7b
Update coreaction.cc
GregoryMorse Aug 5, 2019
e1245cb
Update coreaction.cc
GregoryMorse Aug 5, 2019
deb5261
Update coreaction.cc
GregoryMorse Aug 5, 2019
3f039d9
Update coreaction.cc
GregoryMorse Aug 5, 2019
2775aa1
Update funcdata.hh
GregoryMorse Aug 6, 2019
2e4cc70
Update funcdata.cc
GregoryMorse Aug 6, 2019
cc7e98b
Update coreaction.cc
GregoryMorse Aug 6, 2019
6a1b8ce
Update funcdata.hh
GregoryMorse Aug 6, 2019
d7ab5f6
Update funcdata.cc
GregoryMorse Aug 6, 2019
72923b9
Update coreaction.cc
GregoryMorse Aug 6, 2019
1dde9ec
Update funcdata.hh
GregoryMorse Aug 6, 2019
f0eb342
Update funcdata.cc
GregoryMorse Aug 6, 2019
fb2c20f
Update coreaction.cc
GregoryMorse Aug 6, 2019
3934d71
Update coreaction.cc
GregoryMorse Aug 6, 2019
26d65c8
Update coreaction.cc
GregoryMorse Aug 6, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1223,6 +1223,7 @@ void ActionFuncLink::funcLinkInput(FuncCallSpecs *fc,Funcdata &data)
}
else
data.opInsertInput(op,data.newVarnode(param->getSize(),param->getAddress()),op->numInput());
data.segmentizeFarPtr(param->getType(), param->isTypeLocked(), op->getIn(op->numInput() - 1), false);
}
}
if (spacebase != (AddrSpace *)0) { // If we need it, create the stackplaceholder
Expand Down Expand Up @@ -1251,6 +1252,7 @@ void ActionFuncLink::funcLinkOutput(FuncCallSpecs *fc,Funcdata &data)
int4 sz = outparam->getSize();
Address addr = outparam->getAddress();
data.newVarnodeOut(sz,addr,fc->getOp());
data.segmentizeFarPtr(outparam->getType(), outparam->isTypeLocked(), fc->getOp()->getOut(), false);
VarnodeData vdata;
OpCode res = fc->assumedOutputExtension(addr,sz,vdata);
if (res == CPUI_PIECE) { // Pick an extension based on type
Expand Down Expand Up @@ -4282,6 +4284,23 @@ int4 ActionInferTypes::apply(Funcdata &data)
}
return 0;
}
if (localcount == 0) {
Datatype* ct;
Varnode* vn;
VarnodeLocSet::const_iterator iter;

for (iter = data.beginLoc(); iter != data.endLoc(); ++iter) {
vn = *iter;
if (vn->isAnnotation()) continue;
if ((!vn->isWritten()) && (vn->hasNoDescend())) continue;
ct = vn->getLocalType();
bool bBegin = false;
if (iter == data.beginLoc()) bBegin = true; else iter--;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug risk: In the ActionInferTypes::apply method, there's a potentially dangerous iterator manipulation with if (iter == data.beginLoc()) bBegin = true; else iter--;. This pattern of decrementing and incrementing iterators can lead to subtle bugs. Consider refactoring to a clearer approach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability: The variable name bBegin uses Hungarian notation which is inconsistent with the rest of the codebase. Consider renaming to something like isAtBeginning for clarity and consistency.

data.segmentizeFarPtr(ct, vn->isTypeLock(), vn, true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization: The iterator manipulation in the ActionInferTypes::apply method seems unnecessarily complex. Consider refactoring to avoid the need to decrement and increment iterators, which could improve both readability and performance.

if (bBegin) iter = data.beginLoc(); else iter++;
}
}
Comment on lines +4287 to +4302
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Iterator manipulation is unsafe and may skip / duplicate elements

Inside the newly-added initialisation block you mutate the iter variable inside the for-loop that also increments it.
Because VarnodeLocSet is an ordered container, decrementing and re-incrementing the iterator (iter-- / iter++) produces undefined behaviour and can lead to:

  • Infinite loops when iter == begin (--iter is UB).
  • Skipped elements (the ++iter in the for-header still executes).
  • Re-visiting the same element twice.

A safer pattern is to take a copy of the container up-front (or cache std::next(iter) before calling segmentizeFarPtr) and never touch the running iterator.

-  for (iter = data.beginLoc(); iter != data.endLoc(); ++iter) {
-      vn = *iter;
-      ...
-      bool bBegin = false;
-      if (iter == data.beginLoc()) bBegin = true; else iter--;
-      data.segmentizeFarPtr(ct, vn->isTypeLock(), vn, true);
-      if (bBegin) iter = data.beginLoc(); else iter++;
-  }
+  std::vector<Varnode*> snapshot;
+  snapshot.reserve(std::distance(data.beginLoc(), data.endLoc()));
+
+  for (auto it = data.beginLoc(); it != data.endLoc(); ++it) {
+      Varnode* vn = *it;
+      if (vn->isAnnotation()) continue;
+      if (!vn->isWritten() && vn->hasNoDescend()) continue;
+      snapshot.push_back(vn);
+  }
+
+  for (Varnode* vn : snapshot) {
+      data.segmentizeFarPtr(vn->getLocalType(), vn->isTypeLock(), vn, true);
+  }

This avoids iterator invalidation and keeps the logic simple and deterministic.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (localcount == 0) {
Datatype* ct;
Varnode* vn;
VarnodeLocSet::const_iterator iter;
for (iter = data.beginLoc(); iter != data.endLoc(); ++iter) {
vn = *iter;
if (vn->isAnnotation()) continue;
if ((!vn->isWritten()) && (vn->hasNoDescend())) continue;
ct = vn->getLocalType();
bool bBegin = false;
if (iter == data.beginLoc()) bBegin = true; else iter--;
data.segmentizeFarPtr(ct, vn->isTypeLock(), vn, true);
if (bBegin) iter = data.beginLoc(); else iter++;
}
}
if (localcount == 0) {
Datatype* ct;
Varnode* vn;
VarnodeLocSet::const_iterator iter;
// Build a stable snapshot of the Varnodes we need to process
std::vector<Varnode*> snapshot;
snapshot.reserve(std::distance(data.beginLoc(), data.endLoc()));
for (auto it = data.beginLoc(); it != data.endLoc(); ++it) {
Varnode* vn = *it;
if (vn->isAnnotation()) continue;
if (!vn->isWritten() && vn->hasNoDescend()) continue;
snapshot.push_back(vn);
}
// Now safely iterate the snapshot without touching the original iterator
for (Varnode* vn : snapshot) {
data.segmentizeFarPtr(vn->getLocalType(), vn->isTypeLock(), vn, true);
}
}


buildLocaltypes(data); // Set up initial types (based on local info)
for(iter=data.beginLoc();iter!=data.endLoc();++iter) {
vn = *iter;
Expand Down
65 changes: 65 additions & 0 deletions Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,71 @@ void Funcdata::spacebaseConstant(PcodeOp *op,int4 slot,SymbolEntry *entry,const
opSetInput(op,outvn,slot);
}

void Funcdata::segmentizeFarPtr(Datatype* ct, bool locked, Varnode* vn, bool segmentize)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: This new method segmentizeFarPtr is being called in multiple places, but it's not clear if there's any error handling or logging if the segmentation fails. Consider adding error handling or at least logging when segmentation isn't applicable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider adding a detailed comment block above this new method explaining its purpose and how it handles far pointers. This would help future maintainers understand the intent and behavior of this complex operation.

{
SegmentOp* segdef = (SegmentOp*)0;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
Comment on lines +348 to +350

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

It's crucial to check if ct is null before dereferencing it. If ct is null, ct->getMetatype() will cause a crash. Can you add a null check here?

Suggested change
SegmentOp* segdef = (SegmentOp*)0;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
SegmentOp* segdef = (SegmentOp*)0;
if (ct == nullptr) {
return; // Or handle the null case appropriately
}
if ((ct->getMetatype() == TYPE_PTR ||

Architecture* glb = getArch();
AddrSpace* rspc = glb->getDefaultSpace();
bool arenearpointers = glb->hasNearPointers(rspc);
if (arenearpointers && rspc->getAddrSize() == vn->getSize())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: The condition if (arenearpointers && rspc->getAddrSize() == vn->getSize()) might be inverted. If you're checking for far pointers, shouldn't you be looking for cases where !arenearpointers or where the sizes don't match?

segdef = glb->userops.getSegmentOp(rspc->getIndex());
}
//need to go through every single time this varnode is written to
Comment on lines +348 to +357
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Null-pointer check & type-safety missing

ct is dereferenced without validation; if a caller ever passes a null Datatype*, the decompiler will crash.
Similarly, you cast the result of glb->userops.getSegmentOp() to SegmentOp * without verifying it is non-null.

- SegmentOp* segdef = (SegmentOp*)0;
- if ((ct->getMetatype() == TYPE_PTR ||
+ if (ct == nullptr)            // defensive guard
+   return;
+
+ SegmentOp* segdef = nullptr;
+ if ((ct->getMetatype() == TYPE_PTR ||
       ct->getMetatype() == TYPE_CODE) && locked) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SegmentOp* segdef = (SegmentOp*)0;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
Architecture* glb = getArch();
AddrSpace* rspc = glb->getDefaultSpace();
bool arenearpointers = glb->hasNearPointers(rspc);
if (arenearpointers && rspc->getAddrSize() == vn->getSize())
segdef = glb->userops.getSegmentOp(rspc->getIndex());
}
//need to go through every single time this varnode is written to
if (ct == nullptr) // defensive guard
return;
SegmentOp* segdef = nullptr;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
Architecture* glb = getArch();
AddrSpace* rspc = glb->getDefaultSpace();
bool arenearpointers = glb->hasNearPointers(rspc);
if (arenearpointers && rspc->getAddrSize() == vn->getSize())
segdef = glb->userops.getSegmentOp(rspc->getIndex());
}
//need to go through every single time this varnode is written to

if (segdef != (SegmentOp*)0 && vn->getDef() != (PcodeOp*)0 && !vn->isPtrCheck() &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style suggestion: Consider breaking this complex conditional into multiple lines or separate checks for better readability. This long condition is difficult to parse at a glance.

vn->getDef()->code() != CPUI_SEGMENTOP && vn->getDef()->code() != CPUI_CALLOTHER) {
PcodeOp* segroot = vn->getDef();
Varnode* outvn = newUnique(vn->getSize());
opSetOutput(segroot, outvn);
PcodeOp* piece1 = newOp(2, segroot->getAddr());
opSetOpcode(piece1, CPUI_SUBPIECE);
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1);
opSetInput(piece1, outvn, 0);
opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1);
opInsertAfter(piece1, segroot);
PcodeOp* piece2 = newOp(2, segroot->getAddr());
opSetOpcode(piece2, CPUI_SUBPIECE);
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2);
opSetInput(piece2, outvn, 0);
opSetInput(piece2, newConstant(outvn->getSize(), 0), 1);
opInsertAfter(piece2, piece1);
Comment on lines +364 to +374
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

SUBPIECE constants use pointer-size; should use arch default (usually 4/addrSize 1)

The second input of CPUI_SUBPIECE is an offset constant whose value is important, not its size.
Creating it with outvn->getSize() inflates the varnode unnecessarily (e.g., 8-byte constant on 64-bit).

- opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1);
+ opSetInput(piece1,
+            newConstant(glb->getDefaultSize(), segdef->getInnerSize()), 1);
...
- opSetInput(piece2, newConstant(outvn->getSize(), 0), 1);
+ opSetInput(piece2,
+            newConstant(glb->getDefaultSize(), 0), 1);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
opSetOpcode(piece1, CPUI_SUBPIECE);
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1);
opSetInput(piece1, outvn, 0);
opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1);
opInsertAfter(piece1, segroot);
PcodeOp* piece2 = newOp(2, segroot->getAddr());
opSetOpcode(piece2, CPUI_SUBPIECE);
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2);
opSetInput(piece2, outvn, 0);
opSetInput(piece2, newConstant(outvn->getSize(), 0), 1);
opInsertAfter(piece2, piece1);
opSetOpcode(piece1, CPUI_SUBPIECE);
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1);
opSetInput(piece1, outvn, 0);
opSetInput(piece1,
newConstant(glb->getDefaultSize(), segdef->getInnerSize()), 1);
opInsertAfter(piece1, segroot);
PcodeOp* piece2 = newOp(2, segroot->getAddr());
opSetOpcode(piece2, CPUI_SUBPIECE);
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2);
opSetInput(piece2, outvn, 0);
opSetInput(piece2,
newConstant(glb->getDefaultSize(), 0), 1);
opInsertAfter(piece2, piece1);

PcodeOp* newop = newOp(3, segroot->getAddr());
opSetOutput(newop, vn);
opSetOpcode(newop, CPUI_CALLOTHER);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement: The comment //need to check size of segment suggests there should be a validation check here, but I don't see any actual validation being performed. Consider adding the validation logic or clarifying the comment.

//endianness could matter here - e.g. CALLF addr16 on x86 uses segment(*:2 (ptr+2),*:2 ptr)
opSetInput(newop, newConstant(4, segdef->getIndex()), 0);
opSetInput(newop, vn1, 1); //need to check size of segment
opSetInput(newop, vn2, 2); //need to check size of offset
opInsertAfter(newop, piece2);
vn->setPtrCheck();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue: The comment //FuncLinkInput/FuncLinkOutput come before segmentize... seems to be describing implementation details that might be better documented in a method-level comment. Consider moving this explanation to the method documentation.

outvn->setPtrCheck();

if (segmentize) {//FuncLinkInput/FuncLinkOutput come before segmentize, the rest require at least Spacebase for type locks
segroot = vn->getDef();
vector<Varnode*> bindlist;
bindlist.push_back((Varnode*)0);
bindlist.push_back((Varnode*)0);
if (!segdef->unify(*this, segroot, bindlist)) {
ostringstream err;
err << "Segment op in wrong form at ";
segroot->getAddr().printRaw(err);
throw LowlevelError(err.str());
}

if (segdef->getNumVariableTerms() == 1)
bindlist[1] = newConstant(4, 0);
// Redefine the op as a segmentop
opSetOpcode(segroot, CPUI_SEGMENTOP);
opSetInput(segroot, newVarnodeSpace(segdef->getSpace()), 0);
opSetInput(segroot, bindlist[1], 1);
opSetInput(segroot, bindlist[0], 2);
for (int4 j = segroot->numInput() - 1; j > 2; --j) // Remove anything else
opRemoveInput(segroot, j);
}
}
}
Comment on lines +346 to +409

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is quite complex and lacks detailed comments. Adding comments to explain the purpose and functionality of each code block would greatly improve its readability and maintainability. For example, what are the conditions under which the segmentize block is executed, and what does it accomplish?


void Funcdata::clearCallSpecs(void)

{
Expand Down
1 change: 1 addition & 0 deletions Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.hh
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@ public:
Varnode *newSpacebasePtr(AddrSpace *id); ///< Construct a new \e spacebase register for a given address space
Varnode *findSpacebaseInput(AddrSpace *id) const;
void spacebaseConstant(PcodeOp *op,int4 slot,SymbolEntry *entry,const Address &rampoint,uintb origval,int4 origsize);
void segmentizeFarPtr(Datatype* ct, bool locked, Varnode* vn, bool segmentize);

/// \brief Get the number of heritage passes performed for the given address space
///
Expand Down