Skip to content

Reduce far pointers to near in segmented code #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

arvi18
Copy link

@arvi18 arvi18 commented Apr 26, 2025

This will make proper conversion far pointers to near pointers which are bound as far pointers in type parameters to functions for example which is otherwise not detected or properly translated.

The casting time is a too late and not ideal time for this code. At function stack binding time is perfect as everything is present to detect it. However it is tested and working and reasonably generic using the default segment e.g. global RAM for its basis, checking the segment op, checking matching sizes, and finally that a near pointer already is present which can cause discarding of the segment and this awkward CONCAT structure that gets emitted in code.

CALLOTHER or (not SEGMENTOP which needs the appropriate resolver to be chosen and in such a generic case analysis needed) is better but needs to be processed earlier or CONCAT just becomes SEGMENT in the code. Otherwise this direct simplification (which ruleaction segment reducer should deal with) also works.

Sequences like this are now fixed:
PUSH SS
PUSH AX

Fixes other half of NationalSecurityAgency#817

This fixes a great deal of cases and is one of the greatest output quality improvements in segmented code to date. This is the rigorous and serious attempt at a generalized fix which appears to now work correctly and is versatile.

Summary by CodeRabbit

  • New Features
    • Improved support for far pointer segment information during decompilation, enhancing handling of function parameters, return values, and local variables that use far pointers.
  • Bug Fixes
    • Ensured accurate propagation and adjustment of segment information for far pointers, reducing potential issues in type inference and pointer representation.

Proposal to convert far pointers to near pointers which are bound as far pointers in type parameters to functions for example which is otherwise not detected or properly translated.

The casting time is a too late and not ideal time for this code.  Possibly at function binding time would be better to detect it.  However it is tested and working and reasonably generic using the default segment e.g. global RAM for its basis, checking the segment op, checking matching sizes, and finally that a near pointer already is present which can cause discarding of the segment and this awkward CONCAT structure that gets emitted in code.

CALLOTHER or even SEGMENTOP is better but needs to be processed earlier or CONCAT just becomes SEGMENT in the code.  Otherwise this direct simplification (which ruleaction segment reducer should deal with) also works.  First proposal only.

Sequences like this are now fixed:
PUSH SS
PUSH AX

Fixes other half of NationalSecurityAgency#817
@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

The real issue is this fix is too narrow. It needs to be applied everywhere a far data or function pointer is found - output of functions, local variables - especially these 2 cases would continue just dramatically enhancing the output.

I also may have mixed up base and inner calculation - obviously for x86-16, 2 and 2 makes no difference.

@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

@caheckman PLEASE will/can you advise if this is okay (and correct) to still merge and hasn't been outdated by any subsequent changes you've made?

@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

@GregoryMorse if you don't mind, I have recently added this fix to my local repo and I'm finding that when an offset portion of a segmented address has been increased by 1 this change losses that detail. Eg, this assembly instruction INC word ptr SS:[BP + pData] orginally produces

   pbVar3 = (byte *)pData;
   pData = (byte *)CONCAT22(pData._2_2_,(byte *)pData + 1);

but after adding becomes

   pbVar3 = (byte *)pData;
   pbVar4 = pData._2_2_;
   pData = pData._2_2_;

I wonder if you'd be willing to help me fix this?


Edit: I think the common line (pbVar3 = (byte *)pData;) is a red herring and can be ignored. I realise now that pbVar3 is used a few instructions later in other assignments. This does not negate the issue though.

Copy link

coderabbitai bot commented Apr 26, 2025

Walkthrough

This change introduces a new method, segmentizeFarPtr, to the Funcdata class and integrates its invocation at key points in the decompiler's type inference and function linking processes. The method is called after inserting input and output Varnodes for function parameters and return values, as well as during the initial phase of type inference for local Varnodes. The segmentizeFarPtr method rewrites relevant Varnodes to represent far pointers using architecture-specific segment operations, ensuring proper handling of segmented pointer representations throughout the decompilation process.

Changes

File(s) Change Summary
Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.hh Added public method declaration: void segmentizeFarPtr(Datatype* ct, bool locked, Varnode* vn, bool segmentize) to Funcdata.
Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc Implemented Funcdata::segmentizeFarPtr, which rewrites varnodes to segmented pointer form using architecture-specific operations.
Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc Invoked segmentizeFarPtr after adding function input/output varnodes and at the start of type inference for local varnodes.

Sequence Diagram(s)

sequenceDiagram
    participant Decompiler
    participant Funcdata
    participant Varnode
    participant Architecture

    Decompiler->>Funcdata: funcLinkInput/funcLinkOutput/apply()
    Funcdata->>Funcdata: segmentizeFarPtr(type, locked, varnode, segmentize)
    Funcdata->>Architecture: Get user-defined segment op (if needed)
    Funcdata->>Varnode: Rewrite varnode as segmented pointer (if applicable)
    Funcdata-->>Decompiler: Return (segmented pointer varnode or unchanged)
Loading

Poem

In the warren of code, a hop and a leap,
Far pointers now segment, no secrets they keep.
With segmentize magic, the Varnodes align,
Each function and type gets a segment by design.
The decompiler’s dance is now sharper and neat—
Hopping through memory with nimble, quick feet!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @arvi18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request focuses on improving the handling of far pointers in segmented code within the Ghidra decompiler. The primary goal is to ensure proper conversion of far pointers to near pointers, especially when these pointers are used as type parameters in function definitions. This enhancement aims to fix issues where far pointers were not being correctly translated, leading to suboptimal code generation. The changes involve modifying how the decompiler processes function inputs and outputs, as well as adding a mechanism to segmentize far pointers during the type inference stage. The author indicates that this change results in a significant improvement in the quality of generated code for segmented architectures.

Highlights

  • Far Pointer Conversion: The pull request introduces a new segmentizeFarPtr method in Funcdata class to handle the conversion of far pointers to near pointers. This method is called during function linking (input and output parameters) and type inference.
  • Function Input/Output Handling: The ActionFuncLink class is modified to call segmentizeFarPtr for both input and output parameters of functions. This ensures that far pointers are correctly handled when functions are called.
  • Type Inference Enhancement: The ActionInferTypes class is updated to include a loop that iterates through varnodes and calls segmentizeFarPtr on each varnode to ensure that far pointers are correctly handled during type inference.
  • SegmentOp Handling: The changes ensure that segment operators are correctly identified and handled, particularly in cases where near pointers are already present, preventing the discarding of segment information.

Changelog

  • Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc
    • Modified ActionFuncLink::funcLinkInput to call data.segmentizeFarPtr after inserting input varnodes.
    • Modified ActionFuncLink::funcLinkOutput to call data.segmentizeFarPtr after creating output varnodes.
    • Modified ActionInferTypes::apply to iterate through varnodes and call data.segmentizeFarPtr during type inference.
  • Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc
    • Added a new method Funcdata::segmentizeFarPtr to handle the conversion of far pointers to near pointers.
    • The segmentizeFarPtr method checks if a varnode is a locked pointer or code type and then inserts SUBPIECE and CALLOTHER operations to properly segmentize the pointer if necessary. It also handles the conversion to a SEGMENTOP if segmentize is true.
  • Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.hh
    • Added the declaration of the segmentizeFarPtr method to the Funcdata class.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A pointer's a guide,
Through memory's vast stride,
Far or near it be,
Correctness, we decree,
In code, where truths reside.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request aims to improve the handling of far pointers in segmented code by converting them to near pointers where appropriate. This should lead to better code generation and improved output quality. The changes seem well-structured and address a significant issue. However, some areas could benefit from additional clarification and refinement.

Summary of Findings

  • Missing null check: In Funcdata::segmentizeFarPtr, the code does not check if ct is null before dereferencing it with ct->getMetatype(). This could lead to a crash if ct is ever null.
  • Potential infinite loop: In ActionInferTypes::apply, the iterator manipulation logic within the for loop could potentially lead to an infinite loop if data.beginLoc() and data.endLoc() do not behave as expected or if the conditions for skipping Varnodes are not correctly handled.
  • Lack of comments: The segmentizeFarPtr function in Funcdata lacks detailed comments explaining the purpose and functionality of each code block, making it difficult to understand the logic behind the pointer conversion process.

Merge Readiness

The pull request introduces significant improvements to segmented code handling. However, the potential null pointer dereference and potential infinite loop must be addressed before merging. Additionally, adding more comments to the segmentizeFarPtr function would greatly improve its maintainability. I am unable to directly approve this pull request, and recommend that another reviewer also reviews this code. At a minimum, the critical and high severity issues should be addressed before merging.

Comment on lines +348 to +350
SegmentOp* segdef = (SegmentOp*)0;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

It's crucial to check if ct is null before dereferencing it. If ct is null, ct->getMetatype() will cause a crash. Can you add a null check here?

Suggested change
SegmentOp* segdef = (SegmentOp*)0;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
SegmentOp* segdef = (SegmentOp*)0;
if (ct == nullptr) {
return; // Or handle the null case appropriately
}
if ((ct->getMetatype() == TYPE_PTR ||

Comment on lines +346 to +409
void Funcdata::segmentizeFarPtr(Datatype* ct, bool locked, Varnode* vn, bool segmentize)
{
SegmentOp* segdef = (SegmentOp*)0;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
Architecture* glb = getArch();
AddrSpace* rspc = glb->getDefaultSpace();
bool arenearpointers = glb->hasNearPointers(rspc);
if (arenearpointers && rspc->getAddrSize() == vn->getSize())
segdef = glb->userops.getSegmentOp(rspc->getIndex());
}
//need to go through every single time this varnode is written to
if (segdef != (SegmentOp*)0 && vn->getDef() != (PcodeOp*)0 && !vn->isPtrCheck() &&
vn->getDef()->code() != CPUI_SEGMENTOP && vn->getDef()->code() != CPUI_CALLOTHER) {
PcodeOp* segroot = vn->getDef();
Varnode* outvn = newUnique(vn->getSize());
opSetOutput(segroot, outvn);
PcodeOp* piece1 = newOp(2, segroot->getAddr());
opSetOpcode(piece1, CPUI_SUBPIECE);
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1);
opSetInput(piece1, outvn, 0);
opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1);
opInsertAfter(piece1, segroot);
PcodeOp* piece2 = newOp(2, segroot->getAddr());
opSetOpcode(piece2, CPUI_SUBPIECE);
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2);
opSetInput(piece2, outvn, 0);
opSetInput(piece2, newConstant(outvn->getSize(), 0), 1);
opInsertAfter(piece2, piece1);
PcodeOp* newop = newOp(3, segroot->getAddr());
opSetOutput(newop, vn);
opSetOpcode(newop, CPUI_CALLOTHER);
//endianness could matter here - e.g. CALLF addr16 on x86 uses segment(*:2 (ptr+2),*:2 ptr)
opSetInput(newop, newConstant(4, segdef->getIndex()), 0);
opSetInput(newop, vn1, 1); //need to check size of segment
opSetInput(newop, vn2, 2); //need to check size of offset
opInsertAfter(newop, piece2);
vn->setPtrCheck();
outvn->setPtrCheck();

if (segmentize) {//FuncLinkInput/FuncLinkOutput come before segmentize, the rest require at least Spacebase for type locks
segroot = vn->getDef();
vector<Varnode*> bindlist;
bindlist.push_back((Varnode*)0);
bindlist.push_back((Varnode*)0);
if (!segdef->unify(*this, segroot, bindlist)) {
ostringstream err;
err << "Segment op in wrong form at ";
segroot->getAddr().printRaw(err);
throw LowlevelError(err.str());
}

if (segdef->getNumVariableTerms() == 1)
bindlist[1] = newConstant(4, 0);
// Redefine the op as a segmentop
opSetOpcode(segroot, CPUI_SEGMENTOP);
opSetInput(segroot, newVarnodeSpace(segdef->getSpace()), 0);
opSetInput(segroot, bindlist[1], 1);
opSetInput(segroot, bindlist[0], 2);
for (int4 j = segroot->numInput() - 1; j > 2; --j) // Remove anything else
opRemoveInput(segroot, j);
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is quite complex and lacks detailed comments. Adding comments to explain the purpose and functionality of each code block would greatly improve its readability and maintainability. For example, what are the conditions under which the segmentize block is executed, and what does it accomplish?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.hh (1)

199-199: Add documentation & guard clause for clarity and robustness

segmentizeFarPtr is a non-trivial public utility, but it appears without any explanatory Doxygen and expects ct to be non-null. A short description of its intent/params/side–effects and an early-return for ct == nullptr would make the interface safer and self-documenting.

Example:

+  /// Transform a locked far pointer `vn` into an architecture-specific SEGMENTOP
+  /// representation (optionally unifying with the segment op template when
+  /// `segmentize == true`).
+  /// \param ct         Datatype associated with \p vn (must be a pointer/code)
+  /// \param locked     True if the type was locked by the prototype
+  /// \param vn         The varnode to be rewritten
+  /// \param segmentize Perform SEGMENTOP unification (false for early-binding use-cases)
+  void segmentizeFarPtr(Datatype *ct,bool locked,Varnode *vn,bool segmentize);
Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc (2)

379-381: Hard-coded 4-byte immediate may break on non-32-bit targets

newConstant(4, segdef->getIndex()) assumes the user-op index fits in 4 bytes and that 4 is a legal constant size for every architecture.

Use the default integer size or glb->getDefaultSize() to stay portable:

- opSetInput(newop, newConstant(4, segdef->getIndex()), 0);
+ opSetInput(newop,
+            newConstant(glb->getDefaultSize(), segdef->getIndex()), 0);

398-406: Input list cleanup leaves orphaned constant varnode

After converting the CALLOTHER into CPUI_SEGMENTOP, the original constant index (input 0) is overwritten, but the varnode itself remains live until later dead-code elimination.
Not fatal, but you can remove it immediately for a cleaner IR.

// After opSetInput(...) calls
if (bindlist[0] != nullptr)
    opRemoveInput(segroot, 3);   // remove stale constant (was old input 0)
Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc (1)

1224-1227: Null-safety / defensive check missing before segmentizeFarPtr()

param->getType() can legally be nullptr for K&R style or otherwise incomplete prototypes.
segmentizeFarPtr() eventually dereferences the Datatype*, so a guard will avoid a possible crash when the prototype information is incomplete:

-      data.segmentizeFarPtr(param->getType(), param->isTypeLocked(), op->getIn(op->numInput() - 1), false);
+      if (Datatype *pt = param->getType()) {
+        data.segmentizeFarPtr(pt, param->isTypeLocked(),
+                              op->getIn(op->numInput() - 1), false);
+      }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2abfd4e and 26d65c8.

📒 Files selected for processing (3)
  • Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc (3 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc (1 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.hh (1 hunks)
🔇 Additional comments (2)
Ghidra/Features/Decompiler/src/decompile/cpp/funcdata.cc (1)

361-368: Potential data-flow inconsistency when rewriting the defining op

segroot’s output is replaced with a fresh outvn, but there is no assertion that
vn == segroot->getOut().
If earlier optimisation already changed the def-use chain, the rewrite could silently corrupt SSA form.

Please add an explicit check:

if (segroot->getOut() != vn)
  throw LowlevelError("segmentizeFarPtr: unexpected varnode/output mismatch");
Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.cc (1)

1253-1256: Output-side invocation looks correct 👍

The same helper is now applied to the return-value Varnode at the right moment (immediately after it is created), so the near/far adjustment happens before any later passes read the node.
No issues spotted here.

Comment on lines +364 to +374
opSetOpcode(piece1, CPUI_SUBPIECE);
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1);
opSetInput(piece1, outvn, 0);
opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1);
opInsertAfter(piece1, segroot);
PcodeOp* piece2 = newOp(2, segroot->getAddr());
opSetOpcode(piece2, CPUI_SUBPIECE);
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2);
opSetInput(piece2, outvn, 0);
opSetInput(piece2, newConstant(outvn->getSize(), 0), 1);
opInsertAfter(piece2, piece1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

SUBPIECE constants use pointer-size; should use arch default (usually 4/addrSize 1)

The second input of CPUI_SUBPIECE is an offset constant whose value is important, not its size.
Creating it with outvn->getSize() inflates the varnode unnecessarily (e.g., 8-byte constant on 64-bit).

- opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1);
+ opSetInput(piece1,
+            newConstant(glb->getDefaultSize(), segdef->getInnerSize()), 1);
...
- opSetInput(piece2, newConstant(outvn->getSize(), 0), 1);
+ opSetInput(piece2,
+            newConstant(glb->getDefaultSize(), 0), 1);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
opSetOpcode(piece1, CPUI_SUBPIECE);
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1);
opSetInput(piece1, outvn, 0);
opSetInput(piece1, newConstant(outvn->getSize(), segdef->getInnerSize()), 1);
opInsertAfter(piece1, segroot);
PcodeOp* piece2 = newOp(2, segroot->getAddr());
opSetOpcode(piece2, CPUI_SUBPIECE);
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2);
opSetInput(piece2, outvn, 0);
opSetInput(piece2, newConstant(outvn->getSize(), 0), 1);
opInsertAfter(piece2, piece1);
opSetOpcode(piece1, CPUI_SUBPIECE);
Varnode* vn1 = newUniqueOut(segdef->getBaseSize(), piece1);
opSetInput(piece1, outvn, 0);
opSetInput(piece1,
newConstant(glb->getDefaultSize(), segdef->getInnerSize()), 1);
opInsertAfter(piece1, segroot);
PcodeOp* piece2 = newOp(2, segroot->getAddr());
opSetOpcode(piece2, CPUI_SUBPIECE);
Varnode* vn2 = newUniqueOut(segdef->getInnerSize(), piece2);
opSetInput(piece2, outvn, 0);
opSetInput(piece2,
newConstant(glb->getDefaultSize(), 0), 1);
opInsertAfter(piece2, piece1);

Comment on lines +348 to +357
SegmentOp* segdef = (SegmentOp*)0;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
Architecture* glb = getArch();
AddrSpace* rspc = glb->getDefaultSpace();
bool arenearpointers = glb->hasNearPointers(rspc);
if (arenearpointers && rspc->getAddrSize() == vn->getSize())
segdef = glb->userops.getSegmentOp(rspc->getIndex());
}
//need to go through every single time this varnode is written to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Null-pointer check & type-safety missing

ct is dereferenced without validation; if a caller ever passes a null Datatype*, the decompiler will crash.
Similarly, you cast the result of glb->userops.getSegmentOp() to SegmentOp * without verifying it is non-null.

- SegmentOp* segdef = (SegmentOp*)0;
- if ((ct->getMetatype() == TYPE_PTR ||
+ if (ct == nullptr)            // defensive guard
+   return;
+
+ SegmentOp* segdef = nullptr;
+ if ((ct->getMetatype() == TYPE_PTR ||
       ct->getMetatype() == TYPE_CODE) && locked) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SegmentOp* segdef = (SegmentOp*)0;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
Architecture* glb = getArch();
AddrSpace* rspc = glb->getDefaultSpace();
bool arenearpointers = glb->hasNearPointers(rspc);
if (arenearpointers && rspc->getAddrSize() == vn->getSize())
segdef = glb->userops.getSegmentOp(rspc->getIndex());
}
//need to go through every single time this varnode is written to
if (ct == nullptr) // defensive guard
return;
SegmentOp* segdef = nullptr;
if ((ct->getMetatype() == TYPE_PTR ||
ct->getMetatype() == TYPE_CODE) && locked) {
Architecture* glb = getArch();
AddrSpace* rspc = glb->getDefaultSpace();
bool arenearpointers = glb->hasNearPointers(rspc);
if (arenearpointers && rspc->getAddrSize() == vn->getSize())
segdef = glb->userops.getSegmentOp(rspc->getIndex());
}
//need to go through every single time this varnode is written to

Comment on lines +4287 to +4302
if (localcount == 0) {
Datatype* ct;
Varnode* vn;
VarnodeLocSet::const_iterator iter;

for (iter = data.beginLoc(); iter != data.endLoc(); ++iter) {
vn = *iter;
if (vn->isAnnotation()) continue;
if ((!vn->isWritten()) && (vn->hasNoDescend())) continue;
ct = vn->getLocalType();
bool bBegin = false;
if (iter == data.beginLoc()) bBegin = true; else iter--;
data.segmentizeFarPtr(ct, vn->isTypeLock(), vn, true);
if (bBegin) iter = data.beginLoc(); else iter++;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Iterator manipulation is unsafe and may skip / duplicate elements

Inside the newly-added initialisation block you mutate the iter variable inside the for-loop that also increments it.
Because VarnodeLocSet is an ordered container, decrementing and re-incrementing the iterator (iter-- / iter++) produces undefined behaviour and can lead to:

  • Infinite loops when iter == begin (--iter is UB).
  • Skipped elements (the ++iter in the for-header still executes).
  • Re-visiting the same element twice.

A safer pattern is to take a copy of the container up-front (or cache std::next(iter) before calling segmentizeFarPtr) and never touch the running iterator.

-  for (iter = data.beginLoc(); iter != data.endLoc(); ++iter) {
-      vn = *iter;
-      ...
-      bool bBegin = false;
-      if (iter == data.beginLoc()) bBegin = true; else iter--;
-      data.segmentizeFarPtr(ct, vn->isTypeLock(), vn, true);
-      if (bBegin) iter = data.beginLoc(); else iter++;
-  }
+  std::vector<Varnode*> snapshot;
+  snapshot.reserve(std::distance(data.beginLoc(), data.endLoc()));
+
+  for (auto it = data.beginLoc(); it != data.endLoc(); ++it) {
+      Varnode* vn = *it;
+      if (vn->isAnnotation()) continue;
+      if (!vn->isWritten() && vn->hasNoDescend()) continue;
+      snapshot.push_back(vn);
+  }
+
+  for (Varnode* vn : snapshot) {
+      data.segmentizeFarPtr(vn->getLocalType(), vn->isTypeLock(), vn, true);
+  }

This avoids iterator invalidation and keeps the logic simple and deterministic.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (localcount == 0) {
Datatype* ct;
Varnode* vn;
VarnodeLocSet::const_iterator iter;
for (iter = data.beginLoc(); iter != data.endLoc(); ++iter) {
vn = *iter;
if (vn->isAnnotation()) continue;
if ((!vn->isWritten()) && (vn->hasNoDescend())) continue;
ct = vn->getLocalType();
bool bBegin = false;
if (iter == data.beginLoc()) bBegin = true; else iter--;
data.segmentizeFarPtr(ct, vn->isTypeLock(), vn, true);
if (bBegin) iter = data.beginLoc(); else iter++;
}
}
if (localcount == 0) {
Datatype* ct;
Varnode* vn;
VarnodeLocSet::const_iterator iter;
// Build a stable snapshot of the Varnodes we need to process
std::vector<Varnode*> snapshot;
snapshot.reserve(std::distance(data.beginLoc(), data.endLoc()));
for (auto it = data.beginLoc(); it != data.endLoc(); ++it) {
Varnode* vn = *it;
if (vn->isAnnotation()) continue;
if (!vn->isWritten() && vn->hasNoDescend()) continue;
snapshot.push_back(vn);
}
// Now safely iterate the snapshot without touching the original iterator
for (Varnode* vn : snapshot) {
data.segmentizeFarPtr(vn->getLocalType(), vn->isTypeLock(), vn, true);
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants