Skip to content

[lldb] Fix object format of some mach-o files by using vendor info in getDefaultFormat() #143633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

royitaqi
Copy link
Contributor

@royitaqi royitaqi commented Jun 10, 2025

Context: See previous attempt #142704

TL;DR: If a Mach-O file (including those yaml data in lldb unit tests) doesn't have load commands like LC_BUILD_VERSION or LC_VERSION_MIN_*, they are mistakenly reported as ELF (as in their Triple) when they are parsed by ObjectFileMachO. If the load commands existed, they would have caused ObjectFileMachO to set the correct OS name into the Triple here and here, which would have led to a correct default MachO object format. Previously, #142704 tries to fix this by setting the correct object format in ObjectFileMachO::GetAllArchSpecs. However, it seems such explicit assignment cause the Triple to report "-macho" in its string form, causing change in UI and a test failure which depends on the UI.

So we are dealing with the following constraints:

  1. The UI cannot be changed. IIUC, this means the fix cannot set the object format explicitly. I.e. the fix has to be in Triple's getDefaultFormat(), changing the returned default format based on the Triple's other fields.
  2. In getDefaultFormat(), at least in the case that I tested (see added unit test), the OS is not set.
  3. The Vendor is set to Apple. This is because ObjectFileMachO::GetAllArchSpecs() calls ArchSpec::SetArchitecture() (code), which sets the vendor (code).
  4. Since the above code will always set the arch type to eArchTypeMachO, which will always lead to the vendor to be set as Apple, it seems to be a consistent way to detect mach-o object format.

This patch doesn't change the main logic flow of getDefaultFormat() - it still first look at CPU arch, then OS. The change is that, when we look at OS and it's UnknownOS, then we further look at vendor and return MachO if the vendor is Apple.


An alternative approach is to:

  1. Claim that LLDB don't support Mach-O files which don't have the said version load commands (i.e. undetermined behavior).
  2. In all LLDB tests which uses Mach-O, make sure they have version load commands.

@royitaqi royitaqi requested a review from JDevlieghere as a code owner June 10, 2025 23:49
@royitaqi royitaqi requested review from dmpots and removed request for JDevlieghere June 10, 2025 23:49
@llvmbot llvmbot added the lldb label Jun 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2025

@llvm/pr-subscribers-lldb

Author: None (royitaqi)

Changes

Context: See previous attempt #142704

TL;DR: Some mach-o files (including some of those in lldb tests) are mistakenly reported as ELF (as in their Triple) by ObjectFileMachO. The reason is that those files don't have load commands like LC_BUILD_VERSION or LC_VERSION_MIN_* (which would cause the Triple code to set the correct MachO object format). Previously, #142704 tries to fix this by setting the correct object format ObjectFileMachO::GetAllArchSpecs. However, it seems such explicit assignment cause the Triple to report "-macho" in its string form, causing change in UI and a test failure which depends on the UI.

So we are dealing with the following constraints:

  1. The UI cannot be changed. IIUC, this means the fix cannot set the object format explicitly. I.e. the fix has to be in Triple's getDefaultFormat(), changing the returned default format based on the Triple's other fields.
  2. In getDefaultFormat(), at least in the case that I tested (see added unit test), the OS is not set.
  3. The Vendor is set to Apple. This is because ObjectFileMachO::GetAllArchSpecs()callsArchSpec::SetArchitecture()` (code), which sets the vendor (code).
  4. Since the above code will always set the arch type to eArchTypeMachO, which will always lead to the vendor to be set as Apple, it seems to be a consistent way to detect mach-o object format.

Full diff: https://github.com/llvm/llvm-project/pull/143633.diff

2 Files Affected:

  • (modified) lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp (+55)
  • (modified) llvm/lib/TargetParser/Triple.cpp (+3)
diff --git a/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp b/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp
index 0ef2d0b85fd36..71ff866abb352 100644
--- a/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp
+++ b/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp
@@ -94,4 +94,59 @@ TEST_F(ObjectFileMachOTest, IndirectSymbolsInTheSharedCache) {
   for (size_t i = 0; i < 10; i++)
     OF->ParseSymtab(symtab);
 }
+
+TEST_F(ObjectFileMachOTest, ObjectFormatWithoutVersionLoadCommand) {
+  // A Mach-O file without the load command LC_BUILD_VERSION.
+  const char *yamldata = R"(
+--- !mach-o
+FileHeader:
+  magic:           0xFEEDFACF
+  cputype:         0x0100000C
+  cpusubtype:      0x00000000
+  filetype:        0x00000001
+  ncmds:           1
+  sizeofcmds:      152
+  flags:           0x00002000
+  reserved:        0x00000000
+LoadCommands:
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         152
+    segname:         __TEXT
+    vmaddr:          0
+    vmsize:          4
+    fileoff:         184
+    filesize:        4
+    maxprot:         7
+    initprot:        7
+    nsects:          1
+    flags:           0
+    Sections:
+      - sectname:        __text
+        segname:         __TEXT
+        addr:            0x0000000000000000
+        content:         'AABBCCDD'
+        size:            4
+        offset:          184
+        align:           0
+        reloff:          0x00000000
+        nreloc:          0
+        flags:           0x80000400
+        reserved1:       0x00000000
+        reserved2:       0x00000000
+        reserved3:       0x00000000
+...
+)";
+
+  // Perform setup.
+  llvm::Expected<TestFile> file = TestFile::fromYaml(yamldata);
+  EXPECT_THAT_EXPECTED(file, llvm::Succeeded());
+  auto module_sp = std::make_shared<Module>(file->moduleSpec());
+  ASSERT_NE(module_sp, nullptr);
+  auto object_file = module_sp->GetObjectFile();
+  ASSERT_NE(object_file, nullptr);
+
+  // Verify that the object file is recognized as Mach-O.
+  ASSERT_EQ(object_file->GetArchitecture().GetTriple().getObjectFormat(),
+            llvm::Triple::MachO);
+}
 #endif
diff --git a/llvm/lib/TargetParser/Triple.cpp b/llvm/lib/TargetParser/Triple.cpp
index 6a559ff023caa..19ccd44322314 100644
--- a/llvm/lib/TargetParser/Triple.cpp
+++ b/llvm/lib/TargetParser/Triple.cpp
@@ -933,6 +933,9 @@ static Triple::ObjectFormatType getDefaultFormat(const Triple &T) {
     case Triple::Win32:
     case Triple::UEFI:
       return Triple::COFF;
+    case Triple::UnknownOS:
+      return T.getVendor() == Triple::Apple ? Triple::MachO : Triple::ELF;
+      // Intentional leak into the default case for additional logic.
     default:
       return T.isOSDarwin() ? Triple::MachO : Triple::ELF;
     }

@@ -933,6 +933,9 @@ static Triple::ObjectFormatType getDefaultFormat(const Triple &T) {
case Triple::Win32:
case Triple::UEFI:
return Triple::COFF;
case Triple::UnknownOS:
return T.getVendor() == Triple::Apple ? Triple::MachO : Triple::ELF;
// Intentional leak into the default case for additional logic.
default:
return T.isOSDarwin() ? Triple::MachO : Triple::ELF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fold the above check into this line?

Suggested change
return T.isOSDarwin() ? Triple::MachO : Triple::ELF;
return T.isOSDarwin() || (T.getVendor() == Triple::Apple) ? Triple::MachO : Triple::ELF;

Or do we really only want it when the OS is unknown?

Copy link
Contributor Author

@royitaqi royitaqi Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the top of my head: For the sake of future proof, e.g. say Apple ships their own Linux, I assume their Linux will use ELF format, and so for case where "vendor == Apple && os == Linux", we probably want to return ELF.

WDYT?

(BTW the command of "Intentional leak" is outdated. Will remove in next update.)

@labath
Copy link
Collaborator

labath commented Jun 11, 2025

I don't have an opinion on the patch, but I want to point out that, despite it being motivated by lldb, this is really an change to one of the foundational llvm classes, so you should get someone from the llvm side to review this and also add an (llvm) test to demonstrate/test the behavior you expect.

@dmpots
Copy link
Contributor

dmpots commented Jun 11, 2025

this is really an change to one of the foundational llvm classes, so you should get someone from the llvm side to review this

Tagging a few people involved in the last PR (#75469) that modified this code path: @arsenm, @MaskRay, @lhames, @dsandersllvm, @rudkx

and also add an (llvm) test to demonstrate/test the behavior you expect.

Great call out. Looks like this would be a good place to add a test:
https://github.com/llvm/llvm-project/blob/main/llvm/unittests/TargetParser/TripleTest.cpp

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ok, but we should get some ok from the owners or contributors to the llvm/lib/TargetParser/Triple.cpp file.

@clayborg
Copy link
Collaborator

The reason is that those files don't have load commands like LC_BUILD_VERSION or LC_VERSION_MIN_*

What kind of files don't have this LC_VERSION_MIN_* set? That seems like the bug we should be fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants