Skip to content

[MC] Use StringTable to reduce dynamic relocations #144202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rnk
Copy link
Collaborator

@rnk rnk commented Jun 14, 2025

Dynamic relocations are expensive on ELF/Linux platforms because they are applied in userspace on process startup. Therefore, it is worth optimizing them to make PIE and PIC dylib builds faster. In +asserts builds (non-NDEBUG), nikic identified these schedule class name string pointers as the leading source of dynamic relocations. [1]

This change uses llvm::StringTable and the StringToOffsetTable TableGen helper to turn the string pointers into 32-bit offsets into a separate character array.

The number of dynamic relocations is reduced by ~60%:
❯ llvm-readelf --dyn-relocations lib/libLLVM.so | wc -l
381376 # before
155156 # after

The test suite time is modestly affected, but I'm running on a shared noisy workstation VM with a ton of cores:
https://gist.github.com/rnk/f38882c2fe2e63d0eb58b8fffeab69de
Testing Time: 100.88s # before
Testing Time: 78.50s. # after
Testing Time: 96.25s. # before again

I haven't used any fancy hyperfine/denoising tools, but I think the result is clearly visible and IMO we should just ship it.

[1] https://gist.github.com/nikic/554f0a544ca15d5219788f1030f78c5a

rnk added 2 commits June 13, 2025 20:05
Linux PIC/PIE process startup is more expensive than other OSs because
the kernel doesn't collude to share a single copy of the pre-relocated
image pages. Instead, ld.so has to apply relative relocations at
startup.

According to @nikic , these tables were the leading source of dynamic
relocations in +asserts (non-NDEBUG) builds.

TODO: Data, check-llvm before & after
@llvmbot
Copy link
Member

llvmbot commented Jun 14, 2025

@llvm/pr-subscribers-tools-llvm-mca
@llvm/pr-subscribers-tablegen

@llvm/pr-subscribers-mc

Author: Reid Kleckner (rnk)

Changes

Dynamic relocations are expensive on ELF/Linux platforms because they are applied in userspace on process startup. Therefore, it is worth optimizing them to make PIE and PIC dylib builds faster. In +asserts builds (non-NDEBUG), @nikic identified these schedule class name string pointers as the leading source of dynamic relocations. [1]

This change uses llvm::StringTable and the StringToOffsetTable TableGen helper to turn the string pointers into 32-bit offsets into a separate character array.

The number of dynamic relocations is reduced by ~60%:
❯ llvm-readelf --dyn-relocations lib/libLLVM.so | wc -l
381376 # before
155156 # after

The test suite time is modestly affected, but I'm running on a shared noisy workstation VM with a ton of cores:
https://gist.github.com/rnk/f38882c2fe2e63d0eb58b8fffeab69de
Testing Time: 100.88s # before
Testing Time: 78.50s. # after
Testing Time: 96.25s. # before again

I haven't used any fancy hyperfine/denoising tools, but I think the result is clearly visible and IMO we should just ship it.

[1] https://gist.github.com/nikic/554f0a544ca15d5219788f1030f78c5a


Full diff: https://github.com/llvm/llvm-project/pull/144202.diff

7 Files Affected:

  • (modified) llvm/include/llvm/MC/MCSchedule.h (+13-1)
  • (modified) llvm/lib/MC/MCSchedule.cpp (+1)
  • (modified) llvm/lib/MCA/InstrBuilder.cpp (+2-1)
  • (modified) llvm/test/TableGen/CompressWriteLatencyEntry.td (+4-4)
  • (modified) llvm/test/TableGen/InvalidMCSchedClassDesc.td (+11-11)
  • (modified) llvm/tools/llvm-exegesis/lib/Analysis.cpp (+5-4)
  • (modified) llvm/utils/TableGen/SubtargetEmitter.cpp (+17-4)
diff --git a/llvm/include/llvm/MC/MCSchedule.h b/llvm/include/llvm/MC/MCSchedule.h
index eb71f4581bd61..ba5ceae71aaaf 100644
--- a/llvm/include/llvm/MC/MCSchedule.h
+++ b/llvm/include/llvm/MC/MCSchedule.h
@@ -15,6 +15,7 @@
 #define LLVM_MC_MCSCHEDULE_H
 
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringTable.h"
 #include "llvm/MC/MCInstrDesc.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/ErrorHandling.h"
@@ -124,7 +125,7 @@ struct MCSchedClassDesc {
   static const unsigned short VariantNumMicroOps = InvalidNumMicroOps - 1;
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-  const char* Name;
+  uint32_t NameOff;
 #endif
   uint16_t NumMicroOps : 13;
   uint16_t BeginGroup : 1;
@@ -324,6 +325,9 @@ struct MCSchedModel {
   const MCSchedClassDesc *SchedClassTable;
   unsigned NumProcResourceKinds;
   unsigned NumSchedClasses;
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  const StringTable *SchedClassNames;
+#endif
   // Instruction itinerary tables used by InstrItineraryData.
   friend class InstrItineraryData;
   const InstrItinerary *InstrItineraries;
@@ -368,6 +372,14 @@ struct MCSchedModel {
     return &SchedClassTable[SchedClassIdx];
   }
 
+  StringRef getSchedClassName(unsigned SchedClassIdx) const {
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+    return (*SchedClassNames)[SchedClassTable[SchedClassIdx].NameOff];
+#else
+    return "<unknown>";
+#endif
+  }
+
   /// Returns the latency value for the scheduling class.
   LLVM_ABI static int computeInstrLatency(const MCSubtargetInfo &STI,
                                           const MCSchedClassDesc &SCDesc);
diff --git a/llvm/lib/MC/MCSchedule.cpp b/llvm/lib/MC/MCSchedule.cpp
index 8aea08919f469..527ccf3fc36e0 100644
--- a/llvm/lib/MC/MCSchedule.cpp
+++ b/llvm/lib/MC/MCSchedule.cpp
@@ -37,6 +37,7 @@ const MCSchedModel MCSchedModel::Default = {DefaultIssueWidth,
                                             0,
                                             0,
                                             nullptr,
+                                            nullptr,
                                             nullptr};
 
 int MCSchedModel::computeInstrLatency(const MCSubtargetInfo &STI,
diff --git a/llvm/lib/MCA/InstrBuilder.cpp b/llvm/lib/MCA/InstrBuilder.cpp
index 2bac99b6309af..cad25a6ddd3f5 100644
--- a/llvm/lib/MCA/InstrBuilder.cpp
+++ b/llvm/lib/MCA/InstrBuilder.cpp
@@ -75,7 +75,8 @@ static void initializeUsedResources(InstrDesc &ID,
       WithColor::warning()
           << "Ignoring invalid write of zero cycles on processor resource "
           << PR.Name << "\n";
-      WithColor::note() << "found in scheduling class " << SCDesc.Name
+      WithColor::note() << "found in scheduling class "
+                        << SM.getSchedClassName(ID.SchedClassID)
                         << " (write index #" << I << ")\n";
 #endif
       continue;
diff --git a/llvm/test/TableGen/CompressWriteLatencyEntry.td b/llvm/test/TableGen/CompressWriteLatencyEntry.td
index 88273e8858448..d6a9f0ac0dd76 100644
--- a/llvm/test/TableGen/CompressWriteLatencyEntry.td
+++ b/llvm/test/TableGen/CompressWriteLatencyEntry.td
@@ -33,10 +33,10 @@ def Read_D : SchedRead;
 // CHECK-NEXT: }; // MyTargetReadAdvanceTable
 
 // CHECK:  static const llvm::MCSchedClassDesc SchedModel_ASchedClasses[] = {
-// CHECK-NEXT:  {DBGFIELD("InvalidSchedClass")  8191, false, false, false, 0, 0,  0, 0,  0, 0},
-// CHECK-NEXT:  {DBGFIELD("Inst_A")             1, false, false, false,  0, 0,  1, 1,  0, 0}, // #1
-// CHECK-NEXT:  {DBGFIELD("Inst_B")             1, false, false, false,  0, 0,  2, 1,  0, 0}, // #2
-// CHECK-NEXT:  {DBGFIELD("Inst_C")             1, false, false, false,  0, 0,  1, 1,  1, 1}, // #3
+// CHECK-NEXT:  {DBGFIELD(1)  8191, false, false, false, 0, 0,  0, 0,  0, 0},
+// CHECK-NEXT:  {DBGFIELD(/*Inst_A*/ {{[0-9]+}})             1, false, false, false,  0, 0,  1, 1,  0, 0}, // #1
+// CHECK-NEXT:  {DBGFIELD(/*Inst_B*/ {{[0-9]+}})             1, false, false, false,  0, 0,  2, 1,  0, 0}, // #2
+// CHECK-NEXT:  {DBGFIELD(/*Inst_C*/ {{[0-9]+}})             1, false, false, false,  0, 0,  1, 1,  1, 1}, // #3
 // CHECK-NEXT: }; // SchedModel_ASchedClasses
 
 let SchedModel = SchedModel_A in {
diff --git a/llvm/test/TableGen/InvalidMCSchedClassDesc.td b/llvm/test/TableGen/InvalidMCSchedClassDesc.td
index de5392237a84c..e43edd4174589 100644
--- a/llvm/test/TableGen/InvalidMCSchedClassDesc.td
+++ b/llvm/test/TableGen/InvalidMCSchedClassDesc.td
@@ -1,13 +1,13 @@
 // RUN: llvm-tblgen -gen-subtarget -I %p/../../include %s 2>&1 | FileCheck %s
-// Check if it is valid MCSchedClassDesc if didn't have the resources. 
+// Check if it is valid MCSchedClassDesc if didn't have the resources.
 
 include "llvm/Target/Target.td"
 
 def MyTarget : Target;
 
 let OutOperandList = (outs), InOperandList = (ins) in {
-  def Inst_A : Instruction; 
-  def Inst_B : Instruction; 
+  def Inst_A : Instruction;
+  def Inst_B : Instruction;
 }
 
 let CompleteModel = 0 in {
@@ -18,8 +18,8 @@ let CompleteModel = 0 in {
 
 // Inst_B didn't have the resoures, and it is invalid.
 // CHECK: SchedModel_ASchedClasses[] = {
-// CHECK: {DBGFIELD("Inst_A")             1
-// CHECK-NEXT: {DBGFIELD("Inst_B")             8191
+// CHECK: {DBGFIELD(/*Inst_A*/ 19)             1
+// CHECK-NEXT: {DBGFIELD(/*Inst_B*/ 26)             8191
 let SchedModel = SchedModel_A in {
   def Write_A : SchedWriteRes<[]>;
   def : InstRW<[Write_A], (instrs Inst_A)>;
@@ -27,18 +27,18 @@ let SchedModel = SchedModel_A in {
 
 // Inst_A didn't have the resoures, and it is invalid.
 // CHECK: SchedModel_BSchedClasses[] = {
-// CHECK: {DBGFIELD("Inst_A")             8191
-// CHECK-NEXT: {DBGFIELD("Inst_B")             1 
+// CHECK: {DBGFIELD(/*Inst_A*/ 19)             8191
+// CHECK-NEXT: {DBGFIELD(/*Inst_B*/ 26)             1
 let SchedModel = SchedModel_B in {
-  def Write_B: SchedWriteRes<[]>; 
+  def Write_B: SchedWriteRes<[]>;
   def : InstRW<[Write_B], (instrs Inst_B)>;
 }
 
 // CHECK: SchedModel_CSchedClasses[] = {
-// CHECK: {DBGFIELD("Inst_A")             1
-// CHECK-NEXT: {DBGFIELD("Inst_B")             1
+// CHECK: {DBGFIELD(/*Inst_A*/ 19)             1
+// CHECK-NEXT: {DBGFIELD(/*Inst_B*/ 26)             1
 let SchedModel = SchedModel_C in {
-  def Write_C: SchedWriteRes<[]>; 
+  def Write_C: SchedWriteRes<[]>;
   def : InstRW<[Write_C], (instrs Inst_A, Inst_B)>;
 }
 
diff --git a/llvm/tools/llvm-exegesis/lib/Analysis.cpp b/llvm/tools/llvm-exegesis/lib/Analysis.cpp
index be10c32cf08d5..fb843285ada2a 100644
--- a/llvm/tools/llvm-exegesis/lib/Analysis.cpp
+++ b/llvm/tools/llvm-exegesis/lib/Analysis.cpp
@@ -137,9 +137,9 @@ void Analysis::printInstructionRowCsv(const size_t PointId,
   std::tie(SchedClassId, std::ignore) = ResolvedSchedClass::resolveSchedClassId(
       State_.getSubtargetInfo(), State_.getInstrInfo(), MCI);
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-  const MCSchedClassDesc *const SCDesc =
-      State_.getSubtargetInfo().getSchedModel().getSchedClassDesc(SchedClassId);
-  writeEscaped<kEscapeCsv>(OS, SCDesc->Name);
+  StringRef SCDescName =
+      State_.getSubtargetInfo().getSchedModel().getSchedClassName(SchedClassId);
+  writeEscaped<kEscapeCsv>(OS, SCDescName);
 #else
   OS << SchedClassId;
 #endif
@@ -563,7 +563,8 @@ Error Analysis::run<Analysis::PrintSchedClassInconsistencies>(
     OS << "<div class=\"inconsistency\"><p>Sched Class <span "
           "class=\"sched-class-name\">";
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-    writeEscaped<kEscapeHtml>(OS, RSCAndPoints.RSC.SCDesc->Name);
+    writeEscaped<kEscapeHtml>(OS, SI.getSchedModel().getSchedClassName(
+                                      RSCAndPoints.RSC.SchedClassId));
 #else
     OS << RSCAndPoints.RSC.SchedClassId;
 #endif
diff --git a/llvm/utils/TableGen/SubtargetEmitter.cpp b/llvm/utils/TableGen/SubtargetEmitter.cpp
index ca008e256a70f..fb36890f547b2 100644
--- a/llvm/utils/TableGen/SubtargetEmitter.cpp
+++ b/llvm/utils/TableGen/SubtargetEmitter.cpp
@@ -28,6 +28,7 @@
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TableGen/Error.h"
 #include "llvm/TableGen/Record.h"
+#include "llvm/TableGen/StringToOffsetTable.h"
 #include "llvm/TableGen/TableGenBackend.h"
 #include "llvm/TargetParser/SubtargetFeature.h"
 #include <algorithm>
@@ -1456,6 +1457,10 @@ void SubtargetEmitter::emitSchedClassTables(SchedClassTables &SchedTables,
   }
   OS << "}; // " << Target << "ReadAdvanceTable\n";
 
+  // Pool all SchedClass names in a string table.
+  StringToOffsetTable StrTab;
+  unsigned InvalidNameOff = StrTab.GetOrAddStringOffset("InvalidSchedClass");
+
   // Emit a SchedClass table for each processor.
   for (const auto &[Idx, Proc] : enumerate(SchedModels.procModels())) {
     if (!Proc.hasInstrSchedModel())
@@ -1473,14 +1478,15 @@ void SubtargetEmitter::emitSchedClassTables(SchedClassTables &SchedTables,
     // name and position.
     assert(SchedModels.getSchedClass(0).Name == "NoInstrModel" &&
            "invalid class not first");
-    OS << "  {DBGFIELD(\"InvalidSchedClass\")  "
+    OS << "  {DBGFIELD(" << InvalidNameOff << ")  "
        << MCSchedClassDesc::InvalidNumMicroOps
        << ", false, false, false, 0, 0,  0, 0,  0, 0},\n";
 
     for (unsigned SCIdx = 1, SCEnd = SCTab.size(); SCIdx != SCEnd; ++SCIdx) {
       MCSchedClassDesc &MCDesc = SCTab[SCIdx];
       const CodeGenSchedClass &SchedClass = SchedModels.getSchedClass(SCIdx);
-      OS << "  {DBGFIELD(\"" << SchedClass.Name << "\") ";
+      unsigned NameOff = StrTab.GetOrAddStringOffset(SchedClass.Name);
+      OS << "  {DBGFIELD(/*" << SchedClass.Name << "*/ " << NameOff << ") ";
       if (SchedClass.Name.size() < 18)
         OS.indent(18 - SchedClass.Name.size());
       OS << MCDesc.NumMicroOps << ", " << (MCDesc.BeginGroup ? "true" : "false")
@@ -1495,6 +1501,8 @@ void SubtargetEmitter::emitSchedClassTables(SchedClassTables &SchedTables,
     }
     OS << "}; // " << Proc.ModelName << "SchedClasses\n";
   }
+
+  StrTab.EmitStringTableDef(OS, Target + "SchedClassNames");
 }
 
 void SubtargetEmitter::emitProcessorModels(raw_ostream &OS) {
@@ -1548,6 +1556,8 @@ void SubtargetEmitter::emitProcessorModels(raw_ostream &OS) {
     else
       OS << "  nullptr, nullptr, 0, 0,"
          << " // No instruction-level machine model.\n";
+    OS << "  DBGVAL_OR_NULLPTR(&" << Target
+       << "SchedClassNames), // SchedClassNames\n";
     if (PM.hasItineraries())
       OS << "  " << PM.ItinsDef->getName() << ",\n";
     else
@@ -1569,8 +1579,10 @@ void SubtargetEmitter::emitSchedModel(raw_ostream &OS) {
      << "#endif\n"
      << "#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)\n"
      << "#define DBGFIELD(x) x,\n"
+     << "#define DBGVAL_OR_NULLPTR(x) x\n"
      << "#else\n"
      << "#define DBGFIELD(x)\n"
+     << "#define DBGVAL_OR_NULLPTR(x) nullptr\n"
      << "#endif\n";
 
   if (SchedModels.hasItineraries()) {
@@ -1588,10 +1600,11 @@ void SubtargetEmitter::emitSchedModel(raw_ostream &OS) {
   }
   emitSchedClassTables(SchedTables, OS);
 
-  OS << "\n#undef DBGFIELD\n";
-
   // Emit the processor machine model
   emitProcessorModels(OS);
+
+  OS << "\n#undef DBGFIELD\n";
+  OS << "\n#undef DBGVAL_OR_NULLPTR\n";
 }
 
 static void emitPredicateProlog(const RecordKeeper &Records, raw_ostream &OS) {

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, but should confirm this works in release builds.

@@ -324,6 +325,9 @@ struct MCSchedModel {
const MCSchedClassDesc *SchedClassTable;
unsigned NumProcResourceKinds;
unsigned NumSchedClasses;
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
const StringTable *SchedClassNames;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the field is declared conditionally here, but then initialized unconditionally both via TableGen and in the test file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pine a concern - hopefully a -Asserts build would just fail and catch this?

@@ -1495,6 +1501,8 @@ void SubtargetEmitter::emitSchedClassTables(SchedClassTables &SchedTables,
}
OS << "}; // " << Proc.ModelName << "SchedClasses\n";
}

StrTab.EmitStringTableDef(OS, Target + "SchedClassNames");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this behind the NDEBUG check as well, or do we just rely on it being DCEd? (Do we need to suppress a warning about an unused static in release builds possibly?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think that DCE would occur in the dynamic build. This is an exposed interface, and unless it was marked as infused or static, the linker cannot know it it’s safe to remove.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants