Skip to content

Conversation

@jmmartinez
Copy link
Contributor

@jmmartinez jmmartinez commented Oct 9, 2025

The address space for the pointers stored in the llvm.compiler.used/llvm.used should be 0, but this was not always respected.

After #164432 Clang always emits addrspace 0 for the elements of llvm.compiler.used/llvm.used.

To avoid inconsistencies in the future, move the helpers to GlobalValue and use them in the 2 other places where these variables are manually updated: BitcodeWriter and CodeGenModule.

@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2025

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-transforms

Author: Juan Manuel Martinez Caamaño (jmmartinez)

Changes

At the moment, the address space for the pointer stored in the llvm.compiler.used/llvm.used is not well defined:

  1. Clang uses the language default address space,
  2. LLVM helpers from ModuleUtils use the 0 address space,
  3. and BitcodeWriter (for fembed-bitcode) tries to preserve the element type if it already exists, 0 otherwise.

This PR doesn't solve this issue. It only adds one test that documents one problem with it, and makes the BitcodeWriter use the llvm helpers to reduce the places where these variables change.

Without this patch, the test added in the PR fails with an assertion:

UtilsTests: /home/juamarti/llvm/_llvm/llvm/lib/IR/Constants.cpp:1327:
    static Constant *llvm::ConstantArray::getImpl(ArrayType *, ArrayRef<Constant *>):
    Assertion `C->getType() == Ty->getElementType() && "Wrong type in array element initializer"' failed.

Full diff: https://github.com/llvm/llvm-project/pull/162660.diff

4 Files Affected:

  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+30-40)
  • (modified) llvm/lib/Bitcode/Writer/CMakeLists.txt (+1)
  • (modified) llvm/lib/Transforms/Utils/ModuleUtils.cpp (+2-1)
  • (modified) llvm/unittests/Transforms/Utils/ModuleUtilsTest.cpp (+17)
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index 7ed140d392fca..f7afa393f3e00 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -75,6 +75,7 @@
 #include "llvm/Support/SHA1.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TargetParser/Triple.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
 #include <algorithm>
 #include <cassert>
 #include <cstddef>
@@ -5850,25 +5851,25 @@ static const char *getSectionNameForCommandline(const Triple &T) {
 void llvm::embedBitcodeInModule(llvm::Module &M, llvm::MemoryBufferRef Buf,
                                 bool EmbedBitcode, bool EmbedCmdline,
                                 const std::vector<uint8_t> &CmdArgs) {
-  // Save llvm.compiler.used and remove it.
-  SmallVector<Constant *, 2> UsedArray;
-  SmallVector<GlobalValue *, 4> UsedGlobals;
-  GlobalVariable *Used = collectUsedGlobalVariables(M, UsedGlobals, true);
-  Type *UsedElementType = Used ? Used->getValueType()->getArrayElementType()
-                               : PointerType::getUnqual(M.getContext());
-  for (auto *GV : UsedGlobals) {
-    if (GV->getName() != "llvm.embedded.module" &&
-        GV->getName() != "llvm.cmdline")
-      UsedArray.push_back(
-          ConstantExpr::getPointerBitCastOrAddrSpaceCast(GV, UsedElementType));
-  }
-  if (Used)
-    Used->eraseFromParent();
 
   // Embed the bitcode for the llvm module.
   std::string Data;
   ArrayRef<uint8_t> ModuleData;
   Triple T(M.getTargetTriple());
+  SmallVector<GlobalValue *, 2> NewGlobals;
+
+  auto IsCmdOrBitcode = [&](Constant *C) {
+    GlobalVariable *GV = dyn_cast<GlobalVariable>(C);
+    StringRef Name = GV ? GV->getName() : "";
+    if (EmbedBitcode && Name == "llvm.embedded.module")
+      return true;
+    if (EmbedCmdline && Name == "llvm.cmdline")
+      return true;
+    return false;
+  };
+
+  if (EmbedBitcode || EmbedCmdline)
+    removeFromUsedLists(M, IsCmdOrBitcode);
 
   if (EmbedBitcode) {
     if (Buf.getBufferSize() == 0 ||
@@ -5887,23 +5888,22 @@ void llvm::embedBitcodeInModule(llvm::Module &M, llvm::MemoryBufferRef Buf,
   }
   llvm::Constant *ModuleConstant =
       llvm::ConstantDataArray::get(M.getContext(), ModuleData);
-  llvm::GlobalVariable *GV = new llvm::GlobalVariable(
+  llvm::GlobalVariable *EmbeddedModule = new llvm::GlobalVariable(
       M, ModuleConstant->getType(), true, llvm::GlobalValue::PrivateLinkage,
       ModuleConstant);
-  GV->setSection(getSectionNameForBitcode(T));
+  EmbeddedModule->setSection(getSectionNameForBitcode(T));
   // Set alignment to 1 to prevent padding between two contributions from input
   // sections after linking.
-  GV->setAlignment(Align(1));
-  UsedArray.push_back(
-      ConstantExpr::getPointerBitCastOrAddrSpaceCast(GV, UsedElementType));
+  EmbeddedModule->setAlignment(Align(1));
+  NewGlobals.push_back(EmbeddedModule);
   if (llvm::GlobalVariable *Old =
           M.getGlobalVariable("llvm.embedded.module", true)) {
     assert(Old->hasZeroLiveUses() &&
            "llvm.embedded.module can only be used once in llvm.compiler.used");
-    GV->takeName(Old);
+    EmbeddedModule->takeName(Old);
     Old->eraseFromParent();
   } else {
-    GV->setName("llvm.embedded.module");
+    EmbeddedModule->setName("llvm.embedded.module");
   }
 
   // Skip if only bitcode needs to be embedded.
@@ -5913,30 +5913,20 @@ void llvm::embedBitcodeInModule(llvm::Module &M, llvm::MemoryBufferRef Buf,
                               CmdArgs.size());
     llvm::Constant *CmdConstant =
         llvm::ConstantDataArray::get(M.getContext(), CmdData);
-    GV = new llvm::GlobalVariable(M, CmdConstant->getType(), true,
-                                  llvm::GlobalValue::PrivateLinkage,
-                                  CmdConstant);
-    GV->setSection(getSectionNameForCommandline(T));
-    GV->setAlignment(Align(1));
-    UsedArray.push_back(
-        ConstantExpr::getPointerBitCastOrAddrSpaceCast(GV, UsedElementType));
+    GlobalVariable *CmdLine = new llvm::GlobalVariable(
+        M, CmdConstant->getType(), true, llvm::GlobalValue::PrivateLinkage,
+        CmdConstant);
+    CmdLine->setSection(getSectionNameForCommandline(T));
+    CmdLine->setAlignment(Align(1));
     if (llvm::GlobalVariable *Old = M.getGlobalVariable("llvm.cmdline", true)) {
       assert(Old->hasZeroLiveUses() &&
              "llvm.cmdline can only be used once in llvm.compiler.used");
-      GV->takeName(Old);
+      CmdLine->takeName(Old);
       Old->eraseFromParent();
     } else {
-      GV->setName("llvm.cmdline");
+      CmdLine->setName("llvm.cmdline");
     }
+    NewGlobals.push_back(CmdLine);
+    appendToCompilerUsed(M, NewGlobals);
   }
-
-  if (UsedArray.empty())
-    return;
-
-  // Recreate llvm.compiler.used.
-  ArrayType *ATy = ArrayType::get(UsedElementType, UsedArray.size());
-  auto *NewUsed = new GlobalVariable(
-      M, ATy, false, llvm::GlobalValue::AppendingLinkage,
-      llvm::ConstantArray::get(ATy, UsedArray), "llvm.compiler.used");
-  NewUsed->setSection("llvm.metadata");
 }
diff --git a/llvm/lib/Bitcode/Writer/CMakeLists.txt b/llvm/lib/Bitcode/Writer/CMakeLists.txt
index 2c508ca9fae95..5bbb872a90341 100644
--- a/llvm/lib/Bitcode/Writer/CMakeLists.txt
+++ b/llvm/lib/Bitcode/Writer/CMakeLists.txt
@@ -15,4 +15,5 @@ add_llvm_component_library(LLVMBitWriter
   ProfileData
   Support
   TargetParser
+  TransformUtils
   )
diff --git a/llvm/lib/Transforms/Utils/ModuleUtils.cpp b/llvm/lib/Transforms/Utils/ModuleUtils.cpp
index 596849ecab742..d1acb0ff1ad6b 100644
--- a/llvm/lib/Transforms/Utils/ModuleUtils.cpp
+++ b/llvm/lib/Transforms/Utils/ModuleUtils.cpp
@@ -138,10 +138,11 @@ static void appendToUsedList(Module &M, StringRef Name, ArrayRef<GlobalValue *>
 
   SmallSetVector<Constant *, 16> Init;
   collectUsedGlobals(GV, Init);
+  Type *ArrayEltTy = GV ? GV->getValueType()->getArrayElementType()
+                        : PointerType::getUnqual(M.getContext());
   if (GV)
     GV->eraseFromParent();
 
-  Type *ArrayEltTy = llvm::PointerType::getUnqual(M.getContext());
   for (auto *V : Values)
     Init.insert(ConstantExpr::getPointerBitCastOrAddrSpaceCast(V, ArrayEltTy));
 
diff --git a/llvm/unittests/Transforms/Utils/ModuleUtilsTest.cpp b/llvm/unittests/Transforms/Utils/ModuleUtilsTest.cpp
index d4094c5307060..0cc408af43bc5 100644
--- a/llvm/unittests/Transforms/Utils/ModuleUtilsTest.cpp
+++ b/llvm/unittests/Transforms/Utils/ModuleUtilsTest.cpp
@@ -69,6 +69,23 @@ TEST(ModuleUtils, AppendToUsedList2) {
   EXPECT_EQ(1, getListSize(*M, "llvm.used"));
 }
 
+TEST(ModuleUtils, AppendToUsedList3) {
+  LLVMContext C;
+
+  std::unique_ptr<Module> M = parseIR(C, R"(
+          @x = addrspace(1) global [2 x i32] zeroinitializer, align 4
+          @y = addrspace(2) global [2 x i32] zeroinitializer, align 4
+          @llvm.compiler.used = appending global [1 x ptr addrspace (3)] [ptr addrspace(3) addrspacecast (ptr addrspace (1) @x to ptr addrspace(3))]
+      )");
+  GlobalVariable *X = M->getNamedGlobal("x");
+  GlobalVariable *Y = M->getNamedGlobal("y");
+  EXPECT_EQ(1, getListSize(*M, "llvm.compiler.used"));
+  appendToCompilerUsed(*M, X);
+  EXPECT_EQ(1, getListSize(*M, "llvm.compiler.used"));
+  appendToCompilerUsed(*M, Y);
+  EXPECT_EQ(2, getListSize(*M, "llvm.compiler.used"));
+}
+
 using AppendFnType = decltype(&appendToGlobalCtors);
 using TransformFnType = decltype(&transformGlobalCtors);
 using ParamType = std::tuple<StringRef, AppendFnType, TransformFnType>;

@jmmartinez jmmartinez requested a review from shiltian October 9, 2025 14:06
@arsenm
Copy link
Contributor

arsenm commented Oct 10, 2025

At the moment, the address space for the pointer stored in the llvm.compiler.used/llvm.used is not well defined:

It should be defined to 0

@jmmartinez
Copy link
Contributor Author

At the moment, the address space for the pointer stored in the llvm.compiler.used/llvm.used is not well defined:

It should be defined to 0

In that case, should the LLVM-IR verifier (or at least the linter) reject bitcode where these variables have the wrong type?

I still have to check what breaks exactly when clang emits an addrspace(0) ptr in llvm.used/llvm.compiler.used. AFAIK it is the case for SPIRV, which should be solved by #162678 .

@jmmartinez jmmartinez force-pushed the fix/workaround_for_llvm_compiler_used branch from 3d08e3a to 7b445e0 Compare October 10, 2025 08:14
@github-actions
Copy link

github-actions bot commented Oct 10, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@arsenm
Copy link
Contributor

arsenm commented Oct 10, 2025

In that case, should the LLVM-IR verifier (or at least the linter) reject bitcode where these variables have the wrong type?

Yes. verifier failure.

I still have to check what breaks exactly when clang emits an addrspace(0) ptr in llvm.used/llvm.compiler.used. AFAIK it is the case for SPIRV, which should be solved by #162678 .

SPIR-V probably shouldn't just drop these on the floor. It probably will need special case code to stripPointerCasts and re-package the references into whatever SPIRV type it prefers

@jmmartinez
Copy link
Contributor Author

Ping !

@jmmartinez jmmartinez force-pushed the fix/workaround_for_llvm_compiler_used branch from e9a6253 to b076e53 Compare October 21, 2025 14:40
@jmmartinez jmmartinez requested a review from nikic October 22, 2025 10:04
jmmartinez added a commit that referenced this pull request Nov 3, 2025
…ace(0) (#164432)

By convention the AS of the elements of `llvm.compiler.used` &
`llvm.used` is 0. However, the AS of `CGM.Int8PtrTy` is not always 0.

This leaves some LLVM helpers
(`appendToUsed/appendToCompilerUsed/removeFromUsedLists`) unusable.

This patch makes the AS of the elements of these variables to be 0.

This PR is related to #162660
…@llvm.used if it already exists

This new test fails with:

    /home/juamarti/llvm/_llvm/llvm/lib/IR/Constants.cpp:1327:
    static Constant *llvm::ConstantArray::getImpl(ArrayType *, ArrayRef<Constant *>):
    Assertion `C->getType() == Ty->getElementType()
      && "Wrong type in array element initializer"'
      failed.
…already exists

At the moment, the pointer type stored in the
llvm.compiler.used/llvm.used is not well fixed.

The frontend uses a pointer to the default address space (which may not
be 0; for example, it is 4 for SPIRV).

This patch makes `appendToUsed/appendToCompilerUsed` match the behaviour
in BitcodeWriter.cpp: if the variable already exists, preserve its
element type, otherwise use `ptr addrspace (0)`.

This fixes the following error in the newly added test:

  UtilsTests: /home/juamarti/llvm/_llvm/llvm/lib/IR/Constants.cpp:1327:
    static Constant *llvm::ConstantArray::getImpl(ArrayType *, ArrayRef<Constant *>):
    Assertion `C->getType() == Ty->getElementType() && "Wrong type in array element initializer"' failed.
GlobalValue; and remove dependency between BitcodeWriter &
TransformUtils
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 3, 2025
…ents addrspace(0) (#164432)

By convention the AS of the elements of `llvm.compiler.used` &
`llvm.used` is 0. However, the AS of `CGM.Int8PtrTy` is not always 0.

This leaves some LLVM helpers
(`appendToUsed/appendToCompilerUsed/removeFromUsedLists`) unusable.

This patch makes the AS of the elements of these variables to be 0.

This PR is related to llvm/llvm-project#162660
@jmmartinez jmmartinez force-pushed the fix/workaround_for_llvm_compiler_used branch from b076e53 to 0195220 Compare November 3, 2025 15:39
@jmmartinez jmmartinez changed the title [LLVM] Maintain element type of @llvm.compiler.used/@llvm.used if it already exists [LLVM] Use @llvm.compiler.used/@llvm.used helpers in Clang and BitcodeWriter Nov 3, 2025
@jmmartinez
Copy link
Contributor Author

I've updated the PR after #164432 was merged.

The PR now only moves around the helpers and updates BitcodeWriter/CodeGenModule to use them.

@jmmartinez jmmartinez changed the title [LLVM] Use @llvm.compiler.used/@llvm.used helpers in Clang and BitcodeWriter [NFC][LLVM] Use @llvm.compiler.used/@llvm.used helpers in Clang and BitcodeWriter Nov 3, 2025
jmmartinez added a commit to ROCm/llvm-project that referenced this pull request Nov 3, 2025
…ace(0) (llvm#164432)

By convention the AS of the elements of `llvm.compiler.used` &
`llvm.used` is 0. However, the AS of `CGM.Int8PtrTy` is not always 0.

This leaves some LLVM helpers
(`appendToUsed/appendToCompilerUsed/removeFromUsedLists`) unusable.

This patch makes the AS of the elements of these variables to be 0.

This PR is related to llvm#162660
}

static int getListSize(Module &M, StringRef Name) {
auto *List = M.getGlobalVariable(Name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *List = M.getGlobalVariable(Name);
const GlobalVariable *List = M.getGlobalVariable(Name);

return Mod;
}

static int getListSize(Module &M, StringRef Name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static int getListSize(Module &M, StringRef Name) {
static size_t getListSize(Module &M, StringRef Name) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants