Skip to content

Improper handling of strings explicitly ending with null #1538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DanielELog opened this issue Mar 31, 2025 · 3 comments
Open

Improper handling of strings explicitly ending with null #1538

DanielELog opened this issue Mar 31, 2025 · 3 comments

Comments

@DanielELog
Copy link

Let there be a string: null\0.
When code containing it is compiled to CIR, it'll turn into something like this (emitting less important parts):
cir.global <...> = #cir.const_array<"null" : !cir.array<!s8i x 4>, trailing_zeros> : !cir.array<!s8i x 6> ....
Notice that const_array's type length is shorter than that of global it's contained within.
As this global is eventually used, get_global will state the type as !cir.array<!s8i x 6>.
Compiler doesn't throw any warnings or errors through all of this.

However, the issue rises when the compiled file is later attempted to be parsed by mlir tools, namely mlir::parseSourceFile. Instead of a reference to an object representing parsed code, it returns nullptr, and writes the next text to stderr:
error: 'cir.get_global' op result type pointee type ''!cir.array<!cir.int<s, 8> x 6>'' does not match type '!cir.array<!cir.int<s, 8> x 4>' of the global @.str.

Steps to reproduce:

  • Create a .c-file with the following code pasted in:
const char *funnyThing() {
  return "null\0";
}
  • Compile it to ClangIR:
    clang -S -Xclang -emit-cir-flat <just created file>.c

  • Create another program that tries to parse CIR files:

#include <clang/CIR/Dialect/IR/CIRDialect.h>
#include <mlir/Parser/Parser.h>

int main(int argc, char *argv[]) {
  mlir::MLIRContext context;
  mlir::DialectRegistry registry;
  registry.insert<cir::CIRDialect>();
  context.appendDialectRegistry(registry);
  context.allowUnregisteredDialects();

  mlir::ParserConfig parseConfig(&context);
  auto module =
      mlir::parseSourceFile<mlir::ModuleOp>("<compiled file>", parseConfig);
  if (module.get() == nullptr) {
    return -1;
  }
  return 0;
}
  • Compile and run it

Expected behaviour:

The program exits successfully.

Actual behaviour:

The program exits with a return code of -1 and the aforementioned error text message in console.

Additional notes:

  • The same happens if you initialise a char-array with a string literal of shorter length:
    char notNull[25] = "null";

  • Everything is parsed just fine if the null-char is placed in the middle of the string, as in: null\0null

  • This behaviour was observed on a slightly modified clangir repo for our own needs, but none of the original source code was explicitly changed. It is, though, behind upstream.

@shrikardongre
Copy link
Contributor

Hi would definitely like to give it a try !

@shrikardongre
Copy link
Contributor

Sorry I was not able to pick this up immediately due to health reasons .Will start working on it now .

@shrikardongre
Copy link
Contributor

@el-ev please feel free to take it up if you want to . I was busy with #1560
😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants