Skip to content

Remove atos for backtrace generation. #4930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

LunaTheFoxgirl
Copy link
Contributor

Using atos is not portable, given that:

  1. It's a developer tool that may or may not exist on a given system
  2. Said tool relies on being able to execute external applications at runtime; something the hardened runtime heavily restricts
  3. Apple does not support this usecase.

As an alternative a weak rt_dwarfSymbolicate symbol has been added, allowing dub libraries to implement this functionality instead so that the developer can opt-in to this non-portable behaviour.

Down the line I can investigate making a naïve dSYM lookup system, but without OS help -- that Apple will reject on AppStore review -- full coverage will be unlikely.

@rikkimax
Copy link
Contributor

Feedback that I gave on Discord, replicating here as requested.

The reasons for using dladdr and not using atos ext. should be documented as a comment in the code, so future people know why it was done this way, and when to trigger a replacement.

location.file = info.dli_fname[0..strlen(info.dli_fname)];
location.procedure = info.dli_sname[0..strlen(info.dli_sname)];
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more or less already handled earlier:

// https://code.woboq.org/userspace/glibc/debug/backtracesyms.c.html
// The logic that glibc's backtrace use is to check for for `dli_fname`,
// the file name, and error if not present, then check for `dli_sname`.
// In case `dli_fname` is present but not `dli_sname`, the address is
// printed related to the file. We just print the file.
static const(char)[] getFrameName (const(void)* ptr)
{
import core.sys.posix.dlfcn;
Dl_info info = void;
// Note: See the module documentation about `-L--export-dynamic`
if (dladdr(ptr, &info))
{
// Return symbol name if possible
if (info.dli_sname !is null && info.dli_sname[0] != '\0')
return info.dli_sname[0 .. strlen(info.dli_sname)];
// Fall back to file name
if (info.dli_fname !is null && info.dli_fname[0] != '\0')
return info.dli_fname[0 .. strlen(info.dli_fname)];
}
// `dladdr` failed
return "<ERROR: Unable to retrieve function name>";
}

Copy link
Contributor Author

@LunaTheFoxgirl LunaTheFoxgirl May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good to know, then this can just be a no-op stub. I am calling it for the day (spent 6 hours getting ldc2 compiling due to fighting with llvm and cmake for some time. Did get it working eventually)

Copy link
Member

@kinke kinke May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The hook's job is to populate the file and line fields of the Locations, if possible. So the name could reflect that (resolveSourceLocs() or so).

Edit: Well, that's the primary job at least. There's nothing standing in the way of improving/adding the symbol names too. E.g., to make it work without -L--export-dynamic too.

Copy link
Contributor Author

@LunaTheFoxgirl LunaTheFoxgirl May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've called it symbolicate since it may, on macOS, also end up handling symbolication of Objective-C which requires another kind of roundtrip not related to DWARF. So in general it's supposed to be a hook that pre-symbolicates stuff (including the name of the procedure if relevant). Then letting the default implementation handle anything left over.

@schveiguy
Copy link
Contributor

Needs a cleanup function in case you need to allocate and free.

Do we need to have a function that processes the entire array at once? Why not a function that works on a single stack frame?

@kinke
Copy link
Member

kinke commented May 15, 2025

Do we need to have a function that processes the entire array at once? Why not a function that works on a single stack frame?

That gives greatest flexibility and reduces the need for some state/cache somewhere. The implementation will most likely have to read files etc., so processing the whole batch at once makes sense.

Edit: See e.g. the default implementation, processing the debug_line section until all addresses have been resolved:

/**
* Resolve the addresses of `locations` using `debugLineSectionData`
*
* Runs the DWARF state machine on `debugLineSectionData`,
* assuming it represents a debugging program describing the addresses
* in a continous and increasing manner.
*
* After this function successfully completes, `locations` will contains
* file / lines informations.
*
* Note that the lifetime of the `Location` data is bound to the lifetime
* of `debugLineSectionData`.
*
* Params:
* debugLineSectionData = A DWARF program to feed the state machine
* locations = The locations to resolve
* baseAddress = The offset to apply to every address
*/
void resolveAddresses(const(ubyte)[] debugLineSectionData, Location[] locations, size_t baseAddress) @nogc nothrow
{
size_t numberOfLocationsFound = 0;
const(ubyte)[] dbg = debugLineSectionData;
while (dbg.length > 0)
{
debug(DwarfDebugMachine) printf("new debug program\n");
const lp = readLineNumberProgram(dbg);
LocationInfo lastLoc = LocationInfo(-1, -1);
const(void)* lastAddress;
debug(DwarfDebugMachine) printf("program:\n");
runStateMachine(lp,
(const(void)* address, LocationInfo locInfo, bool isEndSequence)
{
// adjust to ASLR offset
address += baseAddress;
debug (DwarfDebugMachine)
printf("-- offsetting %p to %p\n", address - baseAddress, address);
foreach (ref loc; locations)
{
// If loc.line != -1, then it has been set previously.
// Some implementations (eg. dmd) write an address to
// the debug data multiple times, but so far I have found
// that the first occurrence to be the correct one.
if (loc.line != -1)
continue;
// Can be called with either `locInfo` or `lastLoc`
void update(const ref LocationInfo match)
{
// File indices are 1-based for DWARF < 5
const fileIndex = match.file - (lp.dwarfVersion < 5 ? 1 : 0);
const sourceFile = lp.sourceFiles[fileIndex];
debug (DwarfDebugMachine)
{
printf("-- found for [%p]:\n", loc.address);
printf("-- file: %.*s\n",
cast(int) sourceFile.file.length, sourceFile.file.ptr);
printf("-- line: %d\n", match.line);
}
// DMD emits entries with FQN, but other implementations
// (e.g. LDC) make use of directories
// See https://github.com/dlang/druntime/pull/2945
if (sourceFile.dirIndex != 0)
loc.directory = lp.includeDirectories[sourceFile.dirIndex - 1];
loc.file = sourceFile.file;
loc.line = match.line;
numberOfLocationsFound++;
}
// The state machine will not contain an entry for each
// address, as consecutive addresses with the same file/line
// are merged together to save on space, so we need to
// check if our address is within two addresses we get
// called with.
//
// Specs (DWARF v4, Section 6.2, PDF p.109) says:
// "We shrink it with two techniques. First, we delete from
// the matrix each row whose file, line, source column and
// discriminator information is identical with that of its
// predecessors.
if (loc.address == address)
update(locInfo);
else if (lastAddress &&
loc.address > lastAddress && loc.address < address)
update(lastLoc);
}
if (isEndSequence)
{
lastAddress = null;
}
else
{
lastAddress = address;
lastLoc = locInfo;
}
return numberOfLocationsFound < locations.length;
}
);
if (numberOfLocationsFound == locations.length) return;
}
}

@JohanEngelen
Copy link
Member

I did not test this, but it looks like a regression to me (no more line info). I'm OK with making the atos thing opt-in, but I don't understand why it must be removed. Basically again we are left with backtraces without line info? Having to resort to a (yet non-existing) external implementations is very bad imo.

@LunaTheFoxgirl
Copy link
Contributor Author

dladdr has been removed and a cleanup hook added. Also moved them be run before and after the rest of the symbol resolution.

@LunaTheFoxgirl
Copy link
Contributor Author

LunaTheFoxgirl commented May 16, 2025

I did not test this, but it looks like a regression to me (no more line info). I'm OK with making the atos thing opt-in, but I don't understand why it must be removed. Basically again we are left with backtraces without line info? Having to resort to a (yet non-existing) external implementations is very bad imo.

As said in #4895, having good stack traces is important, but those stack traces should not come at the cost of it being practically infeasible to use D to release stuff on the AppStore. At this point D is already capable of being used for that, and at least in my business it's a part of my future strategy to release i(Pad)OS versions of my software.

While yes, we could make the runtime only compile in atos stuff in debug mode, that does also mean there's extra friction for developers using DLang to develop apps; and also that it's likely to be useless anyways on Darwin derived OSes that don't have atos.

@JohanEngelen
Copy link
Member

While yes, we could make the runtime only compile in atos stuff in debug mode, that does also mean there's extra friction for developers using DLang to develop apps;

What is that extra friction? Do people put debug builds in the app store?

and also that it's likely to be useless anyways on Darwin derived OSes that don't have atos.

This is not a valid argument to make things worse for macOS.

How about providing a separate small library with rt_dwarfSymbolicate overrides that call into atos, so that people can at least have the old (better ;)) behavior by adding -lsomelib to the compiler invoke? (something similar to the ldc_rt.dso.obj lib on windows)

@LunaTheFoxgirl
Copy link
Contributor Author

LunaTheFoxgirl commented May 16, 2025

While yes, we could make the runtime only compile in atos stuff in debug mode, that does also mean there's extra friction for developers using DLang to develop apps;

What is that extra friction? Do people put debug builds in the app store?

and also that it's likely to be useless anyways on Darwin derived OSes that don't have atos.

This is not a valid argument to make things worse for macOS.

How about providing a separate small library with rt_dwarfSymbolicate overrides that call into atos, so that people can at least have the old (better ;)) behavior by adding -lsomelib to the compiler invoke? (something similar to the ldc_rt.dso.obj lib on windows)

This is not only about app store, it also affects things such as making builds with the hardened runtime. Which can end up making your life a headache if you want to sign and notarize your apps as well (including for macOS). Over all, I don't agree that this is the correct approach here.

And yeah, some may want to put release builds with debug info, for example, onto the app store. and also may not know to pass in extra LDC specific flags to link to the non-debug version of druntime, etc. While those can be somewhat solved by making dub more intelligent on the matter, it's just overall a massive ugly hack and your application freezing for a couple of seconds while atos runs every time you generate a stack trace is a little excessive.

Also if you only care about macOS, just use lldb, it'll do the symbolication for you when something crashes. It comes with the dev tools.

@LunaTheFoxgirl
Copy link
Contributor Author

Also as a side-note, it's the default behaviour to generate debug symbols for releases with i(Pad)OS apps and the like, xcode does it for you on compile when using swift or Objective-C; so it is over-all expected that you will in fact have the debug symbols there even if it's technically in release mode. So if we for example, just detected whether the auto-generated dSYMs were there (which apple also forcefully does for you once you publish an app) instead of tying it to release-debug; then atos would still possibly be run. Additionally having strings in druntime relating to atos might be enough to trigger a review rejection.

@schveiguy
Copy link
Contributor

My opinion is that the easy experience for D should have stack traces with file/line. You need them when you are learning D, so that is what should be the default.

I'm OK with making D easier to release on the app store, and I'm hoping there's a way we can make it easy to do (it's OK to require some extra effort for this, with documentation). But it would be bad for the first experience with D to be unreadable stack traces.

@LunaTheFoxgirl
Copy link
Contributor Author

LunaTheFoxgirl commented May 16, 2025

My opinion is that the easy experience for D should have stack traces with file/line. You need them when you are learning D, so that is what should be the default.

I'm OK with making D easier to release on the app store, and I'm hoping there's a way we can make it easy to do (it's OK to require some extra effort for this, with documentation). But it would be bad for the first experience with D to be unreadable stack traces.

As I said in the other thread. Besides providing these hooks. Making a new utility that embeds dSYMs into the dwarf section might be the way to go. Means LDC avoids ugly hacks in its source tree that just makes it more difficult for existing D users to get their work done. My main goal right now is to ensure we get the ugly hacks rooted out that breaks production software for businesses like mine or Auburn Sounds who rely on D and LDC to make a living.

Once that's done it'll be easier to take a wholistic look at how best to approach the shortcomings removing these hacks create, through better tooling or writing custom implementations where needed.

@kinke
Copy link
Member

kinke commented May 16, 2025

My opinion is that the easy experience for D should have stack traces with file/line. You need them when you are learning D, so that is what should be the default.

Yeah my main concern is exactly that - the default experience on a dev box at least should be stack traces with resolved source Locs. Ideally with an acceptable runtime overhead etc., but that's secondary.

So a solution that depends on libatos (dynamically, i.e., copes with it not being available) would be fine to me as well - we depend on poorly documented stuff on Darwin already, that whole TLS disaster with macOS 15.4 was caused by Apple removing an API in macOS 10.15, and Jacob having to re-implement those finicky details in upstream druntime again. This stuff breaking was just a question of time I guess, and might happen again anytime.

@rikkimax
Copy link
Contributor

One option here might be to add a dedicated object file that is linked automatically for executables on Posix. Pass a switch and it won't do this.

That object file can contain the atos stuff, giving the desired default.

However, this is unnecessary if dlopen approach can ship.

@LunaTheFoxgirl
Copy link
Contributor Author

One option here might be to add a dedicated object file that is linked automatically for executables on Posix. Pass a switch and it won't do this.

That object file can contain the atos stuff, giving the desired default.

However, this is unnecessary if dlopen approach can ship.

I'm not sure it can, Apple also scans strings. They'd easily find out that you are trying to load atos or that you are at some point in the application's life cycle.

If you try to hide this kind of stuff from them, they might revoke your dev and signing access.

@LunaTheFoxgirl
Copy link
Contributor Author

On a second thought. Im currently down with a bit of a cold; I think I should have a harder think about what kind of tooling could be made to make things work out once I'm feeling better. So do hold off on merging this.

@dnadlinger
Copy link
Member

dnadlinger commented May 16, 2025

Note that compiler-rt itself, in the sanitizer runtime, also shells out to atos on macOS for symbolizing. Has this caused any concrete issues with D projects being submitted to the App Store? Without any receipts to that end, I'm not sure whether there is something to fix here (though of course the implementation might have issues; #4895).

Using private frameworks directy is another issue, but we are not doing that. Of course, a more elegant solution could always be nice, but might be quite a bit of engineering effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants