-
Notifications
You must be signed in to change notification settings - Fork 113
Some ETLs and views may be silently unioning data incorrectly #7461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
➤ Sean Rose commented: #6916 ( https://github.com/mozilla/bigquery-etl/pull/6916|smart-link ) will allow us to programmatically check for possibly incorrect unions. Here are the results of a test run of that code:
|
➤ Ben Wu commented: Did you run this on the generated sql? I would expect more of these errors in the generated sql |
➤ Sean Rose commented: Yes, I ran it on everything in the sql directory on the private-generated-sql ( https://github.com/mozilla/private-bigquery-etl/tree/private-generated-sql/sql ) branch. The only blind spot I’m aware of is there were 26 cases where I didn’t have the necessary permissions on the tables/views being selected from and got a 403 Forbidden error from BigQuery. |
While investigating an error relating to unioning ping data [~accountid:6047cd5cd7f56e0071965b2d] noticed that the
type
andenrollment
columns in theping_info.experiments[].value.extra
struct are in different orders in various pings, and when unioning such pings together as-is BigQuery won’t complain because their column types are compatible, which could result in data silently ending up in the wrong column in the union output for some pings.This has been manually worked around in a couple of cases recently (bigquery-etl#6878, bigquery-etl#6887), but there may be other such cases we don’t yet know about.
It’s possible the
Schema.generate_compatible_select_expression()
method (code) could be used to help with this situation (it’s currently used for unioning pings in the Glean app ping views).┆Issue is synchronized with this Jira Bug
The text was updated successfully, but these errors were encountered: