-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51695][SQL] Introduce Parser Changes for Table Constraints (CHECK, UNIQUE, PK, FK) #50496
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Show resolved
Hide resolved
|
||
case class UniqueConstraint( | ||
columns: Seq[String], | ||
override val name: String = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you consider to let name
as Option[String]
, so we can use None
instead of null
for the case without name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems more consistent (e.g., ConstraintCharacteristic
uses Option
to represent no enforced
and rely
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is by design. The implementation of withName
and withCharacteristic
will be simpler and consistent in this way.
s"${tableName}_chk_${base}_$rand" | ||
} | ||
|
||
override def sql: String = s"CONSTRAINT $name CHECK ($condition)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if name
is null, what this sql
is? CONSTRAINT null CHECK ...
? Is it valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the constraint name is not provided, all the constraints will generate names. See the method generateConstraintNameIfNeeded
for details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I saw there is generateConstraintNameIfNeeded
but I am not sure when it will be used. So if no name
is provided (i.e., name = null
), Spark will generate a name for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the name is always not null
import org.apache.spark.sql.catalyst.plans.logical.DropConstraint | ||
import org.apache.spark.sql.test.SharedSparkSession | ||
|
||
class AlterTableDropConstraintParseSuite extends AnalysisTest with SharedSparkSession { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for DropConstraint
. Is there another test suite for AddConstraint
too? Seems I don't find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in the other test suites, such as CheckConstraintParseSuite
/PrimaryKeyConstraintParseSuite
, etc
| PRIMARY KEY | ||
; | ||
|
||
uniqueConstraint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: It took me a while to find PK, it wasn't obvious it is part of uniqueSpec
. I would consider splitting them for clarity but also see that you probably wanted to cut on the number rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way should work. This is from ANSI SQL sytanx BTW.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala
Show resolved
Hide resolved
} | ||
} | ||
|
||
// Generate a constraint name based on the table name if the name is not specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the table is renamed later? It seems we can simplify this logic quite a bit without the need to include the table name. I understand it is done to distinguish constraints but I wonder if we can leverage the catalog, namespace, table identifier in that code rather than attempting to generate a unique enough name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do understand we replicate Postgres behavior here, but we don't guarantee there are no duplicates. Let's discuss a bit more on how we can implement things like pg_constraint
table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can leverage the catalog, namespace, table identifier in that code rather than attempting to generate a unique enough name here?
This is possible for PK & FK, but hard for check and unique constraints.
I am ok to include catalog and namespace. Probably in analyzer rule ResolveTableSpec
. However, it is a bit tricky to get the table name from the current V2 CreateTable
plan
case class CreateTable(
name: LogicalPlan,
columns: Seq[ColumnDefinition],
partitioning: Seq[Transform],
tableSpec: TableSpecBase,
ignoreIfExists: Boolean)
* @param ctx Parser context for error reporting | ||
* @return New TableConstraint instance | ||
*/ | ||
def withCharacteristic(c: ConstraintCharacteristic, ctx: ParserRuleContext): TableConstraint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is a good idea to depend on ParserRuleContext
here. It feels like TableConstraint
that is mixed into expressions also handles parsing aspects. Can we handle name generation and characteristics in the code that instantiates these constraints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for better error messages. If ParseException
can accept the current origin without specifying the start/stop Origin
, we can remove this parameter.
Since this is internal classes, I suggest we have a followup to improve it.
c: ConstraintCharacteristic, | ||
ctx: ParserRuleContext): TableConstraint = { | ||
if (c.enforced.contains(true)) { | ||
throw new ParseException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we have to parse the characteristics and validate them prior to constructing this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
case class UniqueConstraint( | ||
columns: Seq[String], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: If I remember correctly, PK can't reference nested columns. What about UNIQUE? Asking to confirm whether this should be a sequence of name parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think nested sub-columns(e.g. col_1.col_2
are supported in unique constraints. The syntax for unique is mostly the same as PK.
cc @srielau for confirmation
@aokolnychyi @viirya All the tests passed. Any further comments on this one?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no more comments. See if others still have some comments.
@srielau @aokolnychyi @viirya Thanks for the review. I am merging this one to master |
What changes were proposed in this pull request?
What changes are proposed in this PR?
This PR introduces parser support for ANSI SQL-compatible table constraints, including:
The updated parser supports these constraints in the following statements:
Key Features
Table Constraint Characteristics
Spark does not rely on the constraint for optimizations unless:
✅ CHECK Constraints
🔑 PRIMARY KEY Constraints
🔐 UNIQUE Constraints
🔗 FOREIGN KEY Constraints
⚙️ ALTER TABLE
Why are the changes needed?
Allow users to define, modify, and enforce table constraints in connectors that support them. This will facilitate data accuracy, ensure consistency, and enable performance optimizations in Spark.
Does this PR introduce any user-facing change?
Yes, introduce Parser Changes for Table Constraints (CHECK, UNIQUE, Primary Key, Foreign Key)
How was this patch tested?
New tests
Was this patch authored or co-authored using generative AI tooling?
No