[SPARK-51695][SQL] Introduce Parser Changes for Table Constraints (CHECK, UNIQUE, PK, FK) #50496

gengliangwang · 2025-04-02T18:26:45Z

What changes were proposed in this pull request?

What changes are proposed in this PR?

This PR introduces parser support for ANSI SQL-compatible table constraints, including:

CHECK
UNIQUE
PRIMARY KEY
FOREIGN KEY

The updated parser supports these constraints in the following statements:

CREATE TABLE
REPLACE TABLE
ALTER TABLE ... ADD CONSTRAINT

Key Features

Constraints can be named or unnamed.
Constraints can appear as:
- Column constraints, at the end of a column definition.
- Table constraints, declared among table elements (in any order).
- ALTER TABLE … ADD CONSTRAINT statements.
ALTER TABLE … DROP CONSTRAINT

Table Constraint Characteristics

ENFORCED: The constraint is validated by the Spark engine during write operations. If data violates the constraint, Spark will raise an error.
NOT ENFORCED: Spark does not validate the constraint during data writes; it’s treated as metadata only.
RELY: A user-provided hint that the constraint is known to be valid. This allows Spark to apply query optimizations based on the assumption that the constraint holds.
NORELY: The default.

Spark does not rely on the constraint for optimizations unless:

It is explicitly marked as RELY, or
It is ENFORCED and has been validated by Spark.

✅ CHECK Constraints

-- Column-level, unnamed
CREATE TABLE t1 (
  age INT CHECK (age > 0)
);

-- Column-level, named
CREATE TABLE t2 (
  age INT CONSTRAINT ck_age CHECK (age > 0)
);

-- Table-level, unnamed
CREATE TABLE t3 (
  age INT,
  CHECK (age > 0)
);

-- Table-level, named
CREATE TABLE t4 (
  age INT,
  CONSTRAINT ck_age CHECK (age > 0)
);

🔑 PRIMARY KEY Constraints

-- Column-level, unnamed
CREATE TABLE t1 (
  id INT PRIMARY KEY
);

-- Column-level, named, RELY
CREATE TABLE t2 (
  id INT CONSTRAINT pk_id PRIMARY KEY RELY
);

-- Table-level, unnamed, NORELY
CREATE TABLE t3 (
  id INT,
  PRIMARY KEY (id) NORELY
);

-- Table-level, named
CREATE TABLE t4 (
  id INT,
  CONSTRAINT pk_id PRIMARY KEY (id)
);

🔐 UNIQUE Constraints

-- Column-level, unnamed
CREATE TABLE t1 (
  email STRING UNIQUE
);

-- Column-level, named
CREATE TABLE t2 (
  email STRING CONSTRAINT uq_email UNIQUE
);

-- Table-level, unnamed
CREATE TABLE t3 (
  email STRING,
  UNIQUE (email)
);

-- Table-level, named
CREATE TABLE t4 (
  email STRING,
  CONSTRAINT uq_email UNIQUE (email)
);

🔗 FOREIGN KEY Constraints

CREATE TABLE dept (id INT PRIMARY KEY);

-- Column-level, unnamed
CREATE TABLE emp1 (
  dept_id INT REFERENCES dept(id)
);

-- Column-level, named
CREATE TABLE emp2 (
  dept_id INT CONSTRAINT fk_dept_col REFERENCES dept(id)
);

-- Table-level, unnamed
CREATE TABLE emp3 (
  dept_id INT,
  FOREIGN KEY (dept_id) REFERENCES dept(id)
);

-- Table-level, named
CREATE TABLE emp4 (
  dept_id INT,
  CONSTRAINT fk_dept_tbl FOREIGN KEY (dept_id) REFERENCES dept(id)
);

⚙️ ALTER TABLE

-- Add named constraint
ALTER TABLE t ADD CONSTRAINT ck_positive CHECK (amount > 0);

-- Add unnamed constraint
ALTER TABLE t ADD UNIQUE (email);
ALTER TABLE t ADD PRIMARY KEY (id);
ALTER TABLE t ADD FOREIGN KEY (dept_id) REFERENCES dept(id);

-- Drop named constraint
ALTER TABLE t DROP CONSTRAINT ck_positive;

Why are the changes needed?

Allow users to define, modify, and enforce table constraints in connectors that support them. This will facilitate data accuracy, ensure consistency, and enable performance optimizations in Spark.

Does this PR introduce any user-facing change?

Yes, introduce Parser Changes for Table Constraints (CHECK, UNIQUE, Primary Key, Foreign Key)

How was this patch tested?

New tests

Was this patch authored or co-authored using generative AI tooling?

No

gengliangwang · 2025-04-02T18:28:18Z

cc @aokolnychyi @srielau

common/utils/src/main/resources/error/error-conditions.json

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4

viirya · 2025-04-02T20:33:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

+
+case class UniqueConstraint(
+    columns: Seq[String],
+    override val name: String = null,


Do you consider to let name as Option[String], so we can use None instead of null for the case without name?

It seems more consistent (e.g., ConstraintCharacteristic uses Option to represent no enforced and rely).

This is by design. The implementation of withName and withCharacteristic will be simpler and consistent in this way.

viirya · 2025-04-02T21:31:59Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

+    s"${tableName}_chk_${base}_$rand"
+  }
+
+  override def sql: String = s"CONSTRAINT $name CHECK ($condition)"


Hmm, if name is null, what this sql is? CONSTRAINT null CHECK ...? Is it valid?

If the constraint name is not provided, all the constraints will generate names. See the method generateConstraintNameIfNeeded for details

Yea, I saw there is generateConstraintNameIfNeeded but I am not sure when it will be used. So if no name is provided (i.e., name = null), Spark will generate a name for it.

Yes, the name is always not null

viirya · 2025-04-02T23:52:29Z

...c/test/scala/org/apache/spark/sql/execution/command/AlterTableDropConstraintParseSuite.scala

+import org.apache.spark.sql.catalyst.plans.logical.DropConstraint
+import org.apache.spark.sql.test.SharedSparkSession
+
+class AlterTableDropConstraintParseSuite extends AnalysisTest with SharedSparkSession {


This is for DropConstraint. Is there another test suite for AddConstraint too? Seems I don't find it.

It is in the other test suites, such as CheckConstraintParseSuite/PrimaryKeyConstraintParseSuite, etc

common/utils/src/main/resources/error/error-conditions.json

aokolnychyi · 2025-04-03T16:26:36Z

sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4

+    | PRIMARY KEY
+    ;
+
+uniqueConstraint


Minor: It took me a while to find PK, it wasn't obvious it is part of uniqueSpec. I would consider splitting them for clarity but also see that you probably wanted to cut on the number rules.

Either way should work. This is from ANSI SQL sytanx BTW.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

aokolnychyi · 2025-04-03T16:42:45Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

+    }
+  }
+
+  // Generate a constraint name based on the table name if the name is not specified


What if the table is renamed later? It seems we can simplify this logic quite a bit without the need to include the table name. I understand it is done to distinguish constraints but I wonder if we can leverage the catalog, namespace, table identifier in that code rather than attempting to generate a unique enough name here?

I do understand we replicate Postgres behavior here, but we don't guarantee there are no duplicates. Let's discuss a bit more on how we can implement things like pg_constraint table.

I wonder if we can leverage the catalog, namespace, table identifier in that code rather than attempting to generate a unique enough name here?

This is possible for PK & FK, but hard for check and unique constraints.

I am ok to include catalog and namespace. Probably in analyzer rule ResolveTableSpec. However, it is a bit tricky to get the table name from the current V2 CreateTable plan

case class CreateTable( name: LogicalPlan, columns: Seq[ColumnDefinition], partitioning: Seq[Transform], tableSpec: TableSpecBase, ignoreIfExists: Boolean)

aokolnychyi · 2025-04-03T16:49:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

+   * @param ctx Parser context for error reporting
+   * @return New TableConstraint instance
+   */
+  def withCharacteristic(c: ConstraintCharacteristic, ctx: ParserRuleContext): TableConstraint


I wonder if it is a good idea to depend on ParserRuleContext here. It feels like TableConstraint that is mixed into expressions also handles parsing aspects. Can we handle name generation and characteristics in the code that instantiates these constraints?

This is for better error messages. If ParseException can accept the current origin without specifying the start/stop Origin, we can remove this parameter.
Since this is internal classes, I suggest we have a followup to improve it.

aokolnychyi · 2025-04-03T17:04:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

+    c: ConstraintCharacteristic,
+    ctx: ParserRuleContext): TableConstraint = {
+    if (c.enforced.contains(true)) {
+      throw new ParseException(


I feel we have to parse the characteristics and validate them prior to constructing this class.

The idea is to parse characteristics and constraints separately

Otherwise, we need to write duplicate syntaxes for each constraint, and have the characteristics check in the AstBuilder(which has over 6k lines of code)

aokolnychyi · 2025-04-03T17:05:54Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

+}
+
+case class UniqueConstraint(
+    columns: Seq[String],


Question: If I remember correctly, PK can't reference nested columns. What about UNIQUE? Asking to confirm whether this should be a sequence of name parts.

I don't think nested sub-columns(e.g. col_1.col_2 are supported in unique constraints. The syntax for unique is mostly the same as PK.
cc @srielau for confirmation

gengliangwang · 2025-04-04T20:53:49Z

@aokolnychyi @viirya All the tests passed. Any further comments on this one?
I will have follow-ups to revisit:

Remove parse context from withCharacteristic method. Currently it is blocked by [SPARK-51722][SQL] Remove "stop" origin from ParseException #50518

viirya

I have no more comments. See if others still have some comments.

gengliangwang · 2025-04-08T21:05:08Z

@srielau @aokolnychyi @viirya Thanks for the review. I am merging this one to master

gengliangwang added 10 commits April 1, 2025 17:24

parser changes

b0c4ae0

Merge remote-tracking branch 'upstream/master' into parserChanges

63a925e

remove unneeded code

9b35779

remove more unneeded code

7035009

refactor constraints

262ff59

remove changes under sql/api/src/main/gen/

d2455ae

remove unneeded error classes

a216579

revert Table.java

e4f6375

more replace table test cases

b506a3d

revise code comment

365646b

github-actions bot added SQL CORE labels Apr 2, 2025

gengliangwang requested review from viirya, cloud-fan, dongjoon-hyun and beliefer April 2, 2025 18:27

srielau reviewed Apr 2, 2025

View reviewed changes

common/utils/src/main/resources/error/error-conditions.json Outdated Show resolved Hide resolved

srielau approved these changes Apr 2, 2025

View reviewed changes

viirya reviewed Apr 2, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala Show resolved Hide resolved

viirya reviewed Apr 2, 2025

View reviewed changes

sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 Show resolved Hide resolved

viirya reviewed Apr 2, 2025

View reviewed changes

fix keywords

6f3fbab

github-actions bot added the DOCS label Apr 2, 2025

gengliangwang added 3 commits April 2, 2025 14:00

fix NPE

8327bd9

Merge remote-tracking branch 'upstream/master' into parserChanges

b37ae43

addjust position of name in fk

7c78d57

viirya reviewed Apr 2, 2025

View reviewed changes

fix keywords.sql tests

09c4e44

aokolnychyi reviewed Apr 3, 2025

View reviewed changes

common/utils/src/main/resources/error/error-conditions.json Outdated Show resolved Hide resolved

aokolnychyi reviewed Apr 3, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala Show resolved Hide resolved

aokolnychyi reviewed Apr 3, 2025

View reviewed changes

gengliangwang added 4 commits April 3, 2025 11:10

fix ThriftServerWithSparkContextSuite

830febf

fix CTAS with parameter

fc1dea9

enhance MULTIPLE_PRIMARY_KEYS error

2ff64ce

ehnace generated names for check constraints

1d4c97b

gengliangwang added 3 commits April 4, 2025 14:07

use random suffix only for simplicity

c2b6012

update unique constraint name

4811afe

use 7-char hex string

2e4a0fc

viirya approved these changes Apr 7, 2025

View reviewed changes

add random suffix to fk name

f62fae5

gengliangwang closed this in cba9454 Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51695][SQL] Introduce Parser Changes for Table Constraints (CHECK, UNIQUE, PK, FK) #50496

[SPARK-51695][SQL] Introduce Parser Changes for Table Constraints (CHECK, UNIQUE, PK, FK) #50496

gengliangwang commented Apr 2, 2025 •

edited

Loading

gengliangwang commented Apr 2, 2025

viirya Apr 2, 2025

viirya Apr 2, 2025 •

edited

Loading

gengliangwang Apr 2, 2025

viirya Apr 2, 2025 •

edited

Loading

gengliangwang Apr 2, 2025

viirya Apr 2, 2025

gengliangwang Apr 2, 2025

viirya Apr 2, 2025

gengliangwang Apr 3, 2025

aokolnychyi Apr 3, 2025

gengliangwang Apr 3, 2025

aokolnychyi Apr 3, 2025 •

edited

Loading

aokolnychyi Apr 3, 2025

gengliangwang Apr 3, 2025

aokolnychyi Apr 3, 2025 •

edited

Loading

gengliangwang Apr 3, 2025

aokolnychyi Apr 3, 2025

gengliangwang Apr 3, 2025 •

edited

Loading

aokolnychyi Apr 3, 2025

gengliangwang Apr 3, 2025

gengliangwang commented Apr 4, 2025 •

edited

Loading

viirya left a comment

gengliangwang commented Apr 8, 2025

[SPARK-51695][SQL] Introduce Parser Changes for Table Constraints (CHECK, UNIQUE, PK, FK) #50496

[SPARK-51695][SQL] Introduce Parser Changes for Table Constraints (CHECK, UNIQUE, PK, FK) #50496

Conversation

gengliangwang commented Apr 2, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

gengliangwang commented Apr 2, 2025

Choose a reason for hiding this comment

viirya Apr 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya Apr 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Apr 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Apr 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang Apr 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang commented Apr 4, 2025 • edited Loading

viirya left a comment

Choose a reason for hiding this comment

gengliangwang commented Apr 8, 2025

gengliangwang commented Apr 2, 2025 •

edited

Loading

viirya Apr 2, 2025 •

edited

Loading

viirya Apr 2, 2025 •

edited

Loading

aokolnychyi Apr 3, 2025 •

edited

Loading

aokolnychyi Apr 3, 2025 •

edited

Loading

gengliangwang Apr 3, 2025 •

edited

Loading

gengliangwang commented Apr 4, 2025 •

edited

Loading