Skip to content

[SPARK-51562][SQL] Add the time() function #50479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

senthh
Copy link
Contributor

@senthh senthh commented Apr 1, 2025

What changes were proposed in this pull request?

This PR adds new function time() which should cast an expr to TIME. This function is a synonym for CAST(expr AS TIME)

Examples

SELECT time('12:25:13.45');
12:25:13.45
SELECT time(timestamp'2020-04-30 12:25:13.45');
12:25:13.45
SELECT time(123);
00:02:03

Why are the changes needed?

This function is a synonym for CAST(expr AS TIME)

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the related test suites:

$ build/sbt "test:testOnly *ExpressionInfoSuite"
$ build/sbt "test:testOnly *TimeExpressionsSuite"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z time.sql"

and also with spark-shell


scala> spark.sql("select time(123) as col").show()
+--------+
|     col|
+--------+
|00:02:03|
+--------+


scala> spark.sql("select time(timestamp'2020-04-30 00:25:13.45') as col").show()
+-----------+
|        col|
+-----------+
|00:25:13.45|
+-----------+


scala> spark.sql("select time('12:25:13.45') as col").show()
+-----------+
|        col|
+-----------+
|12:25:13.45|
+-----------+

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 1, 2025
@senthh
Copy link
Contributor Author

senthh commented Apr 1, 2025

@MaxGekk Could you please review this PR?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a synonym for CAST(expr AS TIME)

As you wrote in the PR description, the function is just a synonym of CAST. Let's just implement that precisely, and map time() to the Cast expression.

@senthh
Copy link
Contributor Author

senthh commented Apr 2, 2025

This function is a synonym for CAST(expr AS TIME)

As you wrote in the PR description, the function is just a synonym of CAST. Let's just implement that precisely, and map time() to the Cast expression.

Sure @MaxGekk I will change the implementation such that time will map to cast expression

MaxGekk and others added 7 commits April 2, 2025 12:41
### What changes were proposed in this pull request?
In the PR, I propose to modify the parquet schema converter for the TIME data type, and convert Catalyst's `TimeType(n)` to parquet physical type `INT64` annotated by the logical type:
```
TimeType(isAdjustedToUTC = false, unit = MICROS)
```
in the parquet writer.

### Why are the changes needed?
To fix a failure of non-vectorized reader. The code below portraits the issue:
```scala
scala> spark.conf.set("spark.sql.parquet.enableVectorizedReader", false)
scala> spark.read.parquet("/Users/maxim.gekk/tmp/time_parquet3").show()

org.apache.spark.SparkRuntimeException: [PARQUET_CONVERSION_FAILURE.UNSUPPORTED] Unable to create a Parquet converter for the data type "TIME(6)" whose Parquet type is optional int64 col. Please modify the conversion making sure it is supported. SQLSTATE: 42846
  at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotCreateParquetConverterForDataTypeError(QueryExecutionErrors.scala:2000)
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test:
```
$ build/sbt "test:testOnly *ParquetFileFormatV1Suite"
$ build/sbt "test:testOnly *ParquetFileFormatV2Suite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50476 from MaxGekk/parquet-time-nested.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request?

This PR adds DSv2 APIs for constraints as per SPIP [doc](https://docs.google.com/document/d/1EHjB4W1LjiXxsK_G7067j9pPX0y15LUF1Z5DlUPoPIo/).

### Why are the changes needed?

These changes are the first step for constraints support in Spark.

### Does this PR introduce _any_ user-facing change?

This PR adds new public interfaces that will be supported in the future.

### How was this patch tested?

This PR comes with tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50253 from aokolnychyi/spark-51441.

Authored-by: Anton Okolnychyi <aokolnychyi@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
…ith_state

### What changes were proposed in this pull request?

Fixed test failure in test_pandas_transform_with_state mentioned in apache#50349 (comment)

### Why are the changes needed?

To unblock Spark 4.0.0 release

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change.

### Was this patch authored or co-authored using generative AI tooling?

Closes apache#50482 from bogao007/test-fix-list-state.

Authored-by: bogao007 <bo.gao@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
… thrown in `MLUtils.loadOperator`

### What changes were proposed in this pull request?

Currently, if model loading fails in `MLUtils.loadOperator`, it throws an [InvocationTargetException](https://docs.oracle.com/javase/8/docs/api/java/lang/reflect/InvocationTargetException.html), which wraps an exception thrown by the method invoked.

As a followup of apache#50098, this PR unwraps InvocationTargetException thrown in `MLUtils.loadOperator`, to make it throws the correct exception.

### Why are the changes needed?

Existing error message is useless in debug.

### Does this PR introduce _any_ user-facing change?

Yes. For example,

```
from pyspark.ml.classification import LogisticRegression
LogisticRegression.load("invalid_location")
```

Before the PR, this codes throws
```
java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.apache.spark.sql.connect.ml.MLUtils$.loadOperator(MLUtils.scala:417)
	at org.apache.spark.sql.connect.ml.MLUtils$.loadTransformer(MLUtils.scala:438)
	at org.apache.spark.sql.connect.ml.MLHandler$.handleMlCommand(MLHandler.scala:247)
```

With this PR, it will throw the correct exception:
```
java.io.FileNotFoundException: File invalid_location/metadata does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:917)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1238)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:907)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:56)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:381)
```

### How was this patch tested?

New test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50478 from xi-db/ml-connect-unwrap-InvocationTargetException.

Authored-by: Xi Lyu <xi.lyu@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
…ugin` configuration from connect related modules

### What changes were proposed in this pull request?
This pr aims to remove redundant `build-helper-maven-plugin` configuration from connect related modules because the current codebase does not have specified directory.

### Why are the changes needed?
Remove redundant `build-helper-maven-plugin` configuration from connect related modules

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50477 from LuciferYang/minor-client-jvm-plugin.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
…ahead of commit log correctly

### What changes were proposed in this pull request?
When State Store Checkpoint format V2 is used, we always read back checkpoint ID from commit log, rather than when commit log matches offset log.

### Why are the changes needed?
Right now, there is a bug of reading checkpoint ID from commit log when the query restarts. If the offset log is ahead of commit log, it doesn't read it, and the tasks won't have checkpoint ID to recover from and the query will fail.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Add a unit test that will fail without the fix.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50480 from siying/checkpoint_v2_commit_read.

Authored-by: Siying Dong <dong.sy@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
@senthh
Copy link
Contributor Author

senthh commented Apr 2, 2025

@MaxGekk Closing this PR as commits got Messed-up, will create new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants