[SPARK-51562][SQL] Add the time() function #50479

senthh · 2025-04-01T18:21:59Z

What changes were proposed in this pull request?

This PR adds new function time() which should cast an expr to TIME. This function is a synonym for CAST(expr AS TIME)

Examples

SELECT time('12:25:13.45');
12:25:13.45
SELECT time(timestamp'2020-04-30 12:25:13.45');
12:25:13.45
SELECT time(123);
00:02:03

Why are the changes needed?

This function is a synonym for CAST(expr AS TIME)

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the related test suites:

$ build/sbt "test:testOnly *ExpressionInfoSuite"
$ build/sbt "test:testOnly *TimeExpressionsSuite"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z time.sql"

and also with spark-shell


scala> spark.sql("select time(123) as col").show()
+--------+
|     col|
+--------+
|00:02:03|
+--------+


scala> spark.sql("select time(timestamp'2020-04-30 00:25:13.45') as col").show()
+-----------+
|        col|
+-----------+
|00:25:13.45|
+-----------+


scala> spark.sql("select time('12:25:13.45') as col").show()
+-----------+
|        col|
+-----------+
|12:25:13.45|
+-----------+

Was this patch authored or co-authored using generative AI tooling?

No

senthh · 2025-04-01T18:24:18Z

@MaxGekk Could you please review this PR?

MaxGekk

This function is a synonym for CAST(expr AS TIME)

As you wrote in the PR description, the function is just a synonym of CAST. Let's just implement that precisely, and map time() to the Cast expression.

senthh · 2025-04-02T06:59:07Z

This function is a synonym for CAST(expr AS TIME)

As you wrote in the PR description, the function is just a synonym of CAST. Let's just implement that precisely, and map time() to the Cast expression.

Sure @MaxGekk I will change the implementation such that time will map to cast expression

### What changes were proposed in this pull request? In the PR, I propose to modify the parquet schema converter for the TIME data type, and convert Catalyst's `TimeType(n)` to parquet physical type `INT64` annotated by the logical type: ``` TimeType(isAdjustedToUTC = false, unit = MICROS) ``` in the parquet writer. ### Why are the changes needed? To fix a failure of non-vectorized reader. The code below portraits the issue: ```scala scala> spark.conf.set("spark.sql.parquet.enableVectorizedReader", false) scala> spark.read.parquet("/Users/maxim.gekk/tmp/time_parquet3").show() org.apache.spark.SparkRuntimeException: [PARQUET_CONVERSION_FAILURE.UNSUPPORTED] Unable to create a Parquet converter for the data type "TIME(6)" whose Parquet type is optional int64 col. Please modify the conversion making sure it is supported. SQLSTATE: 42846 at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotCreateParquetConverterForDataTypeError(QueryExecutionErrors.scala:2000) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the modified test: ``` $ build/sbt "test:testOnly *ParquetFileFormatV1Suite" $ build/sbt "test:testOnly *ParquetFileFormatV2Suite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50476 from MaxGekk/parquet-time-nested. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>

### What changes were proposed in this pull request? This PR adds DSv2 APIs for constraints as per SPIP [doc](https://docs.google.com/document/d/1EHjB4W1LjiXxsK_G7067j9pPX0y15LUF1Z5DlUPoPIo/). ### Why are the changes needed? These changes are the first step for constraints support in Spark. ### Does this PR introduce _any_ user-facing change? This PR adds new public interfaces that will be supported in the future. ### How was this patch tested? This PR comes with tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50253 from aokolnychyi/spark-51441. Authored-by: Anton Okolnychyi <aokolnychyi@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>

…ith_state ### What changes were proposed in this pull request? Fixed test failure in test_pandas_transform_with_state mentioned in apache#50349 (comment) ### Why are the changes needed? To unblock Spark 4.0.0 release ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test only change. ### Was this patch authored or co-authored using generative AI tooling? Closes apache#50482 from bogao007/test-fix-list-state. Authored-by: bogao007 <bo.gao@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

… thrown in `MLUtils.loadOperator` ### What changes were proposed in this pull request? Currently, if model loading fails in `MLUtils.loadOperator`, it throws an [InvocationTargetException](https://docs.oracle.com/javase/8/docs/api/java/lang/reflect/InvocationTargetException.html), which wraps an exception thrown by the method invoked. As a followup of apache#50098, this PR unwraps InvocationTargetException thrown in `MLUtils.loadOperator`, to make it throws the correct exception. ### Why are the changes needed? Existing error message is useless in debug. ### Does this PR introduce _any_ user-facing change? Yes. For example, ``` from pyspark.ml.classification import LogisticRegression LogisticRegression.load("invalid_location") ``` Before the PR, this codes throws ``` java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.sql.connect.ml.MLUtils$.loadOperator(MLUtils.scala:417) at org.apache.spark.sql.connect.ml.MLUtils$.loadTransformer(MLUtils.scala:438) at org.apache.spark.sql.connect.ml.MLHandler$.handleMlCommand(MLHandler.scala:247) ``` With this PR, it will throw the correct exception: ``` java.io.FileNotFoundException: File invalid_location/metadata does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:917) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1238) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:907) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:56) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:381) ``` ### How was this patch tested? New test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50478 from xi-db/ml-connect-unwrap-InvocationTargetException. Authored-by: Xi Lyu <xi.lyu@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

…ugin` configuration from connect related modules ### What changes were proposed in this pull request? This pr aims to remove redundant `build-helper-maven-plugin` configuration from connect related modules because the current codebase does not have specified directory. ### Why are the changes needed? Remove redundant `build-helper-maven-plugin` configuration from connect related modules ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50477 from LuciferYang/minor-client-jvm-plugin. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

…ahead of commit log correctly ### What changes were proposed in this pull request? When State Store Checkpoint format V2 is used, we always read back checkpoint ID from commit log, rather than when commit log matches offset log. ### Why are the changes needed? Right now, there is a bug of reading checkpoint ID from commit log when the query restarts. If the offset log is ahead of commit log, it doesn't read it, and the tasks won't have checkpoint ID to recover from and the query will fail. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add a unit test that will fail without the fix. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50480 from siying/checkpoint_v2_commit_read. Authored-by: Siying Dong <dong.sy@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

senthh · 2025-04-02T07:13:42Z

@MaxGekk Closing this PR as commits got Messed-up, will create new PR

[SPARK-51562][SQL] Add the time() function

0dd9e01

github-actions bot added the SQL label Apr 1, 2025

MaxGekk requested changes Apr 2, 2025

View reviewed changes

MaxGekk and others added 7 commits April 2, 2025 12:41

[SPARK-51562][SQL] Modified time() implementation to map CAST expression

31d8eb5

github-actions bot added ML STRUCTURED STREAMING BUILD PYTHON CONNECT labels Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51562][SQL] Add the time() function #50479

[SPARK-51562][SQL] Add the time() function #50479

senthh commented Apr 1, 2025 •

edited

Loading

senthh commented Apr 1, 2025

MaxGekk left a comment

senthh commented Apr 2, 2025

senthh commented Apr 2, 2025

[SPARK-51562][SQL] Add the time() function #50479

Are you sure you want to change the base?

[SPARK-51562][SQL] Add the time() function #50479

Conversation

senthh commented Apr 1, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

senthh commented Apr 1, 2025

MaxGekk left a comment

Choose a reason for hiding this comment

senthh commented Apr 2, 2025

senthh commented Apr 2, 2025

senthh commented Apr 1, 2025 •

edited

Loading