-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-51562][SQL] Add the time() function #50479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@MaxGekk Could you please review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a synonym for CAST(expr AS TIME)
As you wrote in the PR description, the function is just a synonym of CAST. Let's just implement that precisely, and map time()
to the Cast
expression.
Sure @MaxGekk I will change the implementation such that time will map to cast expression |
### What changes were proposed in this pull request? In the PR, I propose to modify the parquet schema converter for the TIME data type, and convert Catalyst's `TimeType(n)` to parquet physical type `INT64` annotated by the logical type: ``` TimeType(isAdjustedToUTC = false, unit = MICROS) ``` in the parquet writer. ### Why are the changes needed? To fix a failure of non-vectorized reader. The code below portraits the issue: ```scala scala> spark.conf.set("spark.sql.parquet.enableVectorizedReader", false) scala> spark.read.parquet("/Users/maxim.gekk/tmp/time_parquet3").show() org.apache.spark.SparkRuntimeException: [PARQUET_CONVERSION_FAILURE.UNSUPPORTED] Unable to create a Parquet converter for the data type "TIME(6)" whose Parquet type is optional int64 col. Please modify the conversion making sure it is supported. SQLSTATE: 42846 at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotCreateParquetConverterForDataTypeError(QueryExecutionErrors.scala:2000) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the modified test: ``` $ build/sbt "test:testOnly *ParquetFileFormatV1Suite" $ build/sbt "test:testOnly *ParquetFileFormatV2Suite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50476 from MaxGekk/parquet-time-nested. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? This PR adds DSv2 APIs for constraints as per SPIP [doc](https://docs.google.com/document/d/1EHjB4W1LjiXxsK_G7067j9pPX0y15LUF1Z5DlUPoPIo/). ### Why are the changes needed? These changes are the first step for constraints support in Spark. ### Does this PR introduce _any_ user-facing change? This PR adds new public interfaces that will be supported in the future. ### How was this patch tested? This PR comes with tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50253 from aokolnychyi/spark-51441. Authored-by: Anton Okolnychyi <aokolnychyi@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
…ith_state ### What changes were proposed in this pull request? Fixed test failure in test_pandas_transform_with_state mentioned in apache#50349 (comment) ### Why are the changes needed? To unblock Spark 4.0.0 release ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test only change. ### Was this patch authored or co-authored using generative AI tooling? Closes apache#50482 from bogao007/test-fix-list-state. Authored-by: bogao007 <bo.gao@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
… thrown in `MLUtils.loadOperator` ### What changes were proposed in this pull request? Currently, if model loading fails in `MLUtils.loadOperator`, it throws an [InvocationTargetException](https://docs.oracle.com/javase/8/docs/api/java/lang/reflect/InvocationTargetException.html), which wraps an exception thrown by the method invoked. As a followup of apache#50098, this PR unwraps InvocationTargetException thrown in `MLUtils.loadOperator`, to make it throws the correct exception. ### Why are the changes needed? Existing error message is useless in debug. ### Does this PR introduce _any_ user-facing change? Yes. For example, ``` from pyspark.ml.classification import LogisticRegression LogisticRegression.load("invalid_location") ``` Before the PR, this codes throws ``` java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.sql.connect.ml.MLUtils$.loadOperator(MLUtils.scala:417) at org.apache.spark.sql.connect.ml.MLUtils$.loadTransformer(MLUtils.scala:438) at org.apache.spark.sql.connect.ml.MLHandler$.handleMlCommand(MLHandler.scala:247) ``` With this PR, it will throw the correct exception: ``` java.io.FileNotFoundException: File invalid_location/metadata does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:917) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1238) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:907) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:56) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:381) ``` ### How was this patch tested? New test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50478 from xi-db/ml-connect-unwrap-InvocationTargetException. Authored-by: Xi Lyu <xi.lyu@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
…ugin` configuration from connect related modules ### What changes were proposed in this pull request? This pr aims to remove redundant `build-helper-maven-plugin` configuration from connect related modules because the current codebase does not have specified directory. ### Why are the changes needed? Remove redundant `build-helper-maven-plugin` configuration from connect related modules ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50477 from LuciferYang/minor-client-jvm-plugin. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
…ahead of commit log correctly ### What changes were proposed in this pull request? When State Store Checkpoint format V2 is used, we always read back checkpoint ID from commit log, rather than when commit log matches offset log. ### Why are the changes needed? Right now, there is a bug of reading checkpoint ID from commit log when the query restarts. If the offset log is ahead of commit log, it doesn't read it, and the tasks won't have checkpoint ID to recover from and the query will fail. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add a unit test that will fail without the fix. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50480 from siying/checkpoint_v2_commit_read. Authored-by: Siying Dong <dong.sy@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
@MaxGekk Closing this PR as commits got Messed-up, will create new PR |
What changes were proposed in this pull request?
This PR adds new function time() which should cast an expr to TIME. This function is a synonym for CAST(expr AS TIME)
Examples
Why are the changes needed?
This function is a synonym for CAST(expr AS TIME)
Does this PR introduce any user-facing change?
No
How was this patch tested?
By running the related test suites:
and also with spark-shell
Was this patch authored or co-authored using generative AI tooling?
No