-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-51421][SQL] Get seconds of TIME datatype #50525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@MaxGekk Could you please review this PR? |
-- !query output | ||
1 | ||
java.lang.NoClassDefFoundError | ||
Could not initialize class org.apache.datasketches.memory.internal.BaseWritableMemoryImpl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the output is invalid ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon I have regenerated hll.sql.out
test("Second with TIME type") { | ||
// A few test times in microseconds since midnight: | ||
// time in microseconds -> expected second | ||
val testTimes = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add these tests to test the output based on precision?
val time = "13:10:15.987654321"
Seq(
0 -> 15.toDouble,
1 -> 15.9,
2 -> 15.98,
3 -> 15.987,
4 -> 15.9876,
5 -> 15.98765,
6 -> 15.987654).foreach { case (precision, expected) =>
checkEvaluation(
SecondsOfTime(Literal.create(time, TimeType(precision))),
BigDecimal(expected))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinodkc Sure Vinod. I will create a function for SecondsOfTimeWithFraction and also include tests for the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@senthh , I updated the above comment; please check
|
||
override def replacement: Expression = StaticInvoke( | ||
classOf[DateTimeUtils.type], | ||
IntegerType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be DecimalType with a precision and scale that matches the precision of TimeType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vinodkc ,
MaxxGek has responded in jira that we need to return second without fraction. So as per Maxx requirement this PR will work without Fraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also test failed with "java.lang.OutOfMemoryError: Java heap space" is not relevant to our changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon @vinodkc and @MaxGekk It will be helpful if you re-review this PR and provide your input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the type should be IntergerType
|
||
override def replacement: Expression = StaticInvoke( | ||
classOf[DateTimeUtils.type], | ||
IntegerType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the type should be IntergerType
Seq(child.dataType) | ||
) | ||
|
||
override def inputTypes: Seq[AbstractDataType] = Seq(TimeType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You allow any precision of TimeType
in SecondExpressionBuilder
, but here expectsonly TimeType(6)
. It should work for any valid precision. For now, see:
spark-sql (default)> select second(cast('12:00:01.123' as time(3)));
[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "second(CAST(12:00:01.123 AS TIME(3)))" due to data type mismatch: The first parameter requires the "TIME(6)" type, however "CAST(12:00:01.123 AS TIME(3))" has the type "TIME(3)". SQLSTATE: 42K09; line 1 pos 7;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we are able to accept any precision of TimeType.
Query:
spark.sql("select second(cast('12:00:01.123' as time(3)))").show(false)
output:
+-------------------------------------+
|second(CAST(12:00:01.123 AS TIME(3)))|
+-------------------------------------+
|1 |
+-------------------------------------+
Examples: | ||
> SELECT _FUNC_('2018-02-14 12:58:59'); | ||
59 | ||
> SELECT _FUNC_(TIME'13:59:59.999999'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to demonstrate different numbers for seconds and minutes, let's say:
> SELECT _FUNC_(TIME'13:10:59.999999');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk Yes good one. I have modified the usage section as below
> SELECT _FUNC_(TIME'13:25:59.999999');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk Could you please re-review the changes?
@senthh Could you double check PR's description and replace |
@HyukjinKwon Could you please re-review this PR? |
Thanks. +1, LGTM. Merging to master. |
What changes were proposed in this pull request?
This PR adds support for extracting the second component from TIME (TimeType) values in Spark SQL. For example:
Why are the changes needed?
Spark previously supported second() for only TIMESTAMP type values. TIME support was missing, leading to implicit casting attempt to TIMESTAMP, which was incorrect. This PR ensures that second(TIME'HH:MM:SS.######') behaves correctly without unnecessary type coercion.
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
By running new tests:
$ build/sbt "test:testOnly *TimeExpressionsSuite"
Was this patch authored or co-authored using generative AI tooling?
No