Skip to content

[SPARK-51421][SQL] Get seconds of TIME datatype #50525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

senthh
Copy link
Contributor

@senthh senthh commented Apr 6, 2025

What changes were proposed in this pull request?

This PR adds support for extracting the second component from TIME (TimeType) values in Spark SQL. For example:

scala> spark.sql("SELECT SECOND(TIME'13:59:45.99')").show()
+--------------------------+
|second(TIME '13:59:45.99')|
+--------------------------+
|                        45|
+--------------------------+


scala> spark.sql("select second(cast('12:00:01.123' as time(4)))").show(false)
+-------------------------------------+
|second(CAST(12:00:01.123 AS TIME(4)))|
+-------------------------------------+
|1                                    |
+-------------------------------------+

Why are the changes needed?

Spark previously supported second() for only TIMESTAMP type values. TIME support was missing, leading to implicit casting attempt to TIMESTAMP, which was incorrect. This PR ensures that second(TIME'HH:MM:SS.######') behaves correctly without unnecessary type coercion.

Does this PR introduce any user-facing change?

Yes

  • Before this PR, calling second(TIME'HH:MM:SS.######') resulted in a type mismatch error or an implicit cast attempt to TIMESTAMP, which was incorrect.
  • With this PR, second(TIME'HH:MM:SS.######') now works correctly for TIME values without implicit casting.
  • Users can now extract the second component from TIME values natively.

How was this patch tested?

By running new tests:

$ build/sbt "test:testOnly *TimeExpressionsSuite"

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 6, 2025
@senthh
Copy link
Contributor Author

senthh commented Apr 6, 2025

@MaxGekk Could you please review this PR?

-- !query output
1
java.lang.NoClassDefFoundError
Could not initialize class org.apache.datasketches.memory.internal.BaseWritableMemoryImpl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the output is invalid ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon I have regenerated hll.sql.out

test("Second with TIME type") {
// A few test times in microseconds since midnight:
// time in microseconds -> expected second
val testTimes = Seq(
Copy link
Contributor

@vinodkc vinodkc Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add these tests to test the output based on precision?

val time = "13:10:15.987654321"
Seq(
      0 -> 15.toDouble,
      1 -> 15.9,
      2 -> 15.98,
      3 -> 15.987,
      4 -> 15.9876,
      5 -> 15.98765,
      6 -> 15.987654).foreach { case (precision, expected) =>
      checkEvaluation(
        SecondsOfTime(Literal.create(time, TimeType(precision))),
        BigDecimal(expected))
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinodkc Sure Vinod. I will create a function for SecondsOfTimeWithFraction and also include tests for the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@senthh , I updated the above comment; please check


override def replacement: Expression = StaticInvoke(
classOf[DateTimeUtils.type],
IntegerType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be DecimalType with a precision and scale that matches the precision of TimeType?

Copy link
Contributor Author

@senthh senthh Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinodkc In jira, @MaxGekk has given just IntegerType seconds for an example. So I thought the requirement is just to handle IntegerType. I can modify the implementation so that it should handle both with precision and without.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vinodkc ,

MaxxGek has responded in jira that we need to return second without fraction. So as per Maxx requirement this PR will work without Fraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also test failed with "java.lang.OutOfMemoryError: Java heap space" is not relevant to our changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon @vinodkc and @MaxGekk It will be helpful if you re-review this PR and provide your input

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the type should be IntergerType

@senthh senthh requested review from HyukjinKwon and vinodkc April 8, 2025 07:28

override def replacement: Expression = StaticInvoke(
classOf[DateTimeUtils.type],
IntegerType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the type should be IntergerType

Seq(child.dataType)
)

override def inputTypes: Seq[AbstractDataType] = Seq(TimeType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You allow any precision of TimeType in SecondExpressionBuilder, but here expectsonly TimeType(6). It should work for any valid precision. For now, see:

spark-sql (default)> select second(cast('12:00:01.123' as time(3)));
[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "second(CAST(12:00:01.123 AS TIME(3)))" due to data type mismatch: The first parameter requires the "TIME(6)" type, however "CAST(12:00:01.123 AS TIME(3))" has the type "TIME(3)". SQLSTATE: 42K09; line 1 pos 7;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we are able to accept any precision of TimeType.

Query:

spark.sql("select second(cast('12:00:01.123' as time(3)))").show(false)

output:

+-------------------------------------+
|second(CAST(12:00:01.123 AS TIME(3)))|
+-------------------------------------+
|1                                    |
+-------------------------------------+

Examples:
> SELECT _FUNC_('2018-02-14 12:58:59');
59
> SELECT _FUNC_(TIME'13:59:59.999999');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to demonstrate different numbers for seconds and minutes, let's say:

      > SELECT _FUNC_(TIME'13:10:59.999999');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk Yes good one. I have modified the usage section as below

> SELECT _FUNC_(TIME'13:25:59.999999');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk Could you please re-review the changes?

@senthh senthh requested a review from MaxGekk April 8, 2025 13:37
@MaxGekk
Copy link
Member

MaxGekk commented Apr 9, 2025

@senthh Could you double check PR's description and replace hour by second, see
"
This PR adds support for extracting the hour component from TIME (TimeType) values in Spark SQL.
"

@senthh
Copy link
Contributor Author

senthh commented Apr 9, 2025

@senthh Could you double check PR's description and replace hour by second, see " This PR adds support for extracting the hour component from TIME (TimeType) values in Spark SQL. "

@MaxGekk Yes Maxx, I have corrected PR description

@senthh
Copy link
Contributor Author

senthh commented Apr 9, 2025

@HyukjinKwon Could you please re-review this PR?

@MaxGekk
Copy link
Member

MaxGekk commented Apr 9, 2025

@MaxGekk Yes Maxx, I have corrected PR description

Thanks.

+1, LGTM. Merging to master.
Thank you, @senthh and @vinodkc @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in 5602fbf Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants