@@ -708,6 +708,80 @@ class CountingFunction extends ProcessTableFunction<String> {
708
708
{{< /tab >}}
709
709
{{< /tabs >}}
710
710
711
+ ### Large State
712
+
713
+ Flink's state backends provide different types of state to efficiently handle large state.
714
+
715
+ Currently, PTFs support three types of state:
716
+
717
+ - ** Value state** : Represents a single value.
718
+ - ** List state** : Represents a list of values, supporting operations like appending, removing, and iterating.
719
+ - ** Map state** : Represents a map (key-value pair) for efficient lookups, modifications, and removal of individual entries.
720
+
721
+ By default, state entries in a PTF are represented as value state. This means that every state entry is fully read from
722
+ the state backend when the evaluation method is called, and the value is written back to the state backend once the
723
+ evaluation method finishes.
724
+
725
+ To optimize state access and avoid unnecessary (de)serialization, state entries can be declared as:
726
+ - ` org.apache.flink.table.api.dataview.ListView ` (for list state)
727
+ - ` org.apache.flink.table.api.dataview.MapView ` (for map state)
728
+
729
+ These provide direct views to the underlying Flink state backend.
730
+
731
+ For example, when using a ` MapView ` , accessing a value via ` MapView#get ` will only deserialize the value associated with
732
+ the specified key. This allows for efficient access to individual entries without needing to load the entire map. This
733
+ approach is particularly useful when the map does not fit entirely into memory.
734
+
735
+ {{< hint info >}}
736
+ State TTL is applied individually to each entry in a list or map, allowing for fine-grained expiration control over state
737
+ elements.
738
+ {{< /hint >}}
739
+
740
+ {{< tabs "1837eeed-3d13-455c-8e2f-5e164da9f844" >}}
741
+ {{< tab "Java" >}}
742
+ ``` java
743
+ // Function that uses a map view for storing a large map for an event history per user
744
+ class LargeHistoryFunction extends ProcessTableFunction<String > {
745
+ public void eval (
746
+ @StateHint MapView<String , Integer > largeMemory ,
747
+ @ArgumentHint (TABLE_AS_SET ) Row input
748
+ ) {
749
+ String eventId = input. getFieldAs(" eventId" );
750
+ Integer count = largeMemory. get(eventId);
751
+ if (count == null ) {
752
+ largeMemory. put(eventId, 1 );
753
+ } else {
754
+ if (count > 1000 ) {
755
+ collect(" Anomaly detected: " + eventId);
756
+ }
757
+ largeMemory. put(eventId, count + 1 );
758
+ }
759
+ }
760
+ }
761
+ ```
762
+ {{< /tab >}}
763
+ {{< /tabs >}}
764
+
765
+ Similar to other data types, reflection is used to extract the necessary type information. If reflection is not
766
+ feasible - such as when a ` Row ` object is involved - type hints can be provided. Use the ` ARRAY ` data type for list views
767
+ and the ` MAP ` data type for map views.
768
+
769
+ {{< tabs "1937eeed-3d13-455c-8e2f-5e164da9f844" >}}
770
+ {{< tab "Java" >}}
771
+ ``` java
772
+ // Function that uses a list view of rows
773
+ class LargeHistoryFunction extends ProcessTableFunction<String > {
774
+ public void eval (
775
+ @StateHint (type = @DataTypeHint (" ARRAY<ROW<s STRING, i INT>>" )) ListView<Row > largeMemory ,
776
+ @ArgumentHint (TABLE_AS_SET ) Row input
777
+ ) {
778
+ ...
779
+ }
780
+ }
781
+ ```
782
+ {{< /tab >}}
783
+ {{< /tabs >}}
784
+
711
785
### Efficiency and Design Principles
712
786
713
787
A stateful function also means that data layout and data retention should be well thought
0 commit comments