@@ -600,9 +600,11 @@ def read_parquet(
600
600
601
601
There are two batching strategies on awswrangler:
602
602
603
- - If **chunked=True**, a new DataFrame will be returned for each file in your path/dataset.
603
+ - If **chunked=True**, depending on the size of the data, one or more data frames will be
604
+ returned per each file in the path/dataset.
605
+ Unlike **chunked=INTEGER**, rows from different files will not be mixed in the resulting data frames.
604
606
605
- - If **chunked=INTEGER**, awswrangler will iterate on the data by number of rows igual the received INTEGER.
607
+ - If **chunked=INTEGER**, awswrangler will iterate on the data by number of rows egual the received INTEGER.
606
608
607
609
`P.S.` `chunked=True` if faster and uses less memory while `chunked=INTEGER` is more precise
608
610
in number of rows for each Dataframe.
@@ -652,7 +654,7 @@ def read_parquet(
652
654
chunked : Union[int, bool]
653
655
If passed will split the data in a Iterable of DataFrames (Memory friendly).
654
656
If `True` awswrangler iterates on the data by files in the most efficient way without guarantee of chunksize.
655
- If an `INTEGER` is passed awswrangler will iterate on the data by number of rows igual the received INTEGER.
657
+ If an `INTEGER` is passed awswrangler will iterate on the data by number of rows egual the received INTEGER.
656
658
dataset: bool
657
659
If `True` read a parquet dataset instead of simple file(s) loading all the related partitions as columns.
658
660
categories: Optional[List[str]], optional
@@ -830,10 +832,12 @@ def read_parquet_table(
830
832
831
833
There are two batching strategies on awswrangler:
832
834
833
- - If **chunked=True**, a new DataFrame will be returned for each file in your path/dataset.
835
+ - If **chunked=True**, depending on the size of the data, one or more data frames will be
836
+ returned per each file in the path/dataset.
837
+ Unlike **chunked=INTEGER**, rows from different files will not be mixed in the resulting data frames.
834
838
835
839
- If **chunked=INTEGER**, awswrangler will paginate through files slicing and concatenating
836
- to return DataFrames with the number of row igual the received INTEGER.
840
+ to return DataFrames with the number of rows egual the received INTEGER.
837
841
838
842
`P.S.` `chunked=True` if faster and uses less memory while `chunked=INTEGER` is more precise
839
843
in number of rows for each Dataframe.
0 commit comments