Skip to content

Commit 9b5db44

Browse files
Correct documentation for chunksize=True (#2087)
1 parent a3d9e9c commit 9b5db44

File tree

3 files changed

+20
-10
lines changed

3 files changed

+20
-10
lines changed

awswrangler/athena/_read.py

+8-4
Original file line numberDiff line numberDiff line change
@@ -807,9 +807,11 @@ def read_sql_query( # pylint: disable=too-many-arguments,too-many-locals
807807
808808
There are two batching strategies:
809809
810-
- If **chunksize=True**, a new DataFrame will be returned for each file in the query result.
810+
- If **chunksize=True**, depending on the size of the data, one or more data frames will be
811+
returned per each file in the query result.
812+
Unlike **chunksize=INTEGER**, rows from different files will not be mixed in the resulting data frames.
811813
812-
- If **chunksize=INTEGER**, awswrangler will iterate on the data by number of rows igual the received INTEGER.
814+
- If **chunksize=INTEGER**, awswrangler will iterate on the data by number of rows egual the received INTEGER.
813815
814816
`P.S.` `chunksize=True` is faster and uses less memory while `chunksize=INTEGER` is more precise
815817
in number of rows for each Dataframe.
@@ -1110,9 +1112,11 @@ def read_sql_table(
11101112
11111113
There are two batching strategies:
11121114
1113-
- If **chunksize=True**, a new DataFrame will be returned for each file in the query result.
1115+
- If **chunksize=True**, depending on the size of the data, one or more data frames will be
1116+
returned per each file in the query result.
1117+
Unlike **chunksize=INTEGER**, rows from different files will not be mixed in the resulting data frames.
11141118
1115-
- If **chunksize=INTEGER**, awswrangler will iterate on the data by number of rows igual the received INTEGER.
1119+
- If **chunksize=INTEGER**, awswrangler will iterate on the data by number of rows egual the received INTEGER.
11161120
11171121
`P.S.` `chunksize=True` is faster and uses less memory while `chunksize=INTEGER` is more precise
11181122
in number of rows for each Dataframe.

awswrangler/redshift.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -1122,7 +1122,9 @@ def unload(
11221122
11231123
There are two batching strategies on awswrangler:
11241124
1125-
- If **chunked=True**, a new DataFrame will be returned for each file in your path/dataset.
1125+
- If **chunked=True**, depending on the size of the data, one or more data frames will be
1126+
returned per each file in the path/dataset.
1127+
Unlike **chunked=INTEGER**, rows from different files will not be mixed in the resulting data frames.
11261128
11271129
- If **chunked=INTEGER**, awswrangler will iterate on the data by number of rows (equal to the received INTEGER).
11281130

awswrangler/s3/_read_parquet.py

+9-5
Original file line numberDiff line numberDiff line change
@@ -600,9 +600,11 @@ def read_parquet(
600600
601601
There are two batching strategies on awswrangler:
602602
603-
- If **chunked=True**, a new DataFrame will be returned for each file in your path/dataset.
603+
- If **chunked=True**, depending on the size of the data, one or more data frames will be
604+
returned per each file in the path/dataset.
605+
Unlike **chunked=INTEGER**, rows from different files will not be mixed in the resulting data frames.
604606
605-
- If **chunked=INTEGER**, awswrangler will iterate on the data by number of rows igual the received INTEGER.
607+
- If **chunked=INTEGER**, awswrangler will iterate on the data by number of rows egual the received INTEGER.
606608
607609
`P.S.` `chunked=True` if faster and uses less memory while `chunked=INTEGER` is more precise
608610
in number of rows for each Dataframe.
@@ -652,7 +654,7 @@ def read_parquet(
652654
chunked : Union[int, bool]
653655
If passed will split the data in a Iterable of DataFrames (Memory friendly).
654656
If `True` awswrangler iterates on the data by files in the most efficient way without guarantee of chunksize.
655-
If an `INTEGER` is passed awswrangler will iterate on the data by number of rows igual the received INTEGER.
657+
If an `INTEGER` is passed awswrangler will iterate on the data by number of rows egual the received INTEGER.
656658
dataset: bool
657659
If `True` read a parquet dataset instead of simple file(s) loading all the related partitions as columns.
658660
categories: Optional[List[str]], optional
@@ -830,10 +832,12 @@ def read_parquet_table(
830832
831833
There are two batching strategies on awswrangler:
832834
833-
- If **chunked=True**, a new DataFrame will be returned for each file in your path/dataset.
835+
- If **chunked=True**, depending on the size of the data, one or more data frames will be
836+
returned per each file in the path/dataset.
837+
Unlike **chunked=INTEGER**, rows from different files will not be mixed in the resulting data frames.
834838
835839
- If **chunked=INTEGER**, awswrangler will paginate through files slicing and concatenating
836-
to return DataFrames with the number of row igual the received INTEGER.
840+
to return DataFrames with the number of rows egual the received INTEGER.
837841
838842
`P.S.` `chunked=True` if faster and uses less memory while `chunked=INTEGER` is more precise
839843
in number of rows for each Dataframe.

0 commit comments

Comments
 (0)