@@ -28,6 +28,7 @@ tensorflow 2's ``tf.data.Dataset``.
28
28
It provides a simple solution to oversampling / stratification, weighted
29
29
sampling, and finally converting to a ``torch.utils.data.DataLoader ``.
30
30
31
+
31
32
Install
32
33
=======
33
34
@@ -41,6 +42,7 @@ Or, for the old-timers:
41
42
42
43
pip install pytorch-datastream
43
44
45
+
44
46
Usage
45
47
=====
46
48
@@ -72,6 +74,43 @@ a more extensive list on API and usage.
72
74
.state_dict
73
75
.load_state_dict
74
76
77
+
78
+ Simple image dataset example
79
+ ----------------------------
80
+ Here's a basic example of loading images from a directory:
81
+
82
+ .. code-block :: python
83
+
84
+ from datastream import Dataset
85
+ from pathlib import Path
86
+ from PIL import Image
87
+
88
+ # Assuming images are in a directory structure like:
89
+ # images/
90
+ # class1/
91
+ # image1.jpg
92
+ # image2.jpg
93
+ # class2/
94
+ # image3.jpg
95
+ # image4.jpg
96
+
97
+ image_dir = Path(" images" )
98
+ image_paths = list (image_dir.glob(" **/*.jpg" ))
99
+
100
+ dataset = (
101
+ Dataset.from_paths(image_paths, pattern = r " . * /( ?P<class_name> \w + ) /( ?P<image_name> \w + ) . jpg" )
102
+ .map(lambda row : dict (
103
+ image = Image.open(row[" path" ]),
104
+ class_name = row[" class_name" ],
105
+ image_name = row[" image_name" ],
106
+ ))
107
+ )
108
+
109
+ # Access an item from the dataset
110
+ first_item = dataset[0 ]
111
+ print (f " Class: { first_item[' class_name' ]} , Image name: { first_item[' image_name' ]} " )
112
+
113
+
75
114
Merge / stratify / oversample datastreams
76
115
-----------------------------------------
77
116
The fruit datastreams given below repeatedly yields the string of its fruit
87
126
>> > next (iter (datastream.data_loader(batch_size = 8 )))
88
127
[' apple' , ' apple' , ' pear' , ' banana' , ' apple' , ' apple' , ' pear' , ' banana' ]
89
128
129
+
90
130
Zip independently sampled datastreams
91
131
-------------------------------------
92
132
The fruit datastreams given below repeatedly yields the string of its fruit
@@ -101,12 +141,8 @@ type.
101
141
>> > next (iter (datastream.data_loader(batch_size = 4 )))
102
142
[(' apple' , ' pear' ), (' apple' , ' banana' ), (' apple' , ' pear' ), (' apple' , ' banana' )]
103
143
144
+
104
145
More usage examples
105
146
-------------------
106
147
See the `documentation <https://pytorch-datastream.readthedocs.io/en/latest/ >`_
107
148
for more usage examples.
108
-
109
- Install from source
110
- ===================
111
-
112
- .. pip install -e .
0 commit comments