|
14 | 14 | - [1. Download ScanNet++ Data](#1-download-scannet-data-1)
|
15 | 15 | - [2. Data Processing](#2-data-processing)
|
16 | 16 | - [3. Data Structure](#3-data-structure)
|
| 17 | + - [Test Dataset](#test-dataset) |
| 18 | + - [Download Test Dataset](#download-test-dataset) |
| 19 | + - [Data Structure](#data-structure-1) |
| 20 | + - [Test Set Selection Criteria](#test-set-selection-criteria) |
| 21 | + - [Test Category Label Selection](#test-category-label-selection) |
| 22 | + - [Test Data Loading and Evaluation Workflow](#test-data-loading-and-evaluation-workflow) |
17 | 23 |
|
18 | 24 | ## Overview
|
19 | 25 | This document provides instructions for preparing ScanNet and ScanNet++ datasets for training and evaluation.
|
@@ -222,4 +228,52 @@ Each directory contains:
|
222 | 228 | - `extrinsic`: 4x4 camera-to-world transformation matrix
|
223 | 229 | - `render_depth/`: Rendered depth maps stored as 16-bit PNG files (depth values * 1000)
|
224 | 230 | - `rgb_resized_undistorted/`: Undistorted and resized RGB images
|
225 |
| -- `mask_resized_undistorted/`: Undistorted and resized binary mask images (255 for valid pixels, 0 for invalid) |
| 231 | +- `mask_resized_undistorted/`: Undistorted and resized binary mask images (255 for valid pixels, 0 for invalid) |
| 232 | + |
| 233 | +## Test Dataset |
| 234 | + |
| 235 | +### Download Test Dataset |
| 236 | +```bash |
| 237 | +# Download and extract test dataset |
| 238 | +wget https://huggingface.co/datasets/Journey9ni/LSM/resolve/main/scannet_test.tar |
| 239 | +tar -xf scannet_test.tar -C ./data/ # Extract to the data directory |
| 240 | +``` |
| 241 | + |
| 242 | +### Data Structure |
| 243 | +The test dataset is expected to have the following structure: |
| 244 | +```bash |
| 245 | +data/scannet_test/ |
| 246 | +└── {scene_id}/ |
| 247 | + ├── depth/ # Depth maps |
| 248 | + ├── images/ # RGB images |
| 249 | + ├── labels/ # Semantic labels |
| 250 | + ├── selected_seqs_test.json # Test sequence parameters |
| 251 | + └── selected_seqs_train.json # Train sequence parameters |
| 252 | +``` |
| 253 | + |
| 254 | +### Test Set Selection Criteria |
| 255 | +The test set was curated using the following process: |
| 256 | +1. **Initial Selection**: The last 50 scenes from the alphabetically sorted list of original ScanNet scans were initially selected. |
| 257 | +2. **Frame Sampling**: 30 frames were sampled at regular intervals from each selected scene. |
| 258 | +3. **Pose Validation**: Each frame's pose data was checked for NaN values (due to errors in the original ScanNet dataset). Scenes containing frames with invalid poses were excluded (7 scenes removed). |
| 259 | +4. **Compatibility Check**: Scenes that caused errors during testing with NeRF-DFF and Feature-3DGS were further filtered out. |
| 260 | +5. **Final Set**: This process resulted in a final test set of 40 scenes. |
| 261 | + |
| 262 | +### Test Category Label Selection |
| 263 | +We use a predefined set of common indoor categories: ['wall', 'floor', 'ceiling', 'chair', 'table', 'sofa', 'bed', 'other'], instead of the 20 categories used by ScanNetV2. |
| 264 | + |
| 265 | +### Test Data Loading and Evaluation Workflow |
| 266 | + |
| 267 | +The testing process relies on the `TestDataset` class in `large_spatial_model/datasets/testdata.py`, initialized with `split='test'` and `is_training=False`. |
| 268 | + |
| 269 | +1. **View Selection**: The dataset selects test views based on `llff_hold` and `test_ids` parameters for each scene. Typically, frames whose index modulo `llff_hold` falls within `test_ids` are chosen as the core test frames (`target_view`). |
| 270 | +2. **View Grouping**: For each selected `target_view`, the dataset groups it with its immediate predecessor (`source_view1`) and successor (`source_view2`), forming a tuple of view indices: `(source_view2, target_view, source_view1)`. The test set comprises a series of these `(Scene ID, View Indices Tuple)` pairs. |
| 271 | +3. **Data Loading**: When iterating through the dataset during testing: |
| 272 | + * The script loads RGB images (`.jpg`), depth maps (`.png`), semantic label maps (`.png`), and camera parameters (intrinsics, extrinsics from `.npz`) for each view index in the tuple. |
| 273 | + * Preprocessing steps include validity checks (e.g., for NaN in camera poses) and image cropping/resizing. |
| 274 | + * The `map_func` maps original ScanNet semantic labels to the simplified category set defined above. |
| 275 | + * This yields a dictionary for each view containing image, depth, pose, intrinsics, processed label map, etc. |
| 276 | +4. **Model Inference and Evaluation**: |
| 277 | + * The model takes the `source_view1` and `source_view2` data as input to infer the parameters (e.g., Gaussian parameters for 3D Gaussian Splatting). |
| 278 | + * Using these inferred parameters and the `target_view`'s camera pose/intrinsics, the model renders a semantic label map for the `target_view`. |
| 279 | + * This rendered semantic map is then compared against the ground truth semantic label map for the `target_view` from the original ScanNet dataset to evaluate the model's performance. |
0 commit comments