Access here. Citation:
Shangke Liu, Mohamed Amgad, Deeptej More, Muhammad A. Rathore, Roberto Salgado, Lee A. D. Cooper: A panoptic segmentation dataset and deep-learning approach for explainable scoring of tumor-infiltrating lymphocytes
npj Breast Cancer 10, 52 (2024). https://doi.org/10.1038/s41523-024-00663-1
Tumor-Infiltrating Lymphocytes (TILs) have strong prognostic and predictive value in breast cancer, but their visual assessment is subjective. To improve reproducibility, the International Immuno-oncology Working Group recently released recommendations for the computational assessment of TILs that build on visual scoring guidelines. However, existing resources do not adequately address these recommendations due to the lack of annotation datasets that enable joint, panoptic segmentation of tissue regions and cells. Moreover, existing deep-learning methods focus entirely on either tissue segmentation or cell nuclei detection, which complicates the process of TILs assessment by necessitating the use of multiple models and reconciling inconsistent predictions. We introduce PanopTILs, a region and cell-level annotation dataset containing 814,886 nuclei from 151 patients, openly accessible at: sites.google.com/view/panoptils. Using PanopTILs we developed MuTILs, a neural network optimized for assessing TILs in accordance with clinical recommendations. MuTILs is a concept bottleneck model designed to be interpretable and to encourage sensible predictions at multiple resolutions. Using a rigorous internal-external cross-validation procedure, MuTILs achieves an AUROC of 0.93 for lymphocyte detection and a DICE coefficient of 0.81 for tumor-associated stroma segmentation. Our computational score closely matched visual scores from 2 pathologists (Spearman R = 0.58–0.61, p < 0.001). Moreover, computational TILs scores had a higher prognostic value than visual scores, independent of TNM stage and patient age. In conclusion, we introduce a comprehensive open data resource and a modeling approach for detailed mapping of the breast tumor microenvironment.
We recommend using the szolgyen/mutils:v2 image from Docker Hub to perform inference with MuTILs. This image is based on Ubuntu 22.04 and includes a Python 3.10.12 virtual environment preconfigured with all the necessary packages for MuTILs. It is built on nvidia/cuda:12.0.0-base-ubuntu22.04, providing CUDA 12.0.0 compatibility.
For the list of dependencies and additional details, refer to the Dockerfile in this repository.
The container has a preinstalled version of MuTILs in the home folder. There is no need to clone this repository in the container.
docker pull szolgyen/mutils:v2
We recommend using a run_docker.sh
file to start the container, see this example
docker run \
--name Mutils \
--gpus '"device=0,1,2,3,4"' \
--rm \
-it \
-v /path/to/the/slides:/home/input \
-v /path/to/the/output:/home/output \
-v /path/to/the/mutils/models:/home/models \
--ulimit core=0 \
szolgyen/mutils:v2 \
bash
The container needs the
- /home/input
- /home/output
- /home/models
mounting points to be connected to the corresponding server volumes. Make sure that these are set properly in the run_docker.sh
file.
./run_docker.sh
Within the container, check and customize the configuration file at
/home/MuTILs_Panoptic/configs/MuTILsWSIRunConfigs.yaml
.
If not changing the parameters in the configuration file, MuTILs will run with the default parameters. The default parameters are found at configs/MuTILsWSIRunConfigs.py.
The code also records the configuration parameters in the run's log file for reproducibility.
python MuTILs_Panoptic/mutils_panoptic/MuTILsWSIRunner.py
Host (recommended) Container (default)
. home
├── models ├── models
│ ├── fold_1 │ ├── fold_1
│ │ └── mutils_06022021_fold1.pt │ │ └── mutils_06022021_fold1.pt
│ ├── fold_2 │ ├── fold_2
│ │ └── mutils_06022021_fold2.pt │ │ └── mutils_06022021_fold2.pt
│ ├── fold_3 │ ├── fold_3
│ │ └── mutils_06022021_fold3.pt │ │ └── mutils_06022021_fold3.pt
│ ├── fold_4 │ ├── fold_4
│ │ └── mutils_06022021_fold4.pt │ │ └── mutils_06022021_fold4.pt
│ └── fold_5 │ └── fold_5
│ └── mutils_06022021_fold5.pt │ └── mutils_06022021_fold5.pt
├── input ├── input
├── output ├── output
└── run_docker.sh ├── MuTILs_Panoptic
└── venv
The code creates two folders in the mounted output directory:
- LOGS - the terminal output from the running code saved as a log file. It contains information about:
- the configuration parameters of the run,
- GPU availability and which model is loaded on which GPU,
- slide name, and slide ROI scoring steps,
- ROI processing progress,
- duration of the process per slide and overall
- perSlideResults - results of the segmentation and feature extraction are stored here.
- each slide has its own folder
- each slide folder has the following subfolders:
- nucleiMeta - nucleus metadata per ROI in CSV files
- nucleiProps - nucleus features per ROI in CSV files
- roiMasks - segmentation masks per ROI
- roiMeta - aggregated features per ROI in a JSON file
- each slide folder has the following files too:
- slidename_RoiLocs.csv - ROI coordinates and scores
- slidename_RoiLocs.png - visualization of ROI locations
- slidename.json - aggregated features of the whole slide
- slidename.tif - combined segmentation mask
MuTILsMaskVisualizer.py can be used to convert it to a color-coded image