How Researchers Are Preprocessing Gigabyte-Sized WSIs for Deep Learning | HackerNoon
Briefly

Whole slide images (WSIs) are often too large for direct deep learning model processing, typically exceeding 6 GB. To mitigate storage issues, a focused approach on single magnification levels is adopted, enabling model training without excess data storage. A dedicated pipeline is established for fetching, processing, and filtering WSI tiles. Metadata extraction captures essential slide information and saves it in a CSV file. The pipeline also avoids unnecessary patches by leveraging the multi-scale nature of WSIs, efficiently processing only relevant tissue areas after initial filtering steps.
Whole slide images (WSIs) can be too large to be processed directly by Deep Learning models due to their sizes, often exceeding 6 GB of memory.
A preprocessing pipeline is developed to manage WSI tiles, processing and filtering them to efficiently encode and save embeddings and metadata into HDF5 files.
Metadata extraction involves gathering slide details including id, labels, type of slide, and microns per pixel, saved for further processing in a CSV file.
The approach uses WSI's multi-scale property, initially fetching thumbnail level tiles and applying Otsu's thresholding to filter out background patches effectively.
Read at Hackernoon
[
|
]