Paradym shift in mathematics through deep learning-based solution prior

 Most 3D medical imaging techniques, such as CT and MRI, employ voxel-based representations to provide spatial mapping of anatomical structures and tissue characteristics, offering clinical relevance and utility.  In the process of image reconstruction, it is necessary to establish a correspondence between the value assigned to each voxel and the measurable quantities. For instance, in CT, the measured data takes the form of X-ray projections, which can be modeled as the Radon transform of the voxel values within the CT image. In MRI, the measured data consists of k-space data, that is the Fourier transform of MR images.  In this context,  we use the inverse of the forward operator (e.g., the Radon transform or Fourier transform) for voxel-by-voxel image reconstruction to determine the value of each voxel based on measurement data. Consequently, to enhance image resolution, it is necessary to increase the number of measurements proportionally because, in simple terms, the number of equations representing measurements should not be less than the number of unknowns representing individual voxels within the image volume.


The recent trend in medical imaging is shifting away from the traditional conditions, exemplified by the Nyquist criteria, and is now placing a strong emphasis on significantly reducing the number of measurements while preserving spatial resolution. In other words, the objective is to minimize the ratio between the number of measurements and the number of voxels, while ensuring the maintenance of image reconstruction performance. In the context of CT and MRI,  this shift towards data minimization is primarily driven by the need to shorten MRI data acquisition times and reduce radiation exposure in CT scans. In traditional mathematical terms, this shift can be described as a transition from well-posed problems to ill-posed problems. In simpler terms, these ill-posed problems involve solving the following linear system:$$ A \mathbf{x}= \mathbf{b}$$where the dimension of the data vector $\mathbf{b}$ is significantly lower than that of the unknown vector $\mathbf{x}$. Specifically, $\mathbf{b}$ represents the sampled data vector (e.g., k-space for MRI or sinogram for CT), $\mathbf{x}$ represents the image vector (e.g., MRI or CT image), and $A$ stands for the forward matrix, representing the discrete Fourier transform for MRI or discrete Radon transform for CT.

Certainly, tackling this ill-posed problem  is not achievable through the traditional mathematical approach of seeking individual elements of the vector $\mathbf{x}$ one by one.  Traditional mathematical approaches, including norm-based regularization, have limitations in utilization of prior knowledge that captures both local and global interconnections among vector elements into problem solving process.

The well-posedness of an inverse problem often relies on our ability to succinctly represent solutions within the expansive solution space defined by voxel dimensions. Many ill-posed problems can be transformed into well-posed ones by confining the set of permissible solutions to a compactly defined subset denoted as $\mathcal{S}$. This this constrained set $\mathcal S$  ensures that the isometry condition $\|A\mathbf{x} - A\mathbf{x}'\| \approx \|\mathbf{x} - \mathbf{x}'\|$ holds true for  $\mathbf{x}, \mathbf{x}'\in \mathcal S$. With this isometry condition, it becomes possible to define a mapping $f:\mathbf{b} \mapsto \mathbf{x} \in \mathcal{S}$, where the function $f$ maps observations $\mathbf{b}$ to solutions $\mathbf{x}$ within the constrained set $\mathcal{S}$.

In the context of high-resolution medical imaging, accurately capturing the set $\mathcal{S}$, which represents a constrained and meaningful subset of solutions, poses a significant challenge. This difficulty arises from the complex and high-dimensional nature of medical images, where essential features and variations are deeply intertwined. Despite advances in deep learning, including techniques like Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), achieving effective dimensionality reduction through disentangled representation learning remains a formidable task. 

Contrastingly, empirical evidence suggests that architectures like U-Net—which enhances traditional autoencoders by incorporating skip connections—along with similar methodologies, are adept at navigating this complexity. Specifically, these approaches have demonstrated a remarkable capacity for identifying and implementing the reconstruction mapping $f: \mathbf{b} \mapsto \mathbf{x} \in \mathcal{S}$. The success of U-Net and analogous strategies lies in their ability to maintain essential information across the encoding-decoding process, thereby effectively bridging the gap between the high-dimensional input space and the constrained solution set $\mathcal{S}$.

Popular posts from this blog

Exploring the Fundamentals of Diffusion Models

University Education: Reflecting Life's Complexities and Challenges

AI as an Aid, Not a Replacement: Enhancing Medical Practice without Supplanting Doctors