A new, fully automated, rapid method, referred to as kernel principal component analysis residual diagnosis (KPCARD), is proposed for removing cosmic ray artifacts (CRAs) in Raman spectra, and in particular for large Raman imaging datasets. KPCARD identifies CRAs via a statistical analysis of the residuals obtained at each wavenumber in the spectra. The method utilizes the stochastic nature of CRAs; therefore, the most significant components in principal component analysis (PCA) of large numbers of Raman spectra should not contain any CRAs. The process worked by first implementing kernel PCA (kPCA) on all the Raman mapping data and second accurately estimating the inter- and intra-spectrum noise to generate two threshold values. CRA identification was then achieved by using the threshold values to evaluate the residuals for each spectrum and assess if a CRA was present.
CRA correction was achieved by spectral replacement where, the nearest neighbor (NN) spectrum, most spectroscopically similar to the CRA contaminated spectrum and principal components (PCs) obtained by kPCA were both used to generate a robust, best curve fit to the CRA contaminated spectrum. This best fit spectrum then replaced the CRA contaminated spectrum in the dataset. KPCARD efficacy was demonstrated by using simulated data and real Raman spectra collected from solid-state materials. The results showed that KPCARD was fast (<1 min per 8400 spectra), accurate, precise, and suitable for the automated correction of very large (>1 million) Raman datasets.