SEMANTIC DIMENSION NAMING (SDN): A PROCESS FOR NAMING DIMENSIONS IN A SEMANTIC SPACE
One common approach to text mining is to convert text information into a numeric representation by performing a singular value decomposition (SVD) on a weighted term-frequency (TF) matrix. SVD factors the weighted matrix into three matrices that map both the terms and the text units of the original matrix into a new space. Critical to the interpretation of the results is the need to label the dimensions of the semantic space that has been created. Our work provides a simple approach to doing this that greatly reduces the amount of subjectivity in the process.
In this paper, we use varimax rotation of the left singular vectors from the SVD weighted by the square root of the singular values. The goal is to generate rotated vectors in which the loadings on the terms are either large (positive or negative) or zero. This makes it easier to interpret the dimensions. As a second step, we construct scree plots of the loadings on each dimension to indicate which words should be considered as a part of the interpretation of the dimension. The scree plots, in turn, can be examined by experts in a field to provide names for the dimensions.
text mining, singular value decomposition, dimension naming, exploratory factor analysis.