The concept of sparsity as more approptiate characteristic of the data representation than the number of features used was discussed. A feature ranking and a feature selection method based on the linear support vector machines (SVM) that was used in conjunction with the SVM classifier was also proposed. This method can be combined with other classification algorithms. The results show that, at the same level of vector sparcity, feature selection based on SVM normals yields better classification performance than odds ratio or information gain based feature selection when linear SVM classifiers are used.