Support Vector Machine (SVM) learning from imbalanced datasets, as well as most learning machines, can show poor performance on the minority class because SVMs were designed to induce a model based on the overall error. To improve their performance in these kind of problems, a low-cost post-processing strategy is proposed based on calculating a new bias to adjust the function learned by the SVM. The proposed bias will consider the proportional size between classes in order to improve performance on the minority class. This solution avoids not only introducing and tuning new parameters, but also modifying the standard optimization problem for SVM training. Experimental results on 34 datasets, with different degrees of imbalance, show that the proposed method actually improves the classification on imbalanced datasets, by using standardized error measures based on sensitivity and g-means. Furthermore, its performance is comparable to well-known cost-sensitive and Synthetic Minority Over-sampling Technique (SMOTE) schemes, without adding complexity or computational costs.
Rhetorical strategy is relevant in the law domain, where language is a vital instrument. Textual statistics have much to offer for uncovering such a strategy. We propose a methodology that starts from a non-structured text; first, the breakpoints are automatically detected and lexically homogeneous parts are identified; then, the shape of the text through the trajectory of these parts and their hierarchical structure are uncovered; finally, the argument flow is tracked along. Several methods are combined. Chronological clustering of multidimensional count series detects the breakpoints; the shape of the text is revealed by applying correspondence analysis to the parts×words table while the progression of the argument is described by labelled time-constrained hierarchical clustering. This methodology is illustrated on a rhetoric forensic application, concretely a closing speech delivered by a prosecutor at Barcelona Criminal Court. This approach could also be useful in politics, communication and professional writing.