By Matthias Renz, Cyrus Shahabi, Xiaofang Zhou, Muhammad Aamir Cheema

This quantity set LNCS 9049 and LNCS 9050 constitutes the refereed lawsuits of the 20 th overseas convention on Database structures for complex purposes, DASFAA 2015, held in Hanoi, Vietnam, in April 2015.

The sixty three complete papers awarded have been rigorously reviewed and chosen from a complete of 287 submissions. The papers conceal the subsequent themes: info mining; facts streams and time sequence; database garage and index; spatio-temporal facts; glossy computing platform; social networks; details integration and information caliber; info retrieval and summarization; defense and privateness; outlier and imbalanced information research; probabilistic and unsure info; question processing.

In our method, we use a simple clustering algorithm called Density Peak Clustering (DPCluster in short) [15]. DPCluster assumes that the cluster centers are defined as local maxima in the density of data points, or in other words, the cluster centers are surrounded by neighbors with lower density. It also assumes that the cluster centers are at a relatively large distance from any points with a higher local density. According to these two assumptions, DPCluster calculates two quantities for each data point: one is its local density, and the other is its distance from points of higher density, which play important roles in the clustering solutions and are defined as follows [15].

Because the clustering is applied only on minority class samples, the local density is also called the local minority density in this paper. Definition 2. (Distance from points of higher density) For any data point , its distance from points of higher density is measured as the minimum distance between the point and any other point with higher density: () min : (7) For the data point with the highest local density , is defined to be its maximal distance from any other point, that is, max . Based on those quantities, the clustering process consists of two steps.

5 Generation of Synthetic Minority Samples Before we describe the generation of synthetic samples, we first transform the importance weights of minority samples into a probability distribution that indicates the probability that a minority sample is selected as the seed sample: () () ∑ () (9) is selected ranTo generate a synthetic minority sample, a minority sample domly as the seed sample according to the probability distribution. Let denote the cluster that contains . We then select a second minority sample that belongs to the minority cluster .

