SimilaR_fromTwoFunctions {SimilaR} | R Documentation |
An implementation of the SimilaR algorithm - a novel method to quantify the similarity of R functions based on program dependence graphs. Possible use cases include detection of code clones for improving software quality and of plagiarism among students' homework assignments.
SimilaR_fromTwoFunctions
compares the code-base of two function objects.
SimilaR_fromTwoFunctions(function1, function2, functionNames, returnType = c("data.frame", "matrix"), aggregation = c("tnorm", "sym", "both"))
function1 |
a first function object to compare |
function2 |
a second function object to compare |
functionNames |
optional functions' names to be included in the output |
returnType |
|
aggregation |
|
Note that, depending on the "aggregation"
argument, the
method may either return a single value, representing the overall
(symmetric) similarity between a pair of functions, or
or two different values, measuring the (non-symmetric) degrees of "subsethood".
The user might possibly wish to aggregate these two values by means of some
custom aggregation function.
If returnType
is equal to "data.frame", a data frame with one row
that gives the information about the similarity of a given pair of functions
is returned (for compatibility with SimilaR_fromDirectory
).
Columns of the data frame are as follows:
name1
- the name of the first function in a pair
name2
- the name of the second function in a pair
SimilaR
- values in the [0,1] interval as returned by the SimilaR algorithm;
1 denotes that the functions are equivalent, while 0 means that they are totally dissimilar;
if aggregation
is equal to "both"
, two similarity values are given:
the one with suffix "12"
, which means how much the first function is a subset of the second,
and the another one with suffix "21"
which means how much the second function is a subset of the first one
decision
- 0 or 1; 1 means that two functions are classified as similar, and 0 otherwise.
If returnType
is equal to "matrix"
, a square 2x2 matrix
is returned. The element at index (i,j) equals to the similarity degree
between the i-th and the j-th function.
When aggregation
is equal to "sym"
or "tnorm"
,
the matrix is symmetric.
Column names and row names of the matrix are names of the compared functions.
Bartoszuk M., A source code similarity assessment system for functional programming languages based on machine learning and data aggregation methods, Ph.D. thesis, Warsaw University of Technology, Warsaw, Poland, 2018.
Bartoszuk M., Gagolewski M., Binary aggregation functions in software plagiarism detection, In: Proc. FUZZ-IEEE'17, IEEE, 2017.
Bartoszuk M., Beliakov G., Gagolewski M., James S., Fitting aggregation functions to data: Part II - Idempotentization, In: Carvalho J.P. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Part II (Communications in Computer and Information Science 611), Springer, 2016, pp. 780-789. doi:10.1007/978-3-319-40581-0_63.
Bartoszuk M., Beliakov G., Gagolewski M., James S., Fitting aggregation functions to data: Part I - Linearization and regularization, In: Carvalho J.P. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Part II (Communications in Computer and Information Science 611), Springer, 2016, pp. 767-779. doi:10.1007/978-3-319-40581-0_62.
Bartoszuk M., Gagolewski M., Detecting similarity of R functions via a fusion of multiple heuristic methods, In: Alonso J.M., Bustince H., Reformat M. (Eds.), Proc. IFSA/EUSFLAT 2015, Atlantis Press, 2015, pp. 419-426.
Bartoszuk M., Gagolewski M., A fuzzy R code similarity detection algorithm, In: Laurent A. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Part III (CCIS 444), Springer-Verlag, Heidelberg, 2014, pp. 21-30.
Other SimilaR: SimilaR_fromDirectory
f1 <- function(x) {x*x} f2 <- function(x,y) {x+y} ## A data frame is returned: 1 row, 4 columns SimilaR_fromTwoFunctions(f1, f2, returnType = "data.frame", aggregation = "tnorm") ## Custom names in the returned data frame SimilaR_fromTwoFunctions(f1, f2, functionNames = c("first", "second"), returnType = "data.frame", aggregation = "tnorm") ## A data frame is returned: 1 row, 5 columns SimilaR_fromTwoFunctions(f1, f2, returnType = "data.frame", aggregation = "both") ## A non-symmetric square matrix is returned, ## with 2 rows and 2 columns SimilaR_fromTwoFunctions(f1, f2, returnType = "matrix", aggregation = "both")