1 Introduction
Due to the recent advances in machine learning (ML), ML models are being more and more used in practice and applied to realworld scenarios [1, 2, 3]. In order to be accepted by the user and the society in general, it becomes important to be able to explain the output and behavior of ML models. Especially because of legal regulations like the EU regulation ”General Data Protection Right” (GDPR) [4]
, that contains a ”right to an explanation”, it is nowadays indispensable to explain the output and behavior of artificial intelligence (AI) in a comprehensible way.
Recently the research community got attracted to explainability and transparency in machine learning [5, 6, 7, 8]. There exist some methods for explaining and understanding ML models [9]. One family of methods are modelagnostic methods [6, 10]. Modelagnostic methods are flexible in the sense that they are not tailored to a particular model or representation. This makes modelagnostic methods (in theory) applicable to many different types of ML models. In particular, ”truly” modelagnostic methods do not need access to the training data or model internals. It is sufficient to have an interface for passing data points to the model and obtaining the output/predictions of the model  the underlying model is viewed as a blackbox.
Examples of modelagnostic methods are feature interaction methods [11], feature importance methods [12], partial dependency plots [13]
and local methods that approximates the model locally by an explainable model (e.g. a decision tree)
[14, 15]. These methods explain the models by using features as vocabulary.A different class of modelagnostic explanations are examplebased explanations where a prediction or behavior is explained by a (set of) data points [16]. Instances of examplebased explanations are prototypes & criticisms [17] and influential instances [18]. Another instance of examplebased explanations are counterfactual explanations [19]. A counterfactual explanation is a change of the original input that leads to a different (specific) prediction/behavior of the ML model  what has to be different in order to change the prediction of the model? Such an explanation is considered to be fairly intuitive, humanfriendly and useful because it tells people what to do in order to achieve a desired outcome [19, 9].
Model agnostic methods are universal in the sense that they are applicable to all models. However, usually model agnostic methods are computational expensive because they have to make many predictions using the model. Thus, model specific methods are relevant because they can be computed more efficiently. Therefore, it is desirable to have model specific techniques for computing explanations (e.g. contraining the architecture of a neural network
[20]).Learning vector quantization (LVQ) models [21] are prototype based models that represent data by a set of prototypes. LVQ models can be combined with metric learning and thereby increase the effectiveness of the model in case of few prototypes [22, 23]. Furthermore, LVQ models can be used for incremental learning [24] and lifelong learning [25].
Therefore, it is highly relevant to be able to efficiently compute counterfactual explanations of LVQ models. In this paper we investigate how to efficiently compute counterfactual explanations of LVQ models by exploiting the special structure of LVQ models. More precisely, our contributions are:

We propose model and regularizationdependent methods for efficiently computing counterfactual explanations of LVQ models.

We empirically evaluate our methods and compare them with traditional methods.
The remainder of this paper is structured as follows: First, we briefly review counterfactual explanations (section 2.1) and learning vector quantization models (section 2.2). Then, in section 3 we introduce our convex and nonconvex programming framework for efficiently computing counterfactual explanations of different types of LVQ models  note that all derivations can be found in the appendix (section 5). Furthermore, we empirically evaluate the efficiency of our methods. Finally, section 4 summarizes the results of this paper.
2 Review
2.1 Counterfactual explanations
Counterfactual explanations [19] (often just called counterfactuals) are an instance of examplebased explanations [16]. Other instances of examplebased explanations [9] are influential instances [18] and prototypes & criticisms [17].
A counterfactual states a change to some features/dimensions of a given input such that the resulting data point (called counterfactual) has a different (specified) prediction than the original input. Using a counterfactual instance for explaining the prediction of the original input is considered to be fairly intuitive, humanfriendly and useful because it tells people what to do in order to achieve a desired outcome [19, 9].
A classical use case of counterfactual explanations is loan application [3, 9]: Imagine you applied for a credit at a bank. Unfortunately, the bank rejects your application. Now, you would like to know why. In particular, you would like to know what would have to be different so that your application would have been accepted. A possible explanation might be that you would have been accepted if you would earn 500$ more per month and if you would not have a second credit card.
Although counterfactuals constitute very intuitive explanation mechanisms, there do exist a couple of problems.
One problem is that there often exist more than one counterfactual  this is called Rashomon effect [9]. If there are more than one possible explanation (counterfactual), it is not clear which one should be selected.
Another disadvantage of counterfactuals is that there does not exist a general and efficient method for computing counterfactuals of categorical inputs  there are first attempts for bagofwords models [26] but these methods work on a particular model in a particular domain only. However, recently CERTIFAI [27]
was proposed. In CERTIFAI, a genetic algorithm is used for computing counterfactuals  a benefit of using a genetic algorithm is that we can deal with categorical variables and we can add domain knowledge about feature (interaction) constraints, whereas the downside is time complexity and complex hyperparameter tuning (e.g. size of the population, how to do a cross over, mutation, …).
An alternative  but very similar in the spirit  to counterfactuals [19] is the Growing Spheres method [28]
. However, this method suffers from the curse of dimensionality because it has to draw samples from the input space, which can become difficult if the input space is highdimensional.
Now, we first formally define the problem of finding counterfactuals in general form: Assume a prediction function is given. Computing a counterfactual of a given input ^{2}^{2}2We restrict ourself to , but in theory one could use an arbitrary domain . can be interpreted as an optimization problem [19]:
(1) 
where
denotes a loss function that penalizes deviation of the prediction
from the requested prediction . denotes a regularization that penalizes deviations from the original input . The hyperparameter denotes the regularization strength.Two common regularizations are the weighted Manhattan distance and the generalized L2 distance. The weighted Manhattan distance is defined as
(2) 
where denote the feature wise weights. A popular choice [19] for is the inverse median absolute deviation of the th feature median in the training data set :
(3) 
The benefit of this choice is that it takes the (potentially) different variability of the features into account. However, because we need access to the training data set , this regularization is not a truly modelagnostic method  it is not usable if we only have access to a prediction interface of a blackbox model.
The generalized L2 distance^{3}^{3}3also called Mahalanobis distance is defined as
(4) 
where denotes a symmetric positive semidefinite (s.p.s.d.) matrix. Note that the L2 distance can be recovered by setting . The generalized L2 distance can be interpreted as the Euclidean distance in a linearly distorted space.
Depending on the model and the choice of and , the final optimization problem might be differentiable or not. If it is differentiable, we can use a gradientbased optimization algorithm like conjugate gradients, gradient descent or (L)BFGS. Otherwise, we have to use a blackbox optimization algorithm like DownhillSimplex (also called NelderMead) or Powell. However, the best choice is to use a model and regularization specific optimizer. Unfortunately, there exit model specific optimizers for few ML models only.
2.2 Learning vector quantization
In learning vector quantization (LVQ) models [21] we compute a set of labeled prototypes from a training data set of labeled realvalued vectors  we refer to the th prototype as and the corresponding label as .
A new data point is classified according to the label of the nearest prototype:
(5) 
where denotes a function for computing the distance between a data point and a prototype  usually this is the Euclidean distance^{4}^{4}4However, other distance functions are possible.:
(6) 
There exist LVQ models like GMLVQ, LGMLVQ [22], MRSLVQ and LMRSLVQ [23] that learn a custom (class or prototype specific) distance matrix that is used instead of the identity when computing the distance between a data point and a prototype. This gives rise to the generalized L2 distance Eq. 4:
(7) 
Because must be a s.p.s.d. matrix, instead of learning directly, these LVQ variants learn a matrix and compute the final distance matrix as:
(8) 
By this, we can guarantee that the matrix is s.p.s.d., whereas the model only has to learn an arbitrary matrix  which is much easier than making some restriction on the definiteness. Training usually takes place by optimizing suitable cost functions as regards prototype locations and metric parameters. For counterfactual reasoning, the specific training method is irrelevant and we refer to the final model only.
3 Counterfactual explanations of LVQ models
We aim for an efficient explicit formulation how to find counterfactuals, given an input and a specific LVQ model.
3.1 General approach
Because a LVQ model assigns the label of the nearest prototype to a given input, we know that the nearest prototype of a counterfactual must be a prototype with . In order to compute a counterfactual of a given input , it is sufficient to solve the following optimization problem for each prototype with and select the counterfactual yielding the smallest value of :
(9) 
where denotes the set of all prototypes not labeled as . Note that the feasible region of Eq. 9 is always nonempty  the prototype is always a feasible solution.
The pseudocode for computing a counterfactual of a LVQ model is described in Algorithm 1.
Note that the for loop in line 3 of Algorithm 1 can be easily parallelized. Furthermore and in contrast to Eq. 1, we do not have any hyperparameters that need to be chosen.
In the subsequent sections we explore Eq. 9 for different regularizations and LVQ models, and investigate how to solve it efficiently. Note that for the purpose of better readability and due to space constraints, we put all derivations in the appendix (section 5).
3.2 (Generalized matrix) LVQ
When using (generalized matrix) LVQ models  no class or prototype specific distance matrices  the optimization problem Eq. 9
becomes either a linear program (LP) or a convex quadratic program (QP).
When using the weighted Manhattan distance Eq. 2 as a regularization , Eq. 9 can be written as a LP:
(10) 
where
(11) 
(12) 
(13) 
We set if the LVQ model uses the Euclidean distance.
When using the Euclidean distance Eq. 4 as a regularization , Eq. 9 can be written as a convex quadratic program with linear constraints:
(14) 
Note that and are the same as in Eq. 10.
Eq. 10 and Eq. 14 both contain strict inequalities . Unfortunately, strict inequalities are not allowed in linear/quadratic programming because the feasible region would become an open set. However, we could turn the into a by subtract a small number from the right side of the inequality. In practice, when implementing our methods, we found that we can safely replace all by without changing anything else  this might be because of the numerics (like roundoff errors) of fixed size floatingpoint numbers.
Both a linear and a convex quadratic program can be solved efficiently [29].
Another benefit is that we can easily add additional linear constraints like box constraints, restricting the value range of linear interaction of features or freezing some features  these features are not allowed to be different in the counterfactual.
3.3 (Localized generalized matrix) LVQ
In case of LVQ models that learn a class or prototype specific distance matrix , the optimization problem Eq. 9 becomes a quadratically constrained quadratic program (QCQP).
When using the weighted Manhattan distance Eq. 2 as a regularization , the optimization problem Eq. 9 becomes:
(15) 
where
(16) 
(17) 
(18) 
When using the Euclidean distance Eq. 4 as a regularization , Eq. 9 can be written as:
(19) 
Unfortunately, we can not make any statement about the definiteness of . Because is the difference of two s.p.s.d. matrices, all we know is that it is a symmetric matrix. Therefore, Eq. 15 and Eq. 19 are both nonconvex QCQPs and solving a nonconvex QCQP is known to be NPhard [30]. However, there exist methods like the SuggestImprove framework [30] that can approximately solve a nonconvex QCQP very efficiently  more details on how to apply this to Eq. 15 and Eq. 19 can be found in the appendix (section 5).
3.4 Experiments
In order to empirically confirm the efficiency of our proposed methods, we conducted some experiments:
We fitted GLVQ, GMLVQ and LGMLVQ models  we always used 3 prototypes per class  to the ”Breast Cancer Wisconsin (Diagnostic) Data Set” [31] and the ”Optical Recognition of Handwritten Digits Data Set” [32]. We used PCA to reduce the dimensionality of the ”Optical Recognition of Handwritten Digits Data Set” to and of the ”Breast Cancer Wisconsin (Diagnostic) Data Set” to .
We implemented our proposed methods (Algorithm 1) for computing counterfactual explanations^{5}^{5}5Available on GitHub https://github.com/andreArtelt/efficient_computation_counterfactuals_lvq. We used the SCS solver [33, 34] for solving linear/quadratic programs and the SuggestImprove framework [30] for approximatly solving nonconvex QCQPs  we simply picked the target prototype as an initial feasible solution in the Suggeststep and used the penalty convexconcave procedure (CCP)^{6}^{6}6As a subroutine, we used the MOSEK solver  https://www.mosek.com. We gratefully acknowledge the academic license provided by MOSEK ApS. in the Improvestep [35, 30].
For comparison, we used the optimizer for computing counterfactual explanations of LVQ models as implemented in ceml [36]  the optimization problem Eq. 1, where the loss function is the distance to the nearest prototype with the requested label , is minimized by the DownhillSimplex algorithm. In all cases (including our methods), we used the Manhattan distance as a regularizer.
For each possible combination of model, data set and method, we did a 4fold cross validation and recorded for each sample the computation time as well as the Manhattan distance between the counterfactual and the original data point.
Data set  Breast cancer  Handwritten digits  

Model \Method  Blackbox  Ours  Blackbox  Ours 
GLVQ  3.26  1.96  6.51  3.99 
GMLVQ  2.71  2.46  21.34  4.40 
LGMLVQ  2.00  1.57  8.12  7.53 
Data set  Breast cancer  Handwritten digits  

Model \Method  Blackbox  Ours  Blackbox  Ours 
GLVQ  0.95  0.01  3.15  0.02 
GMLVQ  0.25  0.01  2.12  0.02 
LGMLVQ  0.91  0.65  3.35  2.09 
In all cases, our method yields counterfactuals that are closer to the original data point and is always (much) faster than optimizing the original cost function Eq. 1 by using the DownhillSimplex algorithm.
All experiments were implemented in Python 3.6 [37] using the packages cvxpy [38] cvxqcqp [39], sklearnlvq [40], numpy [41], scipy [42], scikitlearn [43] and ceml [36].
4 Conclusion
We proposed, and empirically evaluated, model and regularizationdependent convex and nonconvex programs for efficiently computing counterfactual explanations of LVQ models. We found that in many cases we get either a set of linear or convex quadratic programs which both can be solved efficiently. Only in the case of localized LVQ models we have to solve a set of nonconvex quadratically constrained quadratic programs  we found that they can be efficiently approximately solved by using the SuggestImprove framework.
5 Appendix
5.1 Minimizing the Euclidean distance
First, we expand the Euclidean distance (Eq. 4):
(20) 
Next, we note that that we can drop the constant when optimizing with respect to :
(21) 
5.2 Minimizing the weighted Manhattan distance
First, we transform the problem of minimizing the weighted Manhattan distance (Eq. 2) into epigraph form:
(22) 
Next, we separate the dimensions:
(23) 
After that, we remove the absolute value function:
(24) 
Finally, we rewrite everything in matrixvector notation:
(25) 
where
(26) 
5.3 Enforcing a specific prototype as the nearest neighbor
By using the following set of inequalities, we can force the prototype to be the nearest neighbor of the counterfactual  which would cause to be classified as :
(27) 
We consider a fixed pair of and :
(28) 
where
(29) 
(30) 
(31) 
If we only have one global distance matrix , we find that and the inequality Eq. 28 simplifies:
(32) 
where
(33) 
(34) 
If we do not use a custom distance matrix, we have and Eq. 28 becomes:
(35) 
where
(36) 
(37) 
5.4 Approximately solving the nonconvex QCQP
Recall the nonconvex quadratic constraint from Eq. 28:
(38) 
where the matrix is defined as the difference of two s.p.s.d. matrices:
(39) 
By making use of Eq. 39, we can rewrite Eq. 38 as:
(40) 
where
(41) 
(42) 
Under the assumption that our regularization function is a convex function^{7}^{7}7The weighted Manhattan distance and the Euclidean distance are convex functions!, we can rewrite a generic version of the nonconvex QCQP (Eq. 15 and Eq. 19) as follows:
(43) 
Note that we relaxed the strict inequality. Because and are s.p.s.d. matrices, we know that and are convex functions. Therefore, Eq. 43 is a differenceofconvex program (DCP).
This allows us to use the penalty convexconcave procedure (CCP) [30] for computing an approximate solution of Eq. 43 that is equivalent to the original nonconvex QCQPs Eq. 15 and Eq. 19. For using the penalty CCP, we need the first order Taylor approximation of around a current point :
(44) 
where
(45) 
(46) 
References
 [1] Sharad Goel, Justin M. Rao, and Ravi Shroff. Precinct or prejudice? understanding racial disparities in new york city’s stopandfrisk policy. 2016.
 [2] Kaveh Waddell. How algorithms can bring down minorities’ credit scores. The Atlantic, 2016.
 [3] Amir E. Khandani, Adlar J. Kim, and Andrew Lo. Consumer creditrisk models via machinelearning algorithms. Journal of Banking & Finance, 34(11):2767–2787, 2010.
 [4] European parliament and council. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). https://eurlex.europa.eu/eli/reg/2016/679/oj, 2016.

[5]
Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and
Lalana Kagal.
Explaining explanations: An overview of interpretability of machine
learning.
In
5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018, Turin, Italy, October 13, 2018
, pages 80–89, 2018.  [6] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models. ACM Comput. Surv., 51(5):93:1–93:42, August 2018.
 [7] Erico Tjoa and Cuntai Guan. A survey on explainable artificial intelligence (XAI): towards medical XAI. CoRR, abs/1907.07374, 2019.
 [8] Wojciech Samek, Thomas Wiegand, and KlausRobert Müller. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. CoRR, abs/1708.08296, 2017.
 [9] Christoph Molnar. Interpretable Machine Learning. 2019. https://christophm.github.io/interpretablemlbook/.
 [10] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Modelagnostic interpretability of machine learning. In ICML Workshop on Human Interpretability in Machine Learning (WHI), 2016.
 [11] Brandon M. Greenwell, Bradley C. Boehmke, and Andrew J. McCarthy. A simple and effective modelbased variable importance measure. CoRR, abs/1805.04755, 2018.
 [12] Aaron Fisher, Cynthia Rudin, and Francesca Dominici. All Models are Wrong but many are Useful: Variable Importance for BlackBox, Proprietary, or Misspecified Prediction Models, using Model Class Reliance. arXiv eprints, page arXiv:1801.01489, Jan 2018.
 [13] Qingyuan Zhao and Trevor Hastie. Causal interpretations of blackbox models. Journal of Business & Economic Statistics, 0(ja):1–19, 2019.
 [14] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 1135–1144, New York, NY, USA, 2016. ACM.
 [15] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco Turini, and Fosca Giannotti. Local rulebased explanations of black box decision systems. CoRR, abs/1805.10820, 2018.
 [16] A. Aamodt and E. Plaza. Casebased reasoning: Foundational issues, methodological variations, and systemapproaches. AI communications, 1994.
 [17] Been Kim, Oluwasanmi Koyejo, and Rajiv Khanna. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 510, 2016, Barcelona, Spain, pages 2280–2288, 2016.
 [18] Pang Wei Koh and Percy Liang. Understanding blackbox predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 611 August 2017, pages 1885–1894, 2017.
 [19] Sandra Wachter, Brent D. Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. CoRR, abs/1711.00399, 2017.
 [20] Zebin Yang, Aijun Zhang, and Agus Sudjianto. Enhancing explainability of neural networks through architecture constraints. CoRR, abs/1901.03838, 2019.
 [21] David Nova and Pablo A. Estévez. A review of learning vector quantization classifiers. Neural Comput. Appl., 25(34):511–524, September 2014.
 [22] Petra Schneider, Michael Biehl, and Barbara Hammer. Adaptive relevance matrices in learning vector quantization. Neural Computation, 21(12):3532–3561, 2009. PMID: 19764875.
 [23] Petra Schneider, Michael Biehl, and Barbara Hammer. Distance learning in discriminative vector quantization. Neural Computation, 21(10):2942–2969, 2009. PMID: 19635012.
 [24] Ye Xu, Shen Furao, Osamu Hasegawa, and Jinxi Zhao. An online incremental learning vector quantization. In Advances in Knowledge Discovery and Data Mining, 13th PacificAsia Conference, PAKDD 2009, Bangkok, Thailand, April 2730, 2009, Proceedings, pages 1046–1053, 2009.
 [25] Stephan Kirstein, Heiko Wersing, HorstMichael Gross, and Edgar KÃ¶rner. A lifelong learning vector quantization approach for interactive learning of multiple categories. Neural networks : the official journal of the International Neural Network Society, 28:90–105, 04 2012.
 [26] David Martens and Foster Provost. Explaining datadriven document classifications. MIS Q., 38(1):73–100, March 2014.
 [27] Shubham Sharma, Jette Henderson, and Joydeep Ghosh. CERTIFAI: counterfactual explanations for robustness, transparency, interpretability, and fairness of artificial intelligence models. CoRR, abs/1905.07857, 2019.
 [28] Thibault Laugel, MarieJeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. Comparisonbased inverse classification for interpretability in machine learning. In Information Processing and Management of Uncertainty in KnowledgeBased Systems. Theory and Foundations  17th International Conference, IPMU 2018, Cádiz, Spain, June 1115, 2018, Proceedings, Part I, pages 100–111, 2018.
 [29] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, New York, NY, USA, 2004.
 [30] Jaehyun Park and Stephen Boyd. General heuristics for nonconvex quadratically constrained quadratic programming. arXiv preprint arXiv:1703.07870, 2017.
 [31] Olvi L. Mangasarian William H. Wolberg, W. Nick Street. Breast cancer wisconsin (diagnostic) data set. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic), 1995.
 [32] E. Alpaydin and C. Kaynak. Optical recognition of handwritten digits data set. https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits, 1998.
 [33] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. Conic optimization via operator splitting and homogeneous selfdual embedding. Journal of Optimization Theory and Applications, 169(3):1042–1068, June 2016.
 [34] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. SCS: Splitting conic solver, version 2.1.1. https://github.com/cvxgrp/scs, November 2017.
 [35] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3(1):1–122, January 2011.
 [36] André Artelt. Ceml: Counterfactuals for explaining machine learning models  a python toolbox. https://www.github.com/andreArtelt/ceml, 2019.
 [37] Guido Van Rossum and Fred L Drake Jr. Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands, 1995.
 [38] Steven Diamond and Stephen Boyd. CVXPY: A Pythonembedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
 [39] Jaehyun Park and Stephen Boyd. A cvxpy extension for handling nonconvex qcqp via suggestandimprove framework. https://github.com/cvxgrp/qcqp, 2017.
 [40] Joris Jensen. sklearnlvq. https://github.com/MrNuggelz/sklearnlvq, 2017.
 [41] Stéfan van der Walt, S. Chris Colbert, and Gaël Varoquaux. The numpy array: A structure for efficient numerical computation. Computing in Science and Engineering, 13(2):22–30, 2011.
 [42] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–.
 [43] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikitlearn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
Comments
There are no comments yet.