Title: | Differentially Private Regularized Logistic Regression |
---|---|
Description: | Implements two differentially private algorithms for estimating L2-regularized logistic regression coefficients. A randomized algorithm F is epsilon-differentially private (C. Dwork, Differential Privacy, ICALP 2006 <DOI:10.1007/11681878_14>), if |log(P(F(D) in S)) - log(P(F(D') in S))| <= epsilon for any pair D, D' of datasets that differ in exactly one record, any measurable set S, and the randomness is taken over the choices F makes. |
Authors: | Staal A. Vinterbo <[email protected]> |
Maintainer: | Staal A. Vinterbo <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2-22 |
Built: | 2025-03-15 03:24:59 UTC |
Source: | https://github.com/cran/PrivateLR |
PrivateLR implements two randomized algorithms for estimating
-regularized logistic regression coefficients that allow
specifying the maximal effect a single point change in the training data are
allowed to have. Specifically, the algorithms take as parameter the
maximum allowed change in log-likelihood of producing particular
coefficients resulting from any single training data point substitution.
dplr(object, ...) ## S3 method for class 'formula' dplr(object, data, lambda=NA, eps=1, verbose=0, rp.dim = 0, threshold='fixed', do.scale=FALSE, ...) ## S3 method for class 'numeric' dplr(object, x, ...) ## S3 method for class 'logical' dplr(object, x, ...) ## S3 method for class 'factor' dplr(object, x, ...) ## S3 method for class 'data.frame' dplr(object, target=ncol(object),...) ## S3 method for class 'matrix' dplr(object, target=ncol(object),...) ## S3 method for class 'dplr' predict(object, data, type = "probabilities", ...) ## S3 method for class 'dplr' summary(object, ...) ## S3 method for class 'dplr' print.summary(x, ...) ## S3 method for class 'dplr' print(x, ...) scaled(fml, data)
dplr(object, ...) ## S3 method for class 'formula' dplr(object, data, lambda=NA, eps=1, verbose=0, rp.dim = 0, threshold='fixed', do.scale=FALSE, ...) ## S3 method for class 'numeric' dplr(object, x, ...) ## S3 method for class 'logical' dplr(object, x, ...) ## S3 method for class 'factor' dplr(object, x, ...) ## S3 method for class 'data.frame' dplr(object, target=ncol(object),...) ## S3 method for class 'matrix' dplr(object, target=ncol(object),...) ## S3 method for class 'dplr' predict(object, data, type = "probabilities", ...) ## S3 method for class 'dplr' summary(object, ...) ## S3 method for class 'dplr' print.summary(x, ...) ## S3 method for class 'dplr' print(x, ...) scaled(fml, data)
object |
can be given as an object of If a In If given as a vector, |
data |
a data frame or matrix containing the variables in the model described by
|
lambda |
the regularization parameter. If |
eps |
the privacy level. The coefficients of the model are computed by a
method that guarantees |
verbose |
regulates how much information is printed, 0 nothing, 1 a little, 2 more. |
rp.dim |
if |
threshold |
|
do.scale |
The privacy guarantees are for data where the covariate vectors lie
within the unit ball. If |
type |
|
x |
In the |
target |
the index of the column in |
fml |
A formula that describes the dimensions of the data that should be scaled into the unit ball. |
... |
|
The function dplr
implements logistic regression using the
differentially private methods by Chaudhuri, Monteleoni, and
Sarwate.
The interface is similar but not identical to that of lm
, with
the addition of the possibility of supplying a data matrix or
data.frame together with a target column index (defaults to
ncol(data)
).
The returned model instance has a convenience function
model$pred
that takes a data matrix or data frame to be
classified as input.
The print
function currently prints the summary.
The scaled
function scales data such that covariate vectors
lie within the unit ball. Note that the response variable is
put as the last column in the data frame data
returned.
Also, the response column name might have changed, depending on
the left side of the formula given.
A randomized algorithm , taking a dataset as input, is said to be
-differentially private if it holds that
for any
pair of datasets that differ in exactly one element, and any
set
. We now turn to the algorithms implemented by
dplr
.
Let denote the L2 norm of a vector
, and let
where is
the average logistic loss over the
training data of size
and dimension
with labels
and covariates
. L2-regularized logistic regression
computes
for a given .
The function dplr
implements two approaches to
-differential private L2 regularized logistic regression
(see the ... argument
op
above).
The first is output perturbation, where we compute
where is a
-dimensional real vector sampled with
probability proportional to
.
The second is objective perturbation. Let
where and
are as above. Let
and let
, then if
we compute
otherwise we compute an adjusted lambda version
The logistic regression model coefficients are
then
-differentially private.
The dplr
function returns a class "dplr"
list object
comprised of elements including:
par |
the coefficients of the logistic model. |
coefficients |
same as |
value , counts , convergence , message
|
these are as returned by the
|
CIndex |
the area under the ROC curve (aka., C-Index) of the model on its training data. |
eps |
the supplied privacy level. |
lambda |
the regularization parameter used |
n |
the number of data points |
d |
the dimensionality of the data points |
pred |
a convenience function such that |
p.tr |
this is the classification probability threshold. |
did.rp |
TRUE if random projection was performed. |
rp.dim |
if random projection was performed this contains the number of dimensions projected onto. Only present if random projection was performed. |
rp.p |
the projection matrix used for random projection. Only present if random projection was performed. |
scaled |
TRUE if data was scaled by providing |
status |
a text string indicating the status of the computations.
|
The scaled
function returns a list of the following:
data |
the scaled data frame |
scale |
the scaling factor used. |
The privacy level is only guaranteed for the coefficients of the
model, not for all the other returned values, and also only in the
case when input data points (potentially after expansion of factors) are
of L2-norm <= 1. In particular using prediction thresholds
estimated using data (methods 'youden'
and 'topleft'
),
as well as built in scaling of data is not guaranteed.
Both of these are turned off by default.
This implementation was in part supported by NIH NLM grant 7R01LM007273-07 and NIH Roadmap for Medical Research grant U54 HL108460.
Staal A. Vinterbo <[email protected]>
Chaudhuri K., Monteleoni C., and Sarwate, A. Differentially Private Empirical Risk Minimization. JMLR, 2011, 12, 1069-1109
glm
and
predict
data(iris) # the following two are equivalent # and predict Species being any # but the first factor level. model <- dplr(iris) model <- dplr(Species ~ ., iris) # pick a particular factor level and privacy level 2 model <- dplr(I(Species != 'setosa') ~ ., iris, eps=2) # The following is again equivalent to the two first # examples. Note that we need to remove 'Species' from the # covariate matrix/data frame, and # that the class reported by summary will now # not be 'Species' but 'dplr.class'. model <- dplr(iris$Species, iris[,-5]) # two equivalent methods to get at the predicted # probabilities p <- model$pred(iris) p <- predict(model, iris) # print a summary of the model. Note that # only the coefficients are guaranteed # to be generated in an eps-differentially # private manner. summary(model)
data(iris) # the following two are equivalent # and predict Species being any # but the first factor level. model <- dplr(iris) model <- dplr(Species ~ ., iris) # pick a particular factor level and privacy level 2 model <- dplr(I(Species != 'setosa') ~ ., iris, eps=2) # The following is again equivalent to the two first # examples. Note that we need to remove 'Species' from the # covariate matrix/data frame, and # that the class reported by summary will now # not be 'Species' but 'dplr.class'. model <- dplr(iris$Species, iris[,-5]) # two equivalent methods to get at the predicted # probabilities p <- model$pred(iris) p <- predict(model, iris) # print a summary of the model. Note that # only the coefficients are guaranteed # to be generated in an eps-differentially # private manner. summary(model)