diff --git a/docs/source/counts.rst b/docs/source/counts.rst index e3e56b611..db059d594 100644 --- a/docs/source/counts.rst +++ b/docs/source/counts.rst @@ -8,7 +8,7 @@ Counting over an Integer Range .. function:: counts(x, a:b[, wv]) - Count the number of times (or total weights if a weight vector ``wv`` is given) values in ``a:b`` appear in array ``x``. Here, the optional argument ``wv`` should be a weight vector of type ``WeightVec`` (see :ref:`weightvec`). + Count the number of times (or total weights if a weight vector ``wv`` is given) values in ``a:b`` appear in array ``x``. Here, the optional argument ``wv`` should be a weight vector of type ``AbstractWeights`` (see :ref:`weightvec`). This function returns a vector ``r`` of length ``n``, with ``n = length(a:b) = b-a+1``. In particular, we have diff --git a/docs/source/cov.rst b/docs/source/cov.rst index 4745a0469..4e4339f34 100644 --- a/docs/source/cov.rst +++ b/docs/source/cov.rst @@ -22,14 +22,21 @@ This package implements functions for computing scatter matrix, as well as weigh .. function:: scatter(X, wv[; vardim=..., mean=...]) - Weighted scatter matrix. The weights are given by a weight vector ``wv`` of type ``WeightVec`` (see :ref:`weightvec`). + Weighted scatter matrix. The weights are given by a weight vector ``wv`` of type ``AbstractWeights`` (see :ref:`weightvec`). -.. function:: cov(X, wv[; vardim=..., mean=...]) +.. function:: cov(X, w[; vardim=..., mean=..., corrected=...]) - Weighted covariance matrix. + Compute the weighted covariance matrix. Similar to ``var`` and ``std`` the biased covariance matrix (``corrected=false``) is computed by multiplying ``scattermat(X, w)`` by :math:`\frac{1}{\sum{w}}` to normalize. + However, the unbiased covariance matrix (``corrected=true``) is dependent on the type of weights used: - **Note:** By default, the covariance is normalized by the sum of weights, that is, ``cov(X, wv)`` is equal to ``scatter(X, wv) / sum(wv)``. + * ``AnalyticWeights``: :math:`\frac{1}{\sum w - \sum {w^2} / \sum w}` + * ``FrequencyWeights``: :math:`\frac{1}{\sum{w} - 1}` + * ``ProbabilityWeights``: :math:`\frac{n}{(n - 1) \sum w}` where ``n`` equals ``count(!iszero, w)`` + * ``Weights``: ``ArgumentError`` (bias correction not supported) -.. function:: mean_and_cov(x[, wv][; vardim=...]) +.. function:: mean_and_cov(x[, wv][; vardim=..., corrected=...]) - Jointly compute the mean and covariance of ``x``. + Jointly compute the mean and covariance matrix as a tuple. + A weighting vector ``wv`` can be specified. ``vardim`` that designates whether the variables are columns in the matrix (``1``) or rows (``2``). + Finally, bias correction is applied to the covariance calculation if ``corrected=true``. + See ``cov`` documentation for more details. diff --git a/docs/source/empirical.rst b/docs/source/empirical.rst index b9b265020..b7aa067f4 100644 --- a/docs/source/empirical.rst +++ b/docs/source/empirical.rst @@ -14,12 +14,12 @@ Histograms can be fitted to data using the ``fit`` method. **Arguments:** -``data`` +``data`` is either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an *n*-dimensional histogram). ``weight`` - is an optional ``:ref:`weightvec` WeightVec``` (of the same length as the + is an optional ``:ref:`weightvec` AbstractWeights``` (of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supples, each observation has weight 1. @@ -30,7 +30,7 @@ Histograms can be fitted to data using the ``fit`` method. **Keyword arguments:** -``closed=:left/:right`` +``closed=:left/:right`` determines whether the bin intervals are left-closed [a,b), or right-closed (a,b] (default = ``:right``). @@ -48,7 +48,7 @@ Histograms can be fitted to data using the ``fit`` method. h = fit(Histogram, rand(100), weights(rand(100)), 0:0.1:1.0) h = fit(Histogram, [20], 0:20:100) h = fit(Histogram, [20], 0:20:100, closed=:left) - + # Multivariate h = fit(Histogram, (rand(100),rand(100))) h = fit(Histogram, (rand(100),rand(100)),nbins=10) @@ -60,7 +60,6 @@ Empirical Cumulative Distribution Function .. function:: ecdf(x) - Return an empirical cumulative distribution function based on a vector of samples given in ``x``. + Return an empirical cumulative distribution function based on a vector of samples given in ``x``. **Note:** this is a higher-level function that returns a function, which can then be applied to evaluate CDF values on other samples. - diff --git a/docs/source/means.rst b/docs/source/means.rst index 53961f222..210ac5e88 100644 --- a/docs/source/means.rst +++ b/docs/source/means.rst @@ -32,7 +32,7 @@ The package provides functions to compute means of different kinds. .. function:: mean(x, w) - The ``mean`` function is also extended to accept a weight vector of type ``WeightVec`` (see :ref:`weightvec`) to compute weighted mean. + The ``mean`` function is also extended to accept a weight vector of type ``AbstractWeights`` (see :ref:`weightvec`) to compute weighted mean. **Examples:** @@ -43,7 +43,7 @@ The package provides functions to compute means of different kinds. .. function:: mean(x, w, dim) - Compute weighted means of ``x`` along a certain dimension (specified by an integer ``dim``). The weights are given by a weight vector ``w`` (of type ``WeightVec``). + Compute weighted means of ``x`` along a certain dimension (specified by an integer ``dim``). The weights are given by a weight vector ``w`` (of type ``AbstractWeights``). .. function:: mean!(dst, x, w, dim) diff --git a/docs/source/sampling.rst b/docs/source/sampling.rst index 4c558e12b..963dc2b20 100644 --- a/docs/source/sampling.rst +++ b/docs/source/sampling.rst @@ -9,12 +9,11 @@ The package provides functions for sampling from a given population (with or wit .. function:: sample([rng], a) Randomly draw an element from an array ``a``. - Optionally specify a random number generator ``rng`` as the first argument (defaults to ``Base.GLOBAL_RNG``). -.. function:: sample([rng], a, n[; replace=true, ordered=false]) +.. function:: sample([rng], a, n[; replace=true, ordered=false]) - Randomly draw ``n`` elements from ``a``. + Randomly draw ``n`` elements from ``a``. Optionally specify a random number generator ``rng`` as the first argument (defaults to ``Base.GLOBAL_RNG``). @@ -26,14 +25,13 @@ The package provides functions for sampling from a given population (with or wit .. function:: sample!([rng], a, x[; replace=true, ordered=false]) Draw ``length(x)`` elements from ``a`` and write them to a pre-allocated array ``x``. - Optionally specify a random number generator ``rng`` as the first argument (defaults to ``Base.GLOBAL_RNG``). -.. function:: sample([rng], wv) +.. function:: sample([rng], wv) - Draw an integer in ``1:length(wv)`` with probabilities proportional to the weights given in ``wv``. + Draw an integer in ``1:length(wv)`` with probabilities proportional to the weights given in ``wv``. - Here, ``wv`` should be a weight vector of type ``WeightVec`` (see :ref:`weightvec`). + Here, ``wv`` should be a weight vector of type ``AbstractWeights`` (see :ref:`weightvec`). Optionally specify a random number generator ``rng`` as the first argument (defaults to ``Base.GLOBAL_RNG``). @@ -52,7 +50,7 @@ The package provides functions for sampling from a given population (with or wit **Keyword arguments** - ``replace``: indicates whether to have replacement (default = ``true``). - - ``ordered``: indicates whether to arrange the samples in ascending order (default = ``false``). + - ``ordered``: indicates whether to arrange the samples in ascending order (default = ``false``). .. function:: sample!([rng], a, wv, x[; replace=true, ordered=false]) @@ -74,7 +72,7 @@ Here are a list of algorithms implemented in the package. The functions below ar - ``a``: source array representing the population - ``x``: the destination array -- ``wv``: the weight vector (of type ``WeightVec``), for weighted sampling +- ``wv``: the weight vector (of type ``AbstractWeights``), for weighted sampling - ``n``: the length of ``a`` - ``k``: the length of ``x``. For sampling without replacement, ``k`` must not exceed ``n``. - ``rng``: optional random number generator (defaults to ``Base.GLOBAL_RNG``) @@ -108,7 +106,7 @@ All following functions write results to ``x`` (pre-allocated) and return ``x``. .. function:: fisher_yates_sample!([rng], a, x) - *Fisher-Yates shuffling* (with early termination). + *Fisher-Yates shuffling* (with early termination). Pseudo-code :: @@ -118,15 +116,15 @@ All following functions write results to ``x`` (pre-allocated) and return ``x``. swap inds[i] with a random one in inds[i:n] set x[i] = a[inds[i]] end - + This algorithm consumes ``k`` random numbers. It uses an integer array of length ``n`` internally to maintain the shuffled indices. It is considerably faster than Knuth's algorithm especially when ``n`` is greater than ``k``. .. function:: self_avoid_sample!([rng], a, x) - Use a set to maintain the index that has been sampled. Each time draw a new index, if the index has already been sampled, redraw until it draws an unsampled one. + Use a set to maintain the index that has been sampled. Each time draw a new index, if the index has already been sampled, redraw until it draws an unsampled one. - This algorithm consumes about (or slightly more than) ``k`` random numbers, and requires ``O(k)`` memory to store the set of sampled indices. Very fast when ``n >> k``. + This algorithm consumes about (or slightly more than) ``k`` random numbers, and requires ``O(k)`` memory to store the set of sampled indices. Very fast when ``n >> k``. However, if ``k`` is large and approaches ``n``, the rejection rate would increase drastically, resulting in poorer performance. @@ -153,7 +151,7 @@ All following functions write results to ``x`` (pre-allocated) and return ``x``. *Direct sampling.* - Draw each sample by scanning the weight vector. + Draw each sample by scanning the weight vector. This algorithm: (1) consumes ``k`` random numbers; (2) has time complexity ``O(n k)``, as scanning the weight vector each time takes ``O(n)``; and (3) requires no additional memory space. @@ -173,5 +171,4 @@ All following functions write results to ``x`` (pre-allocated) and return ``x``. It makes a copy of the weight vector at initialization, and sets the weight to zero when the corresponding sample is picked. - This algorithm consumes ``O(k)`` random numbers, and has overall time complexity ``O(n k)``. - + This algorithm consumes ``O(k)`` random numbers, and has overall time complexity ``O(n k)``. diff --git a/docs/source/scalarstats.rst b/docs/source/scalarstats.rst index 382296d75..82100f54b 100644 --- a/docs/source/scalarstats.rst +++ b/docs/source/scalarstats.rst @@ -6,54 +6,73 @@ The package implements functions for computing various statistics over an array Moments --------- -.. function:: var(x, wv[; mean=...]) +.. function:: var(x, w, [dim][; mean=..., corrected=...]) - Compute weighted variance. + Compute the variance of a real-valued array ``x``, optionally over a dimension ``dim``. + Observations in ``x`` are weighted using weight vector ``w``. + The uncorrected (when ``corrected=false``) sample variance is defined as: - One can set the keyword argument ``mean``, which can be either ``nothing`` (to compute the mean value within the function), ``0``, or a pre-computed mean value. + :math:`\frac{1}{\sum{w}} \sum_{i=1}^n {w_i\left({x_i - m}\right)^2 }` - **Note:** the result is normalized by ``sum(wv)`` without correction. + where ``n`` is the length of the input and ``m`` is the mean. + The unbiased estimate (when ``corrected=true``) of the population variance is computed by + replacing :math:`\frac{1}{\sum{w}}` with a factor dependent on the type of weights used: -.. function:: var(x, wv, dim[; mean=...]) + * ``AnalyticWeights``: :math:`\frac{1}{\sum w - \sum {w^2} / \sum w}` + * ``FrequencyWeights``: :math:`\frac{1}{\sum{w} - 1}` + * ``ProbabilityWeights``: :math:`\frac{n}{(n - 1) \sum w}` where ``n`` equals ``count(!iszero, w)`` + * ``Weights``: ``ArgumentError`` (bias correction not supported) - Weighted variance along a specific dimension. +.. function:: std(v, w, [dim][; mean=..., corrected=...]) -.. function:: std(x, wv[; mean=...]) + Compute the standard deviation of a real-valued array ``x``, optionally over a dimension ``dim``. + Observations in ``x`` are weighted using weight vector ``w``. + The uncorrected (when ``corrected=false``) sample standard deviation is defined as: - Compute weighted standard deviation. + :math:`\sqrt{\frac{1}{\sum{w}} \sum_{i=1}^n {w_i\left({x_i - m}\right)^2 }}` - One can set the keyword argument ``mean``, which can be either ``nothing`` (to compute the mean value within the function), ``0``, or a pre-computed mean value. + where ``n`` is the length of the input and ``m`` is the mean. + The unbiased estimate (when ``corrected=true``) of the population standard deviation is + computed by replacing :math:`\frac{1}{\sum{w}}` with a factor dependent on the type of + weights used: -.. function:: std(x, wv, dim[; mean=...]) + * ``AnalyticWeights``: :math:`\frac{1}{\sum w - \sum {w^2} / \sum w}` + * ``FrequencyWeights``: :math:`\frac{1}{\sum{w} - 1}` + * ``ProbabilityWeights``: :math:`\frac{n}{(n - 1) \sum w}` where ``n`` equals ``count(!iszero, w)`` + * ``Weights``: ``ArgumentError`` (bias correction not supported) - Weighted standard deviation along a specific dimension. +.. function:: mean_and_var(x[, w][, dim][; corrected=...]) -.. function:: mean_and_var(x[, wv][, dim]) + Jointly compute the mean and variance of a real-valued array ``x``, optionally over a dimension ``dim``, as a tuple. + Observations in ``x`` can be weighted using weight vector ``w``. + Finally, bias correction is be applied to the variance calculation if ``corrected=true``. + See ``var`` documentation for more details. - Jointly compute the mean and variance of ``x``. +.. function:: mean_and_std(x[, w][, dim][; corrected=...]) -.. function:: mean_and_std(x[, wv][, dim]) - - Jointly compute the mean and standard deviation of ``x``. + Jointly compute the mean and standard deviation of a real-valued array ``x``, optionally over a dimension ``dim``, as a tuple. + A weighting vector ``w`` can be specified to weight the estimates. + Finally, bias correction is applied to the standard deviation calculation if ``corrected=true``. + See ``std`` documentation for more details. .. function:: skewness(x[, wv]) Compute the (standardized) `skewness `_ of ``x``. - One can optionally supply a weight vector of type ``WeightVec`` (see :ref:`weightvec`). + One can optionally supply a weight vector of type ``AbstractWeights`` (see :ref:`weightvec`). .. function:: kurtosis(x[, wv]) Compute the (excessive) `kurtosis `_ of ``x``. - One can optionally supply a weight vector of type ``WeightVec`` (see :ref:`weightvec`). + One can optionally supply a weight vector of type ``AbstractWeights`` (see :ref:`weightvec`). .. function:: moment(x, k[, m][, wv]) Compute the ``k``-th order central moment of the values in `x`. It is the sample mean of ``(x - mean(x)).^k``. - One can optionally supply the center ``m``, and/or a weight vector of type ``WeightVec`` (see :ref:`weightvec`). + One can optionally supply the center ``m``, and/or a weight vector of type ``AbstractWeights`` (see :ref:`weightvec`). Measurements of Variation @@ -160,7 +179,7 @@ Quantile and Friends .. function:: median(x, w) - Compute the weighted median of ``x``, using weights given by a weight vector ``w`` (of type ``WeightVec``). The weight and data vectors must have the same length. The weighted median :math:`x_k` is the element of ``x`` that satisfies :math:`\sum_{x_i < x_k} w_i \le \frac{1}{2} \sum_{j} w_j` and :math:`\sum_{x_i > x_k} w_i \le \frac{1}{2} \sum_{j} w_j`. If a weight has value zero, then its associated data point is ignored. If none of the weights are positive, an error is thrown. ``NaN`` is returned if ``x`` contains any ``NaN`` values. An error is raised if ``w`` contains any ``NaN`` values. + Compute the weighted median of ``x``, using weights given by a weight vector ``w`` (of type ``AbstractWeights``). The weight and data vectors must have the same length. The weighted median :math:`x_k` is the element of ``x`` that satisfies :math:`\sum_{x_i < x_k} w_i \le \frac{1}{2} \sum_{j} w_j` and :math:`\sum_{x_i > x_k} w_i \le \frac{1}{2} \sum_{j} w_j`. If a weight has value zero, then its associated data point is ignored. If none of the weights are positive, an error is thrown. ``NaN`` is returned if ``x`` contains any ``NaN`` values. An error is raised if ``w`` contains any ``NaN`` values. **Examples:** @@ -171,8 +190,8 @@ Quantile and Friends .. function:: quantile(x, w, p) - Compute the weighted quantiles of a vector ``x`` at a specified set of probability values ``p``, using weights given by a weight vector ``w`` (of type ``WeightVec``). Weights must not be negative. The weights and data vectors must have the same length. The quantile for :math:`p` is defined as follows. Denoting :math:`S_k = (k-1)w_k + (n-1) \sum_{i (mean, cov) + mean_and_cov(x, [wv::AbstractWeights]; vardim=1, corrected=false) -> (mean, cov) Return the mean and covariance matrix as a tuple. A weighting vector `wv` can be specified. `vardim` that designates whether the variables are columns in the matrix (`1`) or rows (`2`). +Finally, bias correction is applied to the covariance calculation if +`corrected=true`. See [`cov`](@ref) documentation for more details. """ function mean_and_cov end -@static if VERSION < v"0.5.0-dev+679" - function scattermat(x::DenseMatrix; mean=nothing, vardim::Int=1) - mean == 0 ? scattermat_zm(x, vardim) : - mean == nothing ? scattermat_zm(x .- Base.mean(x, vardim), vardim) : - scattermat_zm(x .- mean, vardim) - end - - function scattermat(x::DenseMatrix, wv::WeightVec; mean=nothing, vardim::Int=1) - mean == 0 ? scattermat_zm(x, wv, vardim) : - mean == nothing ? scattermat_zm(x .- Base.mean(x, wv, vardim), wv, vardim) : - scattermat_zm(x .- mean, wv, vardim) - end +scattermatm(x::DenseMatrix, mean, vardim::Int=1) = + scattermat_zm(x .- mean, vardim) - ## weighted cov - Base.cov(x::DenseMatrix, wv::WeightVec; mean=nothing, vardim::Int=1) = - scale!(scattermat(x, wv; mean=mean, vardim=vardim), inv(sum(wv))) +scattermatm(x::DenseMatrix, mean, wv::AbstractWeights, vardim::Int=1) = + scattermat_zm(x .- mean, wv, vardim) - function mean_and_cov(x::DenseMatrix; vardim::Int=1) - m = mean(x, vardim) - return m, Base.covm(x, m; vardim=vardim) - end - function mean_and_cov(x::DenseMatrix, wv::WeightVec; vardim::Int=1) - m = mean(x, wv, vardim) - return m, Base.cov(x, wv; mean=m, vardim=vardim) - end -else - scattermatm(x::DenseMatrix, mean, vardim::Int=1) = - scattermat_zm(x .- mean, vardim) +scattermat(x::DenseMatrix, vardim::Int=1) = + scattermatm(x, Base.mean(x, vardim), vardim) - scattermatm(x::DenseMatrix, mean, wv::WeightVec, vardim::Int=1) = - scattermat_zm(x .- mean, wv, vardim) +scattermat(x::DenseMatrix, wv::AbstractWeights, vardim::Int=1) = + scattermatm(x, Base.mean(x, wv, vardim), wv, vardim) - scattermat(x::DenseMatrix, vardim::Int=1) = - scattermatm(x, Base.mean(x, vardim), vardim) +## weighted cov +Base.covm(x::DenseMatrix, mean, w::AbstractWeights, vardim::Int=1; + corrected::DepBool=nothing) = + scale!(scattermatm(x, mean, w, vardim), varcorrection(w, depcheck(:covm, corrected))) - scattermat(x::DenseMatrix, wv::WeightVec, vardim::Int=1) = - scattermatm(x, Base.mean(x, wv, vardim), wv, vardim) - ## weighted cov - Base.covm(x::DenseMatrix, mean, wv::WeightVec, vardim::Int=1) = - scale!(scattermatm(x, mean, wv, vardim), inv(sum(wv))) +Base.cov(x::DenseMatrix, w::AbstractWeights, vardim::Int=1; corrected::DepBool=nothing) = + Base.covm(x, Base.mean(x, w, vardim), w, vardim; corrected=depcheck(:cov, corrected)) - Base.cov(x::DenseMatrix, wv::WeightVec, vardim::Int=1) = - Base.covm(x, Base.mean(x, wv, vardim), wv, vardim) - function mean_and_cov(x::DenseMatrix, vardim::Int=1) - m = mean(x, vardim) - return m, Base.covm(x, m, vardim) - end - function mean_and_cov(x::DenseMatrix, wv::WeightVec, vardim::Int=1) - m = mean(x, wv, vardim) - return m, Base.cov(x, wv, vardim) - end +function mean_and_cov(x::DenseMatrix, vardim::Int=1; corrected::Bool=true) + m = mean(x, vardim) + return m, Base.covm(x, m, vardim, corrected) +end +function mean_and_cov(x::DenseMatrix, wv::AbstractWeights, vardim::Int=1; + corrected::DepBool=nothing) + m = mean(x, wv, vardim) + return m, Base.cov(x, wv, vardim; corrected=depcheck(:mean_and_cov, corrected)) end diff --git a/src/deprecates.jl b/src/deprecates.jl index 92cb047de..2336e4a8a 100644 --- a/src/deprecates.jl +++ b/src/deprecates.jl @@ -1,17 +1,18 @@ import Base.@deprecate import Base.depwarn +import Base.@deprecate_binding import Base.varm, Base.stdm -@deprecate varm(v::RealArray, m::Real, wv::WeightVec) varm(v, wv, m) -@deprecate varm(A::RealArray, M::RealArray, wv::WeightVec, dim::Int) varm(v, wv, m, dim) -@deprecate stdm(v::RealArray, m::Real, wv::WeightVec) stdm(v, wv, m) -@deprecate stdm(v::RealArray, m::RealArray, wv::WeightVec, dim::Int) stdm(v, wv, m, dim) +@deprecate varm(v::RealArray, m::Real, wv::AbstractWeights) varm(v, wv, m) +@deprecate varm(A::RealArray, M::RealArray, wv::AbstractWeights, dim::Int) varm(v, wv, m, dim) +@deprecate stdm(v::RealArray, m::Real, wv::AbstractWeights) stdm(v, wv, m) +@deprecate stdm(v::RealArray, m::RealArray, wv::AbstractWeights, dim::Int) stdm(v, wv, m, dim) -@deprecate _moment2(v::RealArray, m::Real, wv::WeightVec) _moment2(v, wv, m) -@deprecate _moment3(v::RealArray, m::Real, wv::WeightVec) _moment3(v, wv, m) -@deprecate _moment4(v::RealArray, m::Real, wv::WeightVec) _moment4(v, wv, m) -@deprecate _momentk(v::RealArray, k::Int, m::Real, wv::WeightVec) _momentk(v, k, wv, m) -@deprecate moment(v::RealArray, k::Int, m::Real, wv::WeightVec) moment(v, k, wv, m) +@deprecate _moment2(v::RealArray, m::Real, wv::AbstractWeights) _moment2(v, wv, m) +@deprecate _moment3(v::RealArray, m::Real, wv::AbstractWeights) _moment3(v, wv, m) +@deprecate _moment4(v::RealArray, m::Real, wv::AbstractWeights) _moment4(v, wv, m) +@deprecate _momentk(v::RealArray, k::Int, m::Real, wv::AbstractWeights) _momentk(v, k, wv, m) +@deprecate moment(v::RealArray, k::Int, m::Real, wv::AbstractWeights) moment(v, k, wv, m) @deprecate AIC(obj::StatisticalModel) aic(obj) @deprecate AICc(obj::StatisticalModel) aicc(obj) @@ -43,3 +44,5 @@ findat(a::AbstractArray, b::AbstractArray) = findat!(Array{Int}(size(b)), a, b) @deprecate df(obj::StatisticalModel) dof(obj) @deprecate df_residual(obj::StatisticalModel) dof_residual(obj) + +@deprecate_binding WeightVec Weights diff --git a/src/hist.jl b/src/hist.jl index 73b6a1f37..22fcb3a2f 100644 --- a/src/hist.jl +++ b/src/hist.jl @@ -218,20 +218,18 @@ Histogram(edge::AbstractVector, closed::Symbol=:default_left, isdensity::Bool=fa push!{T,E}(h::AbstractHistogram{T,1,E}, x::Real, w::Real) = push!(h, (x,), w) push!{T,E}(h::AbstractHistogram{T,1,E}, x::Real) = push!(h,x,one(T)) append!{T}(h::AbstractHistogram{T,1}, v::AbstractVector) = append!(h, (v,)) -append!{T}(h::AbstractHistogram{T,1}, v::AbstractVector, wv::Union{AbstractVector,WeightVec}) = append!(h, (v,), wv) - +append!{T}(h::AbstractHistogram{T,1}, v::AbstractVector, wv::Union{AbstractVector,AbstractWeights}) = append!(h, (v,), wv) fit{T}(::Type{Histogram{T}},v::AbstractVector, edg::AbstractVector; closed::Symbol=:default_left) = fit(Histogram{T},(v,), (edg,), closed=closed) fit{T}(::Type{Histogram{T}},v::AbstractVector; closed::Symbol=:default_left, nbins=sturges(length(v))) = fit(Histogram{T},(v,); closed=closed, nbins=nbins) -fit{T}(::Type{Histogram{T}},v::AbstractVector, wv::WeightVec, edg::AbstractVector; closed::Symbol=:default_left) = +fit{T}(::Type{Histogram{T}},v::AbstractVector, wv::AbstractWeights, edg::AbstractVector; closed::Symbol=:default_left) = fit(Histogram{T},(v,), wv, (edg,), closed=closed) -fit{T}(::Type{Histogram{T}},v::AbstractVector, wv::WeightVec; closed::Symbol=:default_left, nbins=sturges(length(v))) = +fit{T}(::Type{Histogram{T}},v::AbstractVector, wv::AbstractWeights; closed::Symbol=:default_left, nbins=sturges(length(v))) = fit(Histogram{T}, (v,), wv; closed=closed, nbins=nbins) -fit{W}(::Type{Histogram}, v::AbstractVector, wv::WeightVec{W}, args...; kwargs...) = fit(Histogram{W}, v, wv, args...; kwargs...) - +fit{W}(::Type{Histogram}, v::AbstractVector, wv::AbstractWeights{W}, args...; kwargs...) = fit(Histogram{W}, v, wv, args...; kwargs...) # N-dimensional @@ -269,7 +267,7 @@ function append!{T,N}(h::AbstractHistogram{T,N}, vs::NTuple{N,AbstractVector}, w end h end -append!{T,N}(h::AbstractHistogram{T,N}, vs::NTuple{N,AbstractVector}, wv::WeightVec) = append!(h, vs, values(wv)) +append!{T,N}(h::AbstractHistogram{T,N}, vs::NTuple{N,AbstractVector}, wv::AbstractWeights) = append!(h, vs, values(wv)) # Turn kwargs nbins into a type-stable tuple of integers: @@ -299,16 +297,16 @@ fit{T,N}(::Type{Histogram{T}}, vs::NTuple{N,AbstractVector}; closed::Symbol=:def fit(Histogram{T}, vs, histrange(vs,_nbins_tuple(vs, nbins),closed); closed=closed) end -fit{T,N,W}(::Type{Histogram{T}}, vs::NTuple{N,AbstractVector}, wv::WeightVec{W}, edges::NTuple{N,AbstractVector}; closed::Symbol=:default_left) = +fit{T,N,W}(::Type{Histogram{T}}, vs::NTuple{N,AbstractVector}, wv::AbstractWeights{W}, edges::NTuple{N,AbstractVector}; closed::Symbol=:default_left) = append!(Histogram(edges, T, _check_closed_arg(closed,:fit), false), vs, wv) -fit{T,N}(::Type{Histogram{T}}, vs::NTuple{N,AbstractVector}, wv::WeightVec; closed::Symbol=:default_left, nbins=sturges(length(vs[1]))) = begin +fit{T,N}(::Type{Histogram{T}}, vs::NTuple{N,AbstractVector}, wv::AbstractWeights; closed::Symbol=:default_left, nbins=sturges(length(vs[1]))) = begin closed = _check_closed_arg(closed,:fit) fit(Histogram{T}, vs, wv, histrange(vs,_nbins_tuple(vs, nbins),closed); closed=closed) end fit(::Type{Histogram}, args...; kwargs...) = fit(Histogram{Int}, args...; kwargs...) -fit{N,W}(::Type{Histogram}, vs::NTuple{N,AbstractVector}, wv::WeightVec{W}, args...; kwargs...) = fit(Histogram{W}, vs, wv, args...; kwargs...) +fit{N,W}(::Type{Histogram}, vs::NTuple{N,AbstractVector}, wv::AbstractWeights{W}, args...; kwargs...) = fit(Histogram{W}, vs, wv, args...; kwargs...) # Get a suitable high-precision type for the norm of a histogram. diff --git a/src/moments.jl b/src/moments.jl index f18b1fd02..de9d89666 100644 --- a/src/moments.jl +++ b/src/moments.jl @@ -1,52 +1,72 @@ ##### Weighted var & std ## var - """ - varm(x, wv::WeightVec, m, [dim]) - -Return the variance of a real-valued array `x` with a known mean `m`, optionally -over a dimension `dim`. The weighting vector `wv` specifies frequency weights -(also called case weights) for the result. - -This function differs from its counterpart in Base in that Bessel's correction -is not used. That is, here the denominator for the variance is `sum(wv)`, -whereas it's `length(x)-1` in `Base.varm`. The impact is that this is not a -weighted estimate of the population variance based on the sample; it's the weighted -variance of the sample. + varm(x, w::AbstractWeights, m, [dim]; corrected=false) + +Compute the variance of a real-valued array `x` with a known mean `m`, optionally +over a dimension `dim`. Observations in `x` are weighted using weight vector `w`. +The uncorrected (when `corrected=false`) sample variance is defined as: +```math +\\frac{1}{\\sum{w}} \\sum_{i=1}^n {w_i\\left({x_i - m}\\right)^2 } +``` +where ``n`` is the length of the input. The unbiased estimate (when `corrected=true`) of +the population variance is computed by replacing +``\\frac{1}{\\sum{w}}`` with a factor dependent on the type of weights used: +* `AnalyticWeights`: ``\\frac{1}{\\sum w - \\sum {w^2} / \\sum w}`` +* `FrequencyWeights`: ``\\frac{1}{\\sum{w} - 1}`` +* `ProbabilityWeights`: ``\\frac{n}{(n - 1) \\sum w}`` where ``n`` equals `count(!iszero, w)` +* `Weights`: `ArgumentError` (bias correction not supported) """ -Base.varm(v::RealArray, wv::WeightVec, m::Real) = _moment2(v, wv, m) +Base.varm(v::RealArray, w::AbstractWeights, m::Real; corrected::DepBool=nothing) = + _moment2(v, w, m; corrected=depcheck(:varm, corrected)) """ - var(x, wv::WeightVec, [dim]; mean=nothing) - -Return the variance of a real-valued array `x`, optionally over a dimension `dim`. -The weighting vector `wv` specifies frequency weights (also called case weights) -for the estimate. - -This function differs from its counterpart in Base in that Bessel's correction -is not used. That is, here the denominator for the variance is `sum(wv)`, -whereas it's `length(x)-1` in `Base.var`. The impact is that this is not a -weighted estimate of the population variance based on the sample; it's the weighted -variance of the sample. + var(x, w::AbstractWeights, [dim]; mean=nothing, corrected=false) + +Compute the variance of a real-valued array `x`, optionally over a dimension `dim`. +Observations in `x` are weighted using weight vector `w`. +The uncorrected (when `corrected=false`) sample variance is defined as: +```math +\\frac{1}{\\sum{w}} \\sum_{i=1}^n {w_i\\left({x_i - μ}\\right)^2 } +``` +where ``n`` is the length of the input and ``μ`` is the mean. +The unbiased estimate (when `corrected=true`) of the population variance is computed by +replacing ``\\frac{1}{\\sum{w}}`` with a factor dependent on the type of weights used: +* `AnalyticWeights`: ``\\frac{1}{\\sum w - \\sum {w^2} / \\sum w}`` +* `FrequencyWeights`: ``\\frac{1}{\\sum{w} - 1}`` +* `ProbabilityWeights`: ``\\frac{n}{(n - 1) \\sum w}`` where ``n`` equals `count(!iszero, w)` +* `Weights`: `ArgumentError` (bias correction not supported) """ -function Base.var(v::RealArray, wv::WeightVec; mean=nothing) - mean == 0 ? Base.varm(v, wv, 0) : - mean == nothing ? varm(v, wv, Base.mean(v, wv)) : - varm(v, wv, mean) +function Base.var(v::RealArray, w::AbstractWeights; mean=nothing, + corrected::DepBool=nothing) + corrected = depcheck(:var, corrected) + + if mean == nothing + varm(v, w, Base.mean(v, w); corrected=corrected) + else + varm(v, w, mean; corrected=corrected) + end end ## var along dim -Base.varm!(R::AbstractArray, A::RealArray, wv::WeightVec, M::RealArray, dim::Int) = - scale!(_wsum_centralize!(R, @functorize(abs2), A, values(wv), M, dim, true), inv(sum(wv))) +function Base.varm!(R::AbstractArray, A::RealArray, w::AbstractWeights, M::RealArray, + dim::Int; corrected::DepBool=nothing) + corrected = depcheck(:varm!, corrected) + scale!(_wsum_centralize!(R, abs2, A, values(w), M, dim, true), + varcorrection(w, corrected)) +end + +function var!(R::AbstractArray, A::RealArray, w::AbstractWeights, dim::Int; + mean=nothing, corrected::DepBool=nothing) + corrected = depcheck(:var!, corrected) -function var!(R::AbstractArray, A::RealArray, wv::WeightVec, dim::Int; mean=nothing) if mean == 0 - Base.varm!(R, A, wv, - Base.reducedim_initarray(A, dim, 0, eltype(R)), dim) + Base.varm!(R, A, w, Base.reducedim_initarray(A, dim, 0, eltype(R)), dim; + corrected=corrected) elseif mean == nothing - Base.varm!(R, A, wv, Base.mean(A, wv, dim), dim) + Base.varm!(R, A, w, Base.mean(A, w, dim), dim; corrected=corrected) else # check size of mean for i = 1:ndims(A) @@ -58,91 +78,170 @@ function var!(R::AbstractArray, A::RealArray, wv::WeightVec, dim::Int; mean=noth dM == dA || throw(DimensionMismatch("Incorrect size of mean.")) end end - Base.varm!(R, A, wv, mean, dim) + Base.varm!(R, A, w, mean, dim; corrected=corrected) end end -Base.varm(A::RealArray, wv::WeightVec, M::RealArray, dim::Int) = +function Base.varm(A::RealArray, w::AbstractWeights, M::RealArray, dim::Int; + corrected::DepBool=nothing) + corrected = depcheck(:varm, corrected) + @static if VERSION < v"0.6.0-dev.1121" - Base.varm!(similar(A, Float64, Base.reduced_dims(size(A), dim)), A, wv, M, dim) + Base.varm!(similar(A, Float64, Base.reduced_dims(size(A), dim)), A, w, M, dim; + corrected=corrected) else - Base.varm!(similar(A, Float64, Base.reduced_indices(indices(A), dim)), A, wv, M, dim) + Base.varm!(similar(A, Float64, Base.reduced_indices(indices(A), dim)), A, w, M, + dim; corrected=corrected) end +end + +function Base.var(A::RealArray, w::AbstractWeights, dim::Int; mean=nothing, + corrected::DepBool=nothing) + corrected = depcheck(:var, corrected) -Base.var(A::RealArray, wv::WeightVec, dim::Int; mean=nothing) = @static if VERSION < v"0.6.0-dev.1121" - var!(similar(A, Float64, Base.reduced_dims(size(A), dim)), A, wv, dim; mean=mean) + var!(similar(A, Float64, Base.reduced_dims(size(A), dim)), A, w, dim; mean=mean, + corrected=corrected) else - var!(similar(A, Float64, Base.reduced_indices(indices(A), dim)), A, wv, dim; mean=mean) + var!(similar(A, Float64, Base.reduced_indices(indices(A), dim)), A, w, dim; + mean=mean, corrected=corrected) end +end ## std """ - stdm(v, wv::WeightVec, m, [dim]) - -Return the standard deviation of a real-valued array `v` with a known mean `m`, -optionally over a dimension `dim`. The weighting vector `wv` specifies frequency -weights (also called case weights) for the estimate. + stdm(v, w::AbstractWeights, m, [dim]; corrected=false) + +Compute the standard deviation of a real-valued array `x` with a known mean `m`, +optionally over a dimension `dim`. Observations in `x` are weighted using weight vector `w`. +The uncorrected (when `corrected=false`) sample standard deviation is defined as: +```math +\\sqrt{\\frac{1}{\\sum{w}} \\sum_{i=1}^n {w_i\\left({x_i - m}\\right)^2 }} +``` +where ``n`` is the length of the input. The unbiased estimate (when `corrected=true`) of the +population standard deviation is computed by replacing ``\\frac{1}{\\sum{w}}`` with a factor +dependent on the type of weights used: +* `AnalyticWeights`: ``\\frac{1}{\\sum w - \\sum {w^2} / \\sum w}`` +* `FrequencyWeights`: ``\\frac{1}{\\sum{w} - 1}`` +* `ProbabilityWeights`: ``\\frac{n}{(n - 1) \\sum w}`` where ``n`` equals `count(!iszero, w)` +* `Weights`: `ArgumentError` (bias correction not supported) """ -Base.stdm(v::RealArray, wv::WeightVec, m::Real) = sqrt(varm(v, wv, m)) +Base.stdm(v::RealArray, w::AbstractWeights, m::Real; corrected::DepBool=nothing) = + sqrt(varm(v, w, m, corrected=depcheck(:stdm, corrected))) """ - std(v, wv::WeightVec, [dim]; mean=nothing) - -Return the standard deviation of a real-valued array `v`, optionally over a -dimension `dim`. The weighting vector `wv` specifies frequency weights (also -called case weights) for the estimate. + std(v, w::AbstractWeights, [dim]; mean=nothing, corrected=false) + +Compute the standard deviation of a real-valued array `x`, +optionally over a dimension `dim`. Observations in `x` are weighted using weight vector `w`. +The uncorrected (when `corrected=false`) sample standard deviation is defined as: +```math +\\sqrt{\\frac{1}{\\sum{w}} \\sum_{i=1}^n {w_i\\left({x_i - μ}\\right)^2 }} +``` +where ``n`` is the length of the input and ``μ`` is the mean. +The unbiased estimate (when `corrected=true`) of the population standard deviation is +computed by replacing ``\\frac{1}{\\sum{w}}`` with a factor dependent on the type of +weights used: +* `AnalyticWeights`: ``\\frac{1}{\\sum w - \\sum {w^2} / \\sum w}`` +* `FrequencyWeights`: ``\\frac{1}{\\sum{w} - 1}`` +* `ProbabilityWeights`: ``\\frac{n}{(n - 1) \\sum w}`` where ``n`` equals `count(!iszero, w)` +* `Weights`: `ArgumentError` (bias correction not supported) """ -Base.std(v::RealArray, wv::WeightVec; mean=nothing) = sqrt.(var(v, wv; mean=mean)) +Base.std(v::RealArray, w::AbstractWeights; mean=nothing, corrected::DepBool=nothing) = + sqrt.(var(v, w; mean=mean, corrected=depcheck(:std, corrected))) + +Base.stdm(v::RealArray, m::RealArray, dim::Int; corrected::DepBool=nothing) = + Base.sqrt!(varm(v, m, dim; corrected=depcheck(:stdm, corrected))) -Base.stdm(v::RealArray, m::RealArray, dim::Int) = Base.sqrt!(varm(v, m, dim)) -Base.stdm(v::RealArray, wv::WeightVec, m::RealArray, dim::Int) = sqrt.(varm(v, wv, m, dim)) -Base.std(v::RealArray, wv::WeightVec, dim::Int; mean=nothing) = sqrt.(var(v, wv, dim; mean=mean)) +Base.stdm(v::RealArray, w::AbstractWeights, m::RealArray, dim::Int; + corrected::DepBool=nothing) = + sqrt.(varm(v, w, m, dim; corrected=depcheck(:stdm, corrected))) +Base.std(v::RealArray, w::AbstractWeights, dim::Int; mean=nothing, + corrected::DepBool=nothing) = + sqrt.(var(v, w, dim; mean=mean, corrected=depcheck(:std, corrected))) ##### Fused statistics """ - mean_and_var(x, [wv::WeightVec], [dim]) -> (mean, var) + mean_and_var(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, var) Return the mean and variance of a real-valued array `x`, optionally over a dimension -`dim`, as a tuple. A weighting vector `wv` can be specified to weight the estimates. -The weights are assumed to be frequency weights, also called case weights. +`dim`, as a tuple. Observations in `x` can be weighted using weight vector `w`. +Finally, bias correction is be applied to the variance calculation if `corrected=true`. +See [`var`](@ref) documentation for more details. """ -mean_and_var(A::RealArray) = (m = mean(A); (m, varm(A, m))) +function mean_and_var(A::RealArray; corrected::Bool=true) + m = mean(A) + v = varm(A, m; corrected=corrected) + m, v +end """ - mean_and_std(x, [wv::WeightVec], [dim]) -> (mean, std) + mean_and_std(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, std) Return the mean and standard deviation of a real-valued array `x`, optionally -over a dimension `dim`, as a tuple. A weighting vector `wv` can be specified -to weight the estimates. The weights are assumed to be frequency weights, also -called case weights. +over a dimension `dim`, as a tuple. A weighting vector `w` can be specified +to weight the estimates. Finally, bias correction is applied to the +standard deviation calculation if `corrected=true`. +See [`std`](@ref) documentation for more details. """ -mean_and_std(A::RealArray) = (m = mean(A); (m, stdm(A, m))) +function mean_and_std(A::RealArray; corrected::Bool=true) + m = mean(A) + s = stdm(A, m; corrected=corrected) + m, s +end -mean_and_var(A::RealArray, wv::WeightVec) = (m = mean(A, wv); (m, varm(A, wv, m))) -mean_and_std(A::RealArray, wv::WeightVec) = (m = mean(A, wv); (m, stdm(A, wv, m))) +function mean_and_var(A::RealArray, w::AbstractWeights; corrected::DepBool=nothing) + m = mean(A, w) + v = varm(A, w, m; corrected=depcheck(:mean_and_var, corrected)) + m, v +end +function mean_and_std(A::RealArray, w::AbstractWeights; corrected::DepBool=nothing) + m = mean(A, w) + s = stdm(A, w, m; corrected=depcheck(:mean_and_std, corrected)) + m, s +end -mean_and_var(A::RealArray, dim::Int) = (m = mean(A, dim); (m, varm(A, m, dim))) -mean_and_std(A::RealArray, dim::Int) = (m = mean(A, dim); (m, stdm(A, m, dim))) -mean_and_var(A::RealArray, wv::WeightVec, dim::Int) = (m = mean(A, wv, dim); (m, varm(A, wv, m, dim))) -mean_and_std(A::RealArray, wv::WeightVec, dim::Int) = (m = mean(A, wv, dim); (m, stdm(A, wv, m, dim))) +function mean_and_var(A::RealArray, dim::Int; corrected::Bool=true) + m = mean(A, dim) + v = varm(A, m, dim; corrected=corrected) + m, v +end +function mean_and_std(A::RealArray, dim::Int; corrected::Bool=true) + m = mean(A, dim) + s = stdm(A, m, dim; corrected=corrected) + m, s +end -##### General central moment +function mean_and_var(A::RealArray, w::AbstractWeights, dim::Int; + corrected::DepBool=nothing) + m = mean(A, w, dim) + v = varm(A, w, m, dim; corrected=depcheck(:mean_and_var, corrected)) + m, v +end +function mean_and_std(A::RealArray, w::AbstractWeights, dim::Int; + corrected::DepBool=nothing) + m = mean(A, w, dim) + s = stdm(A, w, m, dim; corrected=depcheck(:mean_and_std, corrected)) + m, s +end -function _moment2(v::RealArray, m::Real) + + +##### General central moment +function _moment2(v::RealArray, m::Real; corrected=false) n = length(v) s = 0.0 for i = 1:n @inbounds z = v[i] - m s += z * z end - s / n + varcorrection(n, corrected) * s end -function _moment2(v::RealArray, wv::WeightVec, m::Real) +function _moment2(v::RealArray, wv::AbstractWeights, m::Real; corrected=false) n = length(v) s = 0.0 w = values(wv) @@ -150,7 +249,8 @@ function _moment2(v::RealArray, wv::WeightVec, m::Real) @inbounds z = v[i] - m @inbounds s += (z * z) * w[i] end - s / sum(wv) + + varcorrection(wv, corrected) * s end function _moment3(v::RealArray, m::Real) @@ -163,7 +263,7 @@ function _moment3(v::RealArray, m::Real) s / n end -function _moment3(v::RealArray, wv::WeightVec, m::Real) +function _moment3(v::RealArray, wv::AbstractWeights, m::Real) n = length(v) s = 0.0 w = values(wv) @@ -184,7 +284,7 @@ function _moment4(v::RealArray, m::Real) s / n end -function _moment4(v::RealArray, wv::WeightVec, m::Real) +function _moment4(v::RealArray, wv::AbstractWeights, m::Real) n = length(v) s = 0.0 w = values(wv) @@ -205,7 +305,7 @@ function _momentk(v::RealArray, k::Int, m::Real) s / n end -function _momentk(v::RealArray, k::Int, wv::WeightVec, m::Real) +function _momentk(v::RealArray, k::Int, wv::AbstractWeights, m::Real) n = length(v) s = 0.0 w = values(wv) @@ -218,7 +318,7 @@ end """ - moment(v, k, [wv::WeightVec], m=mean(v)) + moment(v, k, [wv::AbstractWeights], m=mean(v)) Return the `k`th order central moment of a real-valued array `v`, optionally specifying a weighting vector `wv` and a center `m`. @@ -230,7 +330,7 @@ function moment(v::RealArray, k::Int, m::Real) _momentk(v, k, m) end -function moment(v::RealArray, k::Int, wv::WeightVec, m::Real) +function moment(v::RealArray, k::Int, wv::AbstractWeights, m::Real) k == 2 ? _moment2(v, wv, m) : k == 3 ? _moment3(v, wv, m) : k == 4 ? _moment4(v, wv, m) : @@ -238,7 +338,9 @@ function moment(v::RealArray, k::Int, wv::WeightVec, m::Real) end moment(v::RealArray, k::Int) = moment(v, k, mean(v)) -moment(v::RealArray, k::Int, wv::WeightVec) = moment(v, k, wv, mean(v, wv)) +function moment(v::RealArray, k::Int, wv::AbstractWeights) + moment(v, k, wv, mean(v, wv)) +end ##### Skewness and Kurtosis @@ -246,7 +348,7 @@ moment(v::RealArray, k::Int, wv::WeightVec) = moment(v, k, wv, mean(v, wv)) # Skewness # This is Type 1 definition according to Joanes and Gill (1998) """ - skewness(v, [wv::WeightVec], m=mean(v)) + skewness(v, [wv::AbstractWeights], m=mean(v)) Compute the standardized skewness of a real-valued array `v`, optionally specifying a weighting vector `wv` and a center `m`. @@ -267,7 +369,7 @@ function skewness(v::RealArray, m::Real) return cm3 / sqrt(cm2 * cm2 * cm2) # this is much faster than cm2^1.5 end -function skewness(v::RealArray, wv::WeightVec, m::Real) +function skewness(v::RealArray, wv::AbstractWeights, m::Real) n = length(v) length(wv) == n || throw(DimensionMismatch("Inconsistent array lengths.")) cm2 = 0.0 # empirical 2nd centered moment (variance) @@ -289,12 +391,12 @@ function skewness(v::RealArray, wv::WeightVec, m::Real) end skewness(v::RealArray) = skewness(v, mean(v)) -skewness(v::RealArray, wv::WeightVec) = skewness(v, wv, mean(v, wv)) +skewness(v::RealArray, wv::AbstractWeights) = skewness(v, wv, mean(v, wv)) # (excessive) Kurtosis # This is Type 1 definition according to Joanes and Gill (1998) """ - kurtosis(v, [wv::WeightVec], m=mean(v)) + kurtosis(v, [wv::AbstractWeights], m=mean(v)) Compute the excess kurtosis of a real-valued array `v`, optionally specifying a weighting vector `wv` and a center `m`. @@ -314,7 +416,7 @@ function kurtosis(v::RealArray, m::Real) return (cm4 / (cm2 * cm2)) - 3.0 end -function kurtosis(v::RealArray, wv::WeightVec, m::Real) +function kurtosis(v::RealArray, wv::AbstractWeights, m::Real) n = length(v) length(wv) == n || throw(DimensionMismatch("Inconsistent array lengths.")) cm2 = 0.0 # empirical 2nd centered moment (variance) @@ -337,5 +439,4 @@ function kurtosis(v::RealArray, wv::WeightVec, m::Real) end kurtosis(v::RealArray) = kurtosis(v, mean(v)) -kurtosis(v::RealArray, wv::WeightVec) = kurtosis(v, wv, mean(v, wv)) - +kurtosis(v::RealArray, wv::AbstractWeights) = kurtosis(v, wv, mean(v, wv)) diff --git a/src/sampling.jl b/src/sampling.jl index 4a134472a..c2eacb2c0 100644 --- a/src/sampling.jl +++ b/src/sampling.jl @@ -266,7 +266,7 @@ seqsample_c!(a::AbstractArray, x::AbstractArray) = seqsample_c!(Base.GLOBAL_RNG, ### Interface functions (poly-algorithms) """ - sample([rng], a, [wv::WeightVec]) + sample([rng], a, [wv::AbstractWeights]) Select a single random element of `a`. Sampling probabilities are proportional to the weights given in `wv`, if provided. @@ -279,7 +279,7 @@ sample(a::AbstractArray) = sample(Base.GLOBAL_RNG, a) """ - sample!([rng], a, [wv::WeightVec], x; replace=true, ordered=false) + sample!([rng], a, [wv::AbstractWeights], x; replace=true, ordered=false) Draw a random sample of `length(x)` elements from an array `a` and store the result in `x`. A polyalgorithm is used for sampling. @@ -332,7 +332,7 @@ sample!(a::AbstractArray, x::AbstractArray; replace::Bool=true, ordered::Bool=fa """ - sample([rng], a, [wv::WeightVec], n::Integer; replace=true, ordered=false) + sample([rng], a, [wv::AbstractWeights], n::Integer; replace=true, ordered=false) Select a random, optionally weighted sample of size `n` from an array `a` using a polyalgorithm. Sampling probabilities are proportional to the weights @@ -352,7 +352,7 @@ sample(a::AbstractArray, n::Integer; replace::Bool=true, ordered::Bool=false) = """ - sample([rng], a, [wv::WeightVec], dims::Dims; replace=true, ordered=false) + sample([rng], a, [wv::AbstractWeights], dims::Dims; replace=true, ordered=false) Select a random, optionally weighted sample from an array `a` specifying the dimensions `dims` of the output array. Sampling probabilities are @@ -377,7 +377,7 @@ sample(a::AbstractArray, dims::Dims; replace::Bool=true, ordered::Bool=false) = ################################################################ """ - sample([rng], wv::WeightVec) + sample([rng], wv::AbstractWeights) Select a single random integer in `1:length(wv)` with probabilities proportional to the weights given in `wv`. @@ -385,7 +385,7 @@ proportional to the weights given in `wv`. Optionally specify a random number generator ``rng`` as the first argument (defaults to ``Base.GLOBAL_RNG``). """ -function sample(rng::AbstractRNG, wv::WeightVec) +function sample(rng::AbstractRNG, wv::AbstractWeights) t = rand(rng) * sum(wv) w = values(wv) n = length(w) @@ -397,13 +397,13 @@ function sample(rng::AbstractRNG, wv::WeightVec) end return i end -sample(wv::WeightVec) = sample(Base.GLOBAL_RNG, wv) +sample(wv::AbstractWeights) = sample(Base.GLOBAL_RNG, wv) -sample(rng::AbstractRNG, a::AbstractArray, wv::WeightVec) = a[sample(rng, wv)] -sample(a::AbstractArray, wv::WeightVec) = sample(Base.GLOBAL_RNG, a, wv) +sample(rng::AbstractRNG, a::AbstractArray, wv::AbstractWeights) = a[sample(rng, wv)] +sample(a::AbstractArray, wv::AbstractWeights) = sample(Base.GLOBAL_RNG, a, wv) function direct_sample!(rng::AbstractRNG, a::AbstractArray, - wv::WeightVec, x::AbstractArray) + wv::AbstractWeights, x::AbstractArray) n = length(a) length(wv) == n || throw(DimensionMismatch("Inconsistent lengths.")) for i = 1:length(x) @@ -411,7 +411,7 @@ function direct_sample!(rng::AbstractRNG, a::AbstractArray, end return x end -direct_sample!(a::AbstractArray, wv::WeightVec, x::AbstractArray) = +direct_sample!(a::AbstractArray, wv::AbstractWeights, x::AbstractArray) = direct_sample!(Base.GLOBAL_RNG, a, wv, x) function make_alias_table!(w::AbstractVector{Float64}, wsum::Float64, @@ -473,7 +473,7 @@ function make_alias_table!(w::AbstractVector{Float64}, wsum::Float64, nothing end -function alias_sample!(rng::AbstractRNG, a::AbstractArray, wv::WeightVec, x::AbstractArray) +function alias_sample!(rng::AbstractRNG, a::AbstractArray, wv::AbstractWeights, x::AbstractArray) n = length(a) length(wv) == n || throw(DimensionMismatch("Inconsistent lengths.")) @@ -490,11 +490,11 @@ function alias_sample!(rng::AbstractRNG, a::AbstractArray, wv::WeightVec, x::Abs end return x end -alias_sample!(a::AbstractArray, wv::WeightVec, x::AbstractArray) = +alias_sample!(a::AbstractArray, wv::AbstractWeights, x::AbstractArray) = alias_sample!(Base.GLOBAL_RNG, a, wv, x) function naive_wsample_norep!(rng::AbstractRNG, a::AbstractArray, - wv::WeightVec, x::AbstractArray) + wv::AbstractWeights, x::AbstractArray) n = length(a) length(wv) == n || throw(DimensionMismatch("Inconsistent lengths.")) k = length(x) @@ -517,7 +517,7 @@ function naive_wsample_norep!(rng::AbstractRNG, a::AbstractArray, end return x end -naive_wsample_norep!(a::AbstractArray, wv::WeightVec, x::AbstractArray) = +naive_wsample_norep!(a::AbstractArray, wv::AbstractWeights, x::AbstractArray) = naive_wsample_norep!(Base.GLOBAL_RNG, a, wv, x) # Weighted sampling without replacement @@ -530,7 +530,7 @@ naive_wsample_norep!(a::AbstractArray, wv::WeightVec, x::AbstractArray) = # # Instead of keys u^(1/w) where u = random(0,1) keys w/v where v = randexp(1) are used. function efraimidis_a_wsample_norep!(rng::AbstractRNG, a::AbstractArray, - wv::WeightVec, x::AbstractArray) + wv::AbstractWeights, x::AbstractArray) n = length(a) length(wv) == n || throw(DimensionMismatch("a and wv must be of same length (got $n and $(length(wv))).")) k = length(x) @@ -548,7 +548,7 @@ function efraimidis_a_wsample_norep!(rng::AbstractRNG, a::AbstractArray, end return x end -efraimidis_a_wsample_norep!(a::AbstractArray, wv::WeightVec, x::AbstractArray) = +efraimidis_a_wsample_norep!(a::AbstractArray, wv::AbstractWeights, x::AbstractArray) = efraimidis_a_wsample_norep!(Base.GLOBAL_RNG, a, wv, x) # Weighted sampling without replacement @@ -561,7 +561,7 @@ efraimidis_a_wsample_norep!(a::AbstractArray, wv::WeightVec, x::AbstractArray) = # # Instead of keys u^(1/w) where u = random(0,1) keys w/v where v = randexp(1) are used. function efraimidis_ares_wsample_norep!(rng::AbstractRNG, a::AbstractArray, - wv::WeightVec, x::AbstractArray) + wv::AbstractWeights, x::AbstractArray) n = length(a) length(wv) == n || throw(DimensionMismatch("a and wv must be of same length (got $n and $(length(wv))).")) k = length(x) @@ -609,7 +609,7 @@ function efraimidis_ares_wsample_norep!(rng::AbstractRNG, a::AbstractArray, end return x end -efraimidis_ares_wsample_norep!(a::AbstractArray, wv::WeightVec, x::AbstractArray) = +efraimidis_ares_wsample_norep!(a::AbstractArray, wv::AbstractWeights, x::AbstractArray) = efraimidis_ares_wsample_norep!(Base.GLOBAL_RNG, a, wv, x) # Weighted sampling without replacement @@ -622,7 +622,7 @@ efraimidis_ares_wsample_norep!(a::AbstractArray, wv::WeightVec, x::AbstractArray # # Instead of keys u^(1/w) where u = random(0,1) keys w/v where v = randexp(1) are used. function efraimidis_aexpj_wsample_norep!(rng::AbstractRNG, a::AbstractArray, - wv::WeightVec, x::AbstractArray) + wv::AbstractWeights, x::AbstractArray) n = length(a) length(wv) == n || throw(DimensionMismatch("a and wv must be of same length (got $n and $(length(wv))).")) k = length(x) @@ -671,10 +671,10 @@ function efraimidis_aexpj_wsample_norep!(rng::AbstractRNG, a::AbstractArray, end return x end -efraimidis_aexpj_wsample_norep!(a::AbstractArray, wv::WeightVec, x::AbstractArray) = +efraimidis_aexpj_wsample_norep!(a::AbstractArray, wv::AbstractWeights, x::AbstractArray) = efraimidis_aexpj_wsample_norep!(Base.GLOBAL_RNG, a, wv, x) -function sample!(rng::AbstractRNG, a::AbstractArray, wv::WeightVec, x::AbstractArray; +function sample!(rng::AbstractRNG, a::AbstractArray, wv::AbstractWeights, x::AbstractArray; replace::Bool=true, ordered::Bool=false) n = length(a) k = length(x) @@ -704,20 +704,20 @@ function sample!(rng::AbstractRNG, a::AbstractArray, wv::WeightVec, x::AbstractA end return x end -sample!(a::AbstractArray, wv::WeightVec, x::AbstractArray) = +sample!(a::AbstractArray, wv::AbstractWeights, x::AbstractArray) = sample!(Base.GLOBAL_RNG, a, wv, x) -sample{T}(rng::AbstractRNG, a::AbstractArray{T}, wv::WeightVec, n::Integer; +sample{T}(rng::AbstractRNG, a::AbstractArray{T}, wv::AbstractWeights, n::Integer; replace::Bool=true, ordered::Bool=false) = sample!(rng, a, wv, Vector{T}(n); replace=replace, ordered=ordered) -sample(a::AbstractArray, wv::WeightVec, n::Integer; +sample(a::AbstractArray, wv::AbstractWeights, n::Integer; replace::Bool=true, ordered::Bool=false) = sample(Base.GLOBAL_RNG, a, wv, n; replace=replace, ordered=ordered) -sample{T}(rng::AbstractRNG, a::AbstractArray{T}, wv::WeightVec, dims::Dims; +sample{T}(rng::AbstractRNG, a::AbstractArray{T}, wv::AbstractWeights, dims::Dims; replace::Bool=true, ordered::Bool=false) = sample!(rng, a, wv, Array{T}(dims); replace=replace, ordered=ordered) -sample(a::AbstractArray, wv::WeightVec, dims::Dims; +sample(a::AbstractArray, wv::AbstractWeights, dims::Dims; replace::Bool=true, ordered::Bool=false) = sample!(Base.GLOBAL_RNG, a, wv, Array{T}(dims); replace=replace, ordered=ordered) diff --git a/src/weights.jl b/src/weights.jl index b542942b3..f59dbec94 100644 --- a/src/weights.jl +++ b/src/weights.jl @@ -2,44 +2,208 @@ ###### Weight vector ##### if VERSION < v"0.6.0-dev.2123" - immutable WeightVec{S<:Real, T<:Real, V<:RealVector} <: RealVector{T} - values::V - sum::S - end + @compat abstract type AbstractWeights{S<:Real, T<:Real, V<:RealVector} <: RealVector{T} end else - immutable WeightVec{S<:Real, T<:Real, V<:AbstractVector{T}} <: AbstractVector{T} - values::V - sum::S - end + @compat abstract type AbstractWeights{S<:Real, T<:Real, V<:AbstractVector{T}} <: AbstractVector{T} end end """ - WeightVec(vs, [wsum]) + @weights name -Construct a `WeightVec` with weight values `vs` and sum of weights `wsum`. -If omitted, `wsum` is computed. +Generates a new generic weight type with specified `name`, which subtypes `AbstractWeights` +and stores the `values` (`V<:RealVector`) and `sum` (`S<:Real`). """ -function WeightVec{S<:Real, V<:RealVector}(vs::V, s::S=sum(vs)) - return WeightVec{S, eltype(vs), V}(vs, s) +macro weights(name) + return quote + if VERSION < v"0.6.0-dev.2123" + immutable $name{S<:Real, T<:Real, V<:RealVector} <: AbstractWeights{S, T, V} + values::V + sum::S + end + else + immutable $name{S<:Real, T<:Real, V<:AbstractVector{T}} <: AbstractWeights{S, T, V} + values::V + sum::S + end + end + end end +eltype(wv::AbstractWeights) = eltype(wv.values) +length(wv::AbstractWeights) = length(wv.values) +values(wv::AbstractWeights) = wv.values +sum(wv::AbstractWeights) = wv.sum +isempty(wv::AbstractWeights) = isempty(wv.values) + +Base.getindex(wv::AbstractWeights, i) = getindex(wv.values, i) +Base.size(wv::AbstractWeights) = size(wv.values) + +""" + varcorrection(n::Integer, corrected=false) + +Compute a bias correction factor for calculating `var`, `std` and `cov` with +`n` observations. Returns ``\\frac{1}{n - 1}`` when `corrected=true` +(i.e. [Bessel's correction](https://en.wikipedia.org/wiki/Bessel's_correction)), +otherwise returns ``\\frac{1}{n}`` (i.e. no correction). +""" +@inline varcorrection(n::Integer, corrected::Bool=false) = 1 / (n - Int(corrected)) + +@weights Weights + +""" + Weights(vs, wsum=sum(vs)) + +Construct a `Weights` vector with weight values `vs`. +A precomputed sum may be provided as `wsum`. + +The `Weights` type describes a generic weights vector which does not support +all operations possible for [`FrequencyWeights`](@ref), [`AnalyticWeights`](@ref) +and [`ProbabilityWeights`](@ref). +""" +Weights{S<:Real, V<:RealVector}(vs::V, s::S=sum(vs)) = Weights{S, eltype(vs), V}(vs, s) + """ weights(vs) -Construct a `WeightVec` from a given array. +Construct a `Weights` vector from array `vs`. +See the documentation for [`Weights`](@ref) for more details. """ -weights(vs::RealVector) = WeightVec(vs) -weights(vs::RealArray) = WeightVec(vec(vs)) +weights(vs::RealVector) = Weights(vs) +weights(vs::RealArray) = Weights(vec(vs)) -eltype(wv::WeightVec) = eltype(wv.values) -length(wv::WeightVec) = length(wv.values) -values(wv::WeightVec) = wv.values -sum(wv::WeightVec) = wv.sum -isempty(wv::WeightVec) = isempty(wv.values) +""" + varcorrection(w::Weights, corrected=false) + +Returns ``\\frac{1}{\\sum w}`` when `corrected=false` and throws an `ArgumentError` +if `corrected=true`. +""" +@inline function varcorrection(w::Weights, corrected::Bool=false) + corrected && throw(ArgumentError("Weights type does not support bias correction: " * + "use FrequencyWeights, AnalyticWeights or ProbabilityWeights if applicable.")) + 1 / w.sum +end + +@weights AnalyticWeights -Base.getindex(wv::WeightVec, i) = getindex(wv.values, i) -Base.size(wv::WeightVec) = size(wv.values) +""" + AnalyticWeights(vs, wsum=sum(vs)) + +Construct an `AnalyticWeights` vector with weight values `vs`. +A precomputed sum may be provided as `wsum`. +Analytic weights describe a non-random relative importance (usually between 0 and 1) +for each observation. These weights may also be referred to as reliability weights, +precision weights or inverse variance weights. These are typically used when the observations +being weighted are aggregate values (e.g., averages) with differing variances. +""" +AnalyticWeights{S<:Real, V<:RealVector}(vs::V, s::S=sum(vs)) = + AnalyticWeights{S, eltype(vs), V}(vs, s) + +""" + aweights(vs) + +Construct an `AnalyticWeights` vector from array `vs`. +See the documentation for [`AnalyticWeights`](@ref) for more details. +""" +aweights(vs::RealVector) = AnalyticWeights(vs) +aweights(vs::RealArray) = AnalyticWeights(vec(vs)) + +""" + varcorrection(w::AnalyticWeights, corrected=false) + +* `corrected=true`: ``\\frac{1}{\\sum w - \\sum {w^2} / \\sum w}`` +* `corrected=false`: ``\\frac{1}{\\sum w}`` +""" +@inline function varcorrection(w::AnalyticWeights, corrected::Bool=false) + s = w.sum + + if corrected + sum_sn = sum(x -> (x / s) ^ 2, w) + 1 / (s * (1 - sum_sn)) + else + 1 / s + end +end + +@weights FrequencyWeights + +""" + FrequencyWeights(vs, wsum=sum(vs)) + +Construct a `FrequencyWeights` vector with weight values `vs`. +A precomputed sum may be provided as `wsum`. + +Frequency weights describe the number of times (or frequency) each observation +was observed. These weights may also be referred to as case weights or repeat weights. +""" +FrequencyWeights{S<:Real, V<:RealVector}(vs::V, s::S=sum(vs)) = + FrequencyWeights{S, eltype(vs), V}(vs, s) + +""" + fweights(vs) + +Construct a `FrequencyWeights` vector from a given array. +See the documentation for [`FrequencyWeights`](@ref) for more details. +""" +fweights(vs::RealVector) = FrequencyWeights(vs) +fweights(vs::RealArray) = FrequencyWeights(vec(vs)) + +""" + varcorrection(w::FrequencyWeights, corrected=false) + +* `corrected=true`: ``\\frac{1}{\\sum{w} - 1}`` +* `corrected=false`: ``\\frac{1}{\\sum w}`` +""" +@inline function varcorrection(w::FrequencyWeights, corrected::Bool=false) + s = w.sum + + if corrected + 1 / (s - 1) + else + 1 / s + end +end + +@weights ProbabilityWeights + +""" + ProbabilityWeights(vs, wsum=sum(vs)) + +Construct a `ProbabilityWeights` vector with weight values `vs`. +A precomputed sum may be provided as `wsum`. + +Probability weights represent the inverse of the sampling probability for each observation, +providing a correction mechanism for under- or over-sampling certain population groups. +These weights may also be referred to as sampling weights. +""" +ProbabilityWeights{S<:Real, V<:RealVector}(vs::V, s::S=sum(vs)) = + ProbabilityWeights{S, eltype(vs), V}(vs, s) + +""" + pweights(vs) + +Construct a `ProbabilityWeights` vector from a given array. +See the documentation for [`ProbabilityWeights`](@ref) for more details. +""" +pweights(vs::RealVector) = ProbabilityWeights(vs) +pweights(vs::RealArray) = ProbabilityWeights(vec(vs)) + +""" + varcorrection(w::ProbabilityWeights, corrected=false) + +* `corrected=true`: ``\\frac{n}{(n - 1) \\sum w}`` where ``n`` equals `count(!iszero, w)` +* `corrected=false`: ``\\frac{1}{\\sum w}`` +""" +@inline function varcorrection(w::ProbabilityWeights, corrected::Bool=false) + s = w.sum + + if corrected + n = count(!iszero, w) + n / (s * (n - 1)) + else + 1 / s + end +end ##### Weighted sum ##### @@ -54,9 +218,9 @@ wsum(v::AbstractVector, w::AbstractVector) = dot(v, w) wsum(v::AbstractArray, w::AbstractVector) = dot(vec(v), w) # Note: the methods for BitArray and SparseMatrixCSC are to avoid ambiguities -Base.sum(v::BitArray, w::WeightVec) = wsum(v, values(w)) -Base.sum(v::SparseMatrixCSC, w::WeightVec) = wsum(v, values(w)) -Base.sum(v::AbstractArray, w::WeightVec) = dot(v, values(w)) +Base.sum(v::BitArray, w::AbstractWeights) = wsum(v, values(w)) +Base.sum(v::SparseMatrixCSC, w::AbstractWeights) = wsum(v, values(w)) +Base.sum(v::AbstractArray, w::AbstractWeights) = dot(v, values(w)) ## wsum along dimension # @@ -255,10 +419,10 @@ end # extended sum! and wsum -Base.sum!{W<:Real}(R::AbstractArray, A::AbstractArray, w::WeightVec{W}, dim::Int; init::Bool=true) = +Base.sum!{W<:Real}(R::AbstractArray, A::AbstractArray, w::AbstractWeights{W}, dim::Int; init::Bool=true) = wsum!(R, A, values(w), dim; init=init) -Base.sum{T<:Number,W<:Real}(A::AbstractArray{T}, w::WeightVec{W}, dim::Int) = wsum(A, values(w), dim) +Base.sum{T<:Number,W<:Real}(A::AbstractArray{T}, w::AbstractWeights{W}, dim::Int) = wsum(A, values(w), dim) ###### Weighted means ##### @@ -273,15 +437,15 @@ function wmean{T<:Number}(v::AbstractArray{T}, w::AbstractVector) mean(v, weights(w)) end -Base.mean(v::AbstractArray, w::WeightVec) = sum(v, w) / sum(w) +Base.mean(v::AbstractArray, w::AbstractWeights) = sum(v, w) / sum(w) -Base.mean!(R::AbstractArray, A::AbstractArray, w::WeightVec, dim::Int) = +Base.mean!(R::AbstractArray, A::AbstractArray, w::AbstractWeights, dim::Int) = scale!(Base.sum!(R, A, w, dim), inv(sum(w))) wmeantype{T,W}(::Type{T}, ::Type{W}) = typeof((zero(T)*zero(W) + zero(T)*zero(W)) / one(W)) wmeantype{T<:BlasReal}(::Type{T}, ::Type{T}) = T -Base.mean{T<:Number,W<:Real}(A::AbstractArray{T}, w::WeightVec{W}, dim::Int) = +Base.mean{T<:Number,W<:Real}(A::AbstractArray{T}, w::AbstractWeights{W}, dim::Int) = @static if VERSION < v"0.6.0-dev.1121" mean!(similar(A, wmeantype(T, W), Base.reduced_dims(size(A), dim)), A, w, dim) else @@ -290,11 +454,11 @@ Base.mean{T<:Number,W<:Real}(A::AbstractArray{T}, w::WeightVec{W}, dim::Int) = ###### Weighted median ##### -function Base.median(v::AbstractArray, w::WeightVec) +function Base.median(v::AbstractArray, w::AbstractWeights) throw(MethodError(median, (v, w))) end -function Base.median{W<:Real}(v::RealVector, w::WeightVec{W}) +function Base.median{W<:Real}(v::RealVector, w::AbstractWeights{W}) isempty(v) && error("median of an empty array is undefined") if length(v) != length(w) error("data and weight vectors must be the same size") @@ -345,10 +509,10 @@ end wmedian(v, w) Compute the weighted median of an array `v` with weights `w`, given as either a -vector or `WeightVec`. +vector or an `AbstractWeights` vector. """ wmedian(v::RealVector, w::RealVector) = median(v, weights(w)) -wmedian{W<:Real}(v::RealVector, w::WeightVec{W}) = median(v, w) +wmedian{W<:Real}(v::RealVector, w::AbstractWeights{W}) = median(v, w) ###### Weighted quantile ##### @@ -358,11 +522,11 @@ wmedian{W<:Real}(v::RealVector, w::WeightVec{W}) = median(v, w) # Here there is a supplementary function from index to weighted index k -> Sk """ - quantile(v, w::WeightVec, p) + quantile(v, w::AbstractWeights, p) Compute `p`th quantile(s) of `v` with weights `w`. """ -function quantile{V, W <: Real}(v::RealVector{V}, w::WeightVec{W}, p::RealVector) +function quantile{V, W <: Real}(v::RealVector{V}, w::AbstractWeights{W}, p::RealVector) # checks isempty(v) && error("quantile of an empty array is undefined") @@ -430,16 +594,16 @@ function bound_quantiles{T <: Real}(qs::AbstractVector{T}) T[min(one(T), max(zero(T), q)) for q = qs] end -quantile{W <: Real}(v::RealVector, w::WeightVec{W}, p::Number) = quantile(v, w, [p])[1] +quantile{W <: Real}(v::RealVector, w::AbstractWeights{W}, p::Number) = quantile(v, w, [p])[1] """ wquantile(v, w, p) Compute the `p`th quantile(s) of `v` with weights `w`, given as either a vector -or a `WeightVec`. +or an `AbstractWeights` vector. """ -wquantile{W <: Real}(v::RealVector, w::WeightVec{W}, p::RealVector) = quantile(v, w, p) -wquantile{W <: Real}(v::RealVector, w::WeightVec{W}, p::Number) = quantile(v, w, [p])[1] +wquantile{W <: Real}(v::RealVector, w::AbstractWeights{W}, p::RealVector) = quantile(v, w, p) +wquantile{W <: Real}(v::RealVector, w::AbstractWeights{W}, p::Number) = quantile(v, w, [p])[1] wquantile(v::RealVector, w::RealVector, p::RealVector) = quantile(v, weights(w), p) wquantile(v::RealVector, w::RealVector, p::Number) = quantile(v, weights(w), [p])[1] diff --git a/test/cov.jl b/test/cov.jl index de02651ea..bb2881917 100644 --- a/test/cov.jl +++ b/test/cov.jl @@ -1,149 +1,158 @@ using StatsBase using Base.Test -X = randn(3, 8) +@testset "StatsBase.Covariance" begin +weight_funcs = (weights, aweights, fweights, pweights) -Z1 = X .- mean(X, 1) -Z2 = X .- mean(X, 2) +@testset "$f" for f in weight_funcs + X = randn(3, 8) -w1 = rand(3) -w2 = rand(8) + Z1 = X .- mean(X, 1) + Z2 = X .- mean(X, 2) -wv1 = weights(w1) -wv2 = weights(w2) + w1 = rand(3) + w2 = rand(8) -Z1w = X .- mean(X, wv1, 1) -Z2w = X .- mean(X, wv2, 2) + wv1 = f(w1) + wv2 = f(w2) -## reference results + Z1w = X .- mean(X, wv1, 1) + Z2w = X .- mean(X, wv2, 2) -S1 = Z1'Z1 -S2 = Z2 * Z2' + ## reference results -Sz1 = X'X -Sz2 = X * X' + S1 = Z1'Z1 + S2 = Z2 * Z2' -S1w = Z1w' * diagm(w1) * Z1w -S2w = Z2w * diagm(w2) * Z2w' + Sz1 = X'X + Sz2 = X * X' -Sz1w = X' * diagm(w1) * X -Sz2w = X * diagm(w2) * X' + S1w = Z1w' * diagm(w1) * Z1w + S2w = Z2w * diagm(w2) * Z2w' -## scattermat + Sz1w = X' * diagm(w1) * X + Sz2w = X * diagm(w2) * X' -if VERSION < v"0.5.0-dev+679" - @test scattermat(X) ≈ S1 - @test scattermat(X; vardim=2) ≈ S2 + @testset "Scattermat" begin + @test scattermat(X) ≈ S1 + @test scattermat(X, 2) ≈ S2 - @test scattermat(X; mean=0) ≈ Sz1 - @test scattermat(X; mean=0, vardim=2) ≈ Sz2 + @test StatsBase.scattermatm(X, 0) ≈ Sz1 + @test StatsBase.scattermatm(X, 0, 2) ≈ Sz2 - @test scattermat(X; mean=mean(X,1)) ≈ S1 - @test scattermat(X; mean=mean(X,2), vardim=2) ≈ S2 + @test StatsBase.scattermatm(X, mean(X,1)) ≈ S1 + @test StatsBase.scattermatm(X, mean(X,2), 2) ≈ S2 - @test scattermat(X; mean=zeros(1,8)) ≈ Sz1 - @test scattermat(X; mean=zeros(3), vardim=2) ≈ Sz2 + @test StatsBase.scattermatm(X, zeros(1,8)) ≈ Sz1 + @test StatsBase.scattermatm(X, zeros(3), 2) ≈ Sz2 - ## weighted scatter mat + @testset "Weighted" begin + @test scattermat(X, wv1) ≈ S1w + @test scattermat(X, wv2, 2) ≈ S2w - @test scattermat(X, wv1) ≈ S1w - @test scattermat(X, wv2; vardim=2) ≈ S2w + @test StatsBase.scattermatm(X, 0, wv1) ≈ Sz1w + @test StatsBase.scattermatm(X, 0, wv2, 2) ≈ Sz2w - @test scattermat(X, wv1; mean=0) ≈ Sz1w - @test scattermat(X, wv2; mean=0, vardim=2) ≈ Sz2w + @test StatsBase.scattermatm(X, mean(X, wv1, 1), wv1) ≈ S1w + @test StatsBase.scattermatm(X, mean(X, wv2, 2), wv2, 2) ≈ S2w - @test scattermat(X, wv1; mean=mean(X, wv1, 1)) ≈ S1w - @test scattermat(X, wv2; mean=mean(X, wv2, 2), vardim=2) ≈ S2w + @test StatsBase.scattermatm(X, zeros(1,8), wv1) ≈ Sz1w + @test StatsBase.scattermatm(X, zeros(3), wv2, 2) ≈ Sz2w + end + end - @test scattermat(X, wv1; mean=zeros(1,8)) ≈ Sz1w - @test scattermat(X, wv2; mean=zeros(3), vardim=2) ≈ Sz2w -else - @test scattermat(X) ≈ S1 - @test scattermat(X, 2) ≈ S2 + @testset "Uncorrected" begin + @testset "Weighted Covariance" begin + @test cov(X, wv1; corrected=false) ≈ S1w ./ sum(wv1) + @test cov(X, wv2, 2; corrected=false) ≈ S2w ./ sum(wv2) - @test StatsBase.scattermatm(X, 0) ≈ Sz1 - @test StatsBase.scattermatm(X, 0, 2) ≈ Sz2 + @test Base.covm(X, 0, wv1, 1; corrected=false) ≈ Sz1w ./ sum(wv1) + @test Base.covm(X, 0, wv2, 2; corrected=false) ≈ Sz2w ./ sum(wv2) - @test StatsBase.scattermatm(X, mean(X,1)) ≈ S1 - @test StatsBase.scattermatm(X, mean(X,2), 2) ≈ S2 + @test Base.covm(X, mean(X, wv1, 1), wv1, 1; corrected=false) ≈ S1w ./ sum(wv1) + @test Base.covm(X, mean(X, wv2, 2), wv2, 2; corrected=false) ≈ S2w ./ sum(wv2) - @test StatsBase.scattermatm(X, zeros(1,8)) ≈ Sz1 - @test StatsBase.scattermatm(X, zeros(3), 2) ≈ Sz2 + @test Base.covm(X, zeros(1,8), wv1, 1; corrected=false) ≈ Sz1w ./ sum(wv1) + @test Base.covm(X, zeros(3), wv2, 2; corrected=false) ≈ Sz2w ./ sum(wv2) + end - ## weighted scatter mat + @testset "Mean and covariance" begin + (m, C) = mean_and_cov(X; corrected=false) + @test m == mean(X, 1) + @test C == cov(X, 1, false) - @test scattermat(X, wv1) ≈ S1w - @test scattermat(X, wv2, 2) ≈ S2w + (m, C) = mean_and_cov(X, 1; corrected=false) + @test m == mean(X, 1) + @test C == cov(X, 1, false) - @test StatsBase.scattermatm(X, 0, wv1) ≈ Sz1w - @test StatsBase.scattermatm(X, 0, wv2, 2) ≈ Sz2w + (m, C) = mean_and_cov(X, 2; corrected=false) + @test m == mean(X, 2) + @test C == cov(X, 2, false) - @test StatsBase.scattermatm(X, mean(X, wv1, 1), wv1) ≈ S1w - @test StatsBase.scattermatm(X, mean(X, wv2, 2), wv2, 2) ≈ S2w + (m, C) = mean_and_cov(X, wv1; corrected=false) + @test m == mean(X, wv1, 1) + @test C == cov(X, wv1, 1; corrected=false) - @test StatsBase.scattermatm(X, zeros(1,8), wv1) ≈ Sz1w - @test StatsBase.scattermatm(X, zeros(3), wv2, 2) ≈ Sz2w -end - -# weighted covariance - -if VERSION < v"0.5.0-dev+679" - @test cov(X, wv1) ≈ S1w ./ sum(wv1) - @test cov(X, wv2; vardim=2) ≈ S2w ./ sum(wv2) - - @test cov(X, wv1; mean=0) ≈ Sz1w ./ sum(wv1) - @test cov(X, wv2; mean=0, vardim=2) ≈ Sz2w ./ sum(wv2) - - @test cov(X, wv1; mean=mean(X, wv1, 1)) ≈ S1w ./ sum(wv1) - @test cov(X, wv2; mean=mean(X, wv2, 2), vardim=2) ≈ S2w ./ sum(wv2) + (m, C) = mean_and_cov(X, wv1, 1; corrected=false) + @test m == mean(X, wv1, 1) + @test C == cov(X, wv1, 1; corrected=false) - @test cov(X, wv1; mean=zeros(1,8)) ≈ Sz1w ./ sum(wv1) - @test cov(X, wv2; mean=zeros(3), vardim=2) ≈ Sz2w ./ sum(wv2) -else - @test cov(X, wv1) ≈ S1w ./ sum(wv1) - @test cov(X, wv2, 2) ≈ S2w ./ sum(wv2) - - @test Base.covm(X, 0, wv1) ≈ Sz1w ./ sum(wv1) - @test Base.covm(X, 0, wv2, 2) ≈ Sz2w ./ sum(wv2) - - @test Base.covm(X, mean(X, wv1, 1), wv1) ≈ S1w ./ sum(wv1) - @test Base.covm(X, mean(X, wv2, 2), wv2, 2) ≈ S2w ./ sum(wv2) - - @test Base.covm(X, zeros(1,8), wv1) ≈ Sz1w ./ sum(wv1) - @test Base.covm(X, zeros(3), wv2, 2) ≈ Sz2w ./ sum(wv2) -end + (m, C) = mean_and_cov(X, wv2, 2; corrected=false) + @test m == mean(X, wv2, 2) + @test C == cov(X, wv2, 2; corrected=false) + end + end -# mean_and_cov -if VERSION < v"0.5.0-dev+679" - (m, C) = mean_and_cov(X; vardim=1) - @test m == mean(X, 1) - @test C == cov(X; vardim=1) - - (m, C) = mean_and_cov(X; vardim=2) - @test m == mean(X, 2) - @test C == cov(X; vardim=2) - - (m, C) = mean_and_cov(X, wv1; vardim=1) - @test m == mean(X, wv1, 1) - @test C == cov(X, wv1; vardim=1) - - (m, C) = mean_and_cov(X, wv2; vardim=2) - @test m == mean(X, wv2, 2) - @test C == cov(X, wv2; vardim=2) -else - (m, C) = mean_and_cov(X, 1) - @test m == mean(X, 1) - @test C == cov(X, 1) - - (m, C) = mean_and_cov(X, 2) - @test m == mean(X, 2) - @test C == cov(X, 2) - - (m, C) = mean_and_cov(X, wv1, 1) - @test m == mean(X, wv1, 1) - @test C == cov(X, wv1, 1) - - (m, C) = mean_and_cov(X, wv2, 2) - @test m == mean(X, wv2, 2) - @test C == cov(X, wv2, 2) + @testset "Corrected" begin + @testset "Weighted Covariance" begin + if isa(wv1, Weights) + @test_throws ArgumentError cov(X, wv1; corrected=true) + else + var_corr1 = StatsBase.varcorrection(wv1, true) + var_corr2 = StatsBase.varcorrection(wv2, true) + + @test cov(X, wv1; corrected=true) ≈ S1w .* var_corr1 + @test cov(X, wv2, 2; corrected=true) ≈ S2w .* var_corr2 + + @test Base.covm(X, 0, wv1, 1; corrected=true) ≈ Sz1w .* var_corr1 + @test Base.covm(X, 0, wv2, 2; corrected=true) ≈ Sz2w .* var_corr2 + + @test Base.covm(X, mean(X, wv1, 1), wv1, 1; corrected=true) ≈ S1w .* var_corr1 + @test Base.covm(X, mean(X, wv2, 2), wv2, 2; corrected=true) ≈ S2w .* var_corr2 + + @test Base.covm(X, zeros(1,8), wv1, 1; corrected=true) ≈ Sz1w .* var_corr1 + @test Base.covm(X, zeros(3), wv2, 2; corrected=true) ≈ Sz2w .* var_corr2 + end + end + @testset "Mean and covariance" begin + (m, C) = mean_and_cov(X; corrected=true) + @test m == mean(X, 1) + @test C == cov(X, 1, true) + + (m, C) = mean_and_cov(X, 1; corrected=true) + @test m == mean(X, 1) + @test C == cov(X, 1, true) + + (m, C) = mean_and_cov(X, 2; corrected=true) + @test m == mean(X, 2) + @test C == cov(X, 2, true) + + if isa(wv1, Weights) + @test_throws ArgumentError mean_and_cov(X, wv1; corrected=true) + else + (m, C) = mean_and_cov(X, wv1; corrected=true) + @test m == mean(X, wv1, 1) + @test C == cov(X, wv1, 1; corrected=true) + + (m, C) = mean_and_cov(X, wv1, 1; corrected=true) + @test m == mean(X, wv1, 1) + @test C == cov(X, wv1, 1; corrected=true) + + (m, C) = mean_and_cov(X, wv2, 2; corrected=true) + @test m == mean(X, wv2, 2) + @test C == cov(X, wv2, 2; corrected=true) + end + end + end end +end # @testset "StatsBase.Covariance" diff --git a/test/moments.jl b/test/moments.jl index 014a3a1a5..ca1794198 100644 --- a/test/moments.jl +++ b/test/moments.jl @@ -1,122 +1,281 @@ using StatsBase using Base.Test -##### weighted var & std - -x = rand(10) -wv = weights(rand(10)) -m = mean(x, wv) - -@test var(x, wv) ≈ sum(abs2.(x .- m), wv) ./ sum(wv) -@test var(x, wv; mean=0) ≈ sum(abs2.(x), wv) ./ sum(wv) -@test var(x, wv; mean=1.0) ≈ sum(abs2.(x .- 1.0), wv) ./ sum(wv) +@testset "StatsBase.Moments" begin +weight_funcs = (weights, aweights, fweights, pweights) -@test std(x, wv) ≈ sqrt(var(x, wv)) -@test std(x, wv; mean=0) ≈ sqrt(var(x, wv; mean=0)) -@test std(x, wv; mean=1.0) ≈ sqrt(var(x, wv; mean=1.0)) - -(m, v) = mean_and_var(x) -@test m == mean(x) -@test v == var(x) - -(m, s) = mean_and_std(x) -@test m == mean(x) -@test s == std(x) +##### weighted var & std -(m, v) = mean_and_var(x, wv) -@test m == mean(x, wv) -@test v == var(x, wv) +x = [0.57, 0.10, 0.91, 0.72, 0.46, 0.0] +w = [3.84, 2.70, 8.29, 8.91, 9.71, 0.0] + +@testset "Uncorrected with $f" for f in weight_funcs + wv = f(w) + m = mean(x, wv) + + # expected uncorrected output + expected_var = sum(abs2.(x .- m), wv) / sum(wv) + expected_std = sqrt(expected_var) + + @testset "Variance" begin + @test var(x, wv; corrected=false) ≈ expected_var + @test var(x, wv; mean=m, corrected=false) ≈ expected_var + end + + @testset "Standard Deviation" begin + @test std(x, wv; corrected=false) ≈ expected_std + @test std(x, wv; mean=m, corrected=false) ≈ expected_std + end + + @testset "Mean and Variance" begin + (m, v) = mean_and_var(x; corrected=false) + @test m == mean(x) + @test v == var(x; corrected=corrected=false) + + (m, v) = mean_and_var(x, wv; corrected=false) + @test m == mean(x, wv) + @test v == var(x, wv; corrected=false) + end + + @testset "Mean and Standard Deviation" begin + (m, s) = mean_and_std(x; corrected=false) + @test m == mean(x) + @test s == std(x; corrected=false) + + (m, s) = mean_and_std(x, wv; corrected=false) + @test m == mean(x, wv) + @test s == std(x, wv; corrected=false) + end +end -(m, s) = mean_and_std(x, wv) -@test m == mean(x, wv) -@test s == std(x, wv) +# expected corrected output for (weights, aweights, fweights, pweights) +expected_var = [NaN, 0.0694434191182236, 0.05466601256158146, 0.06628969012045285] +expected_std = sqrt(expected_var) + +@testset "Corrected with $(weight_funcs[i])" for i in eachindex(weight_funcs) + wv = weight_funcs[i](w) + m = mean(x, wv) + + @testset "Variance" begin + if isa(wv, Weights) + @test_throws ArgumentError var(x, wv; corrected=true) + else + @test var(x, wv; corrected=true) ≈ expected_var[i] + @test var(x, wv; mean=m, corrected=true) ≈ expected_var[i] + end + end + + @testset "Standard Deviation" begin + if isa(wv, Weights) + @test_throws ArgumentError std(x, wv; corrected=true) + else + @test std(x, wv; corrected=true) ≈ expected_std[i] + @test std(x, wv; mean=m, corrected=true) ≈ expected_std[i] + end + end + + @testset "Mean and Variance" begin + (m, v) = mean_and_var(x; corrected=true) + @test m == mean(x) + @test v == var(x; corrected=true) + + if isa(wv, Weights) + @test_throws ArgumentError mean_and_var(x, wv; corrected=true) + else + (m, v) = mean_and_var(x, wv; corrected=true) + @test m == mean(x, wv) + @test v == var(x, wv; corrected=true) + end + end + + @testset "Mean and Standard Deviation" begin + (m, s) = mean_and_std(x; corrected=true) + @test m == mean(x) + @test s == std(x; corrected=true) + + if isa(wv, Weights) + @test_throws ArgumentError mean_and_std(x, wv; corrected=true) + else + (m, s) = mean_and_std(x, wv; corrected=true) + @test m == mean(x, wv) + @test s == std(x, wv; corrected=true) + end + end +end x = rand(5, 6) w1 = rand(5) w2 = rand(6) -wv1 = weights(w1) -wv2 = weights(w2) -m1 = mean(x, wv1, 1) -m2 = mean(x, wv2, 2) - -@test var(x, wv1, 1; mean=0) ≈ sum(abs2.(x) .* w1, 1) ./ sum(wv1) -@test var(x, wv2, 2; mean=0) ≈ sum(abs2.(x) .* w2', 2) ./ sum(wv2) - -@test var(x, wv1, 1; mean=m1) ≈ sum(abs2.(x .- m1) .* w1, 1) ./ sum(wv1) -@test var(x, wv2, 2; mean=m2) ≈ sum(abs2.(x .- m2) .* w2', 2) ./ sum(wv2) - -@test var(x, wv1, 1) ≈ sum(abs2.(x .- m1) .* w1, 1) ./ sum(wv1) -@test var(x, wv2, 2) ≈ sum(abs2.(x .- m2) .* w2', 2) ./ sum(wv2) - -@test std(x, wv1, 1) ≈ sqrt.(var(x, wv1, 1)) -@test std(x, wv2, 2) ≈ sqrt.(var(x, wv2, 2)) -@test std(x, wv1, 1; mean=0) ≈ sqrt.(var(x, wv1, 1; mean=0)) -@test std(x, wv2, 2; mean=0) ≈ sqrt.(var(x, wv2, 2; mean=0)) -@test std(x, wv1, 1; mean=m1) ≈ sqrt.(var(x, wv1, 1; mean=m1)) -@test std(x, wv2, 2; mean=m2) ≈ sqrt.(var(x, wv2, 2; mean=m2)) - -for d in 1:2 - (m, v) = mean_and_var(x, d) - @test m == mean(x, d) - @test v == var(x, d) - - (m, s) = mean_and_std(x, d) - @test m == mean(x, d) - @test s == std(x, d) -end - -(m, v) = mean_and_var(x, wv1, 1) -@test m == mean(x, wv1, 1) -@test v == var(x, wv1, 1) - -(m, v) = mean_and_var(x, wv2, 2) -@test m == mean(x, wv2, 2) -@test v == var(x, wv2, 2) -(m, s) = mean_and_std(x, wv1, 1) -@test m == mean(x, wv1, 1) -@test s == std(x, wv1, 1) - -(m, s) = mean_and_std(x, wv2, 2) -@test m == mean(x, wv2, 2) -@test s == std(x, wv2, 2) - - - -##### skewness & kurtosis - -wv = weights(ones(5) * 2.0) - -@test skewness(1:5) ≈ 0.0 -@test skewness([1, 2, 3, 4, 5]) ≈ 0.0 -@test skewness([1, 2, 2, 2, 5]) ≈ 1.1731251294063556 -@test skewness([1, 4, 4, 4, 5]) ≈ -1.1731251294063556 +@testset "Uncorrected with $f" for f in weight_funcs + wv1 = f(w1) + wv2 = f(w2) + m1 = mean(x, wv1, 1) + m2 = mean(x, wv2, 2) + + expected_var1 = sum(abs2.(x .- m1) .* w1, 1) ./ sum(wv1) + expected_var2 = sum(abs2.(x .- m2) .* w2', 2) ./ sum(wv2) + expected_std1 = sqrt.(expected_var1) + expected_std2 = sqrt.(expected_var2) + + @testset "Variance" begin + @test var(x, wv1, 1; corrected=false) ≈ expected_var1 + @test var(x, wv2, 2; corrected=false) ≈ expected_var2 + @test var(x, wv1, 1; mean=m1, corrected=false) ≈ expected_var1 + @test var(x, wv2, 2; mean=m2, corrected=false) ≈ expected_var2 + end + + @testset "Standard Deviation" begin + @test std(x, wv1, 1; corrected=false) ≈ expected_std1 + @test std(x, wv2, 2; corrected=false) ≈ expected_std2 + @test std(x, wv1, 1; mean=m1, corrected=false) ≈ expected_std1 + @test std(x, wv2, 2; mean=m2, corrected=false) ≈ expected_std2 + end + + @testset "Mean and Variance" begin + for d in 1:2 + (m, v) = mean_and_var(x, d; corrected=false) + @test m == mean(x, d) + @test v == var(x, d; corrected=false) + end + + (m, v) = mean_and_var(x, wv1, 1; corrected=false) + @test m == mean(x, wv1, 1) + @test v == var(x, wv1, 1; corrected=false) + + (m, v) = mean_and_var(x, wv2, 2; corrected=false) + @test m == mean(x, wv2, 2) + @test v == var(x, wv2, 2; corrected=false) + end + + @testset "Mean and Standard Deviation" begin + for d in 1:2 + (m, s) = mean_and_std(x, d; corrected=false) + @test m == mean(x, d) + @test s == std(x, d; corrected=false) + end + + (m, s) = mean_and_std(x, wv1, 1; corrected=false) + @test m == mean(x, wv1, 1) + @test s == std(x, wv1, 1; corrected=false) + + (m, s) = mean_and_std(x, wv2, 2; corrected=false) + @test m == mean(x, wv2, 2) + @test s == std(x, wv2, 2; corrected=false) + end +end -@test skewness([1, 2, 2, 2, 5], wv) ≈ 1.1731251294063556 +@testset "Corrected with $f" for f in weight_funcs + wv1 = f(w1) + wv2 = f(w2) + m1 = mean(x, wv1, 1) + m2 = mean(x, wv2, 2) + + if !isa(wv1, Weights) + expected_var1 = sum(abs2.(x .- m1) .* w1, 1) .* StatsBase.varcorrection(wv1, true) + expected_var2 = sum(abs2.(x .- m2) .* w2', 2) .* StatsBase.varcorrection(wv2, true) + expected_std1 = sqrt.(expected_var1) + expected_std2 = sqrt.(expected_var2) + end + + @testset "Variance" begin + if isa(wv1, Weights) + @test_throws ArgumentError var(x, wv1, 1; corrected=true) + else + @test var(x, wv1, 1; corrected=true) ≈ expected_var1 + @test var(x, wv2, 2; corrected=true) ≈ expected_var2 + @test var(x, wv1, 1; mean=m1, corrected=true) ≈ expected_var1 + @test var(x, wv2, 2; mean=m2, corrected=true) ≈ expected_var2 + end + end + + @testset "Standard Deviation" begin + if isa(wv1, Weights) + @test_throws ArgumentError std(x, wv1, 1; corrected=true) + else + @test std(x, wv1, 1; corrected=true) ≈ expected_std1 + @test std(x, wv2, 2; corrected=true) ≈ expected_std2 + @test std(x, wv1, 1; mean=m1, corrected=true) ≈ expected_std1 + @test std(x, wv2, 2; mean=m2, corrected=true) ≈ expected_std2 + end + end + + @testset "Mean and Variance" begin + for d in 1:2 + (m, v) = mean_and_var(x, d; corrected=true) + @test m == mean(x, d) + @test v == var(x, d; corrected=true) + end + + if isa(wv1, Weights) + @test_throws ArgumentError mean_and_var(x, wv1, 1; corrected=true) + else + (m, v) = mean_and_var(x, wv1, 1; corrected=true) + @test m == mean(x, wv1, 1) + @test v == var(x, wv1, 1; corrected=true) + + (m, v) = mean_and_var(x, wv2, 2; corrected=true) + @test m == mean(x, wv2, 2) + @test v == var(x, wv2, 2; corrected=true) + end + end + + @testset "Mean and Standard Deviation" begin + for d in 1:2 + (m, s) = mean_and_std(x, d; corrected=true) + @test m == mean(x, d) + @test s == std(x, d; corrected=true) + end + + if isa(wv1, Weights) + @test_throws ArgumentError mean_and_std(x, wv1, 1; corrected=true) + else + (m, s) = mean_and_std(x, wv1, 1; corrected=true) + @test m == mean(x, wv1, 1) + @test s == std(x, wv1, 1; corrected=true) + + (m, s) = mean_and_std(x, wv2, 2; corrected=true) + @test m == mean(x, wv2, 2) + @test s == std(x, wv2, 2; corrected=true) + end + end +end -@test kurtosis(1:5) ≈ -1.3 -@test kurtosis([1, 2, 3, 4, 5]) ≈ -1.3 -@test kurtosis([1, 2, 3, 3, 2]) ≈ -1.1530612244897953 +@testset "Skewness and Kurtosis with $f" for f in weight_funcs + wv = f(ones(5) * 2.0) -@test kurtosis([1, 2, 3, 4, 5], wv) ≈ -1.3 + @test skewness(1:5) ≈ 0.0 + @test skewness([1, 2, 3, 4, 5]) ≈ 0.0 + @test skewness([1, 2, 2, 2, 5]) ≈ 1.1731251294063556 + @test skewness([1, 4, 4, 4, 5]) ≈ -1.1731251294063556 + @test skewness([1, 2, 2, 2, 5], wv) ≈ 1.1731251294063556 -##### general moments + @test kurtosis(1:5) ≈ -1.3 + @test kurtosis([1, 2, 3, 4, 5]) ≈ -1.3 + @test kurtosis([1, 2, 3, 3, 2]) ≈ -1.1530612244897953 -x = collect(2.0:8.0) -@test moment(x, 2) ≈ sum((x .- 5).^2) / length(x) -@test moment(x, 3) ≈ sum((x .- 5).^3) / length(x) -@test moment(x, 4) ≈ sum((x .- 5).^4) / length(x) -@test moment(x, 5) ≈ sum((x .- 5).^5) / length(x) + @test kurtosis([1, 2, 3, 4, 5], wv) ≈ -1.3 +end -@test moment(x, 2, 4.0) ≈ sum((x .- 4).^2) / length(x) -@test moment(x, 3, 4.0) ≈ sum((x .- 4).^3) / length(x) -@test moment(x, 4, 4.0) ≈ sum((x .- 4).^4) / length(x) -@test moment(x, 5, 4.0) ≈ sum((x .- 4).^5) / length(x) +@testset "General Moments with $f" for f in weight_funcs + x = collect(2.0:8.0) + @test moment(x, 2) ≈ sum((x .- 5).^2) / length(x) + @test moment(x, 3) ≈ sum((x .- 5).^3) / length(x) + @test moment(x, 4) ≈ sum((x .- 5).^4) / length(x) + @test moment(x, 5) ≈ sum((x .- 5).^5) / length(x) + + @test moment(x, 2, 4.0) ≈ sum((x .- 4).^2) / length(x) + @test moment(x, 3, 4.0) ≈ sum((x .- 4).^3) / length(x) + @test moment(x, 4, 4.0) ≈ sum((x .- 4).^4) / length(x) + @test moment(x, 5, 4.0) ≈ sum((x .- 4).^5) / length(x) + + w = f([1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0]) + x2 = collect(2.0:6.0) + @test moment(x, 2, w) ≈ sum((x2 .- 4).^2) / 5 + @test moment(x, 3, w) ≈ sum((x2 .- 4).^3) / 5 + @test moment(x, 4, w) ≈ sum((x2 .- 4).^4) / 5 + @test moment(x, 5, w) ≈ sum((x2 .- 4).^5) / 5 +end -w = weights([1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0]) -x2 = collect(2.0:6.0) -@test moment(x, 2, w) ≈ sum((x2 .- 4).^2) / 5 -@test moment(x, 3, w) ≈ sum((x2 .- 4).^3) / 5 -@test moment(x, 4, w) ≈ sum((x2 .- 4).^4) / 5 -@test moment(x, 5, w) ≈ sum((x2 .- 4).^5) / 5 +end # @testset "StatsBase.Moments" diff --git a/test/sampling.jl b/test/sampling.jl index b0e9ce8cf..453996f46 100644 --- a/test/sampling.jl +++ b/test/sampling.jl @@ -187,7 +187,7 @@ check_sample_norep(a, (3, 12), 0; ordered=true) # test of weighted sampling without replacement a = [1:10;] -wv = WeightVec([zeros(6); 1:4]) +wv = Weights([zeros(6); 1:4]) x = vcat([sample(a, wv, 1, replace=false) for j in 1:100000]...) @test minimum(x) == 7 @test maximum(x) == 10 @@ -206,6 +206,5 @@ x = vcat([sample(a, wv, 4, replace=false) for j in 1:10000]...) @test_throws DimensionMismatch sample(a, wv, 5, replace=false) -wv = WeightVec([zeros(5); 1:4; -1]) +wv = Weights([zeros(5); 1:4; -1]) @test_throws ErrorException sample(a, wv, 1, replace=false) - diff --git a/test/weights.jl b/test/weights.jl index f712dc01f..da590d4f1 100644 --- a/test/weights.jl +++ b/test/weights.jl @@ -3,39 +3,42 @@ using Base.Test using Compat import Compat: view -@test isa(weights([1, 2, 3]), WeightVec{Int}) -@test isa(weights([1., 2., 3.]), WeightVec{Float64}) -@test isa(weights([1 2 3; 4 5 6]), WeightVec{Int}) - -@test isa(WeightVec([1, 2, 3], 6), WeightVec{Int}) - -@test isempty(weights(Float64[])) -@test size(weights([1, 2, 3])) == (3,) - -w = [1., 2., 3.] -wv = weights(w) -@test eltype(wv) === Float64 -@test length(wv) === 3 -@test values(wv) === w -@test sum(wv) === 6.0 -@test !isempty(wv) - -b = trues(3) -bv = weights(b) -@test eltype(bv) === Bool -@test length(bv) === 3 -@test values(bv) === b -@test sum(bv) === 3 -@test !isempty(bv) - -ba = BitArray([true, false, true]) -sa = sparsevec([1., 0., 2.]) - -@test sum(ba, wv) === 4.0 -@test sum(sa, wv) === 7.0 +@testset "StatsBase.Weights" begin +weight_funcs = (weights, aweights, fweights, pweights) + +# Construction +@testset "$f" for f in weight_funcs + @test isa(f([1, 2, 3]), AbstractWeights{Int}) + @test isa(f([1., 2., 3.]), AbstractWeights{Float64}) + @test isa(f([1 2 3; 4 5 6]), AbstractWeights{Int}) + + @test isempty(f(Float64[])) + @test size(f([1, 2, 3])) == (3,) + + w = [1., 2., 3.] + wv = f(w) + @test eltype(wv) === Float64 + @test length(wv) === 3 + @test values(wv) === w + @test sum(wv) === 6.0 + @test !isempty(wv) + + b = trues(3) + bv = f(b) + @test eltype(bv) === Bool + @test length(bv) === 3 + @test values(bv) === b + @test sum(bv) === 3 + @test !isempty(bv) + + ba = BitArray([true, false, true]) + sa = sparsevec([1., 0., 2.]) + + @test sum(ba, wv) === 4.0 + @test sum(sa, wv) === 7.0 +end ## wsum - x = [6., 8., 9.] w = [2., 3., 4.] p = [1. 2. ; 3. 4.] @@ -45,8 +48,7 @@ q = [1., 2., 3., 4.] @test wsum(x, w) === 72.0 @test wsum(p, q) === 29.0 -## wsum along dimensions - +## wsum along dimension @test wsum(x, w, 1) == [72.0] x = rand(6, 8) @@ -79,7 +81,6 @@ v = view(x, 2:4, :, :) @test wsum(v, w3, 3) ≈ sum(v .* reshape(w3, 1, 1, 4), 3) ## wsum for Arrays with non-BlasReal elements - x = rand(1:100, 6, 8) w1 = rand(6) w2 = rand(8) @@ -88,7 +89,6 @@ w2 = rand(8) @test wsum(x, w2, 2) ≈ sum(x .* w2', 2) ## wsum! - x = rand(6) w = rand(6) @@ -149,209 +149,218 @@ r = ones(8, 6) @test wsum!(r, x, w3, 3; init=false) === r @test r ≈ sum(x .* reshape(w3, (1, 1, 5)), 3) .+ 1.0 - ## the sum and mean syntax +a = reshape(1.0:27.0, 3, 3, 3) -@test sum([1.0, 2.0, 3.0], weights([1.0, 0.5, 0.5])) ≈ 3.5 -@test sum(1:3, weights([1.0, 1.0, 0.5])) ≈ 4.5 - -@test mean([1:3;], weights([1.0, 1.0, 0.5])) ≈ 1.8 -@test mean(1:3, weights([1.0, 1.0, 0.5])) ≈ 1.8 +@testset "Sum $f" for f in weight_funcs + @test sum([1.0, 2.0, 3.0], f([1.0, 0.5, 0.5])) ≈ 3.5 + @test sum(1:3, f([1.0, 1.0, 0.5])) ≈ 4.5 -a = reshape(1.0:27.0, 3, 3, 3) -for wt in ([1.0, 1.0, 1.0], [1.0, 0.2, 0.0], [0.2, 0.0, 1.0]) - @test sum(a, weights(wt), 1) ≈ sum(a.*reshape(wt, length(wt), 1, 1), 1) - @test sum(a, weights(wt), 2) ≈ sum(a.*reshape(wt, 1, length(wt), 1), 2) - @test sum(a, weights(wt), 3) ≈ sum(a.*reshape(wt, 1, 1, length(wt)), 3) - @test mean(a, weights(wt), 1) ≈ sum(a.*reshape(wt, length(wt), 1, 1), 1)/sum(wt) - @test mean(a, weights(wt), 2) ≈ sum(a.*reshape(wt, 1, length(wt), 1), 2)/sum(wt) - @test mean(a, weights(wt), 3) ≈ sum(a.*reshape(wt, 1, 1, length(wt)), 3)/sum(wt) - @test_throws ErrorException mean(a, weights(wt), 4) + for wt in ([1.0, 1.0, 1.0], [1.0, 0.2, 0.0], [0.2, 0.0, 1.0]) + @test sum(a, f(wt), 1) ≈ sum(a.*reshape(wt, length(wt), 1, 1), 1) + @test sum(a, f(wt), 2) ≈ sum(a.*reshape(wt, 1, length(wt), 1), 2) + @test sum(a, f(wt), 3) ≈ sum(a.*reshape(wt, 1, 1, length(wt)), 3) + end end -# Weighted median tests -data = ( - [7, 1, 2, 4, 10], - [7, 1, 2, 4, 10], - [7, 1, 2, 4, 10, 15], - [1, 2, 4, 7, 10, 15], - [0, 10, 20, 30], - [1, 2, 3, 4, 5], - [1, 2, 3, 4, 5], - [30, 40, 50, 60, 35], - [2, 0.6, 1.3, 0.3, 0.3, 1.7, 0.7, 1.7, 0.4], - [3.7, 3.3, 3.5, 2.8], - [100, 125, 123, 60, 45, 56, 66], - [2, 2, 2, 2, 2, 2], - [2.3], - [-2, -3, 1, 2, -10], - [1, 2, 3, 4, 5], - [5, 4, 3, 2, 1], - [-2, 2, -1, 3, 6], - [-10, 1, 1, -10, -10], - [2, 4], - [2, 2, 4, 4], - [2, 2, 2, 4] -) -wt = ( - [1, 1/3, 1/3, 1/3, 1], - [1, 1, 1, 1, 1], - [1, 1/3, 1/3, 1/3, 1, 1], - [1/3, 1/3, 1/3, 1, 1, 1], - [30, 191, 9, 0], - [10, 1, 1, 1, 9], - [10, 1, 1, 1, 900], - [1, 3, 5, 4, 2], - [2, 2, 0, 1, 2, 2, 1, 6, 0], - [5, 5, 4, 1], - [30, 56, 144, 24, 55, 43, 67], - [0.1, 0.2, 0.3, 0.4, 0.5, 0.6], - [12], - [7, 1, 1, 1, 6], - [1, 0, 0, 0, 2], - [1, 2, -3, 4, -5], - [0.1, 0.2, 0.3, -0.2, 0.1], - [-1, -1, -1, -1, 1], - [1, 1], - [1, 1, 1, 1], - [1, 1, 1, 1] -) -median_answers = (7.0, 4.0, 8.5, - 8.5, 10.0, 2.5, - 5.0, 50.0, 1.7, - 3.5, 100.0, 2.0, - 2.3, -2.0, 5.0, - 2.0, -1.0, -10.0, - 3.0, 3.0, 2.0) -num_tests = length(data) -for i = 1:num_tests - @test wmedian(data[i], wt[i]) == median_answers[i] - @test wmedian(data[i], weights(wt[i])) == median_answers[i] - @test median(data[i], weights(wt[i])) == median_answers[i] - for j = 1:100 - # Make sure the weighted median does not change if the data - # and weights are reordered. - reorder = sortperm(rand(length(data[i]))) - @test median(data[i][reorder], weights(wt[i][reorder])) == median_answers[i] +@testset "Mean $f" for f in weight_funcs + @test mean([1:3;], f([1.0, 1.0, 0.5])) ≈ 1.8 + @test mean(1:3, f([1.0, 1.0, 0.5])) ≈ 1.8 + + for wt in ([1.0, 1.0, 1.0], [1.0, 0.2, 0.0], [0.2, 0.0, 1.0]) + @test mean(a, f(wt), 1) ≈ sum(a.*reshape(wt, length(wt), 1, 1), 1)/sum(wt) + @test mean(a, f(wt), 2) ≈ sum(a.*reshape(wt, 1, length(wt), 1), 2)/sum(wt) + @test mean(a, f(wt), 3) ≈ sum(a.*reshape(wt, 1, 1, length(wt)), 3)/sum(wt) + @test_throws ErrorException mean(a, f(wt), 4) end end -data = [4, 3, 2, 1] -wt = [0, 0, 0, 0] -@test_throws MethodError wmedian(data[1]) -@test_throws ErrorException median(data, weights(wt)) -@test_throws ErrorException wmedian(data, wt) -@test_throws ErrorException median((Float64)[], weights((Float64)[])) -wt = [1, 2, 3, 4, 5] -@test_throws ErrorException median(data, weights(wt)) -@test_throws MethodError median([4 3 2 1 0], weights(wt)) -@test_throws MethodError median([[1 2];[4 5];[7 8];[10 11];[13 14]], weights(wt)) -data = [1, 3, 2, NaN, 2] -@test isnan(median(data, weights(wt))) -wt = [1, 2, NaN, 4, 5] -@test_throws ErrorException median(data, weights(wt)) -data = [1, 3, 2, 1, 2] -@test_throws ErrorException median(data, weights(wt)) -wt = [-1, -1, -1, -1, -1] -@test_throws ErrorException median(data, weights(wt)) -wt = [-1, -1, -1, 0, 0] -@test_throws ErrorException median(data, weights(wt)) - - -# Weighted quantile tests -data = ( - [7, 1, 2, 4, 10], - [7, 1, 2, 4, 10], - [7, 1, 2, 4, 10, 15], - [1, 2, 4, 7, 10, 15], - [0, 10, 20, 30], - [1, 2, 3, 4, 5], - [1, 2, 3, 4, 5], - [30, 40, 50, 60, 35], - [2, 0.6, 1.3, 0.3, 0.3, 1.7, 0.7, 1.7], - [1, 2, 2], - [3.7, 3.3, 3.5, 2.8], - [100, 125, 123, 60, 45, 56, 66], - [2, 2, 2, 2, 2, 2], - [2.3], - [-2, -3, 1, 2, -10], - [1, 2, 3, 4, 5], - [5, 4, 3, 2, 1], - [-2, 2, -1, 3, 6], - [-10, 1, 1, -10, -10], -) -wt = ( - weights([1, 1/3, 1/3, 1/3, 1]), - weights([1, 1, 1, 1, 1]), - weights([1, 1/3, 1/3, 1/3, 1, 1]), - weights([1/3, 1/3, 1/3, 1, 1, 1]), - weights([30, 191, 9, 0]), - weights([10, 1, 1, 1, 9]), - weights([10, 1, 1, 1, 900]), - weights([1, 3, 5, 4, 2]), - weights([2, 2, 5, 1, 2, 2, 1, 6]), - weights([0.1, 0.1, 0.8]), - weights([5, 5, 4, 1]), - weights([30, 56, 144, 24, 55, 43, 67]), - weights([0.1, 0.2, 0.3, 0.4, 0.5, 0.6]), - weights([12]), - weights([7, 1, 1, 1, 6]), - weights([1, 0, 0, 0, 2]), - weights([1, 2, 3, 4, 5]), - weights([0.1, 0.2, 0.3, 0.2, 0.1]), - weights([1, 1, 1, 1, 1]), -) -quantile_answers = ( - [1.0,3.6000000000000005,6.181818181818182,8.2,10.0], - [1.0,2.0,4.0,7.0,10.0], - [1.0,4.75,8.0,10.833333333333334,15.0], - [1.0,4.75,8.0,10.833333333333334,15.0], - [0.0,6.1387900355871885,11.600000000000001,15.912500000000001,30.0], - [1.0,1.5365853658536586,2.5999999999999996,4.405405405405405,5.0], - [1.0,4.239377950569287,4.492918633712858,4.746459316856429,5.0], - [30.0,38.75,45.714285714285715,52.85714285714286,60.0], - [0.3,0.6903846153846154,1.484,1.7,2.0], - [1.0,2.0,2.0,2.0,2.0], - [2.8,3.3361111111111112,3.4611111111111112,3.581578947368421,3.7], - [45.0,59.88593155893536,100.08846153846153,118.62115384615385,125.0], - [2.0,2.0,2.0,2.0,2.0], - [2.3,2.3,2.3,2.3,2.3], - [-10.0,-5.52,-2.5882352941176467,-0.9411764705882351,2.0], - [1.0,1.75,4.25,4.625,5.0], - [1.0,1.625,2.3333333333333335,3.25,5.0], - [-2.0,-0.5384615384615388,1.5384615384615383,2.6999999999999997,6.0], - [-10.0,-10.0,-10.0,1.0,1.0] -) -p = [0.0, 0.25, 0.5, 0.75, 1.0] - -srand(10) -for i = 1:length(data) - @test quantile(data[i], wt[i], p) ≈ quantile_answers[i] - for j = 1:10 - # order of p does not matter - reorder = sortperm(rand(length(p))) - @test quantile(data[i], wt[i], p[reorder]) ≈ quantile_answers[i][reorder] - end - for j = 1:10 - # order of w does not matter - reorder = sortperm(rand(length(data[i]))) - @test quantile(data[i][reorder], weights(wt[i][reorder]), p) ≈ quantile_answers[i] + +@testset "Median $f" for f in weight_funcs + data = ( + [7, 1, 2, 4, 10], + [7, 1, 2, 4, 10], + [7, 1, 2, 4, 10, 15], + [1, 2, 4, 7, 10, 15], + [0, 10, 20, 30], + [1, 2, 3, 4, 5], + [1, 2, 3, 4, 5], + [30, 40, 50, 60, 35], + [2, 0.6, 1.3, 0.3, 0.3, 1.7, 0.7, 1.7, 0.4], + [3.7, 3.3, 3.5, 2.8], + [100, 125, 123, 60, 45, 56, 66], + [2, 2, 2, 2, 2, 2], + [2.3], + [-2, -3, 1, 2, -10], + [1, 2, 3, 4, 5], + [5, 4, 3, 2, 1], + [-2, 2, -1, 3, 6], + [-10, 1, 1, -10, -10], + [2, 4], + [2, 2, 4, 4], + [2, 2, 2, 4] + ) + wt = ( + [1, 1/3, 1/3, 1/3, 1], + [1, 1, 1, 1, 1], + [1, 1/3, 1/3, 1/3, 1, 1], + [1/3, 1/3, 1/3, 1, 1, 1], + [30, 191, 9, 0], + [10, 1, 1, 1, 9], + [10, 1, 1, 1, 900], + [1, 3, 5, 4, 2], + [2, 2, 0, 1, 2, 2, 1, 6, 0], + [5, 5, 4, 1], + [30, 56, 144, 24, 55, 43, 67], + [0.1, 0.2, 0.3, 0.4, 0.5, 0.6], + [12], + [7, 1, 1, 1, 6], + [1, 0, 0, 0, 2], + [1, 2, -3, 4, -5], + [0.1, 0.2, 0.3, -0.2, 0.1], + [-1, -1, -1, -1, 1], + [1, 1], + [1, 1, 1, 1], + [1, 1, 1, 1] + ) + median_answers = (7.0, 4.0, 8.5, + 8.5, 10.0, 2.5, + 5.0, 50.0, 1.7, + 3.5, 100.0, 2.0, + 2.3, -2.0, 5.0, + 2.0, -1.0, -10.0, + 3.0, 3.0, 2.0) + num_tests = length(data) + for i = 1:num_tests + @test wmedian(data[i], wt[i]) == median_answers[i] + @test wmedian(data[i], f(wt[i])) == median_answers[i] + @test median(data[i], f(wt[i])) == median_answers[i] + for j = 1:100 + # Make sure the weighted median does not change if the data + # and weights are reordered. + reorder = sortperm(rand(length(data[i]))) + @test median(data[i][reorder], f(wt[i][reorder])) == median_answers[i] + end end + data = [4, 3, 2, 1] + wt = [0, 0, 0, 0] + @test_throws MethodError wmedian(data[1]) + @test_throws ErrorException median(data, f(wt)) + @test_throws ErrorException wmedian(data, wt) + @test_throws ErrorException median((Float64)[], f((Float64)[])) + wt = [1, 2, 3, 4, 5] + @test_throws ErrorException median(data, f(wt)) + @test_throws MethodError median([4 3 2 1 0], f(wt)) + @test_throws MethodError median([[1 2];[4 5];[7 8];[10 11];[13 14]], f(wt)) + data = [1, 3, 2, NaN, 2] + @test isnan(median(data, f(wt))) + wt = [1, 2, NaN, 4, 5] + @test_throws ErrorException median(data, f(wt)) + data = [1, 3, 2, 1, 2] + @test_throws ErrorException median(data, f(wt)) + wt = [-1, -1, -1, -1, -1] + @test_throws ErrorException median(data, f(wt)) + wt = [-1, -1, -1, 0, 0] + @test_throws ErrorException median(data, f(wt)) end -# w = 1 corresponds to base quantile -for i = 1:length(data) - @test quantile(data[i], weights(ones(Int64, length(data[i]))), p) ≈ quantile(data[i], p) - for j = 1:10 - prandom = rand(4) - @test quantile(data[i], weights(ones(Int64, length(data[i]))), prandom) ≈ quantile(data[i], prandom) + +@testset "Quantile $f" for f in weight_funcs + data = ( + [7, 1, 2, 4, 10], + [7, 1, 2, 4, 10], + [7, 1, 2, 4, 10, 15], + [1, 2, 4, 7, 10, 15], + [0, 10, 20, 30], + [1, 2, 3, 4, 5], + [1, 2, 3, 4, 5], + [30, 40, 50, 60, 35], + [2, 0.6, 1.3, 0.3, 0.3, 1.7, 0.7, 1.7], + [1, 2, 2], + [3.7, 3.3, 3.5, 2.8], + [100, 125, 123, 60, 45, 56, 66], + [2, 2, 2, 2, 2, 2], + [2.3], + [-2, -3, 1, 2, -10], + [1, 2, 3, 4, 5], + [5, 4, 3, 2, 1], + [-2, 2, -1, 3, 6], + [-10, 1, 1, -10, -10], + ) + wt = ( + f([1, 1/3, 1/3, 1/3, 1]), + f([1, 1, 1, 1, 1]), + f([1, 1/3, 1/3, 1/3, 1, 1]), + f([1/3, 1/3, 1/3, 1, 1, 1]), + f([30, 191, 9, 0]), + f([10, 1, 1, 1, 9]), + f([10, 1, 1, 1, 900]), + f([1, 3, 5, 4, 2]), + f([2, 2, 5, 1, 2, 2, 1, 6]), + f([0.1, 0.1, 0.8]), + f([5, 5, 4, 1]), + f([30, 56, 144, 24, 55, 43, 67]), + f([0.1, 0.2, 0.3, 0.4, 0.5, 0.6]), + f([12]), + f([7, 1, 1, 1, 6]), + f([1, 0, 0, 0, 2]), + f([1, 2, 3, 4, 5]), + f([0.1, 0.2, 0.3, 0.2, 0.1]), + f([1, 1, 1, 1, 1]), + ) + quantile_answers = ( + [1.0,3.6000000000000005,6.181818181818182,8.2,10.0], + [1.0,2.0,4.0,7.0,10.0], + [1.0,4.75,8.0,10.833333333333334,15.0], + [1.0,4.75,8.0,10.833333333333334,15.0], + [0.0,6.1387900355871885,11.600000000000001,15.912500000000001,30.0], + [1.0,1.5365853658536586,2.5999999999999996,4.405405405405405,5.0], + [1.0,4.239377950569287,4.492918633712858,4.746459316856429,5.0], + [30.0,38.75,45.714285714285715,52.85714285714286,60.0], + [0.3,0.6903846153846154,1.484,1.7,2.0], + [1.0,2.0,2.0,2.0,2.0], + [2.8,3.3361111111111112,3.4611111111111112,3.581578947368421,3.7], + [45.0,59.88593155893536,100.08846153846153,118.62115384615385,125.0], + [2.0,2.0,2.0,2.0,2.0], + [2.3,2.3,2.3,2.3,2.3], + [-10.0,-5.52,-2.5882352941176467,-0.9411764705882351,2.0], + [1.0,1.75,4.25,4.625,5.0], + [1.0,1.625,2.3333333333333335,3.25,5.0], + [-2.0,-0.5384615384615388,1.5384615384615383,2.6999999999999997,6.0], + [-10.0,-10.0,-10.0,1.0,1.0] + ) + p = [0.0, 0.25, 0.5, 0.75, 1.0] + + srand(10) + for i = 1:length(data) + @test quantile(data[i], wt[i], p) ≈ quantile_answers[i] + for j = 1:10 + # order of p does not matter + reorder = sortperm(rand(length(p))) + @test quantile(data[i], wt[i], p[reorder]) ≈ quantile_answers[i][reorder] + end + for j = 1:10 + # order of w does not matter + reorder = sortperm(rand(length(data[i]))) + @test quantile(data[i][reorder], f(wt[i][reorder]), p) ≈ quantile_answers[i] + end end + # w = 1 corresponds to base quantile + for i = 1:length(data) + @test quantile(data[i], f(ones(Int64, length(data[i]))), p) ≈ quantile(data[i], p) + for j = 1:10 + prandom = rand(4) + @test quantile(data[i], f(ones(Int64, length(data[i]))), prandom) ≈ quantile(data[i], prandom) + end + end + + # other syntaxes + v = [7, 1, 2, 4, 10] + w = [1, 1/3, 1/3, 1/3, 1] + answer = 6.181818181818182 + @test quantile(data[1], f(w), 0.5) ≈ answer + @test wquantile(data[1], f(w), [0.5]) ≈ [answer] + @test wquantile(data[1], f(w), 0.5) ≈ answer + @test wquantile(data[1], w, [0.5]) ≈ [answer] + @test wquantile(data[1], w, 0.5) ≈ answer end -# other syntaxes -v = [7, 1, 2, 4, 10] -w = [1, 1/3, 1/3, 1/3, 1] -answer = 6.181818181818182 -@test quantile(data[1], weights(w), 0.5) ≈ answer -@test wquantile(data[1], weights(w), [0.5]) ≈ [answer] -@test wquantile(data[1], weights(w), 0.5) ≈ answer -@test wquantile(data[1], w, [0.5]) ≈ [answer] -@test wquantile(data[1], w, 0.5) ≈ answer +end # @testset StatsBase.Weights diff --git a/test/wsampling.jl b/test/wsampling.jl index a0091382d..31012438a 100644 --- a/test/wsampling.jl +++ b/test/wsampling.jl @@ -7,7 +7,7 @@ srand(1234) #### weighted sample with replacement -function check_wsample_wrep(a::AbstractArray, vrgn, wv::WeightVec, ptol::Real; ordered::Bool=false) +function check_wsample_wrep(a::AbstractArray, vrgn, wv::AbstractWeights, ptol::Real; ordered::Bool=false) K = length(wv) (vmin, vmax) = vrgn (amin, amax) = extrema(a) @@ -53,7 +53,7 @@ check_wsample_wrep(a, (4, 7), wv, 5.0e-3; ordered=true) #### weighted sampling without replacement -function check_wsample_norep(a::AbstractArray, vrgn, wv::WeightVec, ptol::Real; ordered::Bool=false) +function check_wsample_norep(a::AbstractArray, vrgn, wv::AbstractWeights, ptol::Real; ordered::Bool=false) # each column of a for one run vmin, vmax = vrgn