Skip to content

Commit 8d99281

Browse files
authored
Update docs (#951)
* Use sources path attribute for docs/Project.toml * Fix markdown lists in docstrings * Add doctests * Change to markdown admoniton block * Change preformat list to markdown list * Add syntax highlight tags * Center block level latex math * Use markdown URL link syntax * Use unicode characters in tex sections They're supported by KaTeX for web rendering, and the characters take up less space when viewing the docstring on REPL or so * Fix up spacing in docstrings * Add type header to docstrings * Fix doctests in weights.md
1 parent 58780c9 commit 8d99281

13 files changed

+83
-63
lines changed

docs/Project.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,6 @@ StatsAPI = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
55

66
[compat]
77
Documenter = "1"
8+
9+
[sources.StatsBase]
10+
path = ".."

docs/make.jl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ if Base.HOME_PROJECT[] !== nothing
55
Base.HOME_PROJECT[] = abspath(Base.HOME_PROJECT[])
66
end
77

8+
DocMeta.setdocmeta!(StatsBase, :DocTestSetup, :(using StatsBase))
9+
810
makedocs(
911
sitename = "StatsBase.jl",
1012
modules = [StatsBase, StatsAPI],

docs/src/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Pkg.add("StatsBase")
1919
```
2020

2121
To load the package, use the command:
22-
```
22+
```julia
2323
using StatsBase
2424
```
2525

docs/src/weights.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
```@meta
2+
DocTestSetup = quote
3+
using StatsBase
4+
end
5+
```
16
# Weight Vectors
27

38
In statistical applications, it is not uncommon to assign weights to samples. To facilitate the use of weight vectors, we introduce the abstract type `AbstractWeights` for the purpose of representing weight vectors, which has two advantages:
@@ -68,40 +73,42 @@ weights to past observations.
6873

6974
If `t` is a vector of temporal indices then for each index `i` we compute the weight as:
7075

71-
``λ (1 - λ)^{1 - i}``
76+
```math
77+
λ (1 - λ)^{1 - i}
78+
```
7279

7380
``λ`` is a smoothing factor or rate parameter such that ``0 < λ ≤ 1``.
7481
As this value approaches 0, the resulting weights will be almost equal,
7582
while values closer to 1 will put greater weight on the tail elements of the vector.
7683

7784
For example, the following call generates exponential weights for ten observations with ``λ = 0.3``.
78-
```julia-repl
85+
```jldoctest
7986
julia> eweights(1:10, 0.3)
80-
10-element Weights{Float64,Float64,Array{Float64,1}}:
87+
10-element Weights{Float64, Float64, Vector{Float64}}:
8188
0.3
8289
0.42857142857142855
8390
0.6122448979591837
8491
0.8746355685131197
8592
1.249479383590171
8693
1.7849705479859588
87-
2.549957925694227
94+
2.5499579256942266
8895
3.642797036706039
8996
5.203995766722913
9097
7.434279666747019
9198
```
9299

93100
Simply passing the number of observations `n` is equivalent to passing in `1:n`.
94101

95-
```julia-repl
102+
```jldoctest
96103
julia> eweights(10, 0.3)
97-
10-element Weights{Float64,Float64,Array{Float64,1}}:
104+
10-element Weights{Float64, Float64, Vector{Float64}}:
98105
0.3
99106
0.42857142857142855
100107
0.6122448979591837
101108
0.8746355685131197
102109
1.249479383590171
103110
1.7849705479859588
104-
2.549957925694227
111+
2.5499579256942266
105112
3.642797036706039
106113
5.203995766722913
107114
7.434279666747019
@@ -117,25 +124,24 @@ julia> r
117124
2019-01-01T01:00:00:1 hour:2019-01-02T01:00:00
118125
119126
julia> eweights(t, r, 0.3)
120-
3-element Weights{Float64,Float64,Array{Float64,1}}:
127+
3-element Weights{Float64, Float64, Vector{Float64}}:
121128
0.3
122129
0.6122448979591837
123130
1.249479383590171
124131
```
125132

126-
NOTE: This is equivalent to `eweights(something.(indexin(t, r)), 0.3)`, which is saying that for each value in `t` return the corresponding index for that value in `r`.
127-
Since `indexin` returns `nothing` if there is no corresponding value from `t` in `r` we use `something` to eliminate that possibility.
133+
!!! note
134+
This is equivalent to `eweights(something.(indexin(t, r)), 0.3)`, which is saying that for each value in `t` return the corresponding index for that value in `r`.
135+
Since `indexin` returns `nothing` if there is no corresponding value from `t` in `r` we use `something` to eliminate that possibility.
128136

129137
## Methods
130138

131139
`AbstractWeights` implements the following methods:
132-
```
133-
eltype
134-
length
135-
isempty
136-
values
137-
sum
138-
```
140+
- `eltype`
141+
- `length`
142+
- `isempty`
143+
- `values`
144+
- `sum`
139145

140146
The following constructors are provided:
141147
```@docs

src/cov.jl

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,8 @@ cov(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector) =
188188
error("cov is not defined for $(typeof(ce)), $(typeof(x)) and $(typeof(y))")
189189

190190
"""
191-
cov(ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights]; mean=nothing, dims::Int=1)
191+
cov(ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights];
192+
mean=nothing, dims::Int=1)
192193
193194
Compute the covariance matrix of the matrix `X` along dimension `dims`
194195
using estimator `ce`. A weighting vector `w` can be specified.
@@ -238,10 +239,8 @@ function cor(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector)
238239
end
239240

240241
"""
241-
cor(
242-
ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights];
243-
mean=nothing, dims::Int=1
244-
)
242+
cor(ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights];
243+
mean=nothing, dims::Int=1)
245244
246245
Compute the correlation matrix of the matrix `X` along dimension `dims`
247246
using estimator `ce`. A weighting vector `w` can be specified.

src/deviation.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ end
9090
Linfdist(a, b)
9191
9292
Compute the L∞ distance, also called the Chebyshev distance, between
93-
two arrays: ``\\max_{i\\in1:n} |a_i - b_i|``.
93+
two arrays: ``\\max_{1≤i≤n} |a_i - b_i|``.
9494
Efficient equivalent of `maxabs(a - b)`.
9595
"""
9696
function Linfdist(a::AbstractArray{T}, b::AbstractArray{T}) where T<:Number

src/hist.jl

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -155,15 +155,15 @@ closed: right
155155
isdensity: false
156156
```
157157
## Example illustrating `isdensity`
158-
```julia
158+
```jldoctest
159159
julia> using StatsBase, LinearAlgebra
160160
161161
julia> bins = [0,1,7]; # a small and a large bin
162162
163163
julia> obs = [0.5, 1.5, 1.5, 2.5]; # one observation in the small bin and three in the large
164164
165165
julia> h = fit(Histogram, obs, bins)
166-
Histogram{Int64,1,Tuple{Array{Int64,1}}}
166+
Histogram{Int64, 1, Tuple{Vector{Int64}}}
167167
edges:
168168
[0, 1, 7]
169169
weights: [1, 3]
@@ -173,7 +173,7 @@ isdensity: false
173173
julia> # observe isdensity = false and the weights field records the number of observations in each bin
174174
175175
julia> normalize(h, mode=:density)
176-
Histogram{Float64,1,Tuple{Array{Int64,1}}}
176+
Histogram{Float64, 1, Tuple{Vector{Int64}}}
177177
edges:
178178
[0, 1, 7]
179179
weights: [1.0, 0.5]
@@ -459,7 +459,8 @@ float(h::Histogram{T,N}) where {T,N} = Histogram(h.edges, float(h.weights), h.cl
459459

460460

461461
"""
462-
normalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T<:AbstractFloat,N}
462+
normalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...;
463+
mode::Symbol=:pdf) where {T<:AbstractFloat,N}
463464
464465
Normalize the histogram `h` and optionally scale one or more auxiliary weight
465466
arrays appropriately. See description of `normalize` for details. Returns `h`.

src/reliability.jl

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,11 @@ Calculate Cronbach's alpha (1951) from a covariance matrix `covmatrix` according
1919
the [formula](https://en.wikipedia.org/wiki/Cronbach%27s_alpha):
2020
2121
```math
22-
\\rho = \\frac{k}{k-1} (1 - \\frac{\\sum^k_{i=1} \\sigma^2_i}{\\sum_{i=1}^k \\sum_{j=1}^k \\sigma_{ij}})
22+
ρ = \\frac{k}{k-1} \\left(1 - \\frac{\\sum^k_{i=1} σ^2_i}{\\sum_{i=1}^k \\sum_{j=1}^k σ_{ij}}\\right)
2323
```
2424
25-
where ``k`` is the number of items, i.e. columns, ``\\sigma_i^2`` the item variance,
26-
and ``\\sigma_{ij}`` the inter-item covariance.
25+
where ``k`` is the number of items, i.e. columns, ``σ_i^2`` the item variance,
26+
and ``σ_{ij}`` the inter-item covariance.
2727
2828
Returns a `CronbachAlpha` object that holds:
2929

src/robust.jl

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@ To compute the trimmed mean of `x` use `mean(trim(x))`;
4141
to compute the variance use `trimvar(x)` (see [`trimvar`](@ref)).
4242
4343
# Example
44-
```julia
44+
```jldoctest
4545
julia> collect(trim([5,2,4,3,1], prop=0.2))
46-
3-element Array{Int64,1}:
46+
3-element Vector{Int64}:
4747
2
4848
4
4949
3
@@ -80,9 +80,9 @@ elements equal the lower or upper bound.
8080
To compute the Winsorized mean of `x` use `mean(winsor(x))`.
8181
8282
# Example
83-
```julia
83+
```jldoctest
8484
julia> collect(winsor([5,2,3,4,1], prop=0.2))
85-
5-element Array{Int64,1}:
85+
5-element Vector{Int64}:
8686
4
8787
2
8888
3

src/sampling.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ knuths_sample!(a::AbstractArray, x::AbstractArray; initshuffle::Bool=true) =
188188
Fisher-Yates shuffling (with early termination).
189189
190190
Pseudo-code:
191-
```
191+
```julia
192192
n = length(a)
193193
k = length(x)
194194

src/scalarstats.jl

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -240,30 +240,30 @@ Let `count_less` be the number of elements of `itr` that are less than `value`,
240240
Then `method` supports the following definitions:
241241
242242
- `:inc` (default): Return a value in the range 0 to 1 inclusive.
243-
Return `count_less / (n - 1)` if `value ∈ itr`, otherwise apply interpolation based on
244-
definition 7 of quantile in Hyndman and Fan (1996)
245-
(equivalent to Excel `PERCENTRANK` and `PERCENTRANK.INC`).
246-
This definition corresponds to the lower semi-continuous inverse of
247-
[`quantile`](@ref) with its default parameters.
243+
Return `count_less / (n - 1)` if `value ∈ itr`, otherwise apply interpolation based on
244+
definition 7 of quantile in Hyndman and Fan (1996)
245+
(equivalent to Excel `PERCENTRANK` and `PERCENTRANK.INC`).
246+
This definition corresponds to the lower semi-continuous inverse of
247+
[`quantile`](@ref) with its default parameters.
248248
249249
- `:exc`: Return a value in the range 0 to 1 exclusive.
250-
Return `(count_less + 1) / (n + 1)` if `value ∈ itr` otherwise apply interpolation
251-
based on definition 6 of quantile in Hyndman and Fan (1996)
252-
(equivalent to Excel `PERCENTRANK.EXC`).
250+
Return `(count_less + 1) / (n + 1)` if `value ∈ itr` otherwise apply interpolation
251+
based on definition 6 of quantile in Hyndman and Fan (1996)
252+
(equivalent to Excel `PERCENTRANK.EXC`).
253253
254254
- `:compete`: Return `count_less / (n - 1)` if `value ∈ itr`, otherwise
255-
return `(count_less - 1) / (n - 1)`, without interpolation
256-
(equivalent to MariaDB `PERCENT_RANK`, dplyr `percent_rank`).
255+
return `(count_less - 1) / (n - 1)`, without interpolation
256+
(equivalent to MariaDB `PERCENT_RANK`, dplyr `percent_rank`).
257257
258258
- `:tied`: Return `(count_less + count_equal/2) / n`, without interpolation.
259-
Based on the definition in Roscoe, J. T. (1975)
260-
(equivalent to `"mean"` kind of SciPy `percentileofscore`).
259+
Based on the definition in Roscoe, J. T. (1975)
260+
(equivalent to `"mean"` kind of SciPy `percentileofscore`).
261261
262262
- `:strict`: Return `count_less / n`, without interpolation
263-
(equivalent to `"strict"` kind of SciPy `percentileofscore`).
263+
(equivalent to `"strict"` kind of SciPy `percentileofscore`).
264264
265265
- `:weak`: Return `(count_less + count_equal) / n`, without interpolation
266-
(equivalent to `"weak"` kind of SciPy `percentileofscore`).
266+
(equivalent to `"weak"` kind of SciPy `percentileofscore`).
267267
268268
!!! note
269269
An `ArgumentError` is thrown if `itr` contains `NaN` or `missing` values
@@ -279,7 +279,7 @@ Hyndman, R.J and Fan, Y. (1996) "[Sample Quantiles in Statistical Packages]
279279
*The American Statistician*, Vol. 50, No. 4, pp. 361-365.
280280
281281
# Examples
282-
```julia
282+
```julia-repl
283283
julia> using StatsBase
284284
285285
julia> v1 = [1, 1, 1, 2, 3, 4, 8, 11, 12, 13];

src/transformations.jl

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ reconstruct(t::AbstractDataTransform, y::AbstractVector{<:Real}) =
4747
vec(reconstruct(t, reshape(y, :, 1)))
4848

4949
"""
50+
ZScoreTransform <: AbstractDataTransform
51+
5052
Standardization (Z-score transformation)
5153
"""
5254
struct ZScoreTransform{T<:Real, U<:AbstractVector{T}} <: AbstractDataTransform
@@ -201,6 +203,8 @@ function reconstruct!(x::AbstractMatrix{<:Real}, t::ZScoreTransform, y::Abstract
201203
end
202204

203205
"""
206+
UnitRangeTransform <: AbstractDataTransform
207+
204208
Unit range normalization
205209
"""
206210
struct UnitRangeTransform{T<:Real, U<:AbstractVector} <: AbstractDataTransform
@@ -237,7 +241,7 @@ and return a `UnitRangeTransform` transformation object.
237241
# Keyword arguments
238242
239243
* `dims`: if `1` fit standardization parameters in column-wise fashion;
240-
if `2` fit in row-wise fashion. The default is `nothing`.
244+
if `2` fit in row-wise fashion. The default is `nothing`.
241245
242246
* `unit`: if `true` (the default) shift the minimum data to zero.
243247
@@ -341,8 +345,8 @@ end
341345
"""
342346
standardize(DT, X; dims=nothing, kwargs...)
343347
344-
Return a standardized copy of vector or matrix `X` along dimensions `dims`
345-
using transformation `DT` which is a subtype of `AbstractDataTransform`:
348+
Return a standardized copy of vector or matrix `X` along dimensions `dims`
349+
using transformation `DT` which is a subtype of `AbstractDataTransform`:
346350
347351
- `ZScoreTransform`
348352
- `UnitRangeTransform`

src/weights.jl

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -230,11 +230,15 @@ If `n` is explicitly passed instead of `t`, `t` defaults to `1:n`.
230230
231231
If `scale` is `true` then for each element `i` in `t` the weight value is computed as:
232232
233-
``(1 - λ)^{n - i}``
233+
```math
234+
(1 - λ)^{n - i}
235+
```
234236
235237
If `scale` is `false` then each value is computed as:
236238
237-
``λ (1 - λ)^{1 - i}``
239+
```math
240+
λ (1 - λ)^{1 - i}
241+
```
238242
239243
# Arguments
240244
@@ -250,9 +254,9 @@ If `scale` is `false` then each value is computed as:
250254
- `scale::Bool`: Return the weights scaled to between 0 and 1 (default: false)
251255
252256
# Examples
253-
```julia-repl
257+
```jldoctest
254258
julia> eweights(1:10, 0.3; scale=true)
255-
10-element Weights{Float64,Float64,Array{Float64,1}}:
259+
10-element Weights{Float64, Float64, Vector{Float64}}:
256260
0.04035360699999998
257261
0.05764800999999997
258262
0.08235429999999996
@@ -265,8 +269,8 @@ julia> eweights(1:10, 0.3; scale=true)
265269
1.0
266270
```
267271
# Links
268-
- https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
269-
- https://en.wikipedia.org/wiki/Exponential_smoothing
272+
- <https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average>
273+
- <https://en.wikipedia.org/wiki/Exponential_smoothing>
270274
"""
271275
function eweights(t::AbstractArray{<:Integer}, λ::Real; kwargs...)
272276
isempty(t) && return Weights(copy(t), 0)
@@ -594,6 +598,7 @@ wsumtype(::Type{T}, ::Type{T}) where {T<:BlasReal} = T
594598
wsum!(R::AbstractArray, A::AbstractArray,
595599
w::AbstractVector, dim::Int;
596600
init::Bool=true)
601+
597602
Compute the weighted sum of `A` with weights `w` over the dimension `dim` and store
598603
the result in `R`. If `init=false`, the sum is added to `R` rather than starting
599604
from zero.
@@ -705,11 +710,11 @@ With [`FrequencyWeights`](@ref), the function returns the same result as
705710
`quantile` for a vector with repeated values. Weights must be integers.
706711
707712
With non `FrequencyWeights`, denote ``N`` the length of the vector, ``w`` the vector of weights,
708-
``h = p (\\sum_{i \\leq N} w_i - w_1) + w_1`` the cumulative weight corresponding to the
713+
``h = p (\\sum_{i N} w_i - w_1) + w_1`` the cumulative weight corresponding to the
709714
probability ``p`` and ``S_k = \\sum_{i \\leq k} w_i`` the cumulative weight for each
710715
observation, define ``v_{k+1}`` the smallest element of `v` such that ``S_{k+1}``
711-
is strictly superior to ``h``. The weighted ``p`` quantile is given by ``v_k + \\gamma (v_{k+1} - v_k)``
712-
with ``\\gamma = (h - S_k)/(S_{k+1} - S_k)``. In particular, when all weights are equal,
716+
is strictly superior to ``h``. The weighted ``p`` quantile is given by ``v_k + γ (v_{k+1} - v_k)``
717+
with ``γ = (h - S_k)/(S_{k+1} - S_k)``. In particular, when all weights are equal,
713718
the function returns the same result as the unweighted `quantile`.
714719
"""
715720
function quantile(v::AbstractVector{<:Real}{V}, w::AbstractWeights{W}, p::AbstractVector{<:Real}) where {V,W<:Real}

0 commit comments

Comments
 (0)