-
Notifications
You must be signed in to change notification settings - Fork 9
feat: Add random state feature. #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
d8d4e11
ae45726
3d7c8b6
a0483b4
d39cbe0
c783a02
7f8e33d
6ba837f
f45b4bb
bdfab77
4e13df1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,8 +4,79 @@ | |
|
||
|
||
class SNMFOptimizer: | ||
def __init__(self, MM, Y0=None, X0=None, A=None, rho=1e12, eta=610, max_iter=500, tol=5e-7, components=None): | ||
john-halloran marked this conversation as resolved.
Show resolved
Hide resolved
|
||
print("Initializing SNMF Optimizer") | ||
def __init__( | ||
self, | ||
MM, | ||
Y0=None, | ||
X0=None, | ||
A=None, | ||
john-halloran marked this conversation as resolved.
Show resolved
Hide resolved
|
||
rho=1e12, | ||
eta=610, | ||
max_iter=500, | ||
tol=5e-7, | ||
n_components=None, | ||
random_state=None, | ||
): | ||
"""Run sNMF based on an ndarray, parameters, and either a number | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is fantastic! Thanks for this. Please see here for our docstring standards, I am not sure if you looked at it: For classes it is a bit tricky because what info do we put in the "Class" docstring and what info do we put in the "constructor" (i.e., the By way of example, I would probably do like this in this case
put future development plans into issues, not in the docstring. Just describe the current behavior. Try and keep it brief but highly informational. To conform to PEP8 standards I lower-cased the variables. I know they correspond to matrices but we should decide which standard to break. The tie-breaker should probably be Conditions on instantiation will go in the constructor docstring. That one describes the init method so should look more like a function docstring. It would look something like....
I think there was some text before about how There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Either Y0 or n_components needs to be provided. Currently, Y0.shape overrides n_components if both are provided, and throws an error if neither are provided. The way There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
of components or a set of initial guess matrices. | ||
|
||
Currently instantiating the SNMFOptimizer class runs all the analysis | ||
immediately. The results can then be accessed as instance attributes | ||
of the class (X, Y, and A). Eventually, this will be changed such | ||
that __init__ only prepares for the optimization, which will can then | ||
be done using fit_transform. | ||
|
||
Parameters | ||
---------- | ||
MM: ndarray | ||
john-halloran marked this conversation as resolved.
Show resolved
Hide resolved
|
||
A numpy array containing the data to be decomposed. Rows correspond | ||
to different samples/angles, while columns correspond to different | ||
conditions with different stretching. Currently, there is no option | ||
to treat the first column (commonly containing 2theta angles, sample | ||
index, etc) differently, so if present it must be stripped in advance. | ||
Y0: ndarray | ||
A numpy array containing initial guesses for the component weights | ||
at each stretching condition, with number of rows equal to the assumed | ||
number of components and number of columns equal to the number of | ||
conditions (same number of columns as MM). Must be provided if | ||
n_components is not provided. Will override n_components if both are | ||
provided. | ||
X0: ndarray | ||
A numpy array containing initial guesses for the intensities of each | ||
component per row/sample/angle. Has rows equal to the rows of MM and | ||
columns equal to n_components or the number of rows of Y0. | ||
A: ndarray | ||
A numpy array containing initial guesses for the stretching factor for | ||
each component, at each condition. Has number of rows equal to n_components | ||
or the number of rows of Y0, and columns equal to the number of conditions | ||
(columns of MM). | ||
rho: float | ||
A stretching factor that influences the decomposition. Zero corresponds to | ||
no stretching present. Relatively insensitive and typically adjusted in | ||
powers of 10. | ||
eta: float | ||
A sparsity factor than influences the decomposition. Should be set to zero | ||
for non sparse data such as PDF. Can be used to improve results for sparse | ||
data such as XRD, but due to instability, should be used only after first | ||
selecting the best value for rho. | ||
max_iter: int | ||
The maximum number of times to update each of A, X, and Y before stopping | ||
the optimization. | ||
tol: float | ||
The minimum fractional improvement in the objective function to allow | ||
john-halloran marked this conversation as resolved.
Show resolved
Hide resolved
|
||
without terminating the optimization. Note that a minimum of 20 updates | ||
are run before this parameter is checked. | ||
n_components: int | ||
The number of components to attempt to extract from MM. Note that this will | ||
sbillinge marked this conversation as resolved.
Show resolved
Hide resolved
|
||
be overridden by Y0 if that is provided, but must be provided if no Y0 is | ||
provided. | ||
random_state: int | ||
Used to set a reproducible seed for the initial matrices used in the | ||
optimization. Due to the non-convex nature of the problem, results may vary | ||
even with the same initial guesses, so this does not make the program | ||
deterministic. | ||
""" | ||
|
||
self.MM = MM | ||
sbillinge marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self.X0 = X0 | ||
self.Y0 = Y0 | ||
|
@@ -15,23 +86,22 @@ def __init__(self, MM, Y0=None, X0=None, A=None, rho=1e12, eta=610, max_iter=500 | |
# Capture matrix dimensions | ||
self.N, self.M = MM.shape | ||
self.num_updates = 0 | ||
self.rng = np.random.default_rng(random_state) | ||
john-halloran marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if Y0 is None: | ||
if components is None: | ||
raise ValueError("Must provide either Y0 or a number of components.") | ||
if n_components is None: | ||
raise ValueError("Must provide either Y0 or n_components.") | ||
else: | ||
self.K = components | ||
self.Y0 = np.random.beta(a=2.5, b=1.5, size=(self.K, self.M)) # This is untested | ||
self.K = n_components | ||
self.Y0 = self.rng.beta(a=2.5, b=1.5, size=(self.K, self.M)) | ||
else: | ||
self.K = Y0.shape[0] | ||
|
||
# Initialize A, X0 if not provided | ||
if self.A is None: | ||
self.A = np.ones((self.K, self.M)) + np.random.randn(self.K, self.M) * 1e-3 # Small perturbation | ||
self.A = np.ones((self.K, self.M)) + self.rng.normal(0, 1e-3, size=(self.K, self.M)) | ||
john-halloran marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if self.X0 is None: | ||
self.X0 = np.random.rand(self.N, self.K) # Ensures values in [0,1] | ||
self.X0 = self.rng.random((self.N, self.K)) | ||
|
||
# Initialize solution matrices to be iterated on | ||
self.X = np.maximum(0, self.X0) | ||
self.Y = np.maximum(0, self.Y0) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, it seems that we have already instantiated some kind of SNMF class, then this is doing the optimization. Is there a particular reason why we make this a class and not a function? It feels much more like a function to me. Could you think what the downsides are of making it a function? Is scikit-learn doing some thing like this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What scikit-learn has is an
NMF
class which instantiates, and then has the standard set of model class members likefit_transform
and such. But to actually "do the math", NMF internally makes a call tonon_negative_factorization
. So the math should probably be a function, but unless (and until) we are literally merging this intoscikit-learn
, it's worth having a class. But instantiating the class should not be running math as it does currently. That's going to be the next thing I tackle, but this PR is getting a little long so I'm holding off.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the value in having this as a class tbh. Can you justify why it is worth having a class for this (when we have already instantiated a higher lever class (I can't easily see in the browser what this class is called) as
snmf_class
)