# doc-cache created by Octave 11.3.0
# name: cache
# type: cell
# rows: 3
# columns: 140
# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
adtest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4625
 -- statistics: H = adtest (X)
 -- statistics: H = adtest (X, NAME, VALUE)
 -- statistics: [H, PVAL] = adtest (...)
 -- statistics: [H, PVAL, ADSTAT, CV] = adtest (...)

     Anderson-Darling goodness-of-fit hypothesis test.

     ‘H = adtest (X)’ returns a test decision for the null hypothesis that the
     data in vector X is from a population with a normal distribution, using the
     Anderson-Darling test.  The alternative hypothesis is that x is not from a
     population with a normal distribution.  The result H is 1 if the test
     rejects the null hypothesis at the 5% significance level, or 0 otherwise.

     ‘H = adtest (X, NAME, VALUE)’ returns a test decision for the
     Anderson-Darling test with additional options specified by one or more
     Name-Value pair arguments.  For example, you can specify a null
     distribution other than normal, or select an alternative method for
     calculating the p-value, such as a Monte Carlo simulation.

     The following parameters can be parsed as Name-Value pair arguments.

     Name                 Description
     ----------------------------------------------------------------------------------
     "Distribution"       The distribution being tested for.  It tests whether X
                          could have come from the specified distribution.  There
                          are two choices available for parsing distribution
                          parameters:

        • One of the following char strings: "norm", "exp", "ev", "logn",
          "weibull", for defining either the 'normal', 'exponential', 'extreme
          value', lognormal, or 'Weibull' distribution family, respectively.  In
          this case, X is tested against a composite hypothesis for the
          specified distribution family and the required distribution parameters
          are estimated from the data in X.  The default is "norm".

        • A cell array defining a distribution in which the first cell contains
          a char string with the distribution name, as mentioned above, and the
          consecutive cells containing all specified parameters of the null
          distribution.  In this case, X is tested against a simple hypothesis.

     "Alpha"              Significance level alpha for the test.  Any scalar numeric
                          value between 0 and 1.  The default is 0.05 corresponding
                          to the 5% significance level.
                          
     "MCTol"              Monte-Carlo standard error for the p-value, PVAL, value.
                          which must be a positive scalar value.  In this case, an
                          approximation for the p-value is computed directly, using
                          Monte-Carlo simulations.
                          
     "Asymptotic"         Method for calculating the p-value of the Anderson-Darling
                          test, which can be either true or false logical value.  If
                          you specify 'true', adtest estimates the p-value using the
                          limiting distribution of the Anderson-Darling test
                          statistic.  If you specify 'false', adtest calculates the
                          p-value based on an analytical formula.  For sample sizes
                          greater than 120, the limiting distribution estimate is
                          likely to be more accurate than the small sample size
                          approximation method.

        • If you specify a distribution family with unknown parameters for the
          distribution Name-Value pair (i.e.  composite distribution hypothesis
          test), the "Asymptotic" option must be false.
        • 
          If you use MCTol to calculate the p-value using a Monte Carlo
          simulation, the "Asymptotic" option must be false.

     ‘[H, PVAL] = adtest (...)’ also returns the p-value, PVAL, of the
     Anderson-Darling test, using any of the input arguments from the previous
     syntaxes.

     ‘[H, PVAL, ADSTAT, CV] = adtest (...)’ also returns the test statistic,
     ADSTAT, and the critical value, CV, for the Anderson-Darling test.

     The Anderson-Darling test statistic belongs to the family of Quadratic
     Empirical Distribution Function statistics, which are based on the weighted
     sum of the difference [Fn(x)-F(x)]^2 over the ordered sample values X1 < X2
     < ... < Xn, where F is the hypothesized continuous distribution and Fn is
     the empirical CDF based on the data sample with n sample points.

     See also: kstest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 49
Anderson-Darling goodness-of-fit hypothesis test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
anova1


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2986
 -- statistics: P = anova1 (X)
 -- statistics: P = anova1 (X, GROUP)
 -- statistics: P = anova1 (X, GROUP, DISPLAYOPT)
 -- statistics: P = anova1 (X, GROUP, DISPLAYOPT, VARTYPE)
 -- statistics: [P, ATAB] = anova1 (X, ...)
 -- statistics: [P, ATAB, STATS] = anova1 (X, ...)

     Perform a one-way analysis of variance (ANOVA) for comparing the means of
     two or more groups of data under the null hypothesis that the groups are
     drawn from distributions with the same mean.  For planned contrasts and/or
     diagnostic plots, use anovan instead.

     anova1 can take up to three input arguments:

        • X contains the data and it can either be a vector or matrix.  If X is
          a matrix, then each column is treated as a separate group.  If X is a
          vector, then the GROUP argument is mandatory.

        • GROUP contains the names for each group.  If X is a matrix, then GROUP
          can either be a cell array of strings of a character array, with one
          row per column of X.  If you want to omit this argument, enter an
          empty array ([]).  If X is a vector, then GROUP must be a vector of
          the same length, or a string array or cell array of strings with one
          row for each element of X.  X values corresponding to the same value
          of GROUP are placed in the same group.

        • DISPLAYOPT is an optional parameter for displaying the groups
          contained in the data in a boxplot.  If omitted, it is 'on' by
          default.  If group names are defined in GROUP, these are used to
          identify the groups in the boxplot.  Use 'off' to omit displaying this
          figure.

        • VARTYPE is an optional parameter to used to indicate whether the
          groups can be assumed to come from populations with equal variance.
          When vartype is "equal" the variances are assumed to be equal (this is
          the default).  When vartype is "unequal" the population variances are
          not assumed to be equal and Welch's ANOVA test is used instead.

     anova1 can return up to three output arguments:

        • P is the p-value of the null hypothesis that all group means are
          equal.

        • ATAB is a cell array containing the results in a standard ANOVA table.

        • STATS is a structure containing statistics useful for performing a
          multiple comparison of means with the MULTCOMPARE function.

     If anova1 is called without any output arguments, then it prints the
     results in a one-way ANOVA table to the standard output.  It is also
     printed when DISPLAYOPT is 'on'.

     Examples:

          x = meshgrid (1:6);
          x = x + normrnd (0, 1, 6, 6);
          anova1 (x, [], 'off');
          [p, atab] = anova1(x);

          x = ones (50, 4) .* [-2, 0, 1, 5];
          x = x + normrnd (0, 2, 50, 4);
          groups = {"A", "B", "C", "D"};
          anova1 (x, groups);

     See also: anova2, anovan, multcompare.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a one-way analysis of variance (ANOVA) for comparing the means of two...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
anova2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2465
 -- statistics: P = anova2 (X, REPS)
 -- statistics: P = anova2 (X, REPS, DISPLAYOPT)
 -- statistics: P = anova2 (X, REPS, DISPLAYOPT, MODEL)
 -- statistics: [P, ATAB] = anova2 (...)
 -- statistics: [P, ATAB, STATS] = anova2 (...)

     Performs two-way factorial (crossed) or a nested analysis of variance
     (ANOVA) for balanced designs.  For unbalanced factorial designs, diagnostic
     plots and/or planned contrasts, use anovan instead.

     anova2 requires two input arguments with an optional third and fourth:

        • X contains the data and it must be a matrix of at least two columns
          and two rows.

        • REPS is the number of replicates for each combination of factor
          groups.

        • DISPLAYOPT is an optional parameter for displaying the ANOVA table,
          when it is 'on' (default) and suppressing the display when it is
          'off'.

        • MODEL is an optional parameter to specify the model type as either:

             • "interaction" or "full" (default): compute both main effects and
               their interaction

             • "linear": compute both main effects without an interaction.  When
               REPS > 1 the test is suitable for a balanced randomized block
               design.  When REPS == 1, the test becomes a One-way Repeated
               Measures (RM)-ANOVA with Greenhouse-Geisser correction to the
               column factor degrees of freedom to make the test robust to
               violations of sphericity

             • "nested": treat the row factor as nested within columns.  Note
               that the row factor is considered a random factor in the
               calculation of the statistics.

     anova2 returns up to three output arguments:

        • P is the p-value of the null hypothesis that all group means are
          equal.

        • ATAB is a cell array containing the results in a standard ANOVA table.

        • STATS is a structure containing statistics useful for performing a
          multiple comparison of means with the MULTCOMPARE function.

     If anova2 is called without any output arguments, then it prints the
     results in a one-way ANOVA table to the standard output as if DISPLAYOPT is
     'on'.

     Examples:

          load popcorn;
          anova2 (popcorn, 3);

          [p, anovatab, stats] = anova2 (popcorn, 3, "off");
          disp (p);

     See also: anova1, anovan, multcompare.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Performs two-way factorial (crossed) or a nested analysis of variance (ANOVA)...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
anovan


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8513
 -- statistics: P = anovan (Y, GROUP)
 -- statistics: P = anovan (Y, GROUP, NAME, VALUE)
 -- statistics: [P, ATAB] = anovan (...)
 -- statistics: [P, ATAB, STATS] = anovan (...)
 -- statistics: [P, ATAB, STATS, TERMS] = anovan (...)

     Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to
     evaluate the effect of one or more categorical or continuous predictors
     (i.e.  independent variables) on a continuous outcome (i.e.  dependent
     variable).  The algorithms used make ‘anovan’ suitable for balanced or
     unbalanced factorial (crossed) designs.  By default, ‘anovan’ treats all
     factors as fixed.  Examples of function usage can be found by entering the
     command ‘demo anovan’.  A bootstrap resampling variant of this function,
     ‘bootlm’, is available in the statistics-resampling package and has similar
     usage.

     Data is a single vector Y with groups specified by a corresponding matrix
     or cell array of group labels GROUP, where each column of GROUP has the
     same number of rows as Y.  For example, if ‘Y = [23; 27; 31; 29; 30; 32];
     GROUP = [1, 2; 1, 3; 1, 2; 2, 3; 2, 3; 3, 2];’ then observation 23 was
     measured under conditions 1,2; observation 27 was measured under conditions
     1,3; and so on.  If the GROUP provided is empty, then the linear model is
     fit with just the intercept (no predictors).

     ‘anovan’ can take a number of optional parameters as name-value pairs.

     ‘[...] = anovan (Y, GROUP, "continuous", CONTINUOUS)’

        • CONTINUOUS is a vector of indices indicating which of the columns
          (i.e.  factors) in GROUP should be treated as continuous predictors
          rather than as categorical predictors.  The relationship between
          continuous predictors and the outcome should be linear.

     ‘[...] = anovan (Y, GROUP, "random", RANDOM)’

        • RANDOM is a vector of indices indicating which of the columns (i.e.
          factors) in GROUP should be treated as random effects rather than
          fixed effects.  Octave ‘anovan’ provides only basic support for random
          effects.  Specifically, since all F-statistics in ‘anovan’ are
          calculated using the mean-squared error (MSE), any interaction terms
          containing a random effect are dropped from the model term definitions
          and their associated variance is pooled with the residual, unexplained
          variance making up the MSE. In effect, the model then fitted equates
          to a linear mixed model with random intercept(s).  Variable names for
          random factors are appended with a ' symbol.

     ‘[...] = anovan (Y, GROUP, "model", MODELTYPE)’

        • MODELTYPE can specified as one of the following:

             • "linear" (default) : compute N main effects with no interactions.

             • "interaction" : compute N effects and N*(N-1) two-factor
               interactions

             • "full" : compute the N main effects and interactions at all
               levels

             • a scalar integer : representing the maximum interaction order

             • a matrix of term definitions : each row is a term and each column
               is a factor

               -- Example:
               A two-way ANOVA with interaction would be: [1 0; 0 1; 1 1]

     ‘[...] = anovan (Y, GROUP, "sstype", SSTYPE)’

        • SSTYPE can specified as one of the following:

             • 1 : Type I sequential sums-of-squares.

             • 2 or "h" : Type II partially sequential (or hierarchical)
               sums-of-squares

             • 3 (default) : Type III partial, constrained or marginal
               sums-of-squares

     ‘[...] = anovan (Y, GROUP, "varnames", VARNAMES)’

        • VARNAMES must be a cell array of strings with each element containing
          a factor name for each column of GROUP.  By default (if not parsed as
          optional argument), VARNAMES are "X1","X2","X3", etc.

     ‘[...] = anovan (Y, GROUP, "alpha", ALPHA)’

        • ALPHA must be a scalar value between 0 and 1 requesting 100*(1-ALPHA)%
          confidence bounds for the regression coefficients returned in
          STATS.coeffs (default 0.05 for 95% confidence).

     ‘[...] = anovan (Y, GROUP, "display", DISPOPT)’

        • DISPOPT can be either "on" (default) or "off" and controls the display
          of the model formula, table of model parameters, the ANOVA table and
          the diagnostic plots.  The F-statistic and p-values are formatted in
          APA-style.  To avoid p-hacking, the table of model parameters is only
          displayed if we set planned contrasts (see below).

     ‘[...] = anovan (Y, GROUP, "contrasts", CONTRASTS)’

        • CONTRASTS can be specified as one of the following:

             • A string corresponding to one of the built-in contrasts listed
               below:

                  • "simple" or "anova" (default): Simple (ANOVA) contrast
                    coding.  (The first level appearing in the GROUP column is
                    the reference level)

                  • "poly": Polynomial contrast coding for trend analysis.

                  • "helmert": Helmert contrast coding: the difference between
                    each level with the mean of the subsequent levels.

                  • "effect": Deviation effect coding.  (The first level
                    appearing in the GROUP column is omitted).

                  • "sdif" or "sdiff": Successive differences contrast coding:
                    the difference between each level with the previous level.

                  • "treatment": Treatment contrast (or dummy) coding.  (The
                    first level appearing in the GROUP column is the reference
                    level).  These contrasts are not compatible with SSTYPE = 3.

             • A matrix containing a custom contrast coding scheme (i.e.  the
               generalized inverse of contrast weights).  Rows in the contrast
               matrices correspond to factor levels in the order that they first
               appear in the GROUP column.  The matrix must contain the same
               number of columns as there are the number of factor levels minus
               one.

          If the anovan model contains more than one factor and a built-in
          contrast coding scheme was specified, then those contrasts are applied
          to all factors.  To specify different contrasts for different factors
          in the model, CONTRASTS should be a cell array with the same number of
          cells as there are columns in GROUP.  Each cell should define
          contrasts for the respective column in GROUP by one of the methods
          described above.  If cells are left empty, then the default contrasts
          are applied.  Contrasts for cells corresponding to continuous factors
          are ignored.

     ‘[...] = anovan (Y, GROUP, "weights", WEIGHTS)’

        • WEIGHTS is an optional vector of weights to be used when fitting the
          linear model.  Weighted least squares (WLS) is used with weights (that
          is, minimizing ‘sum (WEIGHTS * RESIDUALS .^ 2))’; otherwise ordinary
          least squares (OLS) is used (default is empty for OLS).

     ‘anovan’ can return up to four output arguments:

     ‘P = anovan (...)’ returns a vector of p-values, one for each term.

     ‘[P, ATAB] = anovan (...)’ returns a cell array containing the ANOVA table.

     ‘[P, ATAB, STATS] = anovan (...)’ returns a structure containing additional
     statistics, including degrees of freedom and effect sizes for each term in
     the linear model, the design matrix, the variance-covariance matrix,
     (weighted) model residuals, and the mean squared error.  The columns of
     STATS.coeffs (from left-to-right) report the model coefficients, standard
     errors, lower and upper 100*(1-alpha)% confidence interval bounds,
     t-statistics, and p-values relating to the contrasts.  The number appended
     to each term name in STATS.coeffnames corresponds to the column number in
     the relevant contrast matrix for that factor.  The STATS structure can be
     used as input for ‘multcompare’.

     ‘[P, ATAB, STATS, TERMS] = anovan (...)’ returns the model term
     definitions.

     See also: anova1, anova2, multcompare, fitlm.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to evaluat...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4
bar3


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4335
 -- statistics: bar3 (Z)
 -- statistics: bar3 (Y, Z)
 -- statistics: bar3 (..., WIDTH)
 -- statistics: bar3 (..., STYLE)
 -- statistics: bar3 (..., COLOR)
 -- statistics: bar3 (..., NAME, VALUE)
 -- statistics: bar3 (AX, ...)
 -- statistics: P = bar3 (...)

     Plot a 3D bar graph.

     ‘bar3 (Z)’ plots 3D bar graph for the elements of Z.  Each bar corresponds
     to an element in Z, which can be a scalar, vector, or 2D matrix.  By
     default, each column in Z is considered as a series and it is handled as a
     distinct series of bars.  When Z is a vector, unlike MATLAB, which plots it
     as a single series of bars, Octave discriminates between a row and column
     vector of Z.  Hence, when Z is column vector, it is plotted as a single
     series of bars (same color), whereas when Z is row vector, each bar is
     plotted as a different group (different colors).  For an MxN matrix, the
     function plots the bars corresponding to each row on the y-axis ranging
     from 1 to M and each column on the x-axis ranging from 1 to N.

     ‘bar3 (Y, Z)’ plots a 3D bar graph of the elements in Z at the y-values
     specified in Y.  It should be noted that Y only affects the tick names
     along the y-axis rather the actual values.  If you want to specify
     non-numerical values for Y, you can specify it with the paired NAME/VALUE
     syntax shown below.

     ‘bar3 (..., WIDTH)’ sets the width of the bars along the x- and y-axes and
     controls the separation of bars among each other.  WIDTH can take any value
     in the range (0,1].  By default, WIDTH is 0.8 and the bars have a small
     separation.  If width is 1, the bars touch one another.  Alternatively, you
     can define WIDTH as a two- element vector using the paired NAME/VALUE
     syntax shown below, in which case you can control the bar separation along
     each axis independently.

     ‘bar3 (..., STYLE)’ specifies the style of the bars, where STYLE can be
     'detached', 'grouped', or 'stacked'.  The default style is 'detached'.

     ‘bar3 (..., COLOR)’ displays all bars using the color specified by color.
     For example, use 'red' or 'r' to specify all red bars.  When you want to
     specify colors for several groups, COLOR can be a cellstr vector with each
     element specifying the color of each group.  COLOR can also be specified as
     a numerical Mx3 matrix, where each row corresponds to a RGB value with its
     elements in the range [0,1].  If only one color is specified, then it
     applies to all bars.  If the number of colors equals the number of groups,
     then each color is applied to each group.  If the number of colors equals
     the number of elements in Z, then each individual bar is assigned the
     particular color.  You can also define COLOR using the paired NAME/VALUE
     syntax shown below.

     ‘bar3 (..., NAME, VALUE)’ specifies one or more of the following name/value
     pairs:

          Name             Value
     -----------------------------------------------------------------------------------
          "width"          A two-element vector specifying the width of the bars
                           along the x- and y-axes, respectively.  Each element must
                           be in the range (0,1].
                           
          "color"          A character or a cellstr vector, or a numerical Mx3 matrix
                           following the same conventions as the COLOR input
                           argument.
                           
          "xlabel"         A cellstr vector specifying the group names along the
                           x-axis.
                           
          "ylabel"         A cellstr vector specifying the names of the bars in the
                           same series along the y-axis.

     ‘bar3 (AX, ...)’ can also take an axes handle AX as a first argument in
     which case it plots into the axes specified by AX instead of into the
     current axes specified by ‘gca ()’.  The optional argument AX can precede
     any of the input argument combinations in the previous syntaxes.

     ‘P = bar3 (...)’ returns a patch handle P, which can be used to set
     properties of the bars after displaying the 3D bar graph.

     See also: boxplot, hist3.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 20
Plot a 3D bar graph.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
bar3h


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4362
 -- statistics: bar3h (Y)
 -- statistics: bar3h (Z, Y)
 -- statistics: bar3h (..., WIDTH)
 -- statistics: bar3h (..., STYLE)
 -- statistics: bar3h (..., COLOR)
 -- statistics: bar3h (..., NAME, VALUE)
 -- statistics: bar3h (AX, ...)
 -- statistics: P = bar3h (...)

     Plot a horizontal 3D bar graph.

     ‘bar3h (Y)’ plots 3D bar graph for the elements of Y.  Each bar corresponds
     to an element in Y, which can be a scalar, vector, or 2D matrix.  By
     default, each column in Y is considered as a series and it is handled as a
     distinct series of bars.  When Y is a vector, unlike MATLAB, which plots it
     as a single series of bars, Octave distinguishes between a row and column
     vector of Y.  Hence, when Y is column vector, it is plotted as a single
     series of bars (same color), whereas when Y is row vector, each bar is
     plotted as a different group (different colors).  For an MxN matrix, the
     function plots the bars corresponding to each row on the z-axis ranging
     from 1 to M and each column on the x-axis ranging from 1 to N.

     ‘bar3h (Z, Y)’ plots a 3D bar graph of the elements in Y at the z-values
     specified in Z.  It should be noted that Z only affects the tick names
     along the z-axis rather the actual values.  If you want to specify
     non-numerical values for Z, you can specify it with the paired NAME/VALUE
     syntax shown below.

     ‘bar3h (..., WIDTH)’ sets the width of the bars along the x- and z-axes and
     controls the separation of bars among each other.  WIDTH can take any value
     in the range (0,1].  By default, WIDTH is 0.8 and the bars have a small
     separation.  If width is 1, the bars touch one another.  Alternatively, you
     can define WIDTH as a two- element vector using the paired NAME/VALUE
     syntax shown below, in which case you can control the bar separation along
     each axis independently.

     ‘bar3h (..., STYLE)’ specifies the style of the bars, where STYLE can be
     'detached', 'grouped', or 'stacked'.  The default style is 'detached'.

     ‘bar3h (..., COLOR)’ displays all bars using the color specified by color.
     For example, use 'red' or 'r' to specify all red bars.  When you want to
     specify colors for several groups, COLOR can be a cellstr vector with each
     element specifying the color of each group.  COLOR can also be specified as
     a numerical Mx3 matrix, where each row corresponds to a RGB value with its
     elements in the range [0,1].  If only one color is specified, then it
     applies to all bars.  If the number of colors equals the number of groups,
     then each color is applied to each group.  If the number of colors equals
     the number of elements in Y, then each individual bar is assigned the
     particular color.  You can also define COLOR using the paired NAME/VALUE
     syntax shown below.

     ‘bar3h (..., NAME, VALUE)’ specifies one or more of the following
     name/value pairs:

          Name             Value
     -----------------------------------------------------------------------------------
          "width"          A two-element vector specifying the width of the bars
                           along the x- and z-axes, respectively.  Each element must
                           be in the range (0,1].
                           
          "color"          A character or a cellstr vector, or a numerical Mx3 matrix
                           following the same conventions as the COLOR input
                           argument.
                           
          "xlabel"         A cellstr vector specifying the group names along the
                           x-axis.
                           
          "zlabel"         A cellstr vector specifying the names of the bars in the
                           same series along the z-axis.

     ‘bar3h (AX, ...)’ can also take an axes handle AX as a first argument in
     which case it plots into the axes specified by AX instead of into the
     current axes specified by ‘gca ()’.  The optional argument AX can precede
     any of the input argument combinations in the previous syntaxes.

     ‘P = bar3h (...)’ returns a patch handle P, which can be used to set
     properties of the bars after displaying the 3D bar graph.

     See also: boxplot, hist3.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 31
Plot a horizontal 3D bar graph.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 13
bartlett_test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2031
 -- statistics: H = bartlett_test (X)
 -- statistics: H = bartlett_test (X, GROUP)
 -- statistics: H = bartlett_test (X, ALPHA)
 -- statistics: H = bartlett_test (X, GROUP, ALPHA)
 -- statistics: [H, PVAL] = bartlett_test (...)
 -- statistics: [H, PVAL, CHISQ] = bartlett_test (...)
 -- statistics: [H, PVAL, CHISQ, DF] = bartlett_test (...)

     Perform a Bartlett test for the homogeneity of variances.

     Under the null hypothesis of equal variances, the test statistic CHISQ
     approximately follows a chi-square distribution with DF degrees of freedom.

     The p-value (1 minus the CDF of this distribution at CHISQ) is returned in
     PVAL.  H = 1 if the null hypothesis is rejected at the significance level
     of ALPHA.  Otherwise H = 0.

     Input Arguments:

        • X contains the data and it can either be a vector or matrix.  If X is
          a matrix, then each column is treated as a separate group.  If X is a
          vector, then the GROUP argument is mandatory.  NaN values are omitted.

        • GROUP contains the names for each group.  If X is a vector, then GROUP
          must be a vector of the same length, or a string array or cell array
          of strings with one row for each element of X.  X values corresponding
          to the same value of GROUP are placed in the same group.  If X is a
          matrix, then GROUP can either be a cell array of strings of a
          character array, with one row per column of X in the same way it is
          used in ‘anova1’ function.  If X is a matrix, then GROUP can be
          omitted either by entering an empty array ([]) or by parsing only
          ALPHA as a second argument (if required to change its default value).

        • ALPHA is the statistical significance value at which the null
          hypothesis is rejected.  Its default value is 0.05 and it can be
          parsed either as a second argument (when GROUP is omitted) or as a
          third argument.

     See also: levene_test, vartest2, vartestn.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Perform a Bartlett test for the homogeneity of variances.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
barttest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1088
 -- statistics: NDIM = barttest (X)
 -- statistics: NDIM = barttest (X, ALPHA)
 -- statistics: [NDIM, PVAL] = barttest (X, ALPHA)
 -- statistics: [NDIM, PVAL, CHISQ] = barttest (X, ALPHA)

     Bartlett's test of sphericity for correlation.

     It compares an observed correlation matrix to the identity matrix in order
     to check if there is a certain redundancy between the variables that we can
     summarize with a few number of factors.  A statistically significant test
     shows that the variables (columns) in X are correlated, thus it makes sense
     to perform some dimensionality reduction of the data in X.

     ‘NDIM = barttest (X, ALPHA)’ returns the number of dimensions necessary to
     explain the nonrandom variation in the data matrix X at the ALPHA
     significance level.  ALPHA is an optional input argument and, when not
     provided, it is 0.05 by default.

     ‘[NDIM, PVAL, CHISQ] = barttest (...)’ also returns the significance values
     PVAL for the hypothesis test for each dimension as well as the associated
     chi^2 values in CHISQ


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Bartlett's test of sphericity for correlation.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
binotest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1178
 -- statistics: [H, PVAL, CI] = binotest (POS, N, P0)
 -- statistics: [H, PVAL, CI] = binotest (POS, N, P0, NAME, VALUE)

     Test for probability P of a binomial sample

     Perform a test of the null hypothesis P == P0 for a sample of size N with
     POS positive results.

     Name-Value pair arguments can be used to set various options.  "alpha" can
     be used to specify the significance level of the test (the default value is
     0.05).  The option "tail", can be used to select the desired alternative
     hypotheses.  If the value is "both" (default) the null is tested against
     the two-sided alternative ‘P != P0’.  The value of PVAL is determined by
     adding the probabilities of all event less or equally likely than the
     observed number POS of positive events.  If the value of "tail" is "right"
     the one-sided alternative ‘P > P0’ is considered.  Similarly for "left",
     the one-sided alternative ‘P < P0’ is considered.

     If H is 0 the null hypothesis is accepted, if it is 1 the null hypothesis
     is rejected.  The p-value of the test is returned in PVAL.  A 100(1-alpha)%
     confidence interval is returned in CI.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 43
Test for probability P of a binomial sample



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
boxplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11421
 -- statistics: S = boxplot (DATA)
 -- statistics: S = boxplot (DATA, GROUP)
 -- statistics: S = boxplot (DATA, NOTCHED, SYMBOL, ORIENTATION, WHISKER, ...)
 -- statistics: S = boxplot (DATA, GROUP, NOTCHED, SYMBOL, ORIENTATION, WHISKER,
          ...)
 -- statistics: S = boxplot (DATA, OPTIONS)
 -- statistics: S = boxplot (DATA, GROUP, OPTIONS, ...)
 -- statistics: [..., H] = boxplot (DATA, ...)

     Produce a box plot.

     A box plot is a graphical display that simultaneously describes several
     important features of a data set, such as center, spread, departure from
     symmetry, and identification of observations that lie unusually far from
     the bulk of the data.

     Input arguments (case-insensitive) recognized by boxplot are:

        • DATA is a matrix with one column for each data set, or a cell vector
          with one cell for each data set.  Each cell must contain a numerical
          row or column vector (NaN and NA are ignored) and not a nested vector
          of cells.

        • NOTCHED = 1 produces a notched-box plot.  Notches represent a robust
          estimate of the uncertainty about the median.

          NOTCHED = 0 (default) produces a rectangular box plot.

          NOTCHED within the interval (0,1) produces a notch of the specified
          depth.  Notched values outside (0,1) are amusing if not exactly
          impractical.

        • SYMBOL sets the symbol for the outlier values.  The default symbol for
          points that lie outside 3 times the interquartile range is 'o'; the
          default symbol for points between 1.5 and 3 times the interquartile
          range is '+'.
          Alternative SYMBOL settings:

          SYMBOL = '.': points between 1.5 and 3 times the IQR are marked with
          '.'  and points outside 3 times IQR with 'o'.

          SYMBOL = ['x','*']: points between 1.5 and 3 times the IQR are marked
          with 'x' and points outside 3 times IQR with '*'.

        • ORIENTATION = 0 makes the boxes horizontally.
          ORIENTATION = 1 plots the boxes vertically (default).  Alternatively,
          orientation can be passed as a string, e.g., 'vertical' or
          'horizontal'.

        • WHISKER defines the length of the whiskers as a function of the IQR
          (default = 1.5).  If WHISKER = 0 then ‘boxplot’ displays all data
          values outside the box using the plotting symbol for points that lie
          outside 3 times the IQR.

        • GROUP may be passed as an optional argument only in the second
          position after DATA.  GROUP can be a numeric, character, string, or
          categorical vector defining separate categories.  To group by multiple
          variables simultaneously, pass a cell array of grouping vectors (e.g.,
          ‘{group1, group2}’).  A separate box is plotted for each unique
          combination of group values.  All grouping variables must have the
          same length as DATA.

        • OPTIONS are additional paired arguments passed with the formalism
          (Name, Value) that provide extra functionality as listed below.
          OPTIONS can be passed at any order after the initial arguments and are
          case-insensitive.

          'Notch''on'    Notched by
                         0.25 of the
                         boxes
                         width.
                 'off'   Produces a
                         straight
                         box.
                 scalar  Proportional
                         width of
                         the notch.
                         
          'Symbol''.'    Defines
                         only
                         outliers
                         between 1.5
                         and 3 IQR.
                 ['x','*']2nd
                         character
                         defines
                         outliers >
                         3 IQR
                         
          'Orientation''vertical'Default
                         value, can
                         also be
                         defined
                         with
                         numerical
                         1.
                 'horizontal'Can also be
                         defined
                         with
                         numerical
                         0.
                         
          'Whisker'scalarMultiplier
                         of IQR
                         (default is
                         1.5).
                         
          'OutlierTags''on'Plot the
                 or 1    vector
                         index of
                         the outlier
                         value next
                         to its
                         point.
                 'off'   No tags are
                 or 0    plotted
                         (default
                         value).
                         
          'Sample_IDs''cell'A cell
                         vector with
                         one cell
                         for each
                         data set
                         containing
                         a nested
                         cell vector
                         with each
                         sample's ID
                         (should be
                         a string).
                         If this
                         option is
                         passed,
                         then all
                         outliers
                         are tagged
                         with their
                         respective
                         sample's ID
                         string
                         instead of
                         their
                         vector's
                         index.
                         
          'BoxWidth''proportional'Create
                         boxes with
                         their width
                         proportional
                         to the
                         number of
                         samples in
                         their
                         respective
                         dataset
                         (default
                         value).
                 'fixed' Make all
                         boxes with
                         equal
                         width.
                         
          'Widths'scalar Scaling
                         factor for
                         box widths
                         (default
                         value is
                         0.4).
                         
          'CapWidths'scalarScaling
                         factor for
                         whisker cap
                         widths
                         (default
                         value is 1,
                         which
                         results to
                         'Widths'/8
                         halflength)
                         
          'BoxStyle''outline'Draw boxes
                         as outlines
                         (default
                         value).
                 'filled'Fill boxes
                         with a
                         color
                         (outlines
                         are still
                         plotted).
                         
          'Positions'vectorNumerical
                         vector that
                         defines the
                         position of
                         each data
                         set.  It
                         must have
                         the same
                         length as
                         the number
                         of groups
                         in a
                         desired
                         manner.
                         This vector
                         merely
                         defines the
                         points
                         along the
                         group axis,
                         which by
                         default is
                         [1:number
                         of groups].
                         
          'Labels'cell   A cell
                         vector of
                         strings
                         containing
                         the names
                         of each
                         group.  By
                         default
                         each group
                         is labeled
                         numerically.
                         If multiple
                         grouping
                         variables
                         are
                         provided,
                         default
                         labels are
                         automatically
                         generated
                         by joining
                         the
                         category
                         names and
                         stacked
                         hierarchically.
                         
          'Colors'characterIf just one
                 string  character
                 or      or 1x3
                 Nx3     vector of
                 numericalRGB values,
                 matrix  specify the
                         fill color
                         of all
                         boxes when
                         BoxStyle =
                         'filled'.
                         If a
                         character
                         string or
                         Nx3 matrix
                         is entered,
                         box #1's
                         fill color
                         corresponds
                         to the
                         first
                         character
                         or first
                         matrix row,
                         and the
                         next boxes'
                         fill colors
                         corresponds
                         to the next
                         characters
                         or rows.
                         If the char
                         string or
                         Nx3 array
                         is
                         exhausted
                         the color
                         selection
                         wraps
                         around.

     Supplemental arguments not described above (...) are concatenated and
     passed to the plot() function.

     The returned matrix S has one column for each data set as follows:

     1        Minimum
     2        1st quartile
     3        2nd quartile (median)
     4        3rd quartile
     5        Maximum
     6        Lower confidence limit for median
     7        Upper confidence limit for median

     The returned structure H contains handles to the plot elements, allowing
     customization of the visualization using set/get functions.

     Example

          title ("Grade 3 heights");
          axis ([0,3]);
          set(gca (), "xtick", [1 2], "xticklabel", {"girls", "boys"});
          boxplot ({randn(10,1)*5+140, randn(13,1)*8+135});


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 19
Produce a box plot.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
canoncorr


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 535
 -- statistics: [A, B, R, U, V] = canoncorr (X, Y)

     Canonical correlation analysis.

     Given X (size K*M) and Y (K*N), returns projection matrices of canonical
     coefficients A (size M*D, where D is the smallest of M, N, D) and B (size
     M*D); the canonical correlations R (1*D, arranged in decreasing order); the
     canonical variables U, V (both K*D, with orthonormal columns); and STATS, a
     structure containing results from Bartlett's chi-square and Rao's F tests
     of significance.

     See also: princomp.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 31
Canonical correlation analysis.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
cdfcalc


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 812
 -- statistics: [YCDF, XCDF, N, EMSG, EID] = cdfcalc (X)

     Calculate an empirical cumulative distribution function.

     ‘[YCDF, XCDF] = cdfcalc (X)’ calculates an empirical cumulative
     distribution function (CDF) of the observations in the data sample vector
     X.  X may be a row or column vector, and represents a random sample of
     observations from some underlying distribution.  On return XCDF is the set
     of X values at which the CDF increases.  At XCDF(i), the function increases
     from YCDF(i) to YCDF(i+1).

     ‘[YCDF, XCDF, N] = cdfcalc (X)’ also returns N, the sample size.

     ‘[YCDF, XCDF, N, EMSG, EID] = cdfcalc (X)’ also returns an error message
     and error id if X is not a vector or if it contains no values other than
     NaN.

     See also: cdfplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 56
Calculate an empirical cumulative distribution function.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
cdfplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1235
 -- statistics: HCDF = cdfplot (X)
 -- statistics: [HCDF, STATS] = cdfplot (X)

     Display an empirical cumulative distribution function.

     ‘HCDF = cdfplot (X)’ plots an empirical cumulative distribution function
     (CDF) of the observations in the data sample vector X.  X may be a row or
     column vector, and represents a random sample of observations from some
     underlying distribution.

     ‘cdfplot’ plots F(x), the empirical (or sample) CDF versus the observations
     in X.  The empirical CDF, F(x), is defined as follows:

     F(x) = (Number of observations <= x) / (Total number of observations)

     for all values in the sample vector X.  NaNs are ignored.  HCDF is the
     handle of the empirical CDF curve (a handle graphics 'line' object).

     ‘[HCDF, STATS] = cdfplot (X)’ also returns a structure with the following
     fields as a statistical summary.

          STATS.min                minimum value of X
          STATS.max                maximum value of X
          STATS.mean               sample mean of X
          STATS.median             sample median (50th percentile) of X
          STATS.std                sample standard deviation of X

     See also: qqplot, cdfcalc.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Display an empirical cumulative distribution function.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
chi2gof


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4603
 -- statistics: H = chi2gof (X)
 -- statistics: [H, P] = chi2gof (X)
 -- statistics: [P, H, STATS] = chi2gof (X)
 -- statistics: [...] = chi2gof (X, NAME, VALUE, ...)

     Chi-square goodness-of-fit test.

     ‘chi2gof’ performs a chi-square goodness-of-fit test for discrete or
     continuous distributions.  The test is performed by grouping the data into
     bins, calculating the observed and expected counts for those bins, and
     computing the chi-square test statistic SUM((O-E).^2./E), where O is the
     observed counts and E is the expected counts.  This test statistic has an
     approximate chi-square distribution when the counts are sufficiently large.

     Bins in either tail with an expected count less than 5 are pooled with
     neighboring bins until the count in each extreme bin is at least 5.  If
     bins remain in the interior with counts less than 5, ‘chi2gof’ displays a
     warning.  In that case, you should use fewer bins, or provide bin centers
     or binedges, to increase the expected counts in all bins.

     ‘H = chi2gof (X)’ performs a chi-square goodness-of-fit test that the data
     in the vector X are a random sample from a normal distribution with mean
     and variance estimated from X.  The result is H = 0 if the null hypothesis
     (that X is a random sample from a normal distribution) cannot be rejected
     at the 5% significance level, or H = 1 if the null hypothesis can be
     rejected at the 5% level.  ‘chi2gof’ uses by default 10 bins ("nbins"), and
     compares the test statistic to a chi-square distribution with NBINS - 3
     degrees of freedom, to take into account that two parameters were
     estimated.

     ‘[H, P] = chi2gof (X)’ also returns the p-value P, which is the probability
     of observing the given result, or one more extreme, by chance if the null
     hypothesis is true.  If there are not enough degrees of freedom to carry
     out the test, P is NaN.

     ‘[H, P, STATS] = chi2gof (X)’ also returns a STATS structure with the
     following fields:

          "chi2stat"               Chi-square statistic
          "df"                     Degrees of freedom
          "binedges"               Vector of bin binedges after pooling
          "O"                      Observed count in each bin
          "E"                      Expected count in each bin

     ‘[...] = chi2gof (X, NAME, VALUE, ...)’ specifies optional Name/Value pair
     arguments chosen from the following list.

          Name             Value
     -----------------------------------------------------------------------------------
          "nbins"          The number of bins to use.  Default is 10.
          "binctrs"        A vector of bin centers.
          "binedges"       A vector of bin binedges.
          "cdf"            A fully specified cumulative distribution function or a
                           function handle provided in a cell array whose first
                           element is a function handle, and all later elements are
                           its parameter values.  The function must take X values as
                           its first argument, and other parameters as later
                           arguments.
          "expected"       A vector with one element per bin specifying the expected
                           counts for each bin.
          "nparams"        The number of estimated parameters; used to adjust the
                           degrees of freedom to be NBINS - 1 - NPARAMS, where NBINS
                           is the number of bins.
          "emin"           The minimum allowed expected value for a bin; any bin in
                           either tail having an expected value less than this amount
                           is pooled with a neighboring bin.  Use the value 0 to
                           prevent pooling.  Default is 5.
          "frequency"      A vector of the same length as X containing the frequency
                           of the corresponding X values.
          "alpha"          An ALPHA value such that the hypothesis is rejected if P <
                           ALPHA.  Default is ALPHA = 0.05.

     You should specify either "cdf" or "expected" parameters, but not both.  If
     your "cdf" input contains extra parameters, these are accounted for
     automatically and there is no need to specify "nparams".  If your
     "expected" input depends on estimated parameters, you should use the
     "nparams" parameter to ensure that the degrees of freedom for the test is
     correct.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 32
Chi-square goodness-of-fit test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
chi2test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4458
 -- statistics: PVAL = chi2test (X)
 -- statistics: [PVAL, CHISQ] = chi2test (X)
 -- statistics: [PVAL, CHISQ, DF] = chi2test (X)
 -- statistics: [PVAL, CHISQ, DF, E] = chi2test (X)
 -- statistics: [...] = chi2test (X, NAME, VALUE)

     Perform a chi-squared test (for independence or homogeneity).

     For 2-way contingency tables, ‘chi2test’ performs and a chi-squared test
     for independence or homogeneity, according to the sampling scheme and
     related question.  Independence means that the two variables forming the
     2-way table are not associated, hence you cannot predict from one another.
     Homogeneity refers to the concept of similarity, hence they all come from
     the same distribution.

     Both tests are computationally identical and will produce the same result.
     Nevertheless, they answer to different questions.  Consider two variables,
     one for gender and another for smoking.  To test independence (whether
     gender and smoking is associated), we would randomly sample from the
     general population and break them down into categories in the table.  To
     test homogeneity (whether men and women share the same smoking habits), we
     would sample individuals from within each gender, and then measure their
     smoking habits (e.g.  smokers vs non-smokers).

     When ‘chi2test’ is called without any output arguments, it will print the
     result in the terminal including p-value, chi^2 statistic, and degrees of
     freedom.  Otherwise it can return the following output arguments:

          PVAL     the p-value of the relevant test.
          CHISQ    the chi^2 statistic of the relevant test.
          DF       the degrees of freedom of the relevant test.
          E        the EXPECTED values of the original contingency table.

     Unlike MATLAB, in GNU Octave ‘chi2test’ also supports 3-way tables, which
     involve three categorical variables (each in a different dimension of X.
     In its simplest form, ‘[...] = chi2test (X)’ will will test for mutual
     independence among the three variables.  Alternatively, when called in the
     form ‘[...] = chi2test (X, NAME, VALUE)’, it can perform the following
     tests:

     NAME             VALUE    Description
     -----------------------------------------------------------------------------------
     "mutual"         []       Mutual independence.  All variables are independent
                               from each other, (A, B, C). Value must be an empty
                               matrix.
     "joint"          scalar   Joint independence.  Two variables are jointly
                               independent of the third, (AB, C). The scalar value
                               corresponds to the dimension of the independent
                               variable (i.e.  3 for C).
     "marginal"       scalar   Marginal independence.  Two variables are independent
                               if you ignore the third, (A, C). The scalar value
                               corresponds to the dimension of the variable to be
                               ignored (i.e.  2 for B).
     "conditional"    scalar   Conditional independence.  Two variables are
                               independent given the third, (AC, BC). The scalar
                               value corresponds to the dimension of the variable
                               that forms the conditional dependence (i.e.  3 for C).
     "homogeneous"    []       Homogeneous associations.  Conditional (partial)
                               odds-ratios are not related on the value of the third,
                               (AB, AC, BC). Value must be an empty matrix.

     When testing for homogeneous associations in 3-way tables, the iterative
     proportional fitting procedure is used.  For small samples it is better to
     use the Cochran-Mantel-Haenszel Test.  K-way tables for k > 3 are supported
     only for testing mutual independence.  Similar to 2-way tables, no optional
     parameters are required for k > 3 multi-way tables.

     ‘chi2test’ produces a warning if any cell of a 2x2 table has an expected
     frequency less than 5 or if more than 20% of the cells in larger 2-way
     tables have expected frequencies less than 5 or any cell with expected
     frequency less than 1.  In such cases, use ‘fishertest’.

     See also: crosstab, fishertest, mcnemar_test.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 61
Perform a chi-squared test (for independence or homogeneity).



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
cholcov


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1399
 -- statistics: T = cholcov (SIGMA)
 -- statistics: [T, P = cholcov (SIGMA)
 -- statistics: [...] = cholcov (SIGMA, FLAG)

     Cholesky-like decomposition for covariance matrix.

     ‘T = cholcov (SIGMA)’ computes matrix T such that SIGMA = T' T.  SIGMA must
     be square, symmetric, and positive semi-definite.

     If SIGMA is positive definite, then T is the square, upper triangular
     Cholesky factor.  If SIGMA is not positive definite, T is computed with an
     eigenvalue decomposition of SIGMA, but in this case T is not necessarily
     triangular or square.  Any eigenvectors whose corresponding eigenvalue is
     close to zero (within a tolerance) are omitted.  If any remaining
     eigenvalues are negative, T is empty.

     The tolerance is calculated as ‘10 * eps (max (abs (diag (sigma))))’.

     ‘[T, P = cholcov (SIGMA)’ returns in P the number of negative eigenvalues
     of SIGMA.  If P > 0, then T is empty, whereas if P = 0, SIGMA) is positive
     semi-definite.

     If SIGMA is not square and symmetric, P is NaN and T is empty.

     ‘[T, P = cholcov (SIGMA, 0)’ returns P = 0 if SIGMA is positive definite,
     in which case T is the Cholesky factor.  If SIGMA is not positive definite,
     P is a positive integer and T is empty.

     ‘[...] = cholcov (SIGMA, 1)’ is equivalent to ‘ [...] = cholcov (SIGMA)’.

     See also: chov.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 50
Cholesky-like decomposition for covariance matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
cl_multinom


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3330
 -- statistics: CL = cl_multinom (X, N, B)
 -- statistics: CL = cl_multinom (X, N, B, METHOD)

     Confidence level of multinomial portions.

     ‘cl_multinom’ returns confidence level of multinomial parameters estimated
     as p = X / sum(X) with predefined confidence interval B.  Finite population
     is also considered.

     This function calculates the level of confidence at which the samples
     represent the true distribution given that there is a predefined tolerance
     (confidence interval).  This is the upside down case of the typical
     exercises at which we want to get the confidence interval given the
     confidence level (and the estimated parameters of the underlying
     distribution).  But once we accept (lets say at elections) that we have a
     standard predefined maximal acceptable error rate (e.g.  B=0.02 ) in the
     estimation and we just want to know that how sure we can be that the
     measured proportions are the same as in the entire population (ie.  the
     expected value and mean of the samples are roughly the same) we need to use
     this function.

     Arguments
     ---------

     Variable   Type       Description
     -------------------------------------------------------------------------------------
     X          int        sample frequencies bins.
                vector
     N          int        Population size that was sampled by X.  If N < sum (X),
                scalar     infinite number assumed.
     B          real       confidence interval.  If vector, it should be the size of X
                vector     containing confidence interval for each cells.  If scalar,
                           each cell will have the same value of b unless it is zero or
                           -1.  If value is 0, B = 0.02 is assumed which is standard
                           choice at elections otherwise it is calculated in a way that
                           one sample in a cell alteration defines the confidence
                           interval.
     METHOD     string     An optional argument for defining the calculation method.
                           Available choices are "bromaghin" (default), "cochran", and
                           agresti_cull.

     Note!  The agresti_cull method is not exactly the solution at reference
     given below but an adjustment of the solutions above.

     Returns
     -------

     Confidence level.

     Example
     -------

     CL = cl_multinom ([27; 43; 19; 11], 10000, 0.05) returns 0.69 confidence
     level.

     References
     ----------

       1. "bromaghin" calculation type (default) is based on the article:

          Jeffrey F. Bromaghin, "Sample Size Determination for Interval
          Estimation of Multinomial Probabilities", The American Statistician
          vol 47, 1993, pp 203-206.

       2. "cochran" calculation type is based on article:

          Robert T. Tortora, "A Note on Sample Size Estimation for Multinomial
          Populations", The American Statistician, , Vol 32.  1978, pp 100-102.

       3. "agresti_cull" calculation type is based on article:

          A. Agresti and B.A. Coull, "Approximate is better than 'exact' for
          interval estimation of binomial portions", The American Statistician,
          Vol.  52, 1998, pp 119-126


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 41
Confidence level of multinomial portions.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
cluster


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1214
 -- statistics: T = cluster (Z, "Cutoff", C)
 -- statistics: T = cluster (Z, "Cutoff", C, "Depth", D)
 -- statistics: T = cluster (Z, "Cutoff", C, "Criterion", CRITERION)
 -- statistics: T = cluster (Z, "MaxClust", N)

     Define clusters from an agglomerative hierarchical cluster tree.

     Given a hierarchical cluster tree Z generated by the ‘linkage’ function,
     ‘cluster’ defines clusters, using a threshold value C to identify new
     clusters ('Cutoff') or according to a maximum number of desired clusters N
     ('MaxClust').

     CRITERION is used to choose the criterion for defining clusters, which can
     be either "inconsistent" (default) or "distance".  When using
     "inconsistent", ‘cluster’ compares the threshold value C to the
     inconsistency coefficient of each link; when using "distance", ‘cluster’
     compares the threshold value C to the height of each link.  D is the depth
     used to evaluate the inconsistency coefficient, its default value is 2.

     ‘cluster’ uses "distance" as a criterion for defining new clusters when it
     is used with the 'MaxClust' method.

     See also: clusterdata, dendrogram, inconsistent, kmeans, linkage, pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 64
Define clusters from an agglomerative hierarchical cluster tree.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
clusterdata


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 868
 -- statistics: T = clusterdata (X, CUTOFF)
 -- statistics: T = clusterdata (X, NAME, VALUE)

     Wrapper function for ‘linkage’ and ‘cluster’.

     If CUTOFF is used, then ‘clusterdata’ calls ‘linkage’ and ‘cluster’ with
     default value, using CUTOFF as a threshold value for ‘cluster’.  If CUTOFF
     is an integer and greater or equal to 2, then CUTOFF is interpreted as the
     maximum number of cluster desired and the "MaxClust" option is used for
     ‘cluster’.

     If CUTOFF is not used, then ‘clusterdata’ expects a list of pair arguments.
     Then you must specify either the "Cutoff" or "MaxClust" option for
     ‘cluster’.  The method and metric used by ‘linkage’, are defined through
     the "linkage" and "distance" arguments.

     See also: cluster, dendrogram, inconsistent, kmeans, linkage, pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 61
Wrapper function for ‘linkage’ and ‘cluster’.

   If 



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
cmdscale


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2701
 -- statistics: Y = cmdscale (D)
 -- statistics: [Y, E] = cmdscale (D)

     Classical multidimensional scaling of a matrix.

     Takes an N by N distance (or difference, similarity, or dissimilarity)
     matrix D.  Returns Y, a matrix of N points with coordinates in P
     dimensional space which approximate those distances (or differences,
     similarities, or dissimilarities).  Also returns the eigenvalues E of ‘B =
     -1/2 * J * (D.^2) * J’, where ‘J = eye(N) - ones(N,N)/N’.  P, the number of
     columns of Y, is equal to the number of positive real eigenvalues of B.

     D can be a full or sparse matrix or a vector of length ‘N*(N-1)/2’
     containing the upper triangular elements (like the output of the ‘pdist’
     function).  It must be symmetric with non-negative entries whose values are
     further restricted by the type of matrix being represented:

     * If D is either a distance, dissimilarity, or difference matrix, then it
     must have zero entries along the main diagonal.  In this case the points Y
     equal or approximate the distances given by D.

     * If D is a similarity matrix, the elements must all be less than or equal
     to one, with ones along the main diagonal.  In this case the points Y equal
     or approximate the distances given by ‘D = sqrt(ones(N,N)-D)’.

     D is a Euclidean matrix if and only if B is positive semi-definite.  When
     this is the case, then Y is an exact representation of the distances given
     in D.  If D is non-Euclidean, Y only approximates the distance given in D.
     The approximation used by ‘cmdscale’ minimizes the statistical loss
     function known as STRAIN.

     The returned Y is an N by P matrix showing possible coordinates of the
     points in P dimensional space (‘P < N’).  The columns correspond to the
     positive eigenvalues of B in descending order.  A translation, rotation, or
     reflection of the coordinates given by Y will satisfy the same distance
     matrix up to the limits of machine precision.

     For any ‘K <= P’, if the largest K positive eigenvalues of B are
     significantly greater in absolute magnitude than its other eigenvalues, the
     first K columns of Y provide a K-dimensional reduction of Y which
     approximates the distances given by D.  The optional return E can be used
     to consider various values of K, or to evaluate the accuracy of specific
     dimension reductions (e.g., ‘K = 2’).

     Reference: Ingwer Borg and Patrick J.F. Groenen (2005), Modern
     Multidimensional Scaling, Second Edition, Springer, ISBN: 978-0-387-25150-9
     (Print) 978-0-387-28981-6 (Online)

     See also: pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 47
Classical multidimensional scaling of a matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
combnk


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 89
 -- statistics: C = combnk (DATA, K)

     Return all combinations of K elements in DATA.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Return all combinations of K elements in DATA.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 14
confusionchart


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1892
 -- statistics: confusionchart (TRUELABELS, PREDICTEDLABELS)
 -- statistics: confusionchart (M)
 -- statistics: confusionchart (M, CLASSLABELS)
 -- statistics: confusionchart (PARENT, ...)
 -- statistics: confusionchart (..., PROP, VAL, ...)
 -- statistics: CM = confusionchart (...)

     Display a chart of a confusion matrix.

     The two vectors of values TRUELABELS and PREDICTEDLABELS, which are used to
     compute the confusion matrix, must be defined with the same format as the
     inputs of ‘confusionmat’.  Otherwise a confusion matrix M as computed by
     ‘confusionmat’ can be given.

     CLASSLABELS is an array of labels, i.e.  the list of the class names.

     If the first argument is a handle to a ‘figure’ or to a ‘uipanel’, then the
     confusion matrix chart is displayed inside that object.

     Optional property/value pairs are passed directly to the underlying
     objects, e.g.  "xlabel", "ylabel", "title", "fontname", "fontsize" etc.

     The optional return value CM is a ‘ConfusionMatrixChart’ object.  Specific
     properties of a ‘ConfusionMatrixChart’ object are:
        • "DiagonalColor" The color of the patches on the diagonal, default is
          [0.0, 0.4471, 0.7412].

        • "OffDiagonalColor" The color of the patches off the diagonal, default
          is [0.851, 0.3255, 0.098].

        • "GridVisible" Available values: on (default), off.

        • "Normalization" Available values: absolute (default),
          column-normalized, row-normalized, total-normalized.

        • "ColumnSummary" Available values: off (default), absolute,
          column-normalized,total-normalized.

        • "RowSummary" Available values: off (default), absolute,
          row-normalized, total-normalized.

     Run ‘demo confusionchart’ to see some examples.

     See also: confusionmat, sortClasses.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 38
Display a chart of a confusion matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 12
confusionmat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1353
 -- statistics: C = confusionmat (GROUP, GROUPHAT)
 -- statistics: C = confusionmat (GROUP, GROUPHAT, "Order", GROUPORDER)
 -- statistics: [C, ORDER] = confusionmat (GROUP, GROUPHAT)

     Compute a confusion matrix for classification problems

     ‘confusionmat’ returns the confusion matrix C for the group of actual
     values GROUP and the group of predicted values GROUPHAT.  The row indices
     of the confusion matrix represent actual values, while the column indices
     represent predicted values.  The indices are the same for both actual and
     predicted values, so the confusion matrix is a square matrix.  Each element
     of the matrix represents the number of matches between a given actual value
     (row index) and a given predicted value (column index), hence correct
     matches lie on the main diagonal of the matrix.  The order of the rows and
     columns is returned in ORDER.

     GROUP and GROUPHAT must have the same number of observations and the same
     data type.  Valid data types are numeric vectors, logical vectors,
     character arrays, string arrays, cell arrays of strings, and categorical
     arrays.

     The order of the rows and columns can be specified by setting the
     GROUPORDER variable.  The data type of GROUPORDER must be the same of GROUP
     and GROUPHAT.

     See also: crosstab.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Compute a confusion matrix for classification problems



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
cophenet


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1099
 -- statistics: [C, D] = cophenet (Z, Y)

     Compute the cophenetic correlation coefficient.

     The cophenetic correlation coefficient C of a hierarchical cluster tree Z
     is the linear correlation coefficient between the cophenetic distances D
     and the euclidean distances Y.

     It is a measure of the similarity between the distance of the leaves, as
     seen in the tree, and the distance of the original data points, which were
     used to build the tree.  When this similarity is greater, that is the
     coefficient is closer to 1, the tree renders an accurate representation of
     the distances between the original data points.

     Z is a hierarchical cluster tree, as the output of ‘linkage’.  Y is a
     vector of euclidean distances, as the output of ‘pdist’.

     The optional output D is a vector of cophenetic distances, in the same
     lower triangular format as Y.  The cophenetic distance between two data
     points is the height of the lowest common node of the tree.

     See also: cluster, dendrogram, inconsistent, linkage, pdist, squareform.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 47
Compute the cophenetic correlation coefficient.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
correlation_test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2394
 -- statistics: H = correlation_test (X, Y)
 -- statistics: [H, PVAL] = correlation_test (Y, X)
 -- statistics: [H, PVAL, STATS] = correlation_test (Y, X)
 -- statistics: [...] = correlation_test (Y, X, NAME, VALUE)

     Perform a correlation coefficient test to determine whether two samples X
     and Y come from uncorrelated populations.

     ‘H = correlation_test (Y, X)’ tests the null hypothesis that the two
     samples X and Y come from uncorrelated populations.  The result is H = 0 if
     the null hypothesis cannot be rejected at the 5% significance level, or H =
     1 if the null hypothesis can be rejected at the 5% level.  Y and X must be
     vectors of equal length with finite real numbers.

     The p-value of the test is returned in PVAL.  STATS is a structure with the
     following fields:
          Field                 Value
     ------------------------------------------------------------------------------------
          method                the type of correlation coefficient used for the test
          df                    the degrees of freedom (where applicable)
          corrcoef              the correlation coefficient
          stat                  the test's statistic
          dist                  the respective distribution for the test
          alt                   the alternative hypothesis for the test

     ‘[...] = correlation_test (..., NAME, VALUE)’ specifies one or more of the
     following name/value pairs:

          Name             Value
     -----------------------------------------------------------------------------------
          "alpha"          the significance level.  Default is 0.05.
                           
          "tail"           a string specifying the alternative hypothesis
              "both"               corrcoef is not 0 (two-tailed, default)
              "left"               corrcoef is less than 0 (left-tailed)
              "right"              corrcoef is greater than 0 (right-tailed)

          "method"         a string specifying the correlation coefficient used for
                           the test
              "pearson"            Pearson's product moment correlation (Default)
              "kendall"            Kendall's rank correlation tau
              "spearman"           Spearman's rank correlation rho

     See also: regression_ftest, regression_ttest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a correlation coefficient test to determine whether two samples X and...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
createns


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2721
 -- Function File: OBJ = createns (X)
 -- Function File: OBJ = createns (X, NAME, VALUE, ...)

     Create a nearest neighbor searcher object.

     ‘OBJ = createns (X)’ creates a nearest neighbor searcher object using the
     training data X.  By default, it constructs an ‘ExhaustiveSearcher’ object
     with the Euclidean distance metric.

     ‘OBJ = createns (X, NAME, VALUE, ...)’ allows customization of the searcher
     type and its properties through name-value pairs.  The following name-value
     pair is supported to specify the searcher type:

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "NSMethod"        Specifies the nearest neighbor search method.  Possible values
                       are:
                          • "exhaustive": Creates an ‘ExhaustiveSearcher’ object.
                          • "kdtree": Creates a ‘KDTreeSearcher’ object.
                          • "hnsw": Creates an ‘hnswSearcher’ object.
                       Default is "exhaustive".
                       

     Additional name-value pairs depend on the selected "NSMethod" and are
     passed directly to the constructor of the corresponding class:

        • For "exhaustive", see ‘ExhaustiveSearcher’ documentation for
          parameters like "Distance", "P", "Scale", and "Cov".
        • For "kdtree", see ‘KDTreeSearcher’ documentation for parameters like
          "Distance", "P", and "BucketSize".
        • For "hnsw", see ‘hnswSearcher’ documentation for parameters like
          "Distance", "P", "Scale", "Cov", "MaxNumLinksPerNode", and
          "TrainSetSize".

     *Input Arguments:*
        • X - Training data, specified as an NxP numeric matrix where rows
          represent observations and columns represent features.  Must be finite
          and numeric.

     *Output:*
        • OBJ - A nearest neighbor searcher object of type ‘ExhaustiveSearcher’,
          ‘KDTreeSearcher’, or ‘hnswSearcher’, depending on the specified
          "NSMethod".

     *Examples:*

          ## Create an ExhaustiveSearcher with default parameters
          X = [1, 2; 3, 4; 5, 6];
          obj = createns (X);

          ## Create a KDTreeSearcher with Euclidean distance
          obj = createns (X, "NSMethod", "kdtree", "Distance", "euclidean");

          ## Create an hnswSearcher with Minkowski distance and custom parameters
          obj = createns (X, "NSMethod", "hnsw", "Distance", "minkowski", "P", 3, "MaxNumLinksPerNode", 2);

     See also: ExhaustiveSearcher, KDTreeSearcher, hnswSearcher, knnsearch,
     rangesearch.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 42
Create a nearest neighbor searcher object.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
crosstab


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 611
 -- statistics: T = crosstab (X1, X2)
 -- statistics: T = crosstab (X1, ..., XN)
 -- statistics: [T, CHISQ, P, LABELS] = crosstab (...)

     Create a cross-tabulation (contingency table) T from data vectors.

     The inputs X1, X2, ...  XN must be vectors of equal length with a data type
     of numeric, logical, char array, categorical, strings, or cell array of
     character vectors.

     As additional return values ‘crosstab’ returns the chi-square statistics
     CHISQ, its p-value P and a cell array LABELS, containing the labels of each
     input argument.

     See also: grp2idx, tabulate.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 66
Create a cross-tabulation (contingency table) T from data vectors.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
crossval


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2164
 -- statistics: RESULTS = crossval (F, X, Y)
 -- statistics: RESULTS = crossval (F, X, Y, NAME, VALUE)

     Perform cross validation on given data.

     F should be a function that takes 4 inputs XTRAIN, YTRAIN, XTEST, YTEST,
     fits a model based on XTRAIN, YTRAIN, applies the fitted model to XTEST,
     and returns a goodness of fit measure based on comparing the predicted and
     actual YTEST.  ‘crossval’ returns an array containing the values returned
     by F for every cross-validation fold or resampling applied to the given
     data.

     X should be an N by M matrix of predictor values

     Y should be an N by 1 vector of predicand values

     Optional arguments may include name-value pairs as follows:

     "KFold"
          Divide set into K equal-size subsets, using each one successively for
          validation.

     "HoldOut"
          Divide set into two subsets, training and validation.  If the value K
          is a fraction, that is the fraction of values put in the validation
          subset (by default K=0.1); if it is a positive integer, that is the
          number of values in the validation subset.

     "LeaveOut"
          Leave-one-out partition (each element is placed in its own subset).
          The value is ignored, but it is required.

     "Partition"
          The value should be a CVPARTITION object.

     "Given"
          The value should be an N by 1 vector specifying in which partition to
          put each element.

     "stratify"
          The value should be an N by 1 vector containing class designations for
          the elements, in which case the "KFold" and "HoldOut" partitionings
          attempt to ensure each partition represents the classes
          proportionately.

     "mcreps"
          The value should be a positive integer specifying the number of times
          to resample based on different partitionings.  Currently only works
          with the partition type "HoldOut".

     Only one of "KFold", "HoldOut", "LeaveOut", "Given", "Partition" should be
     specified.  If none is specified, the default is "KFold" with K = 10.

     See also: cvpartition.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 39
Perform cross validation on given data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
datasample


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1166
 -- statistics: Y = datasample (DATA, K)
 -- statistics: Y = datasample (DATA, K, DIM)
 -- statistics: Y = datasample (..., NAME, VALUE)
 -- statistics: [Y IDCS] = datasample (...)

     Randomly sample data.

     Return K observations randomly sampled from DATA.  DATA can be a vector or
     a matrix of any data.  When DATA is a matrix or a n-dimensional array, the
     samples are the subarrays of size n - 1, taken along the dimension DIM.
     The default value for DIM is 1, that is the row vectors when sampling a
     matrix.

     Output Y is the returned sampled data.  Optional output IDCS is the vector
     of the indices to build Y from DATA.

     Additional options are set through pairs of parameter name and value.
     Available parameters are:

     ‘Replace’
          a logical value that can be ‘true’ (default) or ‘false’: when set to
          ‘true’, ‘datasample’ returns data sampled with replacement.

     ‘Weights’
          a vector of positive numbers that sets the probability of each
          element.  It must have the same size as DATA along dimension DIM.

See also: rand, randi, randperm, randsample.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 21
Randomly sample data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4
dcov


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 713
 -- statistics: [DCOR, DCOV, DVARX, DVARY] = dcov (X, Y)

     Distance correlation, covariance and correlation statistics.

     It returns the distance correlation (DCOR) and the distance covariance
     (DCOV) between X and Y, the distance variance of X in (DVARX) and the
     distance variance of Y in (DVARY).

     X and Y must have the same number of observations (rows) but they can have
     different number of dimensions (columns).  Rows with missing values (NaN)
     in either X or Y are omitted.

     The Brownian covariance is the same as the distance covariance:

     cov_W (X, Y) = dCov (X, Y)

     and thus Brownian correlation is the same as distance correlation.

     See also: corr, cov.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 60
Distance correlation, covariance and correlation statistics.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
dendrogram


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2277
 -- statistics: dendrogram (TREE)
 -- statistics: dendrogram (TREE, P)
 -- statistics: dendrogram (TREE, PROP, VAL)
 -- statistics: dendrogram (TREE, P, PROP, VAL )
 -- statistics: H = dendrogram (...)
 -- statistics: [H, T, PERM] = dendrogram (...)

     Plot a dendrogram of a hierarchical binary cluster tree.

     Given TREE, a hierarchical binary cluster tree as the output of ‘linkage’,
     plot a dendrogram of the tree.  The number of leaves shown by the
     dendrogram plot is limited to P.  The default value for P is 30.  Set P to
     0 to plot all leaves.

     The optional outputs are H, T and PERM:
        • H is a handle to the lines of the plot.

        • T is the vector with the numbers assigned to each leaf.  Each element
          of T is a leaf of TREE and its value is the number shown in the plot.
          When the dendrogram plot is collapsed, that is when the number of
          shown leaves P is inferior to the total number of leaves, a single
          leaf of the plot can represent more than one leaf of TREE: in that
          case multiple elements of T share the same value, that is the same
          leaf of the plot.  When the dendrogram plot is not collapsed, each
          leaf of the plot is the leaf of TREE with the same number.

        • PERM is the vector list of the leaves as ordered as in the plot.

     Additional input properties can be specified by pairs of properties and
     values.  Known properties are:
        • "Reorder" Reorder the leaves of the dendrogram plot using a numerical
          vector of size n, the number of leaves.  When P is smaller than N, the
          reordering cannot break the P groups of leaves.

        • "Orientation" Change the orientation of the plot.  Available values:
          top (default), bottom, left, right.

        • "CheckCrossing" Check if the lines of a reordered dendrogram cross
          each other.  Available values: true (default), false.

        • "ColorThreshold" Not implemented.

        • "Labels" Use a char, string or cellstr array of size N to set the
          label for each leaf; the label is displayed only for nodes with just
          one leaf.

     See also: cluster, clusterdata, cophenet, inconsistent, linkage, pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 56
Plot a dendrogram of a hierarchical binary cluster tree.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
dummyvar


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1542
 -- statistics: D = dummyvar (GROUP)

     Create dummy variables.

     ‘D = dummyvar (GROUP)’ returns a matrix D containing the dummy variables
     associated with the grouping variables in GROUP.  Each row in D corresponds
     to the same observation across all variables in GROUP and each column in D
     corresponds to a separate dummy variable.  D is a numeric matrix of double
     data type containing ones and zeros.

     The grouping variable in GROUP can be specified in one of the following
     options:

        • a positive integer vector representing the different group levels in
          the ordered range ‘1:max (GROUP)’.

        • a positive integer matrix with each column corresponding to a separate
          grouping variable and the integer values representing the group levels
          within that grouping variable in the ordered range ‘1:max (GROUP)’.

        • a categorical column vector, in which case the number and order of
          columns in D correspond to the categories returned by ‘categories
          (GROUP)’.  Categories that are defined but not present in GROUP
          produce columns of zeros.  Elements of GROUP that are ‘<undefined>’
          result in rows of ‘NaN’ values in D.

        • a cell array with its elements containing grouping variables specified
          as any of the above options.  Note that all grouping variables in the
          cell array must have the same number of observations.

     See also: tabulate, grp2idx, grpstats.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 23
Create dummy variables.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4
ecdf


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2357
 -- statistics: [F, X] = ecdf (Y)
 -- statistics: [F, X, FLO, FUP] = ecdf (Y)
 -- statistics: ecdf (...)
 -- statistics: ecdf (AX, ...)
 -- statistics: [...] = ecdf (Y, NAME, VALUE, ...)
 -- statistics: [...] = ecdf (AX, Y, NAME, VALUE, ...)

     Empirical (Kaplan-Meier) cumulative distribution function.

     ‘[F, X] = ecdf (Y)’ calculates the Kaplan-Meier estimate of the cumulative
     distribution function (cdf), also known as the empirical cdf.  Y is a
     vector of data values.  F is a vector of values of the empirical cdf
     evaluated at X.

     ‘[F, X, FLO, FUP] = ecdf (Y)’ also returns lower and upper confidence
     bounds for the cdf.  These bounds are calculated using Greenwood's formula,
     and are not simultaneous confidence bounds.

     ‘ecdf (...)’ without output arguments produces a plot of the empirical cdf.

     ‘ecdf (AX, ...)’ plots into existing axes AX.

     ‘[...] = ecdf (Y, NAME, VALUE, ...)’ specifies additional parameter
     name/value pairs chosen from the following:

     NAME             VALUE
     ----------------------------------------------------------------------------------
     "censoring"      A boolean vector of the same size as Y that is 1 for
                      observations that are right-censored and 0 for observations
                      that are observed exactly.  Default is all observations
                      observed exactly.
                      
     "frequency"      A vector of the same size as Y containing non-negative integer
                      counts.  The jth element of this vector gives the number of
                      times the jth element of Y was observed.  Default is 1
                      observation per Y element.
                      
     "alpha"          A value ALPHA between 0 and 1 specifying the significance
                      level.  Default is 0.05 for 5% significance.
                      
     "function"       The type of function returned as the F output argument, chosen
                      from "cdf" (the default), "survivor", or "cumulative hazard".
                      
     "bounds"         Either "on" to include bounds or "off" (the default) to omit
                      them.  Used only for plotting.

     Type ‘demo ecdf’ to see examples of usage.

     See also: cdfplot, ecdfhist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 58
Empirical (Kaplan-Meier) cumulative distribution function.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
einstein


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1436
 -- statistics: einstein ()
 -- statistics: TILES = einstein (A, B)
 -- statistics: [TILES, RHAT] = einstein (A, B)
 -- statistics: [TILES, RHAT, THAT] = einstein (A, B)
 -- statistics: [TILES, RHAT, THAT, SHAT] = einstein (A, B)
 -- statistics: [TILES, RHAT, THAT, SHAT, PHAT] = einstein (A, B)
 -- statistics: [TILES, RHAT, THAT, SHAT, PHAT, FHAT] = einstein (A, B)

     Plots the tiling of the basic clusters of einstein tiles.

     Scalars A and B define the shape of the einstein tile.  See Smith et al
     (2023) for details: <https://arxiv.org/abs/2303.10798>

        • TILES is a structure containing the coordinates of the einstein tiles
          that are tiled on the plot.  Each field contains the tile coordinates
          of the corresponding clusters.
             • TILES.rhat contains the reflected einstein tiles
             • TILES.that contains the three-hat shells
             • TILES.shat contains the single-hat clusters
             • TILES.phat contains the paired-hat clusters
             • TILES.fhat contains the fylfot clusters

        • RHAT contains the coordinates of the first reflected tile
        • THAT contains the coordinates of the first three-hat shell
        • SHAT contains the coordinates of the first single-hat cluster
        • PHAT contains the coordinates of the first paired-hat cluster
        • FHAT contains the coordinates of the first fylfot cluster


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Plots the tiling of the basic clusters of einstein tiles.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 12
evalclusters


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4758
 -- statistics: EVA = evalclusters (X, CLUST, CRITERION)
 -- statistics: EVA = evalclusters (..., Name, Value)

     Create a clustering evaluation object to find the optimal number of
     clusters.

     ‘evalclusters’ creates a clustering evaluation object to evaluate the
     optimal number of clusters for data X, using criterion CRITERION.  The
     input data X is a matrix with ‘n’ observations of ‘p’ variables.  The
     evaluation criterion CRITERION is one of the following:
     ‘CalinskiHarabasz’
          to create a ‘CalinskiHarabaszEvaluation’ object.

     ‘DaviesBouldin’
          to create a ‘DaviesBouldinEvaluation’ object.

     ‘gap’
          to create a ‘GapEvaluation’ object.

     ‘silhouette’
          to create a ‘SilhouetteEvaluation’ object.

     The clustering algorithm CLUST is one of the following:
     ‘kmeans’
          to cluster the data using ‘kmeans’ with ‘EmptyAction’ set to
          ‘singleton’ and ‘Replicates’ set to 5.

     ‘linkage’
          to cluster the data using ‘clusterdata’ with ‘linkage’ set to ‘Ward’.

     ‘gmdistribution’
          to cluster the data using ‘fitgmdist’ with ‘SharedCov’ set to ‘true’
          and ‘Replicates’ set to 5.

     If the CRITERION is ‘CalinskiHarabasz’, ‘DaviesBouldin’, or ‘silhouette’,
     CLUST can also be a function handle to a function of the form ‘c = clust(x,
     k)’, where X is the input data, K the number of clusters to evaluate and C
     the clustering result.  The clustering result can be either an array of
     size ‘n’ with ‘k’ different integer values, or a matrix of size ‘n’ by ‘k’
     with a likelihood value assigned to each one of the ‘n’ observations for
     each one of the K clusters.  In the latter case, each observation is
     assigned to the cluster with the higher value.  If the CRITERION is
     ‘CalinskiHarabasz’, ‘DaviesBouldin’, or ‘silhouette’, CLUST can also be a
     matrix of size ‘n’ by ‘k’, where ‘k’ is the number of proposed clustering
     solutions, so that each column of CLUST is a clustering solution.

     In addition to the obligatory X, CLUST and CRITERION inputs there is a
     number of optional arguments, specified as pairs of ‘Name’ and ‘Value’
     options.  The known ‘Name’ arguments are:
     ‘KList’
          a vector of positive integer numbers, that is the cluster sizes to
          evaluate.  This option is necessary, unless CLUST is a matrix of
          proposed clustering solutions.

     ‘Distance’
          a distance metric as accepted by the chosen CLUST.  It can be the name
          of the distance metric as a string or a function handle.  When
          CRITERION is ‘silhouette’, it can be a vector as created by function
          ‘pdist’.  Valid distance metric strings are: ‘sqEuclidean’ (default),
          ‘Euclidean’, ‘cityblock’, ‘cosine’, ‘correlation’, ‘Hamming’,
          ‘Jaccard’.  Only used by ‘silhouette’ and ‘gap’ evaluation.

     ‘ClusterPriors’
          the prior probabilities of each cluster, which can be either
          ‘empirical’ (default), or ‘equal’.  When ‘empirical’ the silhouette
          value is the average of the silhouette values of all points; when
          ‘equal’ the silhouette value is the average of the average silhouette
          value of each cluster.  Only used by ‘silhouette’ evaluation.

     ‘B’
          the number of reference datasets generated from the reference
          distribution.  Only used by ‘gap’ evaluation.

     ‘ReferenceDistribution’
          the reference distribution used to create the reference data.  It can
          be ‘PCA’ (default) for a distribution based on the principal
          components of X, or ‘uniform’ for a uniform distribution based on the
          range of the observed data.  ‘PCA’ is currently not implemented.  Only
          used by ‘gap’ evaluation.

     ‘SearchMethod’
          the method for selecting the optimal value with a ‘gap’ evaluation.
          It can be either ‘globalMaxSE’ (default) for selecting the smallest
          number of clusters which is inside the standard error of the maximum
          gap value, or ‘firstMaxSE’ for selecting the first number of clusters
          which is inside the standard error of the following cluster number.
          Only used by ‘gap’ evaluation.

     Output EVA is a clustering evaluation object.

See also: CalinskiHarabaszEvaluation, DaviesBouldinEvaluation, GapEvaluation,
SilhouetteEvaluation.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 77
Create a clustering evaluation object to find the optimal number of clusters.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
factoran


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1446
 -- statistics: LOADINGS = factoran (X, NFAC)
 -- statistics: [LOADINGS, SPECVAR] = factoran (X, NFAC)
 -- statistics: [LOADINGS, SPECVAR, FSCORES] = factoran (X, NFAC)

     Perform principal axis factor analysis on data matrix.

     ‘LOADINGS = factoran (X, NFAC)’ performs principal axis factoring to
     extract NFAC factors from the N x P data matrix X, where rows correspond to
     observations and columns to variables.  The output LOADINGS is a P x NFAC
     matrix whose columns contain the loadings on each factor, in decreasing
     order of importance.

     ‘[LOADINGS, SPECVAR] = factoran (...)’ also returns a P x 1 vector SPECVAR
     containing the specific variances (unique variances) for each variable.

     ‘[LOADINGS, SPECVAR, FSCORES] = factoran (...)’ also returns the N x NFAC
     matrix FSCORES of estimated factor scores, computed using the regression
     method.

     The analysis is performed on the correlation matrix of the standardized X.
     Initial communalities are set to 1.  Iterations continue until the maximum
     change in communality is less than 1e-4 or 50 iterations are reached.  The
     sign of each loading vector is chosen so that the element with largest
     absolute value is positive.

     References
     ----------

       1. Harman, H. H., Modern Factor Analysis, 3rd Edition, University of
          Chicago Press, 1976.

     See also: barttest, pca, pcacov, pcares.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Perform principal axis factor analysis on data matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4
ff2n


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 434
 -- statistics: DFF2 = ff2n (N)

     Two-level full factorial design.

     ‘DFF2 = ff2n (N)’ gives factor settings dFF2 for a two-level full factorial
     design with n factors.  DFF2 is m-by-n, where m is the number of treatments
     in the full-factorial design.  Each row of DFF2 corresponds to a single
     treatment.  Each column contains the settings for a single factor, with
     values of 0 and 1 for the two levels.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 32
Two-level full factorial design.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
fillmissing


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8310
 -- statistics: B = fillmissing (A, "constant", V)
 -- statistics: B = fillmissing (A, METHOD)
 -- statistics: B = fillmissing (A, MOVE_METHOD, WINDOW_SIZE)
 -- statistics: B = fillmissing (A, FILL_FUNCTION, WINDOW_SIZE)
 -- statistics: B = fillmissing (..., DIM)
 -- statistics: B = fillmissing (..., PROPERTYNAME, PROPERTYVALUE)
 -- statistics: [B, IDX] = fillmissing (...)

     Fill missing data in arrays.

     Replace missing entries of array A either with values in V or as determined
     by other specified methods.  'missing' values are determined by the data
     type of A as identified by the function ismissing, currently defined as:

     Standard missing values and their corresponding data types are:

        • NaN - for double, single, duration, and calendarDuration arrays.
        • NaT - for datetime arrays.
        • <missing> - for string arrays.
        • <undefined> - for categorical arrays.
        • {0x0 char} - for cell arrays of character vectors.

     For any data types that do not support missing values, ‘ismissing’ returns
     ‘TF = false (size (A))’.

     A can be a numeric scalar or array, a character vector or array, or a cell
     array of character vectors (a.k.a.  string cells).

     V can be a scalar or an array containing values for replacing the missing
     values in A with a compatible data type for insertion into A.  The shape of
     V must be a scalar or an array with number of elements in V equal to the
     number of elements orthogonal to the operating dimension.  E.g., if
     ‘size(A)’ = [3 5 4], operating along ‘dim’ = 2 requires V to contain either
     1 or 3x4=12 elements.

     If requested, the optional output IDX will contain a logical array the same
     shape as A indicating with 1's which locations in A were filled.

     Alternate Input Arguments and Values:
        • METHOD - replace missing values with:

          ‘next’
          ‘previous’
          ‘nearest’
               next, previous, or nearest non-missing value (nearest defaults to
               next when equidistant as determined by ‘SamplePoints’.)

          ‘linear’
               linear interpolation of neighboring, non-missing values

          ‘spline’
               piecewise cubic spline interpolation of neighboring, non-missing
               values

          ‘pchip’
               'shape preserving' piecewise cubic spline interpolation of
               neighboring, non-missing values

        • MOVE_METHOD - moving window calculated replacement values:

          ‘movmean’
          ‘movmedian’
               moving average or median using a window determined by
               WINDOW_SIZE.  WINDOW_SIZE must be either a positive scalar value
               or a two element positive vector of sizes ‘[NB, NA]’ measured in
               the same units as ‘SamplePoints’.  For scalar values, the window
               is centered on the missing element and includes all data points
               within a distance of half of WINDOW_SIZE on either side of the
               window center point.  Note that for compatibility, when using a
               scalar value, the backward window limit is inclusive and the
               forward limit is exclusive.  If a two-element WINDOW_SIZE vector
               is specified, the window includes all points within a distance of
               NB backward and NA forward from the current element at the window
               center (both limits inclusive).

        • FILL_FUNCTION - custom method specified as a function handle.  The
          supplied fill function must accept three inputs in the following order
          for each missing gap in the data:
          A_VALUES -
               elements of A within the window on either side of the gap as
               determined by WINDOW_SIZE.  (Note these elements can include
               missing values from other nearby gaps.)
          A_LOCS -
               locations of the reference data, A_VALUES, in terms of the
               default or specified ‘SamplePoints’.
          GAP_LOCS -
               location of the gap data points that need to be filled in terms
               of the default or specified ‘SamplePoints’.

          The supplied function must return a scalar or vector with the same
          number of elements in GAP_LOCS.  The required WINDOW_SIZE parameter
          follows similar rules as for the moving average and median methods
          described above, with the two exceptions that (1) each gap is
          processed as a single element, rather than gap elements being
          processed individually, and (2) the window extended on either side of
          the gap has inclusive endpoints regardless of how WINDOW_SIZE is
          specified.

        • DIM - specify a dimension for vector operation (default = first
          non-singeton dimension)

        • PROPERTYNAME-PROPERTYVALUE pairs
          ‘SamplePoints’
               PROPERTYVALUE is a vector of sample point values representing the
               sorted and unique x-axis values of the data in A.  If
               unspecified, the default is assumed to be the vector [1 : SIZE
               (A, DIM)].  The values in ‘SamplePoints’ will affect methods and
               properties that rely on the effective distance between data
               points in A, such as interpolants and moving window functions
               where the WINDOW_SIZE specified for moving window functions is
               measured relative to the ‘SamplePoints’.

          ‘EndValues’
               Apply a separate handling method for missing values at the front
               or back of the array.  PROPERTYVALUE can be:
                  • A constant scalar or array with the same shape requirements
                    as V.
                  • ‘none’ - Do not fill end gap values.
                  • ‘extrap’ - Use the same procedure as METHOD to fill the end
                    gap values.
                  • Any valid METHOD listed above except for ‘movmean’,
                    ‘movmedian’, and ‘fill_function’.  Those methods can only be
                    applied to end gap values with ‘extrap’.

          ‘MissingLocations’
               PROPERTYVALUE must be a logical array the same size as A
               indicating locations of known missing data with a value of
               ‘true’.  (cannot be combined with MaxGap)

          ‘MaxGap’
               PROPERTYVALUE is a numeric scalar indicating the maximum gap
               length to fill, and assumes the same distance scale as the sample
               points.  Gap length is calculated by the difference in locations
               of the sample points on either side of the gap, and gaps larger
               than MaxGap are ignored by FILLMISSING.  (cannot be combined with
               MissingLocations)

     Compatibility Notes:
        • Numerical and logical inputs for A and V may be specified in any
          combination.  The output will be the same class as A, with the V
          converted to that data type for filling.  Only ‘single’ and ‘double’
          have defined 'missing' values, so except for when the
          ‘missinglocations’ option specifies the missing value identification
          of logical and other numeric data types, the output will always be ‘B
          = A’ with ‘IDX = false(size(A))’.
        • All interpolation methods can be individually applied to ‘EndValues’.
        • MATLAB's FILL_FUNCTION method currently has several inconsistencies
          with the other methods (tested against version 2022a), and Octave's
          implementation has chosen the following consistent behavior over
          compatibility: (1) a column full of missing data is considered part of
          ‘EndValues’, (2) such columns are then excluded from FILL_FUNCTION
          processing because the moving window is always empty.  (3) operation
          in dimensions higher than 2 perform identically to operations in dims
          1 and 2, most notable on vectors.

     See also: ismissing, rmmissing, standardizeMissing.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 28
Fill missing data in arrays.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
fishertest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2640
 -- statistics: H = fishertest (X)
 -- statistics: H = fishertest (X, PARAM1, VALUE1, ...)
 -- statistics: [H, PVAL] = fishertest (...)
 -- statistics: [H, PVAL, STATS] = fishertest (...)

     Fisher's exact test.

     ‘H = fishertest (X)’ performs Fisher's exact test on a 2x2 contingency
     table given in matrix X.  This is a test of the hypothesis that there are
     no non-random associations between the two 2-level categorical variables in
     X.  ‘fishertest’ returns the result of the tested hypothesis in H.  H = 0
     indicates that the null hypothesis (of no association) cannot be rejected
     at the 5% significance level.  H = 1 indicates that the null hypothesis can
     be rejected at the 5% level.  X must contain only non-negative integers.
     Use the ‘crosstab’ function to generate the contingency table from samples
     of two categorical variables.  Fisher's exact test is not suitable when all
     integers in X are very large.  User can use the Chi-square test in this
     case.

     ‘[H, PVAL] = fishertest (X)’ returns the p-value in PVAL.  That is the
     probability of observing the given result, or one more extreme, by chance
     if the null hypothesis is true.  Small values of PVAL cast doubt on the
     validity of the null hypothesis.

     ‘[P, PVAL, STATS] = fishertest (...)’ returns the structure STATS with the
     following fields:

          OddsRatio                - the odds ratio
          ConfidenceInterval       - the asymptotic confidence interval for the odds
                                   ratio.  If any of the four entries in the
                                   contingency table X is zero, the confidence
                                   interval will not be computed, and [-Inf Inf] will
                                   be displayed.

     ‘[...] = fishertest (..., NAME, VALUE, ...)’ specifies one or more of the
     following name/value pairs:

          Name             Value
     -----------------------------------------------------------------------------------
          "alpha"          the significance level.  Default is 0.05.
                           
          "tail"           a string specifying the alternative hypothesis
              "both"               odds ratio not equal to 1, indicating association
                                   between two variables (two-tailed test, default)
              "left"               odds ratio greater than 1 (right-tailed test)
              "right"              odds ratio is less than 1 (left-tailed test)

     See also: crosstab, chi2test, mcnemar_test, ztest2.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 20
Fisher's exact test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
fitcdiscr


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3812
 -- statistics: MDL = fitcdiscr (X, Y)
 -- statistics: MDL = fitcdiscr (..., NAME, VALUE)

     Fit a Linear Discriminant Analysis classification model.

     ‘MDL = fitcdiscr (X, Y)’ returns a Linear Discriminant Analysis (LDA)
     classification model, MDL, with X being the predictor data, and Y the class
     labels of observations in X.

        • ‘X’ must be a NxP numeric matrix of predictor data where rows
          correspond to observations and columns correspond to features or
          variables.
        • ‘Y’ is Nx1 matrix or cell matrix containing the class labels of
          corresponding predictor data in X.  Y can be numerical, logical, char
          array or cell array of character vectors.  Y must have same number of
          rows as X.

     ‘MDL = fitcdiscr (..., NAME, VALUE)’ returns a Linear Discriminant Analysis
     model with additional options specified by Name-Value pair arguments listed
     below.

     Model Parameters
     ----------------

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "PredictorNames"  A cell array of character vectors specifying the names of the
                       predictors.  The length of this array must match the number of
                       columns in X.
                       
     "ResponseName"    A character vector specifying the name of the response
                       variable.
                       
     "ClassNames"      Names of the classes in the class labels, Y, used for fitting
                       the Discriminant model.  ClassNames are of the same type as
                       the class labels in Y.
                       
     "Prior"           A numeric vector specifying the prior probabilities for each
                       class.  The order of the elements in Prior corresponds to the
                       order of the classes in ClassNames.  Alternatively, you can
                       specify "empirical" to use the empirical class probabilities
                       or "uniform" to assume equal class probabilities.
                       
     "Cost"            A NxR numeric matrix containing misclassification cost for the
                       corresponding instances in X where R is the number of unique
                       categories in Y.  If an instance is correctly classified into
                       its category the cost is calculated to be 1, otherwise 0.
                       cost matrix can be altered use ‘MDL.COST = somecost’.  default
                       value COST = ones(rows(X),numel(unique(Y))).
                       
     "DiscrimType"     A character vector or string scalar specifying the type of
                       discriminant analysis to perform.  The only supported value is
                       "linear".
                       
     "FillCoeffs"      A character vector or string scalar with values "on" or "off"
                       specifying whether to fill the coefficients after fitting.  If
                       set to "on", the coefficients are computed during model
                       fitting, which can be useful for prediction.
                       
     "Gamma"           A numeric scalar specifying the regularization parameter for
                       the covariance matrix.  It adjusts the linear discriminant
                       analysis to make the model more stable in the presence of
                       multicollinearity or small sample sizes.  A value of 0
                       corresponds to no regularization, while a value of 1
                       corresponds to a completely regularized model.
                       

     See also: ClassificationDiscriminant.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 56
Fit a Linear Discriminant Analysis classification model.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
fitcgam


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6423
 -- statistics: MDL = fitcgam (X, Y)
 -- statistics: MDL = fitcgam (..., NAME, VALUE)

     Fit a Generalized Additive Model (GAM) for binary classification.

     ‘MDL = fitcgam (X, Y)’ returns a GAM classification model, MDL, with X
     being the predictor data, and Y the binary class labels of observations in
     X.

        • ‘X’ must be a NxP numeric matrix of predictor data where rows
          correspond to observations and columns correspond to features or
          variables.
        • ‘Y’ is Nx1 numeric vector containing binary class labels, typically 0
          or 1.

     ‘MDL = fitcgam (..., NAME, VALUE)’ returns a GAM classification model with
     additional options specified by Name-Value pair arguments listed below.

     Model Parameters
     ----------------

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "PredictorNames"  A cell array of character vectors specifying the names of the
                       predictors.  The length of this array must match the number of
                       columns in X.
                       
     "ResponseName"    A character vector specifying the name of the response
                       variable.
                       
     "ClassNames"      Names of the classes in the class labels, Y, used for fitting
                       the Discriminant model.  ClassNames are of the same type as
                       the class labels in Y.
                       
     "Cost"            A NxR numeric matrix containing misclassification cost for the
                       corresponding instances in X where R is the number of unique
                       categories in Y.  If an instance is correctly classified into
                       its category the cost is calculated to be 1, otherwise 0.
                       cost matrix can be altered use ‘MDL.COST = somecost’.  default
                       value COST = ones(rows(X),numel(unique(Y))).
                       
     "Formula"         A model specification given as a string in the form "Y ~
                       terms" where Y represents the response variable and terms the
                       predictor variables.  The formula can be used to specify a
                       subset of variables for training model.  For example: "Y ~ x1
                       + x2 + x3 + x4 + x1:x2 + x2:x3" specifies four linear terms
                       for the first four columns of for predictor data, and x1:x2
                       and x2:x3 specify the two interaction terms for 1st-2nd and
                       3rd-4th columns respectively.  Only these terms will be used
                       for training the model, but X must have at least as many
                       columns as referenced in the formula.  If Predictor Variable
                       names have been defined, then the terms in the formula must
                       reference to those.  When "formula" is specified, all terms
                       used for training the model are referenced in the IntMatrix
                       field of the OBJ class object as a matrix containing the
                       column indexes for each term including both the predictors and
                       the interactions used.
                       
     "Interactions"    A logical matrix, a positive integer scalar, or the string
                       "all" for defining the interactions between predictor
                       variables.  When given a logical matrix, it must have the same
                       number of columns as X and each row corresponds to a different
                       interaction term combining the predictors indexed as true.
                       Each interaction term is appended as a column vector after the
                       available predictor column in X.  When "all" is defined, then
                       all possible combinations of interactions are appended in X
                       before training.  At the moment, parsing a positive integer
                       has the same effect as the "all" option.  When "interactions"
                       is specified, only the interaction terms appended to X are
                       referenced in the IntMatrix field of the OBJ class object.
                       
     "Knots"           A scalar or a row vector with the same columns as X.  It
                       defines the knots for fitting a polynomial when training the
                       GAM.  As a scalar, it is expanded to a row vector.  The
                       default value is 5, hence expanded to ones (1, columns (X)) *
                       5.  You can parse a row vector with different number of knots
                       for each predictor variable to be fitted with, although not
                       recommended.
                       
     "Order"           A scalar or a row vector with the same columns as X.  It
                       defines the order of the polynomial when training the GAM. As
                       a scalar, it is expanded to a row vector.  The default values
                       is 3, hence expanded to ones (1, columns (X)) * 3.  You can
                       parse a row vector with different number of polynomial order
                       for each predictor variable to be fitted with, although not
                       recommended.
                       
     "DoF"             A scalar or a row vector with the same columns as X.  It
                       defines the degrees of freedom for fitting a polynomial when
                       training the GAM. As a scalar, it is expanded to a row vector.
                       The default value is 8, hence expanded to ones (1, columns
                       (X)) * 8.  You can parse a row vector with different degrees
                       of freedom for each predictor variable to be fitted with,
                       although not recommended.
                       
     You can parse either a "Formula" or an "Interactions" optional parameter.
     Parsing both parameters will result an error.  Accordingly, you can only
     pass up to two parameters among "Knots", "Order", and "DoF" to define the
     required polynomial for training the GAM model.

     See also: ClassificationGAM.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 65
Fit a Generalized Additive Model (GAM) for binary classification.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
fitcknn


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 15459
 -- statistics: MDL = fitcknn (X, Y)
 -- statistics: MDL = fitcknn (..., NAME, VALUE)

     Fit a k-Nearest Neighbor classification model.

     ‘MDL = fitcknn (X, Y)’ returns a k-Nearest Neighbor classification model,
     MDL, with X being the predictor data, and Y the class labels of
     observations in X.

        • ‘X’ must be a NxP numeric matrix of predictor data where rows
          correspond to observations and columns correspond to features or
          variables.
        • ‘Y’ is Nx1 matrix or cell matrix containing the class labels of
          corresponding predictor data in X.  Y can be numerical, logical, char
          array or cell array of character vectors.  Y must have same number of
          rows as X.

     ‘MDL = fitcknn (..., NAME, VALUE)’ returns a k-Nearest Neighbor
     classification model with additional options specified by Name-Value pair
     arguments listed below.

     Model Parameters
     ----------------

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "Standardize"     A boolean flag indicating whether the data in X should be
                       standardized prior to training.
                       
     "PredictorNames"  A cell array of character vectors specifying the predictor
                       variable names.  The variable names are assumed to be in the
                       same order as they appear in the training data X.
                       
     "ResponseName"    A character vector specifying the name of the response
                       variable.
                       
     "ClassNames"      Names of the classes in the class labels, Y, used for fitting
                       the kNN model.  ClassNames are of the same type as the class
                       labels in Y.
                       
     "Prior"           A numeric vector specifying the prior probabilities for each
                       class.  The order of the elements in Prior corresponds to the
                       order of the classes in ClassNames.
                       
     "Cost"            A NxR numeric matrix containing misclassification cost for the
                       corresponding instances in X where R is the number of unique
                       categories in Y.  If an instance is correctly classified into
                       its category the cost is calculated to be 1, otherwise 0.
                       cost matrix can be altered use ‘MDL.COST = somecost’.  default
                       value COST = ones(rows(X),numel(unique(Y))).
                       
     "ScoreTransform"  A character vector defining one of the following functions or
                       a user defined function handle, which is used for transforming
                       the prediction scores returned by the ‘predict’ and
                       ‘resubPredict’ methods.  Default value is 'none'.

          VALUE            DESCRIPTION
     -----------------------------------------------------------------------------------
          "doublelogit"    1 ./ (1 + exp (-2 * x))
          "invlogit"       log (x ./ (1 - x))
          "ismax"          Sets the score for the class with the largest score to 1,
                           and sets the scores for all other classes to 0
          "logit"          1 ./ (1 + exp (-x))
          "none"           x (no transformation)
          "identity"       x (no transformation)
          "sign"           -1 for x < 0, 0 for x = 0, 1 for x > 0
          "symmetric"      2 * x - 1
          "symmetricismax" Sets the score for the class with the largest score to 1,
                           and sets the scores for all other classes to -1
          "symmetriclogit" 2 ./ (1 + exp (-x)) - 1

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "BreakTies"       Tie-breaking algorithm used by predict when multiple classes
                       have the same smallest cost.  By default, ties occur when
                       multiple classes have the same number of nearest points among
                       the k nearest neighbors.  The available options are specified
                       by the following character arrays:

          VALUE            DESCRIPTION
                           
     -----------------------------------------------------------------------------------
          "smallest"       This is the default and it favors the class with the
                           smallest index among the tied groups, i.e.  the one that
                           appears first in the training labelled data.
          "nearest"        This favors the class with the nearest neighbor among the
                           tied groups, i.e.  the class with the closest member point
                           according to the distance metric used.
          "random"         This randomly picks one class among the tied groups.

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "BucketSize"      The maximum number of data points in the leaf node of the
                       Kd-tree and it must be a positive integer.  By default, it is
                       50.  This argument is meaningful only when the selected search
                       method is "kdtree".
                       
     "NumNeighbors"    A positive integer value specifying the number of nearest
                       neighbors to be found in the kNN search.  By default, it is 1.
                       
     "Exponent"        A positive scalar (usually an integer) specifying the
                       Minkowski distance exponent.  This argument is only valid when
                       the selected distance metric is "minkowski".  By default it is
                       2.
                       
     "Scale"           A nonnegative numeric vector specifying the scale parameters
                       for the standardized Euclidean distance.  The vector length
                       must be equal to the number of columns in X.  This argument is
                       only valid when the selected distance metric is "seuclidean",
                       in which case each coordinate of X is scaled by the
                       corresponding element of "scale", as is each query point in Y.
                       By default, the scale parameter is the standard deviation of
                       each coordinate in X.  If a variable in X is constant, i.e.
                       zero variance, this value is forced to 1 to avoid division by
                       zero.  This is the equivalent of this variable not being
                       standardized.
                       
     "Cov"             A square matrix with the same number of columns as X
                       specifying the covariance matrix for computing the mahalanobis
                       distance.  This must be a positive definite matrix matching.
                       This argument is only valid when the selected distance metric
                       is "mahalanobis".
                       
     "Distance"        is the distance metric used by ‘knnsearch’ as specified below:

          VALUE            DESCRIPTION
                           
     -----------------------------------------------------------------------------------
          "euclidean"      Euclidean distance.
          "seuclidean"     standardized Euclidean distance.  Each coordinate
                           difference between the rows in X and the query matrix Y is
                           scaled by dividing by the corresponding element of the
                           standard deviation computed from X.  To specify a
                           different scaling, use the "Scale" name-value argument.
          "cityblock"      City block distance.
          "chebychev"      Chebychev distance (maximum coordinate difference).
          "minkowski"      Minkowski distance.  The default exponent is 2.  To
                           specify a different exponent, use the "P" name-value
                           argument.
          "mahalanobis"    Mahalanobis distance, computed using a positive definite
                           covariance matrix.  To change the value of the covariance
                           matrix, use the "Cov" name-value argument.
          "cosine"         Cosine distance.
          "correlation"    One minus the sample linear correlation between
                           observations (treated as sequences of values).
          "spearman"       One minus the sample Spearman's rank correlation between
                           observations (treated as sequences of values).
          "hamming"        Hamming distance, which is the percentage of coordinates
                           that differ.
          "jaccard"        One minus the Jaccard coefficient, which is the percentage
                           of nonzero coordinates that differ.
          @DISTFUN         Custom distance function handle.  A distance function of
                           the form ‘function D2 = distfun (XI, YI)’, where XI is a
                           1xP vector containing a single observation in
                           P-dimensional space, YI is an NxP matrix containing an
                           arbitrary number of observations in the same P-dimensional
                           space, and D2 is an NxP vector of distances, where (D2k)
                           is the distance between observations XI and (YIk,:).

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "DistanceWeight"  A distance weighting function, specified either as a function
                       handle, which accepts a matrix of nonnegative distances and
                       returns a matrix the same size containing nonnegative distance
                       weights, or one of the following values: "equal", which
                       corresponds to no weighting; "inverse", which corresponds to a
                       weight equal to 1/distance; "squaredinverse", which
                       corresponds to a weight equal to 1/distance^2.
                       
     "IncludeTies"     A boolean flag to indicate if the returned values should
                       contain the indices that have same distance as the K^th
                       neighbor.  When false, ‘knnsearch’ chooses the observation
                       with the smallest index among the observations that have the
                       same distance from a query point.  When true, ‘knnsearch’
                       includes all nearest neighbors whose distances are equal to
                       the K^th smallest distance in the output arguments.  To
                       specify K, use the "K" name-value pair argument.
                       
     "NSMethod"        is the nearest neighbor search method used by ‘knnsearch’ as
                       specified below.

          VALUE            DESCRIPTION
                           
     -----------------------------------------------------------------------------------
          "kdtree"         Creates and uses a Kd-tree to find nearest neighbors.
                           "kdtree" is the default value when the number of columns
                           in X is less than or equal to 10, X is not sparse, and the
                           distance metric is "euclidean", "cityblock", "manhattan",
                           "chebychev", or "minkowski".  Otherwise, the default value
                           is "exhaustive".  This argument is only valid when the
                           distance metric is one of the four aforementioned metrics.
          "exhaustive"     Uses the exhaustive search algorithm by computing the
                           distance values from all the points in X to each point in
                           Y.

     Cross Validation Options
     ------------------------

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "Crossval"        Cross-validation flag specified as 'on' or 'off'.  If 'on' is
                       specified, a 10-fold cross validation is performed and a
                       ‘ClassificationPartitionedModel’ is returned in MDL.  To
                       override this cross-validation setting, use only one of the
                       following Name-Value pair arguments.
                       
     "CVPartition"     A ‘cvpartition’ object that specifies the type of
                       cross-validation and the indexing for the training and
                       validation sets.  A ‘ClassificationPartitionedModel’ is
                       returned in MDL and the trained model is stored in the
                       ‘Trained’ property.
                       
     "Holdout"         Fraction of the data used for holdout validation, specified as
                       a scalar value in the range [0,1].  When specified, a randomly
                       selected percentage is reserved as validation data and the
                       remaining set is used for training.  The trained model is
                       stored in the ‘Trained’ property of the
                       ‘ClassificationPartitionedModel’ returned in MDL.  "Holdout"
                       partitioning attempts to ensure that each partition represents
                       the classes proportionately.
                       
     "KFold"           Number of folds to use in the cross-validated model, specified
                       as a positive integer value greater than 1.  When specified,
                       then the data is randomly partitioned in k sets and for each
                       set, the set is reserved as validation data while the
                       remaining k-1 sets are used for training.  The trained models
                       are stored in the ‘Trained’ property of the
                       ‘ClassificationPartitionedModel’ returned in MDL.  "KFold"
                       partitioning attempts to ensure that each partition represents
                       the classes proportionately.
                       
     "Leaveout"        Leave-one-out cross-validation flag specified as 'on' or
                       'off'.  If 'on' is specified, then for each of the n
                       observations (where n is the number of observations, excluding
                       missing observations, specified in the ‘NumObservations’
                       property of the model), one observation is reserved as
                       validation data while the remaining observations are used for
                       training.  The trained models are stored in the ‘Trained’
                       property of the ‘ClassificationPartitionedModel’ returned in
                       MDL.

     See also: ClassificationKNN, ClassificationPartitionedModel, knnsearch,
     rangesearch, pdist2.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Fit a k-Nearest Neighbor classification model.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
fitcnet


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5743
 -- statistics: MDL = fitcnet (X, Y)
 -- statistics: MDL = fitcnet (..., NAME, VALUE)

     Fit a Neural Network classification model.

     ‘MDL = fitcnet (X, Y)’ returns a Neural Network classification model, MDL,
     with X being the predictor data, and Y the class labels of observations in
     X.

        • ‘X’ must be a NxP numeric matrix of predictor data where rows
          correspond to observations and columns correspond to features or
          variables.
        • ‘Y’ is Nx1 matrix or cell matrix containing the class labels of
          corresponding predictor data in X.  Y can contain any type of
          categorical data.  Y must have same numbers of rows as X.

     ‘MDL = fitcnet (..., NAME, VALUE)’ returns a Neural Network classification
     model with additional options specified by Name-Value pair arguments listed
     below.

     Model Parameters
     ----------------

     NAME                          VALUE
                                   
     -----------------------------------------------------------------------------------------------
     "Standardize"                 A boolean flag indicating whether the data in X should be
                                   standardized prior to training.
                                   
     "PredictorNames"              A cell array of character vectors specifying the predictor
                                   variable names.  The variable names are assumed to be in the
                                   same order as they appear in the training data X.
                                   
     "ResponseName"                A character vector specifying the name of the response
                                   variable.
                                   
     "ClassNames"                  Names of the classes in the class labels, Y, used for fitting
                                   the Neural Network model.  ClassNames are of the same type as
                                   the class labels in Y.
                                   
     "Prior"                       A numeric vector specifying the prior probabilities for each
                                   class.  The order of the elements in Prior corresponds to the
                                   order of the classes in ClassNames.
                                   
     "LayerSizes"                  A vector of positive integers that defines the sizes of the
                                   fully connected layers in the neural network model.  Each
                                   element in LayerSizes corresponds to the number of outputs for
                                   the respective fully connected layer in the neural network
                                   model.  The default value is 10.
                                   
     "LearningRate"                A positive scalar value that defines the learning rate during
                                   the gradient descent.  Default value is 0.01.
                                   
     "Activations"                 A character vector or a cellstr vector specifying the
                                   activation functions for the hidden layers of the neural
                                   network (excluding the output layer).  The available
                                   activation functions are 'linear', 'sigmoid', 'tanh',
                                   'sigmoid', and 'none'.  The default value is 'sigmoid'.
                                   
     "OutputLayerActivation"       A character vector specifying the activation function for the
                                   output layer of the neural network.  The available activation
                                   functions are 'linear', 'sigmoid', 'tanh', 'sigmoid', and
                                   'none'.  The default value is 'sigmoid'.
                                   
     "IterationLimit"              A positive integer scalar that specifies the maximum number of
                                   training iterations.  The default value is 1000.
                                   
     "DisplayInfo"                 A boolean flag indicating whether to print information during
                                   training.  Default is false.
                                   
     "ScoreTransform"              A character vector defining one of the following functions or
                                   a user defined function handle, which is used for transforming
                                   the prediction scores returned by the ‘predict’ and
                                   ‘resubPredict’ methods.  Default value is 'none'.

          VALUE                    DESCRIPTION
     -------------------------------------------------------------------------------------------
          "doublelogit"            1 ./ (1 + exp (-2 * x))
          "invlogit"               log (x ./ (1 - x))
          "ismax"                  Sets the score for the class with the largest score to 1,
                                   and sets the scores for all other classes to 0
          "logit"                  1 ./ (1 + exp (-x))
          "none"                   x (no transformation)
          "identity"               x (no transformation)
          "sign"                   -1 for x < 0, 0 for x = 0, 1 for x > 0
          "symmetric"              2 * x - 1
          "symmetricismax"         Sets the score for the class with the largest score to 1,
                                   and sets the scores for all other classes to -1
          "symmetriclogit"         2 ./ (1 + exp (-x)) - 1

     See also: ClassificationNeuralNetwork.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 42
Fit a Neural Network classification model.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
fitcsvm


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11100
 -- statistics: MDL = fitcsvm (X, Y)
 -- statistics: MDL = fitcsvm (..., NAME, VALUE)

     Fit a Support Vector Machine classification model.

     ‘MDL = fitcsvm (X, Y)’ returns a Support Vector Machine classification
     model, MDL, with X being the predictor data, and Y the class labels of
     observations in X.

        • ‘X’ must be a NxP numeric matrix of predictor data where rows
          correspond to observations and columns correspond to features or
          variables.
        • ‘Y’ is Nx1 matrix or cell matrix containing the class labels of
          corresponding predictor data in X.  Y can be numerical, logical, char
          array or cell array of character vectors.  Y must have same number of
          rows as X.

     ‘MDL = fitcsvm (..., NAME, VALUE)’ returns a Support Vector Machine model
     with additional options specified by Name-Value pair arguments listed
     below.

     Model Parameters
     ----------------

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "Standardize"     A boolean flag indicating whether the data in X should be
                       standardized prior to training.
                       
     "PredictorNames"  A cell array of character vectors specifying the predictor
                       variable names.  The variable names are assumed to be in the
                       same order as they appear in the training data X.
                       
     "ResponseName"    A character vector specifying the name of the response
                       variable.
                       
     "ClassNames"      Names of the classes in the class labels, Y, used for fitting
                       the kNN model.  ClassNames are of the same type as the class
                       labels in Y.
                       
     "SVMtype"         Specifies the type of SVM used for training the
                       ‘ClassificationSVM’ model.  By default, the type of SVM is
                       defined by setting other parameters and/or by the data itself.
                       Setting the "SVMtype" parameter overrides the default behavior
                       and it accepts the following options:

          VALUE            DESCRIPTION
     -----------------------------------------------------------------------------------
          "C_SVC"          It is the standard SVM formulation for classification
                           tasks.  It aims to find the optimal hyperplane that
                           separates different classes by maximizing the margin
                           between them while allowing some misclassifications.  The
                           parameter "C" controls the trade-off between maximizing
                           the margin and minimizing the classification error.  It is
                           the default type, unless otherwise specified.
          "nu_SVC"         It is a variation of the standard SVM that introduces a
                           parameter ν (nu) as an upper bound on the fraction of
                           margin errors and a lower bound on the fraction of support
                           vectors.  This formulation provides more control over the
                           number of support vectors and the margin errors, making it
                           useful for specific classification scenarios.  It is the
                           default type, when the "OutlierFraction" parameter is set.
          "one_class_SVM"  It is used for anomaly detection and novelty detection
                           tasks.  It aims to separate the data points of a single
                           class from the origin in a high-dimensional feature space.
                           This method is particularly useful for identifying
                           outliers or unusual patterns in the data.  It is the
                           default type, when the "Nu" parameter is set or when there
                           is a single class in Y.  When "one_class_SVM" is set by
                           the "SVMtype" pair argument, Y has no effect and any
                           classes are ignored.

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "OutlierFraction" The expected proportion of outliers in the training data,
                       specified as a scalar value in the range [0,1].  When
                       specified, the type of SVM model is switched to "nu_SVC" and
                       "OutlierFraction" defines the ν (nu) parameter.
                       
     "KernelFunction"  A character vector specifying the method for computing
                       elements of the Gram matrix.  The available kernel functions
                       are 'gaussian' or 'rbf', 'linear', 'polynomial', and
                       'sigmoid'.  For one-class learning, the default Kernel
                       function is 'rbf'.  For two-class learning the default is
                       'linear'.
                       
     "PolynomialOrder" A positive integer that specifies the order of polynomial in
                       kernel function.  The default value is 3.  Unless the
                       "KernelFunction" is set to 'polynomial', this parameter is
                       ignored.
                       
     "KernelScale"     A positive scalar that specifies a scaling factor for the γ
                       (gamma) parameter, which can be seen as the inverse of the
                       radius of influence of samples selected by the model as
                       support vectors.  The γ (gamma) parameter is computed as gamma
                       = KernelScale / (number of features).  The default value for
                       "KernelScale" is 1.
                       
     "KernelOffset"    A nonnegative scalar that specifies the coef0 in kernel
                       function.  For the polynomial kernel, it influences the
                       polynomial's shift, and for the sigmoid kernel, it affects the
                       hyperbolic tangent's shift.  The default value for
                       "KernelOffset" is 0.
                       
     "BoxConstraint"   A positive scalar that specifies the upper bound of the
                       Lagrange multipliers, i.e.  the parameter C, which is used for
                       training "C_SVC" and "one_class_SVM" type of models.  It
                       determines the trade-off between maximizing the margin and
                       minimizing the classification error.  The default value for
                       "BoxConstraint" is 1.
                       
     "Nu"              A positive scalar, in the range (0,1] that specifies the
                       parameter ν (nu) for training "nu_SVC" and "one_class_SVM"
                       type of models.  Unless overridden by setting the "SVMtype"
                       parameter, setting the "Nu" parameter always forces the
                       training model type to "one_class_SVM", in which case, the
                       number of classes in Y is ignored.  The default value for "Nu"
                       is 1.
                       
     "CacheSize"       A positive scalar that specifies the memory requirements (in
                       MB) for storing the Gram matrix.  The default is 1000.
                       
     "Tolerance"       A nonnegative scalar that specifies the tolerance of
                       termination criterion.  The default value is 1e-6.
                       
     "Shrinking"       Specifies whether to use shrinking heuristics.  It accepts
                       either 0 or 1.  The default value is 1.

     Cross Validation Options
     ------------------------

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "Crossval"        Cross-validation flag specified as 'on' or 'off'.  If 'on' is
                       specified, a 10-fold cross validation is performed and a
                       ‘ClassificationPartitionedModel’ is returned in MDL.  To
                       override this cross-validation setting, use only one of the
                       following Name-Value pair arguments.
                       
     "CVPartition"     A ‘cvpartition’ object that specifies the type of
                       cross-validation and the indexing for the training and
                       validation sets.  A ‘ClassificationPartitionedModel’ is
                       returned in MDL and the trained model is stored in the
                       ‘Trained’ property.
                       
     "Holdout"         Fraction of the data used for holdout validation, specified as
                       a scalar value in the range [0,1].  When specified, a randomly
                       selected percentage is reserved as validation data and the
                       remaining set is used for training.  The trained model is
                       stored in the ‘Trained’ property of the
                       ‘ClassificationPartitionedModel’ returned in MDL.  "Holdout"
                       partitioning attempts to ensure that each partition represents
                       the classes proportionately.
                       
     "KFold"           Number of folds to use in the cross-validated model, specified
                       as a positive integer value greater than 1.  When specified,
                       then the data is randomly partitioned in k sets and for each
                       set, the set is reserved as validation data while the
                       remaining k-1 sets are used for training.  The trained models
                       are stored in the ‘Trained’ property of the
                       ‘ClassificationPartitionedModel’ returned in MDL.  "KFold"
                       partitioning attempts to ensure that each partition represents
                       the classes proportionately.
                       
     "Leaveout"        Leave-one-out cross-validation flag specified as 'on' or
                       'off'.  If 'on' is specified, then for each of the n
                       observations (where n is the number of observations, excluding
                       missing observations, specified in the ‘NumObservations’
                       property of the model), one observation is reserved as
                       validation data while the remaining observations are used for
                       training.  The trained models are stored in the ‘Trained’
                       property of the ‘ClassificationPartitionedModel’ returned in
                       MDL.

     See also: ClassificationSVM, ClassificationPartitionedModel, svmtrain,
     svmpredict.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 50
Fit a Support Vector Machine classification model.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
fitgmdist


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2994
 -- statistics: GMDIST = fitgmdist (DATA, K, PARAM1, VALUE1, ...)

     Fit a Gaussian mixture model with K components to DATA.  Each row of DATA
     is a data sample.  Each column is a variable.

     Optional parameters are:
        • "start": Initialization conditions.  Possible values are:
             • "randSample" (default) Takes means uniformly from rows of data.
             • "plus" Use k-means++ to initialize means.
             • "cluster" Performs an initial clustering with 10% of the data.
             • VECTOR A vector whose length is the number of rows in data, and
               whose values are 1 to k specify the components each row is
               initially allocated to.  The mean, variance, and weight of each
               component is calculated from that.
             • STRUCTURE A structure with fields mu, Sigma and
               ComponentProportion.
          For "randSample", "plus", and "cluster", the initial variance of each
          component is the variance of the entire data sample.

        • "Replicates": Number of random restarts to perform.

        • "RegularizationValue" or "Regularize": A small number added to the
          diagonal entries of the covariance to prevent singular covariances.

        • "SharedCovariance" or "SharedCov" (logical).  True if all components
          must share the same variance, to reduce the number of free parameters

        • "CovarianceType" or "CovType" (string).  Possible values are:
             • "full" (default) Allow arbitrary covariance matrices.
             • "diagonal" Force covariances to be diagonal, to reduce the number
               of free parameters.

        • "Options": A structure with all of the following fields:
             • MaxIter Maximum number of EM iterations (default 100).
             • TolFun Threshold increase in likelihood to terminate EM (default
               1e-6).
             • Display Possible values are:
                  • "off" (default): Display nothing.
                  • "final": Display the total number of iterations and
                    likelihood once the execution completes.
                  • "iter": Display the number of iteration and likelihood after
                    each iteration.
        • "Weight": A column vector or Nx2 matrix.  The first column consists of
          non-negative weights given to the samples.  If these are all integers,
          this is equivalent to specifying WEIGHT(i) copies of row i of DATA,
          but potentially faster.  If a row of DATA is used to represent samples
          that are similar but not identical, then the second column of WEIGHT
          indicates the variance of those original samples.  Specifically, in
          the EM algorithm, the contribution of row i towards the variance is
          set to at least WEIGHT(i,2), to prevent spurious components with zero
          variance.

     See also: gmdistribution, kmeans.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 55
Fit a Gaussian mixture model with K components to DATA.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
fitlm


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4120
 -- statistics: TAB = fitlm (X, Y)
 -- statistics: TAB = fitlm (X, Y, NAME, VALUE)
 -- statistics: TAB = fitlm (X, Y, MODELSPEC)
 -- statistics: TAB = fitlm (X, Y, MODELSPEC, NAME, VALUE)
 -- statistics: [TAB] = fitlm (...)
 -- statistics: [TAB, STATS] = fitlm (...)
 -- statistics: [TAB, STATS] = fitlm (...)

     Regress the continuous outcome (i.e.  dependent variable) Y on continuous
     or categorical predictors (i.e.  independent variables) X by minimizing the
     sum-of-squared residuals.  Unless requested otherwise, fitlm prints the
     model formula, the regression coefficients (i.e.  parameters/contrasts) and
     an ANOVA table.  Note that unlike anovan, fitlm treats all factors as
     continuous by default.  A bootstrap resampling variant of this function,
     ‘bootlm’, is available in the statistics-resampling package and has similar
     usage.

     X must be a column major matrix or cell array consisting of the predictors.
     A constant term (intercept) should not be included in X - it is
     automatically added to the model.  Y must be a column vector corresponding
     to the outcome variable.  MODELSPEC can specified as one of the following:

        • "constant" : model contains only a constant (intercept) term.

        • "linear" (default) : model contains an intercept and linear term for
          each predictor.

        • "interactions" : model contains an intercept, linear term for each
          predictor and all products of pairs of distinct predictors.

        • "full" : model contains an intercept, linear term for each predictor
          and all combinations of the predictors.

        • a matrix of term definitions : an t-by-(N+1) matrix specifying terms
          in a model, where t is the number of terms, N is the number of
          predictor variables, and +1 accounts for the outcome variable.  The
          outcome variable is the last column in the terms matrix and must be a
          column of zeros.  An intercept must be specified in the first row of
          the terms matrix and must be a row of zeros.

     fitlm can take a number of optional parameters as name-value pairs.

     ‘[...] = fitlm (..., "CategoricalVars", CATEGORICAL)’

        • CATEGORICAL is a vector of indices indicating which of the columns
          (i.e.  variables) in X should be treated as categorical predictors
          rather than as continuous predictors.

     fitlm also accepts optional anovan parameters as name-value pairs (except
     for the "model" parameter).  The accepted parameter names from anovan and
     their default values in fitlm are:

        • CONTRASTS : "treatment"

        • SSTYPE: 2

        • ALPHA: 0.05

        • DISPLAY: "on"

        • WEIGHTS: [] (empty)

        • RANDOM: [] (empty)

        • CONTINUOUS: [1:N]

        • VARNAMES: [] (empty)

     Type 'help anovan' to find out more about what these options do.

     fitlm can return up to two output arguments:

     [TAB] = fitlm (...) returns a cell array containing a table of model
     parameters

     [TAB, STATS] = fitlm (...) returns a structure containing additional
     statistics, including degrees of freedom and effect sizes for each term in
     the linear model, the design matrix, the variance-covariance matrix,
     (weighted) model residuals, and the mean squared error.  The columns of
     STATS.coeffs (from left-to-right) report the model coefficients, standard
     errors, lower and upper 100*(1-alpha)% confidence interval bounds,
     t-statistics, and p-values relating to the contrasts.  The number appended
     to each term name in STATS.coeffnames corresponds to the column number in
     the relevant contrast matrix for that factor.  The STATS structure can be
     used as input for multcompare.  Note that if the model contains a
     continuous variable and you wish to use the STATS output as input to
     multcompare, then the model needs to be refit with the "contrast" parameter
     set to a sum-to-zero contrast coding scheme, e.g."simple".

     See also: anovan, multcompare.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Regress the continuous outcome (i.e.  dependent variable) Y on continuous or
...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
fitrgam


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6341
 -- statistics: OBJ = fitrgam (X, Y)
 -- statistics: OBJ = fitrgam (X, Y, NAME, VALUE)

     Fit a Generalized Additive Model (GAM) for regression.

     ‘OBJ = fitrgam (X, Y)’ returns an object of class RegressionGAM, with
     matrix X containing the predictor data and vector Y containing the
     continuous response data.

        • X must be a NxP numeric matrix of input data where rows correspond to
          observations and columns correspond to features or variables.  X will
          be used to train the GAM model.
        • Y must be Nx1 numeric vector containing the response data
          corresponding to the predictor data in X.  Y must have same number of
          rows as X.

     ‘OBJ = fitrgam (..., NAME, VALUE)’ returns an object of class RegressionGAM
     with additional properties specified by Name-Value pair arguments listed
     below.

          NAME             VALUE
                           
     -----------------------------------------------------------------------------------
          "predictors"     Predictor Variable names, specified as a row vector cell
                           of strings with the same length as the columns in X.  If
                           omitted, the program will generate default variable names
                           (x1, x2, ..., xn) for each column in X.
                           
          "responsename"   Response Variable Name, specified as a string.  If
                           omitted, the default value is "Y".
                           
          "formula"        a model specification given as a string in the form "Y ~
                           terms" where Y represents the response variable and terms
                           the predictor variables.  The formula can be used to
                           specify a subset of variables for training model.  For
                           example: "Y ~ x1 + x2 + x3 + x4 + x1:x2 + x2:x3" specifies
                           four linear terms for the first four columns of for
                           predictor data, and x1:x2 and x2:x3 specify the two
                           interaction terms for 1st-2nd and 3rd-4th columns
                           respectively.  Only these terms will be used for training
                           the model, but X must have at least as many columns as
                           referenced in the formula.  If Predictor Variable names
                           have been defined, then the terms in the formula must
                           reference to those.  When "formula" is specified, all
                           terms used for training the model are referenced in the
                           IntMatrix field of the OBJ class object as a matrix
                           containing the column indexes for each term including both
                           the predictors and the interactions used.
                           
          "interactions"   a logical matrix, a positive integer scalar, or the string
                           "all" for defining the interactions between predictor
                           variables.  When given a logical matrix, it must have the
                           same number of columns as X and each row corresponds to a
                           different interaction term combining the predictors
                           indexed as true.  Each interaction term is appended as a
                           column vector after the available predictor column in X.
                           When "all" is defined, then all possible combinations of
                           interactions are appended in X before training.  At the
                           moment, parsing a positive integer has the same effect as
                           the "all" option.  When "interactions" is specified, only
                           the interaction terms appended to X are referenced in the
                           IntMatrix field of the OBJ class object.
                           
          "knots"          a scalar or a row vector with the same columns as X.  It
                           defines the knots for fitting a polynomial when training
                           the GAM. As a scalar, it is expanded to a row vector.  The
                           default value is 5, hence expanded to ones (1, columns
                           (X)) * 5.  You can parse a row vector with different
                           number of knots for each predictor variable to be fitted
                           with, although not recommended.
                           
          "order"          a scalar or a row vector with the same columns as X.  It
                           defines the order of the polynomial when training the GAM.
                           As a scalar, it is expanded to a row vector.  The default
                           values is 3, hence expanded to ones (1, columns (X)) * 3.
                           You can parse a row vector with different number of
                           polynomial order for each predictor variable to be fitted
                           with, although not recommended.
                           
          "dof"            a scalar or a row vector with the same columns as X.  It
                           defines the degrees of freedom for fitting a polynomial
                           when training the GAM. As a scalar, it is expanded to a
                           row vector.  The default value is 8, hence expanded to
                           ones (1, columns (X)) * 8.  You can parse a row vector
                           with different degrees of freedom for each predictor
                           variable to be fitted with, although not recommended.
                           
          "tol"            a positive scalar to set the tolerance for convergence
                           during training.  By default, it is set to 1e-3.

     You can parse either a "formula" or an "interactions" optional parameter.
     Parsing both parameters will result an error.  Accordingly, you can only
     pass up to two parameters among "knots", "order", and "dof" to define the
     required polynomial for training the GAM model.

     See also: RegressionGAM, regress, regress_gp.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Fit a Generalized Additive Model (GAM) for regression.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
friedman


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1838
 -- statistics: P = friedman (X)
 -- statistics: P = friedman (X, REPS)
 -- statistics: P = friedman (X, REPS, DISPLAYOPT)
 -- statistics: [P, TBL] = friedman (...)
 -- statistics: [P, TBL, STATS] = friedman (...)

     Performs the nonparametric Friedman's test to compare column effects in a
     two-way layout.  friedman tests the null hypothesis that the column effects
     are all the same against the alternative that they are not all the same.

     friedman requires one up to three input arguments:

        • X contains the data and it must be a matrix of at least two columns
          and two rows.
        • REPS is the number of replicates for each combination of factor
          groups.  If not provided, no replicates are assumed.
        • DISPLAYOPT is an optional parameter for displaying the Friedman's
          ANOVA table, when it is 'on' and suppressing the display when it is
          'off' (default).

     friedman returns up to three output arguments:

        • P is the p-value of the null hypothesis that all group means are
          equal.
        • TBL is a table containing the results of the Friedman's test in ANOVA
          table format.  The table includes columns for Source, SS, df, MS,
          Chi-sq, and Prob>Chi-sq with rows for Columns, [Interaction], Error,
          and Total.
        • STATS is a structure containing statistics useful for performing a
          multiple comparison of medians with the MULTCOMPARE function.

     If friedman is called without any output arguments, then it prints the
     results in a Friedman's ANOVA table to the standard output.

     Examples:

          load popcorn;
          friedman (popcorn, 3);

          [p, anovatab, stats] = friedman (popcorn, 3);
          disp (p);

     See also: anova2, kruskalwallis, multcompare.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Performs the nonparametric Friedman's test to compare column effects in a
two...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
fullfact


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 494
 -- statistics: A = fullfact (LEVELS)

     Full factorial design.

     ‘A =’ fullfact (LEVELS) returns a numeric matrix A with the treatments of a
     full factorial design specified by LEVELS, which must be a numeric vector
     of real positive integer values with each value specifying the number of
     levels of each individual factor.

     Each row of A corresponds to a single treatment and each column to a single
     factor.  For binary full factorial design, use ‘ff2n’.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 22
Full factorial design.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
geomean


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1979
 -- statistics: M = geomean (X)
 -- statistics: M = geomean (X, "all")
 -- statistics: M = geomean (X, DIM)
 -- statistics: M = geomean (X, VECDIM)
 -- statistics: M = geomean (..., NANFLAG)

     Compute the geometric mean of X.

        • If X is a vector, then ‘geomean(X)’ returns the geometric mean of the
          elements in X defined as

               geomean (X) = PROD_i X(i) ^ (1/N)

          where N is the length of the X vector.

        • If X is a matrix, then ‘geomean(X)’ returns a row vector with the
          geometric mean of each columns in X.

        • If X is a multidimensional array, then ‘geomean(X)’ operates along the
          first nonsingleton dimension of X.

        • X must not contain any negative or complex values.

     ‘geomean(X, "all")’ returns the geometric mean of all the elements in X.
     If X contains any 0, then the returned value is 0.

     ‘geomean(X, DIM)’ returns the geometric mean along the operating dimension
     DIM of X.  Calculating the harmonic mean of any subarray containing any 0
     will return 0.

     ‘geomean(X, VECDIM)’ returns the geometric mean over the dimensions
     specified in the vector VECDIM.  For example, if X is a 2-by-3-by-4 array,
     then ‘geomean(X, [1 2])’ returns a 1-by-1-by-4 array.  Each element of the
     output array is the geometric mean of the elements on the corresponding
     page of X.  If VECDIM indexes all dimensions of X, then it is equivalent to
     ‘geomean (X, "all")’.  Any dimension in VECDIM greater than ‘ndims (X)’ is
     ignored.

     ‘geomean(..., NANFLAG)’ specifies whether to exclude NaN values from the
     calculation, using any of the input argument combinations in previous
     syntaxes.  By default, geomean includes NaN values in the calculation
     (NANFLAG has the value "includenan").  To exclude NaN values, set the value
     of NANFLAG to "omitnan".

     See also: harmmean, mean.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 32
Compute the geometric mean of X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
glmfit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5565
 -- statistics: B = glmfit (X, Y, DISTRIBUTION)
 -- statistics: B = glmfit (X, Y, DISTRIBUTION, NAME, VALUE)
 -- statistics: [B, DEV] = glmfit (...)
 -- statistics: [B, DEV, STATS] = glmfit (...)

     Perform generalized linear model fitting.

     ‘B = glmfit (X, Y, DISTRIBUTION)’ returns a vector B of coefficient
     estimates for a generalized linear regression model of the responses in Y
     on the predictors in X, using the distribution defined in DISTRIBUTION.

        • X is an nxp numeric matrix of predictor variables with n observations
          and p predictors.
        • Y is an nx1 numeric vector of responses for all supported
          distributions, except for the 'binomial' distribution in which case Y
          can be either a numeric or logical nx1 vector or an nx2 matrix, where
          the first column contains the number of successes and the second
          column contains the number of trials.
        • DISTRIBUTION is a character vector specifying the distribution of the
          response variable.  Supported distributions are "normal", "binomial",
          "poisson", "gamma", and "inverse gaussian".

     ‘B = glmfit (..., NAME, VALUE)’ specifies additional options using
     Name-Value pair arguments.

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "B0"              A numeric vector specifying initial values for the coefficient
                       estimates.  By default, the initial values are fitted values
                       fitted from the data.
                       
     "Constant"        A character vector specifying whether to include a constant
                       term in the model.  Valid options are "ON" (default) and
                       "OFF".
                       
     "EstDisp"         A character vector specifying whether to compute dispersion
                       parameter.  Valid options are "ON" and "OFF".  For "binomial"
                       and "poisson" distributions the default is "OFF", whereas for
                       the "normal", "gamma", and "inverse gaussian" distributions
                       the default is "ON".
                       
     "link"            A character vector specifying the name of a canonical link
                       function or a numeric scalar for specifying a "power" link
                       function.  Supported canonical link functions include
                       "identity" (default for "normal" distribution), "log" (default
                       for "poisson" distribution), "logit" (default for "binomial"
                       distribution), "probit", "loglog", "comploglog", and
                       "reciprocal" (default for the "gamma" distribution).  The
                       "power" link function is the default for the "inverse
                       gaussian" distribution with p = -2.  For custom link
                       functions, the user can provide cell array with three function
                       handles: the link function, its derivative, and its inverse,
                       or alternatively a structure S with three fields: S.Link,
                       S.Derivative, and S.Inverse.  Each field can either contain a
                       function handle or a character vector with the name of an
                       existing function.  All custom link functions must accept a
                       vector of inputs and return a vector of the same size.
                       
     "Offset"          A numeric vector of the same length as the response Y
                       specifying an offset variable in the fit.  It is used as an
                       additional predictor with a coefficient value fixed at 1.
                       
     "Options"         A scalar structure containing the fields MaxIter and TolX.
                       MaxIter must be a scalar positive integer specifying the
                       maximum number of iteration allowed for fitting the model, and
                       TolX must be a positive scalar value specifying the
                       termination tolerance.
                       
     "Weights"         An nx1 numeric vector of nonnegative values, where n is the
                       number of observations in X.  By default, it is ‘ones (n, 1)’.

     ‘[B, DEV] = glmfit (...)’ also returns the deviance of the fit as a numeric
     value in DEV.  Deviance is a generalization of the residual sum of squares.
     It measures the goodness of fit compared to a saturated model.

     ‘[B, DEV, STATS] = glmfit (...)’ also returns the structure STATS, which
     contains the model statistics in the following fields:

        • beta - Coefficient estimates B
        • dfe - Degrees of freedom for error
        • sfit - Estimated dispersion parameter
        • s - Theoretical or estimated dispersion parameter
        • estdisp - ‘false’ when "EstDisp" is "off" and ‘true’ when "EstDisp" is
          "on"
        • covb - Estimated covariance matrix for B
        • se - Vector of standard errors of the coefficient estimates B
        • coeffcorr - Correlation matrix for B
        • t - t statistics for B
        • p - p-values for B
        • resid - Vector of residuals
        • residp - Vector of Pearson residuals
        • residd - Vector of deviance residuals
        • resida - Vector of Anscombe residuals

     See also: glmval.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 41
Perform generalized linear model fitting.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
glmval


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2043
 -- statistics: YHAT = glmval (B, X, LINK)
 -- statistics: [YHAT, Y_LO, Y_HI] = glmval (B, X, LINK, STATS)
 -- statistics: [...] = glmval (..., NAME, VALUE)

     Predict values for a generalized linear model.

     ‘YHAT = glmval (B, X, LINK)’ returns the predicted values for the
     generalized linear model with a vector of coefficient estimates B, a matrix
     of predictors X, in which each column corresponds to a distinct predictor
     variable, and a link function LINK, which can be any of the character
     vectors, numeric scalar, or custom-defined link functions used as values
     for the "link" name-value pair argument in the ‘glmfit’ function.

     ‘[YHAT, Y_LO, Y_HI] = glmval (B, X, LINK, STATS)’ also returns the 95%
     confidence intervals for the predicted values according to the model's
     statistics contained in the STATS structure, which is the output of the
     ‘glmfit’ function.  By default, the confidence intervals are
     nonsimultaneous, and apply to the fitted curve instead of new observations.

     ‘[...] = glmval (..., NAME, VALUE)’ specifies additional options using
     Name-Value pair arguments.

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "confidence"      A scalar value between 0 and 1 specifying the confidence level
                       for the confidence bounds.
                       
     "Constant"        A character vector specifying whether to include a constant
                       term in the model.  Valid options are "ON" (default) and
                       "OFF".
                       
     "simultaneous"    Specifies whether to include a constant term in the model.
                       Options are "ON" (default) or "OFF".
                       
     "size"            A numeric scalar or a vector with one value for each row of X
                       specifying the size parameter N for a binomial model.

     See also: glmfit.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Predict values for a generalized linear model.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 14
gmdistribution


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1185
 -- statistics: GMDIST = gmdistribution (MU, SIGMA)
 -- statistics: GMDIST = gmdistribution (MU, SIGMA, P)
 -- statistics: GMDIST = gmdistribution (MU, SIGMA, P, EXTRA)

     Create an object of the gmdistribution class which represents a Gaussian
     mixture model with k components of n-dimensional Gaussians.

     Input MU is a k-by-n matrix specifying the n-dimensional mean of each of
     the k components of the distribution.

     Input SIGMA is an array that specifies the variances of the distributions,
     in one of four forms depending on its dimension.
        • n-by-n-by-k: Slice SIGMA(:,:,i) is the variance of the i'th component
        • 1-by-n-by-k: Slice diag(SIGMA(1,:,i)) is the variance of the i'th
          component
        • n-by-n: SIGMA is the variance of every component
        • 1-by-n-by-k: Slice diag(SIGMA) is the variance of every component

     If P is specified, it is a vector of length k specifying the proportion of
     each component.  If it is omitted or empty, each component has an equal
     proportion.

     Input EXTRA is used by fitgmdist to indicate the parameters of the fitting
     process.

     See also: fitgmdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Create an object of the gmdistribution class which represents a Gaussian mixt...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
grp2idx


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1781
 -- statistics: G = grp2idx (S)
 -- statistics: [G, GN] = grp2idx (S)
 -- statistics: [G, GN, GL] = grp2idx (S)

     Get index for grouping variable.

     ‘G = grp2idx (S)’ returns a numeric column vector of integer values G
     indexing the distinct groups in the grouping variable S.  S can specified
     as any of the following data types:

        • categorical vector
        • cell array of character vectors
        • character array
        • duration vector
        • logical vector
        • numeric vector

     S must be a vector, unless it is a 2-D character array.  In the case of
     numerical and logical data types, the group indices are ordered in sorted
     order of S.  In the case of categorical arrays, the group indices are
     allocated by the order of the categories in S.  For the rest of the data
     types, the group indices are allocated by order of first appearance in S.
     Note that in case of a categorical grouping variable, the indexing integer
     values might not be continuous, since S may contain unassigned categories.
     For every other data type, G will contain integer values in the range
     [1:K], where K is the number of distinct groups in S.

     ‘[G, GN] = grp2idx (S)’ also returns a cell array of character vectors GN
     representing the list of group names.  The order of the group names in GN
     follow the same pattern as the group indices in G according to the data
     type of S, as described above.

     ‘[G, GN, GL] = grp2idx (S)’ further returns a column vector GL representing
     the list of the group levels with the same data type as S.

     Note that standard missing values in S appear as NaN in G and are not
     present on either GN and GL.

     See also: grpstats.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 32
Get index for grouping variable.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
grpstats


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6946
 -- statistics: STATS = grpstats (X)
 -- statistics: STATS = grpstats (X, GROUP)
 -- statistics: [STATS1, ..., STATSN] = grpstats (X, GROUP, WHICHSTATS)
 -- statistics: [STATS1, ..., STATSN] = grpstats (X, GROUP, WHICHSTATS, 'Alpha',
          ALPHA)
 -- statistics: TBLSTATS = grpstats (TBL, GROUPVARS)
 -- statistics: TBLSTATS = grpstats (TBL, GROUPVARS, WHICHSTATS)
 -- statistics: TBLSTATS = grpstats (TBL, GROUPVARS, WHICHSTATS, NAME, VALUE)
 -- statistics: grpstats (X, GROUP, ALPHA)
 -- statistics: H = grpstats (X, GROUP, ALPHA)

     Summary statistics by group.

     ‘grpstats’ computes groupwise summary statistics for the data in X, which
     can be a numeric matrix or a table.  Numeric vectors are treated as a
     single column matrix.  NaNs are treated as missing values and removed from
     calculations.

     Syntax for Numeric Input
     ------------------------

     ‘STATS = grpstats (X)’ calculates the mean statistic for each column in X
     and returns it as row vector in STATS.

     ‘STATS = grpstats (X, GROUP)’ calculates the mean statistic for each column
     in X grouped by GROUP.  The returned argument, STATS, is also a matrix with
     equal columns as X and the number of rows is equal to the groups specified
     by GROUP.

     The grouping variable, GROUP can be a vector of any data type supported by
     the ‘grp2idx’ function.  Alternatively, it can be a cell vector specifying
     multiple grouping variables with each cell element containing any of the
     aforementioned supported grouping vectors.  If GROUP is empty (‘[]’), then
     input X is treated as a single group.

     ‘[STATS1, ..., STATSN] = grpstats (X, GROUP, WHICHSTATS)’ calculates the
     summary statistics specified by the WHICHSTATS argument, which can include
     any of the available statistics shown below.  The number of output
     arguments must match the number of requested statistics specified in
     WHICHSTATS.  computes summary statistics for the numeric matrix X grouped
     by GROUP.

     X must be a numeric vector or a 2-D matrix.  Vectors are treated as a
     single-column matrix.

     GROUP is a grouping variable that defines the groups for the rows of X.  It
     can be a categorical variable, numeric vector, string array, or cell array
     of strings.  GROUP can also be a cell array containing multiple grouping
     variables.  If GROUP is empty (‘[]’) or omitted, all of X is treated as a
     single group.

     WHICHSTATS specifies the statistics to compute.  It can be either a string
     array or a cell array of strings specifying any of the following builtin
     statistics.  If omitted, the default is "mean".  WHICHSTATS can also
     contain function handles for custom statistics.

     The available statistics are:
          "mean"           Mean of each group.
          "median"         Median of each group.
          "sem"            Standard error of the mean for each group.
          "std"            Standard deviation of each group.
          "var"            Variance of each group.
          "min"            Minimum value in each group.
          "max"            Maximum value in each group.
          "range"          Difference between max and min in each group.
          "numel"          Number of elements (count) in each group.
          "meanci"         Confidence interval for the mean.
          "predci"         Prediction interval for a new observation.
          "gname"          Group names.

     ‘[...] = grpstats (..., 'Alpha', ALPHA)’ specifies the significance level
     for the confidence intervals ("meanci" and "predci") as ‘100 *
     (1-ALPHA)@%’.  ALPHA must be a scalar between 0 and 1.  When not specified,
     it defaults to 0.05.  Note that this paired input argument is also valid
     for table input.

     Syntax for Table Input
     ----------------------

     ‘TBLSTATS = grpstats (TBL, GROUPVARS)’ computes the summary statistics for
     the data in table TBL, grouped by the variables specified in GROUPVARS.  If
     GROUPVARS is empty or omitted, then all of TBL is treated as a single
     group.  GROUPVARS can be a cell array of character vectors or a string
     array specifying one or more variable names in TBL to be used as grouping
     variables.  Alternatively, all valid methods for indexing table variables
     are supported (e.g.  ‘vartype’ object, logical vector, function handle).

     The output TBLSTATS is a table with one row for each group.  It contains
     the grouping variables, an additional "GroupCount" variable, and the
     specified summary statistics for the variables in TBL, expect for those
     specified as grouping variables.  When input is a table, only a single
     output variable, TBLSTATS can be specified.  The output TBLSTATS also
     contains RowNames, which are the unique combinations of the specified
     groups, for which data are available in TBL.  When no groups are specified,
     the row name of the single row output table defaults to 'All'.

     ‘TBLSTATS = grpstats (TBL, GROUPVARS, WHICHSTATS)’ specifies which
     statistics to calculate for the variables in TBL.  Unless specified, the
     mean is calculated for each variable.  When specifying more than one
     statistic, TBLSTATS contains multiple variables for each variable in TBL
     and each is named by combining the applied statistic with the name of the
     original variable.  When a function handle is applied, its string
     representation is used instead.

     For table input specifically, ‘grpstats’ also accepts the following paired
     arguments.

     Name                  Value
     -----------------------------------------------------------------------------------
     'DataVars'            A vector specifying the variables in TBL, for which to
                           calculate the specified statistics.  The vector can be any
                           of the valid options for indexing table variables.
                           
     'VarNames'            A cell array of character vectors or a string array
                           specifying the names of the variables in the output table.
                           The number of specified names must match the number of
                           expected variables in the output table.

     Plotting Syntax
     ---------------

     The syntax ‘grpstats (X, GROUP, ALPHA)’ generates an ‘errorbar’ plot with
     the group means and their respective confidence intervals.  X must be a
     numeric vector or matrix.  ALPHA is a scalar between 0 and 1 that
     determines the confidence level.  This syntax is an alternative to calling
     ‘errorbar’ after computing "mean" and "meanci" statistics.  The optional
     output H is a handle to the hggroup object representing the data plot and
     errorbars.

     See also: grp2idx.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 28
Summary statistics by group.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
gscatter


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1791
 -- statistics: gscatter (X, Y, G)
 -- statistics: gscatter (X, Y, G, CLR, SYM, SIZ)
 -- statistics: gscatter (..., DOLEG, XNAM, YNAM)
 -- statistics: H = gscatter (...)

     Draw a scatter plot with grouped data.

     ‘gscatter’ is a utility function to draw a scatter plot of X and Y,
     according to the groups defined by G.  Input X and Y are numeric vectors of
     the same size, while G is either a vector of the same size as X or a
     character matrix with the same number of rows as the size of X.  As a
     vector G can be numeric, logical, a character array, a string array (not
     implemented), a cell string or cell array.

     A number of optional inputs change the appearance of the plot:
        • "CLR" defines the color for each group; if not enough colors are
          defined by "CLR", ‘gscatter’ cycles through the specified colors.
          Colors can be defined as named colors, as rgb triplets or as indices
          for the current ‘colormap’.  The default value is a different color
          for each group, according to the current ‘colormap’.

        • "SYM" is a char array of symbols for each group; if not enough symbols
          are defined by "SYM", ‘gscatter’ cycles through the specified symbols.

        • "SIZ" is a numeric array of sizes for each group; if not enough sizes
          are defined by "SIZ", ‘gscatter’ cycles through the specified sizes.

        • "DOLEG" is a boolean value to show the legend; it can be either on
          (default) or off.

        • "XNAM" is a character array, the name for the x axis.

        • "YNAM" is a character array, the name for the y axis.

     Output H is an array of graphics handles to the ‘line’ object of each
     group.

See also: scatter.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 38
Draw a scatter plot with grouped data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
harmmean


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1985
 -- statistics: M = harmmean (X)
 -- statistics: M = harmmean (X, "all")
 -- statistics: M = harmmean (X, DIM)
 -- statistics: M = harmmean (X, VECDIM)
 -- statistics: M = harmmean (..., NANFLAG)

     Compute the harmonic mean of X.

        • If X is a vector, then ‘harmmean(X)’ returns the harmonic mean of the
          elements in X defined as

               harmmean (X) = N / SUM_i X(i)^-1

          where N is the length of the X vector.

        • If X is a matrix, then ‘harmmean(X)’ returns a row vector with the
          harmonic mean of each columns in X.

        • If X is a multidimensional array, then ‘harmmean(X)’ operates along
          the first nonsingleton dimension of X.

        • X must not contain any negative or complex values.

     ‘harmmean(X, "all")’ returns the harmonic mean of all the elements in X.
     If X contains any 0, then the returned value is 0.

     ‘harmmean(X, DIM)’ returns the harmonic mean along the operating dimension
     DIM of X.  Calculating the harmonic mean of any subarray containing any 0
     will return 0.

     ‘harmmean(X, VECDIM)’ returns the harmonic mean over the dimensions
     specified in the vector VECDIM.  For example, if X is a 2-by-3-by-4 array,
     then ‘harmmean(X, [1 2])’ returns a 1-by-1-by-4 array.  Each element of the
     output array is the harmonic mean of the elements on the corresponding page
     of X.  If VECDIM indexes all dimensions of X, then it is equivalent to
     ‘harmmean (X, "all")’.  Any dimension in VECDIM greater than ‘ndims (X)’ is
     ignored.

     ‘harmmean(..., NANFLAG)’ specifies whether to exclude NaN values from the
     calculation, using any of the input argument combinations in previous
     syntaxes.  By default, harmmean includes NaN values in the calculation
     (NANFLAG has the value "includenan").  To exclude NaN values, set the value
     of NANFLAG to "omitnan".

     See also: geomean, mean.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 31
Compute the harmonic mean of X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
hist3


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2098
 -- statistics: hist3 (X)
 -- statistics: hist3 (X, NBINS)
 -- statistics: hist3 (X, "Nbins", NBINS)
 -- statistics: hist3 (X, CENTERS)
 -- statistics: hist3 (X, "Ctrs", CENTERS)
 -- statistics: hist3 (X, "Edges", EDGES)
 -- statistics: [N, C] = hist3 (...)
 -- statistics: hist3 (..., PROP, VAL, ...)
 -- statistics: hist3 (HAX, ...)

     Produce bivariate (2D) histogram counts or plots.

     The elements to produce the histogram are taken from the Nx2 matrix X.  Any
     row with NaN values are ignored.  The actual bins can be configured in 3
     different: number, centers, or edges of the bins:

     Number of bins (default)
          Produces equally spaced bins between the minimum and maximum values of
          X.  Defined as a 2 element vector, NBINS, one for each dimension.
          Defaults to ‘[10 10]’.

     Center of bins
          Defined as a cell array of 2 monotonically increasing vectors,
          CENTERS.  The width of each bin is determined from the adjacent values
          in the vector with the initial and final bin, extending to Infinity.

     Edge of bins
          Defined as a cell array of 2 monotonically increasing vectors, EDGES.
          ‘N(i,j)’ contains the number of elements in X for which:

               EDGES{1}(i) <= X(:,1) < EDGES{1}(i+1)
               EDGES{2}(j) <= X(:,2) < EDGES{2}(j+1)

          The consequence of this definition is that values outside the initial
          and final edge values are ignored, and that the final bin only
          contains the number of elements exactly equal to the final edge.

     The return values, N and C, are the bin counts and centers respectively.
     These are specially useful to produce intensity maps:

          [counts, centers] = hist3 (data);
          imagesc (centers{1}, centers{2}, counts)

     If there is no output argument, or if the axes graphics handle HAX is
     defined, the function will plot a 3 dimensional bar graph.  Any extra
     property/value pairs are passed directly to the underlying surface object.

     See also: hist, histc, lookup, mesh.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 49
Produce bivariate (2D) histogram counts or plots.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
histfit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1432
 -- statistics: histfit (X)
 -- statistics: histfit (X, NBINS)
 -- statistics: histfit (X, NBINS, DISTNAME)
 -- statistics: histfit (AX, ...)
 -- statistics: H = histfit (...)

     Plot histogram with superimposed distribution fit.

     ‘histfit (X)’ plots a histogram of the values in the vector X using the
     number of bins equal to the square root of the number of non-missing
     elements in X and superimposes a fitted normal density function.

     ‘histfit (X, NBINS)’ plots a histogram of the values in the vector X using
     NBINS number of bins in the histogram and superimposes a fitted normal
     density function.

     ‘histfit (X, NBINS, DISTNAME)’ plots a histogram of the values in the
     vector X using NBINS number of bins in the histogram and superimposes a
     fitted density function from the distribution specified by DISTNAME.

     ‘histfit (AX, ...)’ uses the axes handle AX to plot the histogram and the
     fitted density function onto followed by any of the input argument
     combinations specified in the previous syntaxes.

     ‘H = histfit (...)’ returns a vector of handles H, where H(1) is the handle
     to the histogram and H(2) is the handle to the density curve.

     Note: calling ‘histfit’ without any input arguments will return a cell
     array of character vectors listing all supported distributions.

     See also: bar, hist, normplot, fitdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 50
Plot histogram with superimposed distribution fit.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
hmmestimate


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4535
 -- statistics: [TRANSPROBEST, OUTPROBEST] = hmmestimate (SEQUENCE, STATES)
 -- statistics: [...] = hmmestimate (..., "statenames", STATENAMES)
 -- statistics: [...] = hmmestimate (..., "symbols", SYMBOLS)
 -- statistics: [...] = hmmestimate (..., "pseudotransitions",
          PSEUDOTRANSITIONS)
 -- statistics: [...] = hmmestimate (..., "pseudoemissions", PSEUDOEMISSIONS)

     Estimation of a hidden Markov model for a given sequence.

     Estimate the matrix of transition probabilities and the matrix of output
     probabilities of a given sequence of outputs and states generated by a
     hidden Markov model.  The model assumes that the generation starts in state
     ‘1’ at step ‘0’ but does not include step ‘0’ in the generated states and
     sequence.

     Arguments
     ---------

        • SEQUENCE is a vector of a sequence of given outputs.  The outputs must
          be integers ranging from ‘1’ to the number of outputs of the hidden
          Markov model.

        • STATES is a vector of the same length as SEQUENCE of given states.
          The states must be integers ranging from ‘1’ to the number of states
          of the hidden Markov model.

     Return values
     -------------

        • TRANSPROBEST is the matrix of the estimated transition probabilities
          of the states.  ‘transprobest(i, j)’ is the estimated probability of a
          transition to state ‘j’ given state ‘i’.

        • OUTPROBEST is the matrix of the estimated output probabilities.
          ‘outprobest(i, j)’ is the estimated probability of generating output
          ‘j’ given state ‘i’.

     If ‘'symbols'’ is specified, then SEQUENCE is expected to be a sequence of
     the elements of SYMBOLS instead of integers.  SYMBOLS can be a cell array.

     If ‘'statenames'’ is specified, then STATES is expected to be a sequence of
     the elements of STATENAMES instead of integers.  STATENAMES can be a cell
     array.

     If ‘'pseudotransitions'’ is specified then the integer matrix
     PSEUDOTRANSITIONS is used as an initial number of counted transitions.
     ‘pseudotransitions(i, j)’ is the initial number of counted transitions from
     state ‘i’ to state ‘j’.  TRANSPROBEST will have the same size as
     PSEUDOTRANSITIONS.  Use this if you have transitions that are very unlikely
     to occur.

     If ‘'pseudoemissions'’ is specified then the integer matrix PSEUDOEMISSIONS
     is used as an initial number of counted outputs.  ‘pseudoemissions(i, j)’
     is the initial number of counted outputs ‘j’ given state ‘i’.  If
     ‘'pseudoemissions'’ is also specified then the number of rows of
     PSEUDOEMISSIONS must be the same as the number of rows of
     PSEUDOTRANSITIONS.  OUTPROBEST will have the same size as PSEUDOEMISSIONS.
     Use this if you have outputs or states that are very unlikely to occur.

     Examples
     --------

          transprob = [0.8, 0.2; 0.4, 0.6];
          outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1];
          [sequence, states] = hmmgenerate (25, transprob, outprob);
          [transprobest, outprobest] = hmmestimate (sequence, states)

          symbols = {"A", "B", "C"};
          statenames = {"One", "Two"};
          [sequence, states] = hmmgenerate (25, transprob, outprob, ...
                                            "symbols", symbols, ...
                                            "statenames", statenames);
          [transprobest, outprobest] = hmmestimate (sequence, states, ...
                                            "symbols', symbols, ...
                                            "statenames', statenames)

          pseudotransitions = [8, 2; 4, 6];
          pseudoemissions = [2, 4, 4; 7, 2, 1];
          [sequence, states] = hmmgenerate (25, transprob, outprob);
          [transprobest, outprobest] = hmmestimate (sequence, states, ...
                                       "pseudotransitions", pseudotransitions, ...
                                       "pseudoemissions", pseudoemissions)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  ‘Computational Statistics
          Handbook with MATLAB’. Appendix E, pages 547-557, Chapman & Hall/CRC,
          2001.

       2. Lawrence R. Rabiner.  A Tutorial on Hidden Markov Models and Selected
          Applications in Speech Recognition.  ‘Proceedings of the IEEE’, 77(2),
          pages 257-286, February 1989.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Estimation of a hidden Markov model for a given sequence.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
hmmgenerate


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2661
 -- statistics: [SEQUENCE, STATES] = hmmgenerate (LEN, TRANSPROB, OUTPROB)
 -- statistics: [...] = hmmgenerate (..., "symbols", SYMBOLS)
 -- statistics: [...] = hmmgenerate (..., "statenames", STATENAMES)

     Output sequence and hidden states of a hidden Markov model.

     Generate an output sequence and hidden states of a hidden Markov model.
     The model starts in state ‘1’ at step ‘0’ but will not include step ‘0’ in
     the generated states and sequence.

     Arguments
     ---------

        • LEN is the number of steps to generate.  SEQUENCE and STATES will have
          LEN entries each.

        • TRANSPROB is the matrix of transition probabilities of the states.
          ‘transprob(i, j)’ is the probability of a transition to state ‘j’
          given state ‘i’.

        • OUTPROB is the matrix of output probabilities.  ‘outprob(i, j)’ is the
          probability of generating output ‘j’ given state ‘i’.

     Return values
     -------------

        • SEQUENCE is a vector of length LEN of the generated outputs.  The
          outputs are integers ranging from ‘1’ to ‘columns (outprob)’.

        • STATES is a vector of length LEN of the generated hidden states.  The
          states are integers ranging from ‘1’ to ‘columns (transprob)’.

     If ‘"symbols"’ is specified, then the elements of SYMBOLS are used for the
     output sequence instead of integers ranging from ‘1’ to ‘columns
     (outprob)’.  SYMBOLS can be a cell array.

     If ‘"statenames"’ is specified, then the elements of STATENAMES are used
     for the states instead of integers ranging from ‘1’ to ‘columns
     (transprob)’.  STATENAMES can be a cell array.

     Examples
     --------

          transprob = [0.8, 0.2; 0.4, 0.6];
          outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1];
          [sequence, states] = hmmgenerate (25, transprob, outprob)

          symbols = {"A", "B", "C"};
          statenames = {"One", "Two"};
          [sequence, states] = hmmgenerate (25, transprob, outprob, ...
                                            "symbols", symbols, ...
                                            "statenames", statenames)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  ‘Computational Statistics
          Handbook with MATLAB’. Appendix E, pages 547-557, Chapman & Hall/CRC,
          2001.

       2. Lawrence R. Rabiner.  A Tutorial on Hidden Markov Models and Selected
          Applications in Speech Recognition.  ‘Proceedings of the IEEE’, 77(2),
          pages 257-286, February 1989.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 59
Output sequence and hidden states of a hidden Markov model.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
hmmviterbi


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2758
 -- statistics: VPATH = hmmviterbi (SEQUENCE, TRANSPROB, OUTPROB)
 -- statistics: VPATH = hmmviterbi (..., "symbols", SYMBOLS)
 -- statistics: VPATH = hmmviterbi (..., "statenames", STATENAMES)

     Viterbi path of a hidden Markov model.

     Use the Viterbi algorithm to find the Viterbi path of a hidden Markov model
     given a sequence of outputs.  The model assumes that the generation starts
     in state ‘1’ at step ‘0’ but does not include step ‘0’ in the generated
     states and sequence.

     Arguments
     ---------

        • SEQUENCE is the vector of length LEN of given outputs.  The outputs
          must be integers ranging from ‘1’ to ‘columns (outprob)’.

        • TRANSPROB is the matrix of transition probabilities of the states.
          ‘transprob(i, j)’ is the probability of a transition to state ‘j’
          given state ‘i’.

        • OUTPROB is the matrix of output probabilities.  ‘outprob(i, j)’ is the
          probability of generating output ‘j’ given state ‘i’.

     Return values
     -------------

        • VPATH is the vector of the same length as SEQUENCE of the estimated
          hidden states.  The states are integers ranging from ‘1’ to ‘columns
          (transprob)’.

     If ‘"symbols"’ is specified, then SEQUENCE is expected to be a sequence of
     the elements of SYMBOLS instead of integers ranging from ‘1’ to ‘columns
     (outprob)’.  SYMBOLS can be a cell array.

     If ‘"statenames"’ is specified, then the elements of STATENAMES are used
     for the states in VPATH instead of integers ranging from ‘1’ to ‘columns
     (transprob)’.  STATENAMES can be a cell array.

     Examples
     --------

          transprob = [0.8, 0.2; 0.4, 0.6];
          outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1];
          [sequence, states] = hmmgenerate (25, transprob, outprob);
          vpath = hmmviterbi (sequence, transprob, outprob);

          symbols = {"A", "B", "C"};
          statenames = {"One", "Two"};
          [sequence, states] = hmmgenerate (25, transprob, outprob, ...
                               "symbols", symbols, "statenames", statenames);
          vpath = hmmviterbi (sequence, transprob, outprob, ...
                  "symbols", symbols, "statenames", statenames);

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  ‘Computational Statistics
          Handbook with MATLAB’. Appendix E, pages 547-557, Chapman & Hall/CRC,
          2001.

       2. Lawrence R. Rabiner.  A Tutorial on Hidden Markov Models and Selected
          Applications in Speech Recognition.  ‘Proceedings of the IEEE’, 77(2),
          pages 257-286, February 1989.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 38
Viterbi path of a hidden Markov model.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
hotelling_t2test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1845
 -- statistics: [H, PVAL, STATS] = hotelling_t2test (X)
 -- statistics: [...] = hotelling_t2test (X, M)
 -- statistics: [...] = hotelling_t2test (X, Y)
 -- statistics: [...] = hotelling_t2test (X, M, NAME, VALUE)
 -- statistics: [...] = hotelling_t2test (X, Y, NAME, VALUE)

     Compute Hotelling's T^2 ("T-squared") test for a single sample or two
     dependent samples (paired-samples).

     For a sample X from a multivariate normal distribution with unknown mean
     and covariance matrix, test the null hypothesis that ‘mean (X) == M’.

     For two dependent samples X and Y from a multivariate normal distributions
     with unknown means and covariance matrices, test the null hypothesis that
     ‘mean (X - Y) == 0’.

     hotelling_t2test treats NaNs as missing values, and ignores the
     corresponding rows.

     Name-Value pair arguments can be used to set statistical significance.
     "alpha" can be used to specify the significance level of the test (the
     default value is 0.05).

     If H is 1 the null hypothesis is rejected, meaning that the tested sample
     does not come from a multivariate distribution with mean M, or in case of
     two dependent samples that they do not come from the same multivariate
     distribution.  If H is 0, then the null hypothesis cannot be rejected and
     it can be assumed that it holds true.

     The p-value of the test is returned in PVAL.

     STATS is a structure containing the value of the Hotelling's T^2 test
     statistic in the field "Tsq", and the degrees of freedom of the F
     distribution in the fields "df1" and "df2".  Under the null hypothesis,
     (n-p) T^2 / (p(n-1)) has an F distribution with p and n-p degrees of
     freedom, where n and p are the numbers of samples and variables,
     respectively.

     See also: hotelling_t2test2.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Compute Hotelling's T^2 ("T-squared") test for a single sample or two depende...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 17
hotelling_t2test2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1526
 -- statistics: [H, PVAL, STATS] = hotelling_t2test2 (X, Y)
 -- statistics: [...] = hotelling_t2test2 (X, Y, NAME, VALUE)

     Compute Hotelling's T^2 ("T-squared") test for two independent samples.

     For two samples X from multivariate normal distributions with the same
     number of variables (columns), unknown means and unknown equal covariance
     matrices, test the null hypothesis ‘mean (X) == mean (Y)’.

     hotelling_t2test2 treats NaNs as missing values, and ignores the
     corresponding rows for each sample independently.

     Name-Value pair arguments can be used to set statistical significance.
     "alpha" can be used to specify the significance level of the test (the
     default value is 0.05).

     If H is 1 the null hypothesis is rejected, meaning that the tested samples
     do not come from the same multivariate distribution.  If H is 0, then the
     null hypothesis cannot be rejected and it can be assumed that both samples
     come from the same multivariate distribution.

     The p-value of the test is returned in PVAL.

     STATS is a structure containing the value of the Hotelling's T^2 test
     statistic in the field "Tsq", and the degrees of freedom of the F
     distribution in the fields "df1" and "df2".  Under the null hypothesis,

          (n_x+n_y-p-1) T^2 / (p(n_x+n_y-2))

     has an F distribution with p and n_x+n_y-p-1 degrees of freedom, where n_x
     and n_y are the sample sizes and p is the number of variables.

     See also: hotelling_t2test.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 71
Compute Hotelling's T^2 ("T-squared") test for two independent samples.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 12
inconsistent


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1039
 -- statistics: Y = inconsistent (Z)
 -- statistics: Y = inconsistent (Z, D)

     Compute the inconsistency coefficient for each link of a hierarchical
     cluster tree.

     Given a hierarchical cluster tree Z generated by the ‘linkage’ function,
     ‘inconsistent’ computes the inconsistency coefficient for each link of the
     tree, using all the links down to the D-th level below that link.

     The default depth D is 2, which means that only two levels are considered:
     the level of the computed link and the level below that.

     Each row of Y corresponds to the row of same index of Z.  The columns of Y
     are respectively: the mean of the heights of the links used for the
     calculation, the standard deviation of the heights of those links, the
     number of links used, the inconsistency coefficient.

     *Reference* Jain, A., and R. Dubes.  Algorithms for Clustering Data.  Upper
     Saddle River, NJ: Prentice-Hall, 1988.

See also: cluster, clusterdata, dendrogram, linkage, pdist, squareform.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Compute the inconsistency coefficient for each link of a hierarchical cluster...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
ismissing


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2417
 -- statistics: TF = ismissing (A)
 -- statistics: TF = ismissing (A, INDICATOR)

     Find missing data in arrays.

     ‘TF = ismissing (A)’ returns a logical array, TF, with the same dimensions
     as A, where ‘true’ values match the standard missing values in the input
     data according to their data type.

     Standard missing values and their corresponding data types are:

        • NaN - for double, single, duration, and calendarDuration arrays.
        • NaT - for datetime arrays.
        • <missing> - for string arrays.
        • <undefined> - for categorical arrays.
        • {0x0 char} - for cell arrays of character vectors.

     For any data types that do not support missing values, ‘ismissing’ returns
     ‘TF = false (size (A))’.

     Note: the generic ‘ismissing’ function from the statistics package only
     operates on core Octave datatypes and it explicitly identifies missing
     values in double and single arrays, as well as in cell arrays of character
     vectors.  All other data types are handled by the overloaded methods from
     their respective data class from the datatypes package.  Use ‘help
     class_name.ismissing’ to find more information about the functional
     specialization of their respective class implementation.

     The optional input INDICATOR can be a scalar or a vector, of the same type
     as the input data A, specifying alternative missing values in the input
     data.  When specifying INDICATOR values, the standard missing values are
     ignored, unless explicitly stated in the INDICATOR.

     Additional data type matches between INDICATOR and A are:

        • double indicators also match single, all integer types, and logical
          data in A.

        • string and char indicators also match categorical data in A.

        • char and cellstr indicators also match string data in A.

     Note: the generic ‘ismissing’ function from the statistics package only
     accepts INDICATOR argument for numeric, logical, and char arrays, as well
     as for cell arrays of character vectors.  For all other core Octave data
     types, ‘ismissing’ produces an error.  However, INDICATOR is supported for
     data classes from the datatypes package through their respective class
     implementation of overloaded methods.

     See also: fillmissing, rmmissing, standardizeMissing.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 28
Find missing data in arrays.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
isoutlier


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7852
 -- statistics: TF = isoutlier (X)
 -- statistics: TF = isoutlier (X, METHOD)
 -- statistics: TF = isoutlier (X, "percentiles", THRESHOLD)
 -- statistics: TF = isoutlier (X, MOVMETHOD, WINDOW)
 -- statistics: TF = isoutlier (..., DIM)
 -- statistics: TF = isoutlier (..., NAME, VALUE)
 -- statistics: [TF, L, U, C] = isoutlier (...)

     Find outliers in data

     ‘isoutlier (X)’ returns a logical array whose elements are true when an
     outlier is detected in the corresponding element of X.  ‘isoutlier’ treats
     NaNs as missing values and removes them.

        • If X is a matrix, then ‘isoutlier’ operates on each column of X
          separately.
        • If X is a multidimensional array, then ‘isoutlier’ operates along the
          first dimension of X whose size does not equal 1.

     By default, an outlier is a value that is more than three scaled median
     absolute deviations (MAD) from the median.  The scaled median is defined as
     ‘c*median(abs(A-median(A)))’, where ‘c=-1/(sqrt(2)*erfcinv(3/2))’.

     ‘isoutlier (X, METHOD)’ specifies a method for detecting outliers.  The
     following methods are available:

     Method        Description
     -------------------------------------------------------------------------------
     "median"      Outliers are defined as elements more than three scaled MAD
                   from the median.
     "mean"        Outliers are defined as elements more than three standard
                   deviations from the mean.
     "quartiles"   Outliers are defined as elements more than 1.5 interquartile
                   ranges above the upper quartile (75 percent) or below the
                   lower quartile (25 percent).  This method is useful when the
                   data in X is not normally distributed.
     "grubbs"      Outliers are detected using Grubbs’ test for outliers, which
                   removes one outlier per iteration based on hypothesis testing.
                   This method assumes that the data in X is normally
                   distributed.
     "gesd"        Outliers are detected using the generalized extreme
                   Studentized deviate test for outliers.  This iterative method
                   is similar to "grubbs", but can perform better when there are
                   multiple outliers masking each other.

     ‘isoutlier (X, "percentiles", THRESHOLD)’ detects outliers based on a
     percentile thresholds, specified as a two-element row vector whose elements
     are in the interval [0, 100].  The first element indicates the lower
     percentile threshold, and the second element indicates the upper percentile
     threshold.  The first element of threshold must be less than the second
     element.

     ‘isoutlier (X, MOVMETHOD, WINDOW)’ specifies a moving method for detecting
     outliers.  The following methods are available:

     Method        Description
     -------------------------------------------------------------------------------
     "movmedian"   Outliers are defined as elements more than three local scaled
                   MAD from the local median over a window length specified by
                   WINDOW.
     "movmean"     Outliers are defined as elements more than three local
                   standard deviations from the from the local mean over a window
                   length specified by WINDOW.

     WINDOW must be a positive integer scalar or a two-element vector of
     positive integers.  When WINDOW is a scalar, if it is an odd number, the
     window is centered about the current element and contains WINDOW - 1
     neighboring elements.  If even, then the window is centered about the
     current and previous elements.  When WINDOW is a two-element vector of
     positive integers [nb, na], the window contains the current element, nb
     elements before the current element, and na elements after the current
     element.  When "SamplePoints" are also specified, WINDOW can take any real
     positive values (either as a scalar or a two-element vector) and in this
     case, the windows are computed relative to the sample points.

     DIM specifies the operating dimension and it must be a positive integer
     scalar.  If not specified, then, by default, ‘isoutlier’ operates along the
     first non-singleton dimension of X.

     The following optional parameters can be specified as NAME/VALUE paired
     arguments.

        • "SamplePoints" can be specified as a vector of sample points with
          equal length as the operating dimension.  The sample points represent
          the x-axis location of the data and must be sorted and contain unique
          elements.  Sample points do not need to be uniformly sampled.  By
          default, the vector is [1, 2, 3, ..., N], where N = size (X, DIM).
          You can use unequally spaced "SamplePoints" to define a
          variable-length window for one of the moving methods available.

        • "ThresholdFactor" can be specified as a nonnegative scalar.  For
          methods "median" and "movmedian", the detection threshold factor
          replaces the number of scaled MAD, which is 3 by default.  For methods
          "mean" and "movmean", the detection threshold factor replaces the
          number of standard deviations, which is 3 by default.  For methods
          "grubbs" and "gesd", the detection threshold factor ranges from 0 to
          1, specifying the critical alpha-value of the respective test, and it
          is 0.05 by default.  For the "quartiles" method, the detection
          threshold factor replaces the number of interquartile ranges, which is
          1.5 by default.  "ThresholdFactor" is not supported for the
          "quartiles" method.

        • "MaxNumOutliers" is only relevant to the "gesd" method and it must be
          a positive integer scalar specifying the maximum number of outliers
          returned by the "gesd" method.  By default, it is the integer nearest
          to the 10% of the number of elements along the operating dimension in
          X.  The "gesd" method assumes the nonoutlier input data is sampled
          from an approximate normal distribution.  When the data is not sampled
          in this way, the number of returned outliers might exceed the
          MaxNumOutliers value.

     ‘[TF, L, U, C] = isoutlier (...)’ returns up to 4 output arguments as
     described below.

        • TF is the outlier indicator with the same size a X.

        • L is the lower threshold used by the outlier detection method.  If
          METHOD is used for outlier detection, then L has the same size as X in
          all dimensions except for the operating dimension where the length is
          1.  If MOVMETHOD is used, then L has the same size as X.

        • U is the upper threshold used by the outlier detection method.  If
          METHOD is used for outlier detection, then U has the same size as X in
          all dimensions except for the operating dimension where the length is
          1.  If MOVMETHOD is used, then U has the same size as X.

        • C is the center value used by the outlier detection method.  If METHOD
          is used for outlier detection, then C has the same size as X in all
          dimensions except for the operating dimension where the length is 1.
          If MOVMETHOD is used, then C has the same size as X.  For "median",
          "movmedian", "mean", and "movmean" methods, C is computed by taking
          into account the outlier values.  For "grubbs" and "gesd" methods, C
          is computed by excluding the outliers.  For the "percentiles" method,
          C is the average between U and L thresholds.

     See also: filloutliers, rmoutliers, ismissing.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 21
Find outliers in data



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
jackknife


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2000
 -- statistics: JACKSTAT = jackknife (E, X)
 -- statistics: JACKSTAT = jackknife (E, X, ...)

     Compute jackknife estimates of a parameter taking one or more given samples
     as parameters.

     In particular, E is the estimator to be jackknifed as a function name,
     handle, or inline function, and X is the sample for which the estimate is
     to be taken.  The I-th entry of JACKSTAT will contain the value of the
     estimator on the sample X with its I-th row omitted.

          jackstat (I) = E(X(1 : I - 1, I + 1 : length(X)))

     Depending on the number of samples to be used, the estimator must have the
     appropriate form:
        • If only one sample is used, then the estimator need not be concerned
          with cell arrays, for example jackknifing the standard deviation of a
          sample can be performed with ‘JACKSTAT = jackknife (@std, rand (100,
          1))’.
        • If, however, more than one sample is to be used, the samples must all
          be of equal size, and the estimator must address them as elements of a
          cell-array, in which they are aggregated in their order of appearance:

          JACKSTAT = jackknife (@(x) std(x{1})/var(x{2}),
          rand (100, 1), randn (100, 1))

     If all goes well, a theoretical value P for the parameter is already known,
     N is the sample size,

     ‘T = N * E(X) - (N - 1) * mean(JACKSTAT)’

     and

     ‘V = sumsq(N * E(X) - (N - 1) * JACKSTAT - T) / (N * (N - 1))’

     then

     ‘(T-P)/sqrt(V)’ should follow a t-distribution with N-1 degrees of freedom.

     Jackknifing is a well known method to reduce bias.  Further details can be
     found in:

     References
     ----------

       1. Rupert G. Miller.  The jackknife - a review.  Biometrika (1974),
          61(1):1-15.  doi:10.1093/biomet/61.1.1
       2. Rupert G. Miller.  Jackknifing Variances.  Ann.  Math.  Statist.
          (1968), Volume 39, Number 2, 567-582.  doi:10.1214/aoms/1177698418


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Compute jackknife estimates of a parameter taking one or more given samples a...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
kmeans


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7195
 -- statistics: IDX = kmeans (DATA, K)
 -- statistics: [IDX, CENTERS] = kmeans (DATA, K)
 -- statistics: [IDX, CENTERS, SUMD] = kmeans (DATA, K)
 -- statistics: [IDX, CENTERS, SUMD, DIST] = kmeans (DATA, K)
 -- statistics: [...] = kmeans (DATA, K, PARAM1, VALUE1, ...)
 -- statistics: [...] = kmeans (DATA, [], "start", START, ...)

     Perform a K-means clustering of the NxD matrix DATA.

     If parameter "start" is specified, then K may be empty in which case K is
     set to the number of rows of START.

     The outputs are:

     IDX               An Nx1 vector whose i-th element is the class to which row i
                       of DATA is assigned.
                       
     CENTERS           A KxD array whose i-th row is the centroid of cluster i.
                       
     SUMD              A kx1 vector whose i-th entry is the sum of the distances from
                       samples in cluster i to centroid i.
                       
     DIST              An Nxk matrix whose ij-th element is the distance from sample
                       i to centroid j.

     The following parameters may be placed in any order.  Each parameter must
     be followed by its value, as in Name-Value pairs.

     Name            Description
     -----------------------------------------------------------------------------------
     "Start"         The initialization method for the centroids.

         Value              Description
     ------------------------------------------------------------------------------------
         "plus"             The k-means++ algorithm.  (Default)
         "sample"           A subset of k rows from DATA, sampled uniformly without
                            replacement.
         "cluster"          Perform a pilot clustering on 10% of the rows of DATA.
         "uniform"          Each component of each centroid is drawn uniformly from
                            the interval between the maximum and minimum values of
                            that component within DATA.  This performs poorly and is
                            implemented only for Matlab compatibility.
         NUMERIC            A kxD matrix of centroid starting locations.  The rows
         MATRIX             correspond to seeds.
         NUMERIC ARRAY      A kxDxr array of centroid starting locations.  The third
                            dimension invokes replication of the clustering routine.
                            Page r contains the set of seeds for replicate r.  kmeans
                            infers the number of replicates (specified by the
                            "Replicates" Name-Value pair argument) from the size of
                            the third dimension.

     Name            Description
     ------------------------------------------------------------------------------------
     "Distance"      The distance measure used for partitioning and calculating
                     centroids.

         Value              Description
     ------------------------------------------------------------------------------------
         "sqeuclidean"      The squared Euclidean distance.  i.e.  the sum of the
                            squares of the differences between corresponding
                            components.  In this case, the centroid is the arithmetic
                            mean of all samples in its cluster.  This is the only
                            distance for which this algorithm is truly "k-means".
         "cityblock"        The sum metric, or L1 distance, i.e.  the sum of the
                            absolute differences between corresponding components.  In
                            this case, the centroid is the median of all samples in
                            its cluster.  This gives the k-medians algorithm.
         "cosine"           One minus the cosine of the included angle between points
                            (treated as vectors).  Each centroid is the mean of the
                            points in that cluster, after normalizing those points to
                            unit Euclidean length.
         "correlation"      One minus the sample correlation between points (treated
                            as sequences of values).  Each centroid is the
                            component-wise mean of the points in that cluster, after
                            centering and normalizing those points to zero mean and
                            unit standard deviation.
         "hamming"          The number of components in which the sample and the
                            centroid differ.  In this case, the centroid is the median
                            of all samples in its cluster.  Unlike Matlab, Octave
                            allows non-logical DATA.

     Name            Description
     ------------------------------------------------------------------------------------
     "EmptyAction"   What to do when a centroid is not the closest to any data sample.

         Value              Description
     ------------------------------------------------------------------------------------
         "error"            Throw an error.
         "singleton"        (Default) Select the row of DATA that has the highest
                            error and use that as the new centroid.
         "drop"             Remove the centroid, and continue computation with one
                            fewer centroid.  The dimensions of the outputs CENTROIDS
                            and D are unchanged, with values for omitted centroids
                            replaced by NaN.

     Name            Description
     ------------------------------------------------------------------------------------
     "Display"       Display a text summary.

         Value              Description
     ------------------------------------------------------------------------------------
         "off"              (Default) Display no summary.
         "final"            Display a summary for each clustering operation.
         "iter"             Display a summary for each iteration of a clustering
                            operation.

     Name            Value
     ------------------------------------------------------------------------------------
     "Replicates"    A positive integer specifying the number of independent
                     clusterings to perform.  The output values are the values for the
                     best clustering, i.e., the one with the smallest value of SUMD.
                     If START is numeric, then REPLICATES defaults to (and must equal)
                     the size of the third dimension of START.  Otherwise it defaults
                     to 1.
     "MaxIter"       The maximum number of iterations to perform for each replicate.
                     If the maximum change of any centroid is less than 0.001, then
                     the replicate terminates even if MAXITER iterations have no
                     occurred.  The default is 100.

     Example:

     [~,c] = kmeans (rand(10, 3), 2, "emptyaction", "singleton");

     See also: linkage.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 52
Perform a K-means clustering of the NxD matrix DATA.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
knnsearch


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6966
 -- statistics: IDX = knnsearch (X, Y)
 -- statistics: [IDX, D] = knnsearch (X, Y)
 -- statistics: [...] = knnsearch (..., NAME, VALUE)

     Find k-nearest neighbors from input data.

     ‘IDX = knnsearch (X, Y)’ finds K nearest neighbors in X for Y.  It returns
     IDX which contains indices of K nearest neighbors of each row of Y, If not
     specified, K = 1.  X must be an NxP numeric matrix of input data, where
     rows correspond to observations and columns correspond to features or
     variables.  Y is an MxP numeric matrix with query points, which must have
     the same numbers of column as X.

     ‘[IDX, D] = knnsearch (X, Y)’ also returns the the distances, D, which
     correspond to the K nearest neighbour in X for each Y

     Additional parameters can be specified by Name-Value pair arguments.

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "K"               is the number of nearest neighbors to be found in the kNN
                       search.  It must be a positive integer value and by default it
                       is 1.
                       
     "P"               is the Minkowski distance exponent and it must be a positive
                       scalar.  This argument is only valid when the selected
                       distance metric is "minkowski".  By default it is 2.
                       
     "Scale"           is the scale parameter for the standardized Euclidean distance
                       and it must be a nonnegative numeric vector of equal length to
                       the number of columns in X.  This argument is only valid when
                       the selected distance metric is "seuclidean", in which case
                       each coordinate of X is scaled by the corresponding element of
                       "scale", as is each query point in Y.  By default, the scale
                       parameter is the standard deviation of each coordinate in X.
                       
     "Cov"             is the covariance matrix for computing the mahalanobis
                       distance and it must be a positive definite matrix matching
                       the the number of columns in X.  This argument is only valid
                       when the selected distance metric is "mahalanobis".
                       
     "BucketSize"      is the maximum number of data points in the leaf node of the
                       Kd-tree and it must be a positive integer.  This argument is
                       only valid when the selected search method is "kdtree".
                       
     "SortIndices"     is a boolean flag to sort the returned indices in ascending
                       order by distance and it is true by default.  When the
                       selected search method is "exhaustive" or the "IncludeTies"
                       flag is true, ‘knnsearch’ always sorts the returned indices.
                       
     "Distance"        is the distance metric used by ‘knnsearch’ as specified below:

          "euclidean"      Euclidean distance.
          "seuclidean"     standardized Euclidean distance.  Each coordinate
                           difference between the rows in X and the query matrix Y is
                           scaled by dividing by the corresponding element of the
                           standard deviation computed from X.  To specify a
                           different scaling, use the "Scale" name-value argument.
          "cityblock"      City block distance.
          "chebychev"      Chebychev distance (maximum coordinate difference).
          "minkowski"      Minkowski distance.  The default exponent is 2.  To
                           specify a different exponent, use the "P" name-value
                           argument.
          "mahalanobis"    Mahalanobis distance, computed using a positive definite
                           covariance matrix.  To change the value of the covariance
                           matrix, use the "Cov" name-value argument.
          "cosine"         Cosine distance.
          "correlation"    One minus the sample linear correlation between
                           observations (treated as sequences of values).
          "spearman"       One minus the sample Spearman's rank correlation between
                           observations (treated as sequences of values).
          "hamming"        Hamming distance, which is the percentage of coordinates
                           that differ.
          "jaccard"        One minus the Jaccard coefficient, which is the percentage
                           of nonzero coordinates that differ.
          @DISTFUN         Custom distance function handle.  A distance function of
                           the form ‘function D2 = distfun (XI, YI)’, where XI is a
                           1xP vector containing a single observation in
                           P-dimensional space, YI is an NxP matrix containing an
                           arbitrary number of observations in the same P-dimensional
                           space, and D2 is an NxP vector of distances, where (D2k)
                           is the distance between observations XI and (YIk,:).

     "NSMethod"        is the nearest neighbor search method used by ‘knnsearch’ as
                       specified below.

          "kdtree"         Creates and uses a Kd-tree to find nearest neighbors.
                           "kdtree" is the default value when the number of columns
                           in X is less than or equal to 10, X is not sparse, and the
                           distance metric is "euclidean", "cityblock", "manhattan",
                           "chebychev", or "minkowski".  Otherwise, the default value
                           is "exhaustive".  This argument is only valid when the
                           distance metric is one of the four aforementioned metrics.
          "exhaustive"     Uses the exhaustive search algorithm by computing the
                           distance values from all the points in X to each point in
                           Y.

     "IncludeTies"     is a boolean flag to indicate if the returned values should
                       contain the indices that have same distance as the K^th
                       neighbor.  When false, ‘knnsearch’ chooses the observation
                       with the smallest index among the observations that have the
                       same distance from a query point.  When true, ‘knnsearch’
                       includes all nearest neighbors whose distances are equal to
                       the K^th smallest distance in the output arguments.  To
                       specify K, use the "K" name-value pair argument.

     See also: rangesearch, pdist2, fitcknn.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 41
Find k-nearest neighbors from input data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 13
kruskalwallis


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2529
 -- statistics: P = kruskalwallis (X)
 -- statistics: P = kruskalwallis (X, GROUP)
 -- statistics: P = kruskalwallis (X, GROUP, DISPLAYOPT)
 -- statistics: [P, TBL] = kruskalwallis (X, ...)
 -- statistics: [P, TBL, STATS] = kruskalwallis (X, ...)

     Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way
     analysis of variance (ANOVA), for comparing the means of two or more groups
     of data under the null hypothesis that the groups are drawn from the same
     population, i.e.  the group means are equal.

     kruskalwallis can take up to three input arguments:

        • X contains the data and it can either be a vector or matrix.  If X is
          a matrix, then each column is treated as a separate group.  If X is a
          vector, then the GROUP argument is mandatory.
        • GROUP contains the names for each group.  If X is a matrix, then GROUP
          can either be a cell array of strings of a character array, with one
          row per column of X.  If you want to omit this argument, enter an
          empty array ([]).  If X is a vector, then GROUP must be a vector of
          the same length, or a string array or cell array of strings with one
          row for each element of X.  X values corresponding to the same value
          of GROUP are placed in the same group.
        • DISPLAYOPT is an optional parameter for displaying the groups
          contained in the data in a boxplot.  If omitted, it is 'on' by
          default.  If group names are defined in GROUP, these are used to
          identify the groups in the boxplot.  Use 'off' to omit displaying this
          figure.

     kruskalwallis can return up to three output arguments:

        • P is the p-value of the null hypothesis that all group means are
          equal.
        • TBL is a cell array containing the results in a standard ANOVA table.
        • STATS is a structure containing statistics useful for performing a
          multiple comparison of means with the MULTCOMPARE function.

     If kruskalwallis is called without any output arguments, then it prints the
     results in a one-way ANOVA table to the standard output.  It is also
     printed when DISPLAYOPT is 'on'.

     Examples:

          x = meshgrid (1:6);
          x = x + normrnd (0, 1, 6, 6);
          [p, atab] = kruskalwallis(x);

          x = ones (50, 4) .* [-2, 0, 1, 5];
          x = x + normrnd (0, 2, 50, 4);
          group = {"A", "B", "C", "D"};
          kruskalwallis (x, group);


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way
an...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
kstest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4344
 -- statistics: H = kstest (X)
 -- statistics: H = kstest (X, NAME, VALUE)
 -- statistics: [H, P] = kstest (...)
 -- statistics: [H, P, KSSTAT, CV] = kstest (...)

     Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.

     ‘H = kstest (X)’ performs a Kolmogorov-Smirnov (K-S) test to determine if a
     random sample X could have come from a standard normal distribution.  H
     indicates the results of the null hypothesis test.

        • H = 0 => Do not reject the null hypothesis at the 5% significance
        • H = 1 => Reject the null hypothesis at the 5% significance

     X is a vector representing a random sample from some unknown distribution
     with a cumulative distribution function F(X). Missing values declared as
     NaNs in X are ignored.

     ‘H = kstest (X, NAME, VALUE)’ returns a test decision for a single-sample
     K-S test with additional options specified by one or more NAME-VALUE pair
     arguments as shown below.

     Name              Value
     -----------------------------------------------------------------------------------
     "alpha"           A numeric scalar between 0 and 1 specifying th the
                       significance level.  Default is 0.05 for 5% significance.
                       
     "CDF"             The hypothesized CDF under the null hypothesis.  It can be
                       specified as a function handle of an existing cdf function, a
                       character vector defining a probability distribution with
                       default parameters, a probability distribution object, or a
                       two-column matrix.  If not provided, the default is the
                       standard normal, N(0,1).  The one-sample Kolmogorov-Smirnov
                       test is only valid for continuous cumulative distribution
                       functions, and requires the CDF to be predetermined.  The
                       result is not accurate if CDF is estimated from the data.
                       
     "tail"            A string indicating the type of test:
                      "unequal"         "F(X) not equal to CDF(X)" (two-sided)
                                        (Default)
                                        
                      "larger"          "F(X) > CDF(X)" (one-sided)
                                        
                      "smaller"         "F(X) < CDF(X)" (one-sided)

     Let S(X) be the empirical c.d.f.  estimated from the sample vector X, F(X)
     be the corresponding true (but unknown) population c.d.f., and CDF be the
     known input c.d.f.  specified under the null hypothesis.  For ‘tail’ =
     "unequal", "larger", and "smaller", the test statistics are max|S(X) -
     CDF(X)|, max[S(X) - CDF(X)], and max[CDF(X) - S(X)], respectively.

     ‘[H, P] = kstest (...)’ also returns the asymptotic p-value P.

     ‘[H, P, KSSTAT] = kstest (...)’ returns the K-S test statistic KSSTAT
     defined above for the test type indicated by the "tail" option

     In the matrix version of CDF, column 1 contains the x-axis data and column
     2 the corresponding y-axis c.d.f data.  Since the K-S test statistic will
     occur at one of the observations in X, the calculation is most efficient
     when CDF is only specified at the observations in X.  When column 1 of CDF
     represents x-axis points independent of X, CDF is linearly interpolated at
     the observations found in the vector X.  In this case, the interval along
     the x-axis (the column 1 spread of CDF) must span the observations in X for
     successful interpolation.

     The decision to reject the null hypothesis is based on comparing the
     p-value P with the "alpha" value, not by comparing the statistic KSSTAT
     with the critical value CV.  CV is computed separately using an approximate
     formula or by interpolation using Miller's approximation table.  The
     formula and table cover the range 0.01 <= "alpha" <= 0.2 for two-sided
     tests and 0.005 <= "alpha" <= 0.1 for one-sided tests.  CV is returned as
     NaN if "alpha" is outside this range.  Since CV is approximate, a
     comparison of KSSTAT with CV may occasionally lead to a different
     conclusion than a comparison of P with "alpha".

     See also: kstest2, cdfplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 71
Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
kstest2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2255
 -- statistics: H = kstest2 (X1, X2)
 -- statistics: H = kstest2 (X1, X2, NAME, VALUE)
 -- statistics: [H, P] = kstest2 (...)
 -- statistics: [H, P, KS2STAT] = kstest2 (...)

     Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.

     ‘H = kstest2 (X1, X2)’ returns a test decision for the null hypothesis that
     the data in vectors X1 and X2 are from the same continuous distribution,
     using the two-sample Kolmogorov-Smirnov test.  The alternative hypothesis
     is that X1 and X2 are from different continuous distributions.  The result
     H is 1 if the test rejects the null hypothesis at the 5% significance
     level, and 0 otherwise.

     ‘H = kstest2 (X1, X2, NAME, VALUE)’ returns a test decision for a
     two-sample Kolmogorov-Smirnov test with additional options specified by one
     or more name-value pair arguments as shown below.

     "alpha"          A value ALPHA between 0 and 1 specifying the significance
                      level.  Default is 0.05 for 5% significance.
                      
     "tail"           A string indicating the type of test:

        "unequal"        "F(X1) not equal to F(X2)" (two-sided) [Default]
                         
        "larger"         "F(X1) > F(X2)" (one-sided)
                         
        "smaller"        "F(X1) < F(X2)" (one-sided)

     The two-sided test uses the maximum absolute difference between the cdfs of
     the distributions of the two data vectors.  The test statistic is ‘D* =
     max(|F1(x) - F2(x)|)’, where F1(x) is the proportion of X1 values less or
     equal to x and F2(x) is the proportion of X2 values less than or equal to
     x.  The one-sided test uses the actual value of the difference between the
     cdfs of the distributions of the two data vectors rather than the absolute
     value.  The test statistic is ‘D* = max(F1(x) - F2(x))’ or ‘D* = max(F2(x)
     - F1(x))’ for ‘tail’ = "larger" or "smaller", respectively.

     ‘[H, P] = kstest2 (...)’ also returns the asymptotic p-value P.

     ‘[H, P, KS2STAT] = kstest2 (...)’ also returns the Kolmogorov-Smirnov test
     statistic KS2STAT defined above for the test type indicated by ‘tail’.

     See also: kstest, cdfplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 62
Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
levene_test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2672
 -- statistics: H = levene_test (X)
 -- statistics: H = levene_test (X, GROUP)
 -- statistics: H = levene_test (X, ALPHA)
 -- statistics: H = levene_test (X, TESTTYPE)
 -- statistics: H = levene_test (X, GROUP, ALPHA)
 -- statistics: H = levene_test (X, GROUP, TESTTYPE)
 -- statistics: H = levene_test (X, GROUP, ALPHA, TESTTYPE)
 -- statistics: [H, PVAL] = levene_test (...)
 -- statistics: [H, PVAL, W] = levene_test (...)
 -- statistics: [H, PVAL, W, DF] = levene_test (...)

     Perform a Levene's test for the homogeneity of variances.

     Under the null hypothesis of equal variances, the test statistic W
     approximately follows an F distribution with DF degrees of freedom being a
     vector ([k-1, N-k]).

     The p-value (1 minus the CDF of this distribution at W) is returned in
     PVAL.  H = 1 if the null hypothesis is rejected at the significance level
     of ALPHA.  Otherwise H = 0.

     Input Arguments:

        • X contains the data and it can either be a vector or matrix.  If X is
          a matrix, then each column is treated as a separate group.  If X is a
          vector, then the GROUP argument is mandatory.  NaN values are omitted.

        • GROUP contains the names for each group.  If X is a vector, then GROUP
          must be a vector of the same length, or a string array or cell array
          of strings with one row for each element of X.  X values corresponding
          to the same value of GROUP are placed in the same group.  If X is a
          matrix, then GROUP can either be a cell array of strings of a
          character array, with one row per column of X in the same way it is
          used in ‘anova1’ function.  If X is a matrix, then GROUP can be
          omitted either by entering an empty array ([]) or by parsing only
          ALPHA as a second argument (if required to change its default value).

        • ALPHA is the statistical significance value at which the null
          hypothesis is rejected.  Its default value is 0.05 and it can be
          parsed either as a second argument (when GROUP is omitted) or as a
          third argument.

        • TESTTYPE is a string determining the type of Levene's test.  By
          default it is set to "absolute", but the user can also parse
          "quadratic" in order to perform Levene's Quadratic test for equal
          variances or "median" in order to to perform the Brown-Forsythe's
          test.  These options determine how the Z_ij values are computed.  If
          an invalid name is parsed for TESTTYPE, then the Levene's Absolute
          test is performed.

     See also: bartlett_test, vartest2, vartestn.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Perform a Levene's test for the homogeneity of variances.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
linkage


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3010
 -- statistics: Y = linkage (D)
 -- statistics: Y = linkage (D, METHOD)
 -- statistics: Y = linkage (X)
 -- statistics: Y = linkage (X, METHOD)
 -- statistics: Y = linkage (X, METHOD, METRIC)
 -- statistics: Y = linkage (X, METHOD, ARGLIST)

     Produce a hierarchical clustering dendrogram.

     D is the dissimilarity matrix relative to n observations, formatted as a
     (n-1)*n/2x1 vector as produced by ‘pdist’.  Alternatively, X contains data
     formatted for input to ‘pdist’, METRIC is a metric for ‘pdist’ and ARGLIST
     is a cell array containing arguments that are passed to ‘pdist’.

     ‘linkage’ starts by putting each observation into a singleton cluster and
     numbering those from 1 to n.  Then it merges two clusters, chosen according
     to METHOD, to create a new cluster numbered n+1, and so on until all
     observations are grouped into a single cluster numbered 2(n-1).  Row k of
     the (m-1)x3 output matrix relates to cluster n+k: the first two columns are
     the numbers of the two component clusters and column 3 contains their
     distance.

     METHOD defines the way the distance between two clusters is computed and
     how they are recomputed when two clusters are merged:

     ‘"single" (default)’
          Distance between two clusters is the minimum distance between two
          elements belonging each to one cluster.  Produces a cluster tree known
          as minimum spanning tree.

     ‘"complete"’
          Furthest distance between two elements belonging each to one cluster.

     ‘"average"’
          Unweighted pair group method with averaging (UPGMA). The mean distance
          between all pair of elements each belonging to one cluster.

     ‘"weighted"’
          Weighted pair group method with averaging (WPGMA). When two clusters A
          and B are joined together, the new distance to a cluster C is the mean
          between distances A-C and B-C.

     ‘"centroid"’
          Unweighted Pair-Group Method using Centroids (UPGMC). Assumes
          Euclidean metric.  The distance between cluster centroids, each
          centroid being the center of mass of a cluster.

     ‘"median"’
          Weighted pair-group method using centroids (WPGMC). Assumes Euclidean
          metric.  Distance between cluster centroids.  When two clusters are
          joined together, the new centroid is the midpoint between the joined
          centroids.

     ‘"ward"’
          Ward's sum of squared deviations about the group mean (ESS). Also
          known as minimum variance or inner squared distance.  Assumes
          Euclidean metric.  How much the moment of inertia of the merged
          cluster exceeds the sum of those of the individual clusters.

     *Reference* Ward, J. H. Hierarchical Grouping to Optimize an Objective
     Function J. Am.  Statist.  Assoc.  1963, 58, 236-244,
     <http://iv.slis.indiana.edu/sw/data/ward.pdf>.

     See also: pdist,squareform.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 45
Produce a hierarchical clustering dendrogram.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
loadmodel


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 435
 -- ClassificationSVM: OBJ = loadmodel (FILENAME)

     Load a Classification or Regression model from a file.

     ‘OBJ = loadmodel (FILENAME)’ loads a Classification or Regression object,
     OBJ, from a file defined in FILENAME.

     See also: savemodel, ClassificationDiscriminant, ClassificationGAM,
     ClassificationKNN, ClassificationNeuralNetwork,
     ClassificationPartitionedModel, ClassificationSVM, RegressionGAM.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Load a Classification or Regression model from a file.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 19
logistic_regression


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2217
 -- statistics: [INTERCEPT, SLOPE, DEV, DL, D2L, P, STATS] = logistic_regression
          (Y, X, PRINT, INTERCEPT, SLOPE)

     Perform ordinal logistic regression.

     Suppose Y takes values in k ordered categories, and let ‘P_i (X)’ be the
     cumulative probability that Y falls in one of the first i categories given
     the covariate X.  Then

          [INTERCEPT, SLOPE] = logistic_regression (Y, X)

     fits the model

          logit (P_i (X)) = X * SLOPE + INTERCEPT_i,   i = 1 ... k-1

     The number of ordinal categories, k, is taken to be the number of distinct
     values of ‘round (Y)’.  If k equals 2, Y is binary and the model is
     ordinary logistic regression.  The matrix X is assumed to have full column
     rank.

     Given Y only, ‘INTERCEPT = logistic_regression (Y)’ fits the model with
     baseline logit odds only.

     The full form is

          [INTERCEPT, SLOPE, DEV, DL, D2L, P, STATS]
             = logistic_regression (Y, X, PRINT, INTERCEPT, SLOPE)

     in which all output arguments and all input arguments except Y are
     optional.

     Setting PRINT to 1 requests summary information about the fitted model to
     be displayed.  Setting PRINT to 2 requests information about convergence at
     each iteration.  Other values request no information to be displayed.  The
     input arguments INTERCEPT and SLOPE give initial estimates for INTERCEPT
     and SLOPE.

     The returned value DEV holds minus twice the log-likelihood.

     The returned values DL and D2L are the vector of first and the matrix of
     second derivatives of the log-likelihood with respect to INTERCEPT and
     SLOPE.

     P holds estimates for the conditional distribution of Y given X.

     STATS returns a structure that contains the following fields:
        • "intercept": intercept coefficients
        • "slope": slope coefficients
        • "coeff": regression coefficients (intercepts and slops)
        • "covb": estimated covariance matrix for coefficients (coeff)
        • "coeffcorr": correlation matrix for coeff
        • "se": standard errors of the coeff
        • "z": z statistics for coeff
        • "pval": p-values for coeff


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 36
Perform ordinal logistic regression.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
logit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 175
 -- statistics: X = logit (P)

     Compute the logit for each value of P

     The logit is defined as

          logit (P) = log (P / (1-P))

     See also: probit, logicdf.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 37
Compute the logit for each value of P



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
mahal


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 493
 -- statistics: D = mahal (Y, X)

     Mahalanobis' D-square distance.

     Return the Mahalanobis' D-square distance of the points in Y from the
     distribution implied by points X.

     Specifically, it uses a Cholesky decomposition to set

           answer(i) = (Y(i,:) - mean (X)) * inv (A) * (Y(i,:)-mean (X))'

     where A is the covariance of X.

     The data X and Y must have the same number of components (columns), but may
     have a different number of observations (rows).


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 31
Mahalanobis' D-square distance.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
makima


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1778
 -- statistics: YI = makima (X, Y, XQ)
 -- statistics: YI = makima (Y, XQ)
 -- statistics: YI = makima (..., "extrap")

     Compute the 1-D Modified Akima piecewise cubic Hermite interpolant of
     sample data X and Y.

     The Modified Akima (MAKIMA) algorithm generates a shape-preserving
     piecewise cubic interpolant.  It differs from standard splines by avoiding
     excessive local undulations and overshoots, and it connects collinear
     points (flat regions) with straight lines.  It is particularly well-suited
     for oscillatory data where ‘pchip’ might aggressively flatten local
     extrema.

     The sample points X must be a vector of unique values.  If X is not sorted,
     the function will automatically sort it and rearrange Y accordingly.

     The sample values Y can be a scalar, vector, or an N-dimensional array.  If
     Y is an N-dimensional array, the interpolation is performed along its last
     dimension, which must have the same length as X.  Complex values for Y are
     supported.

     If query points XQ are provided, the function evaluates the interpolant and
     returns the interpolated values YI.  By default, ‘makima’ uses the boundary
     polynomials to extrapolate for points outside the range of X.  The optional
     string argument "extrap" is accepted for compatibility with other
     interpolation functions.

     If only X and Y are provided, the function returns a piecewise polynomial
     structure PP that represents the interpolant.  This structure can be
     evaluated later at specific query points using ‘ppval’.

     Evaluating the interpolant at query points outside the domain of X
     automatically extrapolates using the boundary polynomials.

     See also: interp1, pchip, spline.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Compute the 1-D Modified Akima piecewise cubic Hermite interpolant of sample
...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
manova1


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3054
 -- statistics: D = manova1 (X, GROUP)
 -- statistics: D = manova1 (X, GROUP, ALPHA)
 -- statistics: [D, P] = manova1 (...)
 -- statistics: [D, P, STATS] = manova1 (...)

     One-way multivariate analysis of variance (MANOVA).

     ‘D = manova1 (X, GROUP, ALPHA)’ performs a one-way MANOVA for comparing the
     mean vectors of two or more groups of multivariate data.

     X is a matrix with each row representing a multivariate observation, and
     each column representing a variable.

     GROUP is a numeric vector, string array, or cell array of strings with the
     same number of rows as X.  X values are in the same group if they
     correspond to the same value of GROUP.

     ALPHA is the scalar significance level and is 0.05 by default.

     D is an estimate of the dimension of the group means.  It is the smallest
     dimension such that a test of the hypothesis that the means lie on a space
     of that dimension is not rejected.  If D = 0 for example, we cannot reject
     the hypothesis that the means are the same.  If D = 1, we reject the
     hypothesis that the means are the same but we cannot reject the hypothesis
     that they lie on a line.

     ‘[D, P] = manova1 (...)’ returns P, a vector of p-values for testing the
     null hypothesis that the mean vectors of the groups lie on various
     dimensions.  P(1) is the p-value for a test of dimension 0, P(2) for
     dimension 1, etc.

     ‘[D, P, STATS] = manova1 (...)’ returns a STATS structure with the
     following fields:

          "W"              within-group sum of squares and products matrix
          "B"              between-group sum of squares and products matrix
          "T"              total sum of squares and products matrix
          "dfW"            degrees of freedom for WSSP matrix
          "dfB"            degrees of freedom for BSSP matrix
          "dfT"            degrees of freedom for TSSP matrix
          "lambda"         value of Wilk's lambda (the test statistic)
          "chisq"          transformation of lambda to a chi-square distribution
          "chisqdf"        degrees of freedom for chisq
          "eigenval"       eigenvalues of (WSSP^-1) * BSSP
          "eigenvec"       eigenvectors of (WSSP^-1) * BSSP; these are the
                           coefficients for canonical variables, and they are scaled
                           so the within-group variance of C is 1
          "canon"          canonical variables, equal to XC*eigenvec, where XC is X
                           with columns centered by subtracting their means
          "mdist"          Mahalanobis distance from each point to its group mean
          "gmdist"         Mahalanobis distances between each pair of group means
          "gnames"         Group names

     The canonical variables C have the property that C(:,1) is the linear
     combination of the X columns that has the maximum separation between
     groups, C(:,2) has the maximum separation subject to it being orthogonal to
     C(:,1), and so on.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 51
One-way multivariate analysis of variance (MANOVA).



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 13
manovacluster


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 984
 -- statistics: manovacluster (STATS)
 -- statistics: manovacluster (STATS, METHOD)
 -- statistics: H = manovacluster (STATS)
 -- statistics: H = manovacluster (STATS, METHOD)

     Cluster group means using manova1 output.

     ‘manovacluster (STATS)’ draws a dendrogram showing the clustering of group
     means, calculated using the output STATS structure from ‘manova1’ and
     applying the single linkage algorithm.  See the ‘dendrogram’ function for
     more information about the figure.

     ‘manovacluster (STATS, METHOD)’ uses the METHOD algorithm in place of
     single linkage.  The available methods are:

          "single"         -- nearest distance
          "complete"       -- furthest distance
          "average"        -- average distance
          "centroid"       -- center of mass distance
          "ward"           -- inner squared distance

     ‘H = manovacluster (...)’ returns a vector of line handles.

     See also: manova1.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 41
Cluster group means using manova1 output.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 12
mcnemar_test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1799
 -- statistics: [H, PVAL, CHISQ] = mcnemar_test (X)
 -- statistics: [H, PVAL, CHISQ] = mcnemar_test (X, ALPHA)
 -- statistics: [H, PVAL, CHISQ] = mcnemar_test (X, TESTTYPE)
 -- statistics: [H, PVAL, CHISQ] = mcnemar_test (X, ALPHA, TESTTYPE)

     Perform a McNemar's test on paired nominal data.

     McNemar's test is applied to a 2x2 contingency table X with a dichotomous
     trait, with matched pairs of subjects, of data cross-classified on the row
     and column variables to testing the null hypothesis of symmetry of the
     classification probabilities.  More formally, the null hypothesis of
     marginal homogeneity states that the two marginal probabilities for each
     outcome are the same.

     Under the null, with a sufficiently large number of discordants (X(1,2) +
     X(2,1) >= 25), the test statistic, CHISQ, follows a chi-squared
     distribution with 1 degree of freedom.  When the number of discordants is
     less than 25, then the mid-P exact McNemar test is used.

     TESTTYPE will force ‘mcnemar_test’ to apply a particular method for testing
     the null hypothesis independently of the number of discordants.  Valid
     options for TESTTYPE:
        • "asymptotic" Original McNemar test statistic
        • "corrected" Edwards' version with continuity correction
        • "exact" An exact binomial test
        • "mid-p" The mid-P McNemar test (mid-p binomial test)

     The test decision is returned in H, which is 1 when the null hypothesis is
     rejected (PVAL < ALPHA) or 0 otherwise.  ALPHA defines the critical value
     of statistical significance for the test.

     Further information about the McNemar's test can be found at
     <https://en.wikipedia.org/wiki/McNemar%27s_test>

     See also: crosstab, chi2test, fishertest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 48
Perform a McNemar's test on paired nominal data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
mhsample


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3370
 -- statistics: [SMPL, ACCEPT] = mhsample (START, NSAMPLES, PROPERTY, VALUE,
          ...)

     Draws NSAMPLES samples from a target stationary distribution PDF using
     Metropolis-Hastings algorithm.

     Inputs:

        • START is a NCHAIN by DIM matrix of starting points for each Markov
          chain.  Each row is the starting point of a different chain and each
          column corresponds to a different dimension.

        • NSAMPLES is the number of samples, the length of each Markov chain.

     Some property-value pairs can or must be specified, they are:

     (Required) One of:

        • "pdf" PDF: a function handle of the target stationary distribution to
          be sampled.  The function should accept different locations in each
          row and each column corresponds to a different dimension.

          or

        • "logpdf" LOGPDF: a function handle of the log of the target stationary
          distribution to be sampled.  The function should accept different
          locations in each row and each column corresponds to a different
          dimension.

     In case optional argument SYMMETRIC is set to false (the default), one of:

        • "proppdf" PROPPDF: a function handle of the proposal distribution that
          is sampled from with PROPRND to give the next point in the chain.  The
          function should accept two inputs, the random variable and the current
          location each input should accept different locations in each row and
          each column corresponds to a different dimension.

          or

        • "logproppdf" LOGPROPPDF: the log of "proppdf".

     The following input property/pair values may be needed depending on the
     desired output:

        • "proprnd" PROPRND: (Required) a function handle which generates random
          numbers from PROPPDF.  The function should accept different locations
          in each row and each column corresponds to a different dimension
          corresponding with the current location.

        • "symmetric" SYMMETRIC: true or false based on whether PROPPDF is a
          symmetric distribution.  If true, PROPPDF (or LOGPROPPDF) need not be
          specified.  The default is false.

        • "burnin" BURNIN the number of points to discard at the beginning, the
          default is 0.

        • "thin" THIN: omits THIN-1 of every THIN points in the generated Markov
          chain.  The default is 1.

        • "nchain" NCHAIN: the number of Markov chains to generate.  The default
          is 1.

     Outputs:

        • SMPL: a NSAMPLES x DIM x NCHAIN tensor of random values drawn from
          PDF, where the rows are different random values, the columns
          correspond to the dimensions of PDF, and the third dimension
          corresponds to different Markov chains.

        • ACCEPT is a vector of the acceptance rate for each chain.

     Example : Sampling from a normal distribution

          start = 1;
          nsamples = 1e3;
          pdf = @(x) exp (-.5 * x .^ 2) / (pi ^ .5 * 2 ^ .5);
          proppdf = @(x,y) 1 / 6;
          proprnd = @(x) 6 * (rand (size (x)) - .5) + x;
          [smpl, accept] = mhsample (start, nsamples, "pdf", pdf, "proppdf", ...
          proppdf, "proprnd", proprnd, "thin", 4);
          histfit (smpl);

     See also: rand, slicesample.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Draws NSAMPLES samples from a target stationary distribution PDF using
Metrop...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
mnrfit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2486
 -- statistics: B = mnrfit (X, Y)
 -- statistics: B = mnrfit (X, Y, NAME, VALUE)
 -- statistics: [B, DEV] = mnrfit (...)
 -- statistics: [B, DEV, STATS] = mnrfit (...)

     Perform logistic regression for binomial responses or multiple ordinal
     responses.

     Note: This function is currently a wrapper for the ‘logistic_regression’
     function.  It can only be used for fitting an ordinal logistic model and a
     nominal model with 2 categories (which is an ordinal case).  Hierarchical
     models as well as nominal model with more than two classes are not
     currently supported.  This function is a work in progress.

     ‘B = mnrfit (X, Y)’ returns a matrix, B, of coefficient estimates for a
     multinomial logistic regression of the nominal responses in Y on the
     predictors in X.  X is an NxP numeric matrix the observations on predictor
     variables, where N corresponds to the number of observations and P
     corresponds to predictor variables.  Y contains the response category
     labels and it either be an NxP categorical or numerical matrix (containing
     only 1s and 0s) or an Nx1 numeric vector with positive integer values, a
     cell array of character vectors and a logical vector.  Y can also be
     defined as a character matrix with each row corresponding to an observation
     of X.

     ‘B = mnrfit (X, Y, NAME, VALUE)’ returns a matrix, B, of coefficient
     estimates for a multinomial model fit with additional parameters specified
     Name-Value pair arguments.

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "model"           Specifies the type of model to fit.  Currently, only "ordinal"
                       is fully supported.  "nominal" is only supported for 2 classes
                       in Y.
                       
     "display"         A flag to enable/disable displaying information about the
                       fitted model.  Default is "off".

     ‘[B, DEV, STATS] = mnrfit (...’ also returns the deviance of the fit, DEV,
     and the structure STATS for any of the previous input arguments.  STATS
     currently only returns values for the fields "beta", same as B,
     "coeffcorr", the estimated correlation matrix for B, "covd", the estimated
     covariance matrix for B, and "se", the standard errors of the coefficient
     estimates B.

     See also: logistic_regression.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform logistic regression for binomial responses or multiple ordinal
respon...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 15
monotone_smooth


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1790
 -- statistics: YY = monotone_smooth (X, Y, H)

     Produce a smooth monotone increasing approximation to a sampled functional
     dependence.

     A kernel method is used (an Epanechnikov smoothing kernel is applied to
     y(x); this is integrated to yield the monotone increasing form.  See
     Reference 1 for details.)

     Arguments
     ---------

        • X is a vector of values of the independent variable.

        • Y is a vector of values of the dependent variable, of the same size as
          X.  For best performance, it is recommended that the Y already be
          fairly smooth, e.g.  by applying a kernel smoothing to the original
          values if they are noisy.

        • H is the kernel bandwidth to use.  If H is not given, a "reasonable"
          value is computed.

     Return values
     -------------

        • YY is the vector of smooth monotone increasing function values at X.

     Examples
     --------

          x = 0:0.1:10;
          y = (x .^ 2) + 3 * randn(size(x)); # typically non-monotonic from the added
          noise
          ys = ([y(1) y(1:(end-1))] + y + [y(2:end) y(end)])/3; # crudely smoothed via
          moving average, but still typically non-monotonic
          yy = monotone_smooth(x, ys); # yy is monotone increasing in x
          plot(x, y, '+', x, ys, x, yy)

     References
     ----------

       1. Holger Dette, Natalie Neumeyer and Kay F. Pilz (2006), A simple
          nonparametric estimator of a strictly monotone regression function,
          ‘Bernoulli’, 12:469-490
       2. Regine Scheder (2007), R Package 'monoProc', Version 1.0-6,
          <http://cran.r-project.org/web/packages/monoProc/monoProc.pdf> (The
          implementation here is based on the monoProc function mono.1d)


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Produce a smooth monotone increasing approximation to a sampled functional
de...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
multcompare


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7650
 -- statistics: C = multcompare (STATS)
 -- statistics: C = multcompare (STATS, "name", VALUE)
 -- statistics: [C, M] = multcompare (...)
 -- statistics: [C, M, H] = multcompare (...)
 -- statistics: [C, M, H, GNAMES] = multcompare (...)
 -- statistics: PADJ = multcompare (P)
 -- statistics: PADJ = multcompare (P, "ctype", CTYPE)

     Perform posthoc multiple comparison tests or p-value adjustments to control
     the family-wise error rate (FWER) or false discovery rate (FDR).

     ‘C = multcompare (STATS)’ performs a multiple comparison using a STATS
     structure that is obtained as output from any of the following functions:
     anova1, anova2, anovan, kruskalwallis, and friedman.  The return value C is
     a matrix with one row per comparison and six columns.  Columns 1-2 are the
     indices of the two samples being compared.  Columns 3-5 are a lower bound,
     estimate, and upper bound for their difference, where the bounds are for
     95% confidence intervals.  Column 6-8 are the multiplicity adjusted
     p-values for each individual comparison, the test statistic and the degrees
     of freedom.  All tests by multcompare are two-tailed.

     multcompare can take a number of optional parameters as name-value pairs.

     ‘[...] = multcompare (STATS, "alpha", ALPHA)’

        • ALPHA sets the significance level of null hypothesis significance
          tests to ALPHA, and the central coverage of two-sided confidence
          intervals to 100*(1-ALPHA)%.  (Default ALPHA is 0.05).

     ‘[...] = multcompare (STATS, "ControlGroup", REF)’

        • REF is the index of the control group to limit comparisons to.  The
          index must be a positive integer scalar value.  For each dimension (d)
          listed in DIM, multcompare uses STATS.grpnames{d}(idx) as the control
          group.  (Default is empty, i.e.  [], for full pairwise comparisons)

     ‘[...] = multcompare (STATS, "ctype", CTYPE)’

        • CTYPE is the type of comparison test to use.  In order of increasing
          power, the choices are: "bonferroni", "scheffe", "mvt", "holm"
          (default), "hochberg", "fdr", or "lsd".  The first five methods
          control the family-wise error rate.  The "fdr" method controls false
          discovery rate (by the original Benjamini-Hochberg step-up procedure).
          The final method, "lsd" (or "none"), makes no attempt to control the
          Type 1 error rate of multiple comparisons.  The coverage of confidence
          intervals are only corrected for multiple comparisons in the cases
          where CTYPE is "bonferroni", "scheffe" or "mvt", which control the
          Type 1 error rate for simultaneous inference.

          The "mvt" method uses the multivariate t distribution to assess the
          probability or critical value of the maximum statistic across the
          tests, thereby accounting for correlations among comparisons in the
          control of the family-wise error rate with simultaneous inference.  In
          the case of pairwise comparisons, it simulates Tukey's (or the
          Games-Howell) test, in the case of comparisons with a single control
          group, it simulates Dunnett's test.  CTYPE values "tukey-kramer" and
          "hsd" are recognised but set the value of CTYPE and REF to "mvt" and
          empty respectively.  A CTYPE value "dunnett" is recognised but sets
          the value of CTYPE to "mvt", and if REF is empty, sets REF to 1.
          Since the algorithm uses a Monte Carlo method (of 1e+06 random
          samples), you can expect the results to fluctuate slightly with each
          call to multcompare and the calculations may be slow to complete for a
          large number of comparisons.  If the parallel package is installed and
          loaded, multcompare will automatically accelerate computations by
          parallel processing.  Note that p-values calculated by the "mvt" are
          truncated at 1e-06.

     ‘[...] = multcompare (STATS, "df", DF)’

        • DF is an optional scalar value to set the number of degrees of freedom
          in the calculation of p-values for the multiple comparison tests.  By
          default, this value is extracted from the STATS structure of the ANOVA
          test, but setting DF maybe necessary to approximate Satterthwaite
          correction if anovan was performed using weights.

     ‘[...] = multcompare (STATS, "dim", DIM)’

        • DIM is a vector specifying the dimension or dimensions over which the
          estimated marginal means are to be calculated.  Used only if STATS
          comes from anovan.  The value [1 3], for example, computes the
          estimated marginal mean for each combination of the first and third
          predictor values.  The default is to compute over the first dimension
          (i.e.  1).  If the specified dimension is, or includes, a continuous
          factor then multcompare will return an error.

     ‘[...] = multcompare (STATS, "estimate", ESTIMATE)’

        • ESTIMATE is a string specifying the estimates to be compared when
          computing multiple comparisons after anova2; this argument is ignored
          by anovan and anova1.  Accepted values for ESTIMATE are either
          "column" (default) to compare column means, or "row" to compare row
          means.  If the model type in anova2 was "linear" or "nested" then only
          "column" is accepted for ESTIMATE since the row factor is assumed to
          be a random effect.

     ‘[...] = multcompare (STATS, "display", DISPLAY)’

        • DISPLAY is either "on" (the default): to display a table and graph of
          the comparisons (e.g.  difference between means), their 100*(1-ALPHA)%
          intervals and multiplicity adjusted p-values in APA style; or "off":
          to omit the table and graph.  On the graph, markers and error bars
          colored red have multiplicity adjusted p-values < ALPHA, otherwise the
          markers and error bars are blue.

     ‘[...] = multcompare (STATS, "seed", SEED)’

        • SEED is a scalar value used to initialize the random number generator
          so that CTYPE "mvt" produces reproducible results.

     ‘[C, M, H, GNAMES] = multcompare (...)’ returns additional outputs.  M is a
     matrix where columns 1-2 are the estimated marginal means and their
     standard errors, and columns 3-4 are lower and upper bounds of the
     confidence intervals for the means; the critical value of the test
     statistic is scaled by a factor of 2^(-0.5) before multiplying by the
     standard errors of the group means so that the intervals overlap when the
     difference in means becomes significant at approximately the level ALPHA.
     When ALPHA is 0.05, this corresponds to confidence intervals with 83.4%
     central coverage.  H is a handle to the figure containing the graph.
     GNAMES is a cell array with one row for each group, containing the names of
     the groups.

     ‘PADJ = multcompare (P)’ calculates and returns adjusted p-values (PADJ)
     using the Holm-step down Bonferroni procedure to control the family-wise
     error rate.

     ‘PADJ = multcompare (P, "ctype", CTYPE)’ calculates and returns adjusted
     p-values (PADJ) computed using the method CTYPE.  In order of increasing
     power, CTYPE for p-value adjustment can be either "bonferroni", "holm"
     (default), "hochberg", or "fdr".  See above for further information about
     the CTYPE methods.

     See also: anova1, anova2, anovan, kruskalwallis, friedman, fitlm.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform posthoc multiple comparison tests or p-value adjustments to control t...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
multiway


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2099
 -- statistics: GROUPINDEX = multiway (NUMBERS, NUM_PARTS)
 -- statistics: GROUPINDEX = multiway (NUMBERS, NUM_PARTS, METHOD)
 -- statistics: [GROUPINDEX, PARTITION] = multiway (...)
 -- statistics: [GROUPINDEX, PARTITION, GROUPSIZES] = multiway (...)

     Solve the multiway number partitioning problem.

     ‘GROUPINDEX = multiway (NUMBERS, NUM_PARTS)’ splits a set of numbers in
     NUMBERS into a number of subsets specified in NUM_PARTS such that the sums
     of the subsets are nearly as equal as possible and returns a vector of
     group indices in GROUPINDEX with each index corresponding to the set of
     numbers provided as input.

        • NUMBERS is a vector of positive real numbers to be partitioned.
        • NUM_PARTS is a positive integer scalar specifying the number of
          partitions (subsets) to split the numbers into.

     ‘GROUPINDEX = multiway (NUMBERS, NUM_PARTS, METHOD)’ also specifies the
     algorithm used for partitioning the set of numbers.  By default, ‘multiway’
     uses the complete Karmarkar-Karp algorithm, when the set of numbers
     contains up to 10 elements and the requested number of subsets does not
     exceed 5, otherwise it defaults to the greedy algorithm, which is optimized
     for speed, but may not return the optimal partitioning.  The following
     methods are supported:

        • 'greedy' (Greedy algorithm)
        • 'completeKK' (Complete Karmarkar-Karp algorithm)

     The ‘multiway’ function may return up to three output arguments described
     below:

        • GROUPINDEX: A vector of the same length as NUMBERS containing the
          group index (from 1 to NUM_PARTS) for each number.
        • PARTITION: A cell array of length NUM_PARTS with each cell containing
          the numbers assigned to that partition.
        • GROUPSIZES: A vector of the sums of the numbers in each partition.

     Example:
          numbers = [4, 5, 6, 7, 8];
          num_parts = 2;
          [groupindex, partition, groupsizes] = multiway (numbers, num_parts);

     See also: cvpartition.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 47
Solve the multiway number partitioning problem.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
nanmax


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1935
 -- statistics: V = nanmax (X)
 -- statistics: V = nanmax (X, [], DIM)
 -- statistics: [V, IDX] = nanmax (...)
 -- statistics: V = nanmax (X, [], 'all')
 -- statistics: V = nanmax (X, [], VECDIM)
 -- statistics: V = nanmax (X, Y)

     Find the maximum while ignoring NaN values.

     ‘V = nanmax (X)’ returns the maximum of X, after removing NaN values.  If X
     is a vector, a scalar maximum value is returned.  If X is a matrix, a row
     vector of column maxima is returned.  If X is a multidimensional array, the
     ‘nanmax’ operates along the first nonsingleton dimension.  If all values in
     a column are NaN, the maximum is returned as NaN rather than [].

     ‘V = nanmax (X, [], DIM)’ operates along the dimension DIM of X.

     ‘[V, IDX] = nanmax (...)’ also returns the row indices of the maximum
     values for each column in the vector IDX.  When X is a vector, then IDX is
     a scalar value as V.

     ‘V = nanmax (X, [], 'all')’ returns the maximum of all elements of X, after
     removing NaN values.  It is the equivalent of ‘nanmax (X(:))’.  The
     optional flag 'all' cannot be used together with DIM or VECDIM input
     arguments.

     ‘V = nanmax (X, [], VECDIM)’ returns the maximum over the dimensions
     specified in the vector VECDIM.  Each element of VECDIM represents a
     dimension of the input array X and the output V has length 1 in the
     specified operating dimensions.  The lengths of the other dimensions are
     the same for X and Y.  For example, if X is a 2-by-3-by-4 array, then
     ‘nanmax (X, [1 2])’ returns a 1-by-1-by-4 array.  Each element of the
     output array is the maximum of the elements on the corresponding page of X.
     If VECDIM indexes all dimensions of X, then it is equivalent to ‘nanmax (X,
     'all')’.  Any dimension in VECDIM greater than ‘ndims (X)’ is ignored.

     See also: max, nanmin, nansum.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 43
Find the maximum while ignoring NaN values.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
nanmean


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1926
 -- statistics: S = nanmean (X)
 -- statistics: S = nanmean (X, 'all')
 -- statistics: S = nanmean (X, DIM)
 -- statistics: S = nanmean (X, VECDIM)

     Compute the mean while ignoring NaN values.

     ‘S = nanmean (X)’ returns the mean of X after removing NaN values.  If X is
     a vector, a scalar value is returned.  If X is a matrix, a row vector of
     column means is returned.  If X is a multidimensional array, ‘nanmean’
     operates along the first nonsingleton dimension.  If all values along a
     dimension are NaN, the mean is returned returned as NaN.

     ‘S = nanmean (X, 'all')’ returns the mean of all elements of X, after
     removing NaN values.  It is the equivalent of ‘nanmean (X(:))’.

     ‘S = nanmean (X, DIM)’ operates along the dimension DIM of X.

     ‘S = nanmean (X, VECDIM)’ returns the mean over the dimensions specified in
     the vector VECDIM.  Each element of VECDIM represents a dimension of the
     input array X and the output S has length 1 in the specified operating
     dimensions.  The lengths of the other dimensions are the same for X and Y.
     For example, if X is a 2-by-3-by-4 array, then ‘nanmean (X, [1 2])’ returns
     a 1-by-1-by-4 array.  Each element of the output array is the mean of the
     elements on the corresponding page of X.  If VECDIM indexes all dimensions
     of X, then it is equivalent to ‘nanmean (X, 'all')’.  Any dimension in
     VECDIM greater than ‘ndims (X)’ is ignored.

     ‘nanmean’ primarily operates on single and double numeric types, since they
     support NaN values, while preserving the data type.  Nevertheless, it can
     also operate on integer types by treating them as double types.  To avoid
     overflow on very large int64 and uint64 values, use the ‘mean’ function,
     which applies special handling for such cases.

     See also: mean, nansum, nanmin, nanmax.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 43
Compute the mean while ignoring NaN values.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
nanmin


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1935
 -- statistics: V = nanmin (X)
 -- statistics: V = nanmin (X, [], DIM)
 -- statistics: [V, IDX] = nanmin (...)
 -- statistics: V = nanmin (X, [], 'all')
 -- statistics: V = nanmin (X, [], VECDIM)
 -- statistics: V = nanmin (X, Y)

     Find the minimum while ignoring NaN values.

     ‘V = nanmin (X)’ returns the minimum of X, after removing NaN values.  If X
     is a vector, a scalar minimum value is returned.  If X is a matrix, a row
     vector of column minima is returned.  If X is a multidimensional array, the
     ‘nanmin’ operates along the first nonsingleton dimension.  If all values in
     a column are NaN, the minimum is returned as NaN rather than [].

     ‘V = nanmin (X, [], DIM)’ operates along the dimension DIM of X.

     ‘[V, IDX] = nanmin (...)’ also returns the row indices of the minimum
     values for each column in the vector IDX.  When X is a vector, then IDX is
     a scalar value as V.

     ‘V = nanmin (X, [], 'all')’ returns the minimum of all elements of X, after
     removing NaN values.  It is the equivalent of ‘nanmin (X(:))’.  The
     optional flag 'all' cannot be used together with DIM or VECDIM input
     arguments.

     ‘V = nanmin (X, [], VECDIM)’ returns the minimum over the dimensions
     specified in the vector VECDIM.  Each element of VECDIM represents a
     dimension of the input array X and the output V has length 1 in the
     specified operating dimensions.  The lengths of the other dimensions are
     the same for X and Y.  For example, if X is a 2-by-3-by-4 array, then
     ‘nanmin (X, [1 2])’ returns a 1-by-1-by-4 array.  Each element of the
     output array is the minimum of the elements on the corresponding page of X.
     If VECDIM indexes all dimensions of X, then it is equivalent to ‘nanmin (X,
     'all')’.  Any dimension in VECDIM greater than ‘ndims (X)’ is ignored.

     See also: min, nanmax, nansum.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 43
Find the minimum while ignoring NaN values.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
nansum


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1527
 -- statistics: S = nansum (X)
 -- statistics: S = nanmax (X, 'all')
 -- statistics: S = nanmax (X, DIM)
 -- statistics: S = nanmax (X, VECDIM)

     Compute the sum while ignoring NaN values.

     ‘S = nansum (X)’ returns the sum of X, after removing NaN values.  If X is
     a vector, a scalar value is returned.  If X is a matrix, a row vector of
     column sums is returned.  If X is a multidimensional array, the ‘nansum’
     operates along the first nonsingleton dimension.  If all values along a
     dimension are NaN, the sum is returned returned as 0.

     ‘S = nansum (X, 'all')’ returns the sum of all elements of X, after
     removing NaN values.  It is the equivalent of ‘nansum (X(:))’.

     ‘S = nansum (X, DIM)’ operates along the dimension DIM of X.

     ‘S = nansum (X, VECDIM)’ returns the sum over the dimensions specified in
     the vector VECDIM.  Each element of VECDIM represents a dimension of the
     input array X and the output S has length 1 in the specified operating
     dimensions.  The lengths of the other dimensions are the same for X and Y.
     For example, if X is a 2-by-3-by-4 array, then ‘nanmax (X, [1 2])’ returns
     a 1-by-1-by-4 array.  Each element of the output array is the maximum of
     the elements on the corresponding page of X.  If VECDIM indexes all
     dimensions of X, then it is equivalent to ‘nanmax (X, 'all')’.  Any
     dimension in VECDIM greater than ‘ndims (X)’ is ignored.

     See also: sum, nanmin, nanmax.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 42
Compute the sum while ignoring NaN values.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 22
normalise_distribution


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2070
 -- statistics: NORMALISED = normalise_distribution (DATA)
 -- statistics: NORMALISED = normalise_distribution (DATA, DISTRIBUTION)
 -- statistics: NORMALISED = normalise_distribution (DATA, DISTRIBUTION,
          DIMENSION)

     Transform a set of data so as to be N(0,1) distributed according to an idea
     by van Albada and Robinson.

     This is achieved by first passing it through its own cumulative
     distribution function (CDF) in order to get a uniform distribution, and
     then mapping the uniform to a normal distribution.

     The data must be passed as a vector or matrix in DATA.  If the CDF is
     unknown, then [] can be passed in DISTRIBUTION, and in this case the
     empirical CDF will be used.  Otherwise, if the CDFs for all data are known,
     they can be passed in DISTRIBUTION, either in the form of a single function
     name as a string, or a single function handle, or a cell array consisting
     of either all function names as strings, or all function handles.  In the
     latter case, the number of CDFs passed must match the number of rows, or
     columns respectively, to normalise.  If the data are passed as a matrix,
     then the transformation will operate either along the first non-singleton
     dimension, or along DIMENSION if present.

     Notes: The empirical CDF will map any two sets of data having the same size
     and their ties in the same places after sorting to some permutation of the
     same normalised data:
          normalise_distribution([1 2 2 3 4])
          ⇒ -1.28  0.00  0.00  0.52  1.28

          normalise_distribution([1 10 100 10 1000])
          ⇒ -1.28  0.00  0.52  0.00  1.28

     Original source: S.J. van Albada, P.A. Robinson "Transformation of
     arbitrary distributions to the normal distribution with application to EEG
     test-retest reliability" Journal of Neuroscience Methods, Volume 161, Issue
     2, 15 April 2007, Pages 205-211 ISSN 0165-0270,
     10.1016/j.jneumeth.2006.11.004.
     (http://www.sciencedirect.com/science/article/pii/S0165027006005668)


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Transform a set of data so as to be N(0,1) distributed according to an idea b...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
normplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 770
 -- Function File: normplot (X)
 -- Function File: normplot (AX, X)
 -- Function File: H = normplot (...)

     Produce normal probability plot of the data in X.  If X is a matrix,
     ‘normplot’ plots the data for each column.  NaN values are ignored.

     ‘H = normplot (AX, X)’ takes a handle AX in addition to the data in X and
     it uses that axes for plotting.  You may get this handle of an existing
     plot with ‘gca’.

     The line joining the 1st and 3rd quantile is drawn solid whereas its
     extensions to both ends are dotted.  If the underlying distribution is
     normal, the points will cluster around the solid part of the line.  Other
     distribution types will introduce curvature in the plot.

     See also: cdfplot, wblplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 49
Produce normal probability plot of the data in X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
optimalleaforder


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1307
 -- statistics: LEAFORDER = optimalleaforder (TREE, D)
 -- statistics: LEAFORDER = optimalleaforder (..., NAME, VALUE)

     Compute the optimal leaf ordering of a hierarchical binary cluster tree.

     The optimal leaf ordering of a tree is the ordering which minimizes the sum
     of the distances between each leaf and its adjacent leaves, without
     altering the structure of the tree, that is without redefining the clusters
     of the tree.

     Required inputs:
        • TREE: a hierarchical cluster tree TREE generated by the ‘linkage’
          function.

        • D: a matrix of distances as computed by ‘pdist’.

     Optional inputs can be the following property/value pairs:
        • property 'Criteria' at the moment can only have the value 'adjacent',
          for minimizing the distances between leaves.

        • property 'Transformation' can have one of the values 'linear',
          'inverse' or a handle to a custom function which computes S the
          similarity matrix.

     optimalleaforder's output LEAFORDER is the optimal leaf ordering.

     *Reference* Bar-Joseph, Z., Gifford, D.K., and Jaakkola, T.S. Fast optimal
     leaf ordering for hierarchical clustering.  Bioinformatics vol.  17 suppl.
     1, 2001.

See also: dendrogram,linkage,pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 72
Compute the optimal leaf ordering of a hierarchical binary cluster tree.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 21
parseWilkinsonFormula


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5639
 -- statistics: TERMS = parseWilkinsonFormula (FORMULA)
 -- statistics: RESULT = parseWilkinsonFormula (FORMULA, MODE)
 -- statistics: [X, Y, NAMES] = parseWilkinsonFormula (FORMULA, "model_matrix",
          DATA)

     Parse and expand statistical model formulae using the Wilkinson notation.

     This function implements the recursive-descent parser and expansion logic
     described by Wilkinson & Rogers (1973) for factorial models.  It allows the
     symbolic specification of analysis of variance and regression models,
     converting strings into computational schemas or design matrices.  It also
     supports multi-variable response specification on the Left-Hand Side (LHS)
     using lists or ranges.

     ‘parseWilkinsonFormula’ accepts as its first input argument a Wilkinson
     notation string specified by FORMULA either as a character vector or a
     string scalar with the following list of valid symbols:

     *Right-Hand Side (Model) Operators* The RHS specifies the independent
     variables (predictors) and the structural relationships between them, such
     as interactions and nesting.  The parser expands these expressions into
     fundamental model terms following the standard statistical rules of
     marginality.  Additionally, explicit nesting notation (e.g., ‘B(A)’) is
     supported to denote that factor B is nested within A.

     Operator     Description                  Expansion Example
     -----------------------------------------------------------------------------------
     ‘+’          Addition (Union)             ‘A + B’ expands to A, B
     ‘*’          Crossing                     ‘A * B’ expands to A, B, A:B
     ‘-’          Deletion                     ‘A*B - A:B’ expands to A, B
     ‘/’          Nesting                      ‘A / B’ expands to A, A:B
     ‘:’          Interaction                  ‘A : B’ expands to A:B
     ‘^’          Power (Limit)                ‘(A+B)^2’ expands to A, B, A:B
     ‘1’          Intercept                    ‘y ~ A - 1’ removes intercept

     *Left-Hand Side (Response) Operators* The LHS, separated by the ‘~’
     operator, defines the dependent variables.  It natively supports
     multi-response syntaxes.

     Operator     Description                  Usage Example
     -----------------------------------------------------------------------------------
     ‘~’          Formula separator            ‘y ~ x’
     ‘,’          List separator               ‘y1, y2 ~ x’
     ‘-’          Range operator               ‘T1 - T3 ~ x’

     *Processing Modes* ‘parseWilkinsonFormula (FORMULA, MODE)’ evaluates the
     formula string based on the selected MODE:

        • ‘'expand'’ (default) - Returns a structure containing ‘response’ and
          ‘model’ fields.  Each field contains cell arrays of the expanded,
          fundamental terms.

        • ‘'equation'’ - Generates a string representing the mathematical
          equation of the fitted model.  Coefficients are represented
          generically as ‘c1, c2, ...’.  If multiple responses are specified, it
          returns a string array of equations.

          Formula String                   Equation Output
          ----------------------------------------------------------------------------------
          ‘y ~ x’                          ‘"y = c1 + c2*x"’
          ‘y ~ A * B’                      ‘"y = c1 + c2*A + c3*B + c4*A*B"’
          ‘y ~ School / Class’             ‘"y = c1 + c2*School + c3*Class*School"’
          ‘y ~ x^2’                        ‘"y = c1 + c2*x + c3*x^2"’
          ‘y1 - y2 ~ Trt’                  ‘["y1 = c1 + c2*Trt", "y2 = ..."]’

        • ‘'matrix'’ - Returns a schema structure containing a binary matrix
          defining term membership, useful for internal algorithmic processing.

        • ‘'model_matrix'’ - Constructs the numeric Design Matrix (X) and
          Response Matrix (Y) directly from a provided data table.

        • ‘'parse'’ - Returns the raw Abstract Syntax Tree (AST) structure.

        • ‘'tokenize'’ - Returns the array of tokens generated by the lexer.

     *Data Handling ('model_matrix' mode)* When using the ‘'model_matrix'’ mode,
     a DATA argument must be provided as an Octave ‘table’.
        • *Categorical Variables:* Cell arrays of strings in the table are
          automatically detected as categorical factors and undergo corner-point
          (reference) dummy coding.
        • *Numeric Variables:* Standard numeric vectors are treated as
          continuous predictors or responses.
        • *Missing Data:* Rows containing ‘NaN’ values in any of the active
          variables are automatically omitted from the final matrices.

     *Outputs*
     TERMS / RESULT
          The processed model structure, string array, or cell array depending
          on the selected MODE.
     X
          The generated numeric design matrix (Observations x Parameters).
          Includes a column of ones for the intercept unless ‘- 1’ is in the
          formula.
     Y
          The numeric response matrix (Observations x K responses).
     NAMES
          A cell array of character vectors containing the column names
          corresponding to the generated design matrix X.

     *References*

     Wilkinson, G. N. and Rogers, C. E. (1973).  Symbolic Description of
     Factorial Models for Analysis of Variance.  Applied Statistics, 22,
     392-399.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 73
Parse and expand statistical model formulae using the Wilkinson notation.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3
pca


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3394
 -- statistics: COEFF = pca (X)
 -- statistics: COEFF = pca (X, NAME, VALUE)
 -- statistics: [COEFF, SCORE, LATENT] = pca (...)
 -- statistics: [COEFF, SCORE, LATENT, TSQUARED] = pca (...)
 -- statistics: [COEFF, SCORE, LATENT, TSQUARED, EXPLAINED, MU] = pca (...)

     Performs a principal component analysis on a data matrix.

     A principal component analysis of a data matrix of N observations in a D
     dimensional space returns a DxD transformation matrix, to perform a change
     of basis on the data.  The first component of the new basis is the
     direction that maximizes the variance of the projected data.

     Input argument:
        • X : a NxD data matrix

     The following NAME, VALUE pair arguments can be used:
        • "Algorithm" defines the algorithm to use:
             • "svd" (default), for singular value decomposition
             • "eig" for eigenvalue decomposition

        • "Centered" is a boolean indicator for centering the observation data.
          It is ‘true’ by default.
        • 
          "Economy" is a boolean indicator for the economy size output.  It is
          ‘true’ by default.  Hence, ‘pca’ returns only the elements of LATENT
          that are not necessarily zero, and the corresponding columns of COEFF
          and SCORE, that is, when N <= D, only the first N - 1.

        • "NumComponents" defines the number of components k to return.  If k <
          p, then only the first k columns of COEFF and SCORE are returned.

        • "Rows" defines how to handle missing values:
             • "complete" (default), missing values are removed before
               computation.
             • "pairwise" (only valid when "Algorithm" is "eig"), the covariance
               of rows with missing data is computed using the available data,
               but the covariance matrix could be not positive definite, which
               triggers the termination of ‘pca’.
             • "complete", missing values are not allowed, ‘pca’ terminates with
               an error if there are any.

        • "Weights" defines observation weights as a vector of positive values
          of length N.

        • "VariableWeights" defines variable weights:
             • a VECTOR of positive values of length D.
             • the string "variance" to use the sample variance as weights.

     Return values:
        • COEFF : the principal component coefficients, a DxD transformation
          matrix
        • SCORE : the principal component scores, the representation of X in the
          principal component space
        • LATENT : the principal component variances, i.e., the eigenvalues of
          the covariance matrix of X
        • TSQUARED : Hotelling's T-squared Statistic for each observation in X
        • EXPLAINED : the percentage of the variance explained by each principal
          component
        • MU : the estimated mean of each variable of X, it is zero if the data
          are not centered

     Matlab compatibility note: the alternating least square method 'als' and
     associated options 'Coeff0', 'Score0', and 'Options' are not yet
     implemented

     References
     ----------

       1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition, Springer,
          2002

     See also: barttest, factoran, pcacov, pcares.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Performs a principal component analysis on a data matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
pcacov


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1482
 -- statistics: COEFF = pcacov (K)
 -- statistics: [COEFF, LATENT] = pcacov (K)
 -- statistics: [COEFF, LATENT, EXPLAINED] = pcacov (K)

     Perform principal component analysis on covariance matrix

     ‘COEFF = pcacov (K)’ performs principal component analysis on the square
     covariance matrix K and returns the principal component coefficients, also
     known as loadings.  The columns are in order of decreasing component
     variance.

     ‘[COEFF, LATENT] = pcacov (K)’ also returns a vector with the principal
     component variances, i.e.  the eigenvalues of K.  LATENT has a length of
     size (COEFF, 1).

     ‘[COEFF, LATENT, EXPLAINED] = pcacov (K)’ also returns a vector with the
     percentage of the total variance explained by each principal component.
     EXPLAINED has the same size as LATENT.  The entries in EXPLAINED range from
     0 (none of the variance is explained) to 100 (all of the variance is
     explained).

     ‘pcacov’ does not standardize K to have unit variances.  In order to
     perform principal component analysis on standardized variables, use the
     correlation matrix R = K ./ (SD * SD'), where SD = sqrt (diag (K)), in
     place of K.  To perform principal component analysis directly on the data
     matrix, use ‘pca’.

     References
     ----------

       1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition, Springer,
          2002

     See also: barttest, factoran, pcares, pca.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Perform principal component analysis on covariance matrix



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
pcares


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1299
 -- statistics: RESIDUALS = pcares (X, NDIM)
 -- statistics: [RESIDUALS, RECONSTRUCTED] = pcares (X, NDIM)

     Calculate residuals from principal component analysis.

     ‘RESIDUALS = pcares (X, NDIM)’ returns the residuals obtained by retaining
     NDIM principal components of the NxD matrix X.  Rows of X correspond to
     observations, columns of X correspond to variables.  NDIM is a scalar and
     must be less than or equal to D. RESIDUALS is a matrix of the same size as
     X.  Use the data matrix, not the covariance matrix, with this function.

     ‘[RESIDUALS, RECONSTRUCTED] = pcares (X, NDIM)’ returns the reconstructed
     observations, i.e.  the approximation to X obtained by retaining its first
     NDIM principal components.

     ‘pcares’ does not normalize the columns of X.  Use pcares (zscore (X),
     NDIM) in order to perform the principal components analysis based on
     standardized variables, i.e.  based on correlations.  Use ‘pcacov’ in order
     to perform principal components analysis directly on a covariance or
     correlation matrix without constructing residuals.

     References
     ----------

       1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition, Springer,
          2002

     See also: factoran, pcacov, pca.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Calculate residuals from principal component analysis.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
pdist


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4084
 -- statistics: D = pdist (X)
 -- statistics: D = pdist (X, DISTANCE)
 -- statistics: D = pdist (X, DISTANCE, DISTPARAMETER)

     Return the distance between any two rows in X.

     ‘D = pdist (X’ calculates the euclidean distance between pairs of
     observations in X.  X must be an MxP numeric matrix representing M points
     in P-dimensional space.  This function computes the pairwise distances
     returned in D as an Mx(M-1)/P row vector.  Use ‘Z = squareform (D)’ to
     convert the row vector D into a an MxM symmetric matrix Z, where Z(i,j)
     corresponds to the pairwise distance between points i and j.

     ‘D = pdist (X, Y, DISTANCE)’ returns the distance between pairs of
     observations in X using the metric specified by DISTANCE, which can be any
     of the following options.

     "euclidean"           Euclidean distance.
     "fasteuclidean"       Euclidean distance computed with an alternative
                           algorithm which may be faster but might reduce
                           accuracy.
     "squaredeuclidean"    Squared Euclidean distance.
     "fastsquaredeuclidean"Euclidean distance computed with an alternative
                           algorithm which may be faster but might reduce
                           accuracy.
     "seuclidean"          standardized Euclidean distance.  Each coordinate
                           difference between the rows in X and the query
                           matrix Y is scaled by dividing by the
                           corresponding element of the standard deviation
                           computed from X.  A different scaling vector can
                           be specified with the subsequent DISTPARAMETER
                           input argument.
     "mahalanobis"         Mahalanobis distance, computed using a positive
                           definite covariance matrix.  A different
                           covariance matrix can be specified with the
                           subsequent DISTPARAMETER input argument.
     "cityblock"           City block distance.
     "minkowski"           Minkowski distance.  The default exponent is 2.  A
                           different exponent can be specified with the
                           subsequent DISTPARAMETER input argument.
     "chebychev"           Chebychev distance (maximum coordinate
                           difference).
     "cosine"              One minus the cosine of the included angle between
                           points (treated as vectors).
     "correlation"         One minus the sample linear correlation between
                           observations (treated as sequences of values).
     "hamming"             Hamming distance, which is the percentage of
                           coordinates that differ.
     "jaccard"             One minus the Jaccard coefficient, which is the
                           percentage of nonzero coordinates that differ.
     "spearman"            One minus the sample Spearman's rank correlation
                           between observations (treated as sequences of
                           values).
     @DISTFUN              Custom distance function handle.  A distance
                           function of the form ‘function D2 = distfun (XI,
                           YI)’, where XI is a 1xP vector containing a single
                           observation in P-dimensional space, YI is an NxP
                           matrix containing an arbitrary number of
                           observations in the same P-dimensional space, and
                           D2 is an NxP vector of distances, where (D2k) is
                           the distance between observations XI and (YIk,:).

     ‘D = pdist (X, Y, DISTANCE, DISTPARAMETER)’ returns the distance using the
     metric specified by DISTANCE and DISTPARAMETER.  The latter one can only be
     specified when the selected DISTANCE is "seuclidean", "minkowski", and
     "mahalanobis".

     See also: pdist2, squareform, linkage.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Return the distance between any two rows in X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
pdist2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5084
 -- statistics: D = pdist2 (X, Y)
 -- statistics: D = pdist2 (X, Y, DISTANCE)
 -- statistics: D = pdist2 (X, Y, DISTANCE, DISTPARAMETER)
 -- statistics: D = pdist2 (..., NAME, VALUE)
 -- statistics: [D, I] = pdist2 (..., NAME, VALUE)

     Compute pairwise distance between two sets of vectors.

     ‘D = pdist2 (X, Y)’ calculates the euclidean distance between each pair of
     observations in X and Y.  Let X be an MxP matrix representing M points in
     P-dimensional space and Y be an NxP matrix representing another set of
     points in the same space.  This function computes the MxN distance matrix
     D, where D(i,j) is the distance between X(i,:) and Y(j,:).

     ‘D = pdist2 (X, Y, DISTANCE)’ returns the distance between each pair of
     observations in X and Y using the metric specified by DISTANCE, which can
     be any of the following options.

     "euclidean"           Euclidean distance.
     "fasteuclidean"       Euclidean distance computed with an alternative
                           algorithm which may be faster but might reduce
                           accuracy.
     "squaredeuclidean"    Squared Euclidean distance.
     "fastsquaredeuclidean"Euclidean distance computed with an alternative
                           algorithm which may be faster but might reduce
                           accuracy.
     "seuclidean"          standardized Euclidean distance.  Each coordinate
                           difference between the rows in X and the query
                           matrix Y is scaled by dividing by the
                           corresponding element of the standard deviation
                           computed from X.  A different scaling vector can
                           be specified with the subsequent DISTPARAMETER
                           input argument.
     "mahalanobis"         Mahalanobis distance, computed using a positive
                           definite covariance matrix.  A different
                           covariance matrix can be specified with the
                           subsequent DISTPARAMETER input argument.
     "cityblock"           City block distance.
     "minkowski"           Minkowski distance.  The default exponent is 2.  A
                           different exponent can be specified with the
                           subsequent DISTPARAMETER input argument.
     "chebychev"           Chebychev distance (maximum coordinate
                           difference).
     "cosine"              One minus the cosine of the included angle between
                           points (treated as vectors).
     "correlation"         One minus the sample linear correlation between
                           observations (treated as sequences of values).
     "hamming"             Hamming distance, which is the percentage of
                           coordinates that differ.
     "jaccard"             One minus the Jaccard coefficient, which is the
                           percentage of nonzero coordinates that differ.
     "spearman"            One minus the sample Spearman's rank correlation
                           between observations (treated as sequences of
                           values).
     @DISTFUN              Custom distance function handle.  A distance
                           function of the form ‘function D2 = distfun (XI,
                           YI)’, where XI is a 1xP vector containing a single
                           observation in P-dimensional space, YI is an NxP
                           matrix containing an arbitrary number of
                           observations in the same P-dimensional space, and
                           D2 is an NxP vector of distances, where (D2k) is
                           the distance between observations XI and (YIk,:).

     ‘D = pdist2 (X, Y, DISTANCE, DISTPARAMETER)’ returns the distance using the
     metric specified by DISTANCE and DISTPARAMETER.  The latter one can only be
     specified when the selected DISTANCE is "seuclidean", "minkowski", and
     "mahalanobis".

     ‘D = pdist2 (..., NAME, VALUE)’ for any previous arguments, modifies the
     computation using NAME-VALUE parameters.
        • ‘D = pdist2 (X, Y, DISTANCE, "Smallest", K)’ computes the distance
          using the metric specified by DISTANCE and returns the K smallest
          pairwise distances to observations in X for each observation in Y in
          ascending order.
        • ‘D = pdist2 (X, Y, DISTANCE, DISTPARAMETER, "Largest", K)’ computes
          the distance using the metric specified by DISTANCE and DISTPARAMETER
          and returns the K largest pairwise distances in descending order.

     ‘[D, I] = pdist2 (..., NAME, VALUE)’ also returns the matrix I, which
     contains the indices of the observations in X corresponding to the
     distances in D.  You must specify either "Smallest" or "Largest" as an
     optional NAME-VALUE pair argument to compute the second output argument.

     See also: pdist, knnsearch, rangesearch.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Compute pairwise distance between two sets of vectors.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
plsregress


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5336
 -- statistics: [XLOAD, YLOAD] = plsregress (X, Y)
 -- statistics: [XLOAD, YLOAD] = plsregress (X, Y, NCOMP)
 -- statistics: [XLOAD, YLOAD, XSCORE, YSCORE, COEF, PCTVAR, MSE, STATS] =
          plsregress (X, Y, NCOMP)
 -- statistics: [XLOAD, YLOAD, XSCORE, YSCORE, COEF, PCTVAR, MSE, STATS] =
          plsregress (..., NAME, VALUE)

     Calculate partial least squares regression using SIMPLS algorithm.

     ‘plsregress’ uses the SIMPLS algorithm, and first centers X and Y by
     subtracting off column means to get centered variables.  However, it does
     not rescale the columns.  To perform partial least squares regression with
     standardized variables, use ‘zscore’ to normalize X and Y.

     ‘[XLOAD, YLOAD] = plsregress (X, Y)’ computes a partial least squares
     regression of Y on X, using NCOMP PLS components, which by default are
     calculated as min (size (X, 1) - 1, size(X, 2)), and returns the the
     predictor and response loadings in XLOAD and YLOAD, respectively.
        • X is an NxP matrix of predictor variables, with rows corresponding to
          observations, and columns corresponding to variables.
        • Y is an NxM response matrix.
        • XLOAD is a PxNCOMP matrix of predictor loadings, where each row of
          XLOAD contains coefficients that define a linear combination of PLS
          components that approximate the original predictor variables.
        • YLOAD is an MxNCOMP matrix of response loadings, where each row of
          YLOAD contains coefficients that define a linear combination of PLS
          components that approximate the original response variables.

     ‘[XLOAD, YLOAD] = plsregress (X, Y, NCOMP)’ defines the desired number of
     PLS components to use in the regression.  NCOMP, a scalar positive integer,
     must not exceed the default calculated value.

     ‘[XLOAD, YLOAD, XSCORE, YSCORE, COEF, PCTVAR, MSE, STATS] = plsregress (X,
     Y, NCOMP)’ also returns the following arguments:
        • XSCORE is an NxNCOMP orthonormal matrix with the predictor scores,
          i.e., the PLS components that are linear combinations of the variables
          in X, with rows corresponding to observations and columns
          corresponding to components.
        • YSCORE is an NxNCOMP orthonormal matrix with the response scores,
          i.e., the linear combinations of the responses with which the PLS
          components XSCORE have maximum covariance, with rows corresponding to
          observations and columns corresponding to components.
        • COEF is a (P+1)xM matrix with the PLS regression coefficients,
          containing the intercepts in the first row.
        • PCTVAR is a 2xNCOMP matrix containing the percentage of the variance
          explained by the model with the first row containing the percentage of
          explained varianced in X by each PLS component and the second row
          containing the percentage of explained variance in Y.
        • MSE is a 2x(NCOMP+1) matrix containing the estimated mean squared
          errors for PLS models with 0:NCOMP components with the first row
          containing the squared errors for the predictor variables in X and the
          second row containing the mean squared errors for the response
          variable(s) in Y.
        • STATS is a structure with the following fields:
             • STATS.W is a PxNCOMP matrix of PLS weights.
             • STATS.T2 is the T^2 statistics for each point in XSCORE.
             • STATS.Xresiduals is an NxP matrix with the predictor residuals.
             • STATS.Yresiduals is an NxM matrix with the response residuals.

     ‘[...] = plsregress (..., NAME, VALUE, ...)’ specifies one or more of the
     following NAME/VALUE pairs:

          NAME             VALUE
     -----------------------------------------------------------------------------------
          "CV"             The method used to compute MSE.  When VALUE is a positive
                           integer K, ‘plsregress’ uses K-fold cross-validation.  Set
                           VALUE to a cross-validation partition, created using
                           ‘cvpartition’, to use other forms of cross-validation.
                           Set VALUE to "resubstitution" to use both X and Y to fit
                           the model and to estimate the mean squared errors, without
                           cross-validation.  By default, VALUE = "resubstitution".
          "MCReps"         A positive integer indicating the number of Monte-Carlo
                           repetitions for cross-validation.  By default, VALUE = 1.
                           A different "MCReps" value is only meaningful when using
                           the "HoldOut" method for cross-validation, previously set
                           by a ‘cvpartition’ object.  If no cross-validation method
                           is used, then "MCReps" must be 1.

     Further information about the PLS regression can be found at
     <https://en.wikipedia.org/wiki/Partial_least_squares_regression>

     References
     ----------

       1. SIMPLS: An alternative approach to partial least squares regression.
          Chemometrics and Intelligent Laboratory Systems (1993)


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 66
Calculate partial least squares regression using SIMPLS algorithm.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
ppplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 950
 -- statistics: ppplot (X, DIST)
 -- statistics: ppplot (X, DIST, PARAMS)
 -- statistics: [P, Y] = ppplot (X, DIST, PARAMS)

     Perform a PP-plot (probability plot).

     If F is the CDF of the distribution DIST with parameters PARAMS and X a
     sample vector of length N, the PP-plot graphs ordinate Y(I) = F (I-th
     largest element of X) versus abscissa P(I) = (I - 0.5)/N.  If the sample
     comes from F, the pairs will approximately follow a straight line.

     The default for DIST is the standard normal distribution.

     The optional argument PARAMS contains a list of parameters of DIST.

     For example, for a probability plot of the uniform distribution on [2,4]
     and X, use

          ppplot (x, "unif", 2, 4)

     DIST can be any string for which a function DISTCDF that calculates the CDF
     of distribution DIST exists.

     If no output is requested then the data are plotted immediately.

     See also: qqplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 37
Perform a PP-plot (probability plot).



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
princomp


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1237
 -- statistics: COEFF = princomp (X)
 -- statistics: [COEFF, SCORE] = princomp (X)
 -- statistics: [COEFF, SCORE, LATENT] = princomp (X)
 -- statistics: [COEFF, SCORE, LATENT, TSQUARE] = princomp (X)
 -- statistics: [...] = princomp (X, "econ")

     Performs a principal component analysis on a NxP data matrix X.

        • COEFF : returns the principal component coefficients
        • SCORE : returns the principal component scores, the representation of
          X in the principal component space
        • LATENT : returns the principal component variances, i.e., the
          eigenvalues of the covariance matrix X.
        • TSQUARE : returns Hotelling's T-squared Statistic for each observation
          in X
        • [...]  = princomp(X,'econ') returns only the elements of latent that
          are not necessarily zero, and the corresponding columns of COEFF and
          SCORE, that is, when n <= p, only the first n-1.  This can be
          significantly faster when p is much larger than n.  In this case the
          svd will be applied on the transpose of the data matrix X

     References
     ----------

       1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition, Springer,
          2002


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 63
Performs a principal component analysis on a NxP data matrix X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
probit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 184
 -- statistics: X = probit (P)

     Probit transformation

     Return the probit (the quantile of the standard normal distribution) for
     each element of P.

     See also: logit.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 21
Probit transformation



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
procrustes


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2460
 -- statistics: D = procrustes (X, Y)
 -- statistics: D = procrustes (X, Y, PARAM1, VALUE1, ...)
 -- statistics: [D, Z] = procrustes (...)
 -- statistics: [D, Z, TRANSFORM] = procrustes (...)

     Procrustes Analysis.

     ‘D = procrustes (X, Y)’ computes a linear transformation of the points in
     the matrix Y to best conform them to the points in the matrix X by
     minimizing the sum of squared errors, as the goodness of fit criterion,
     which is returned in D as a dissimilarity measure.  D is standardized by a
     measure of the scale of X, given by
        • sum (sum ((X - repmat (mean (X, 1), size (X, 1), 1)) .^ 2, 1))
     i.e., the sum of squared elements of a centered version of X.  However, if
     X comprises repetitions of the same point, the sum of squared errors is not
     standardized.

     X and Y must have the same number of points (rows) and procrustes matches
     the i-th point in Y to the i-th point in X.  Points in Y can have smaller
     dimensions (columns) than those in X, but not the opposite.  Missing
     dimensions in Y are added with padding columns of zeros as necessary to
     match the the dimensions in X.

     ‘[D, Z] = procrustes (X, Y)’ also returns the transformed values in Y.

     ‘[D, Z, TRANSFORM] = procrustes (X, Y)’ also returns the transformation
     that maps Y to Z.

     TRANSFORM is a structure with fields:

          c             the translation component
          T             the orthogonal rotation and reflection component
          b             the scale component

     So that ‘Z = TRANSFORM.b * Y * TRANSFORM.T + TRANSFORM.c’

     procrustes can take two optional parameters as Name-Value pairs.

     ‘[...] = procrustes (..., "Scaling", false)’ computes a transformation that
     does not include scaling, that is TRANSFORM.b = 1.  Setting "Scaling" to
     true includes a scaling component, which is the default.

     ‘[...] = procrustes (..., "Reflection", false)’ computes a transformation
     that does not include a reflection component, that is TRANSFORM.T = 1.
     Setting "Reflection" to true forces the solution to include a reflection
     component in the computed transformation, that is TRANSFORM.T = -1.

     ‘[...] = procrustes (..., "Reflection", "best")’ computes the best fit
     procrustes solution, which may or may not include a reflection component,
     which is the default.

     See also: cmdscale.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 20
Procrustes Analysis.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
qqplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1191
 -- statistics: [Q, S] = qqplot (X)
 -- statistics: [Q, S] = qqplot (X, Y)
 -- statistics: [Q, S] = qqplot (X, DIST)
 -- statistics: [Q, S] = qqplot (X, Y, PARAMS)
 -- statistics: qqplot (...)

     Perform a QQ-plot (quantile plot).

     If F is the CDF of the distribution DIST with parameters PARAMS and G its
     inverse, and X a sample vector of length N, the QQ-plot graphs ordinate
     S(I) = I-th largest element of x versus abscissa Q(If) = G((I - 0.5)/N).

     If the sample comes from F, except for a transformation of location and
     scale, the pairs will approximately follow a straight line.

     If the second argument is a vector Y the empirical CDF of Y is used as
     DIST.

     The default for DIST is the standard normal distribution.  The optional
     argument PARAMS contains a list of parameters of DIST.  For example, for a
     quantile plot of the uniform distribution on [2,4] and X, use

          qqplot (x, "unif", 2, 4)

     DIST can be any string for which a function DISTINV or DIST_INV exists that
     calculates the inverse CDF of distribution DIST.

     If no output arguments are given, the data are plotted directly.

     See also: ppplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 34
Perform a QQ-plot (quantile plot).



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
qrandn


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 498
 -- statistics: Z = qrandn (Q, R, C)
 -- statistics: Z = qrandn (Q, [R, C])

     Returns random deviates drawn from a q-Gaussian distribution.

     Parameter Q characterizes the q-Gaussian distribution.  The result has the
     size indicated by S.

     Reference: W. Thistleton, J. A. Marsh, K. Nelson, C. Tsallis (2006)
     "Generalized Box-Muller method for generating q-Gaussian random deviates"
     arXiv:cond-mat/0605570 http://arxiv.org/abs/cond-mat/0605570

     See also: rand, randn.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 61
Returns random deviates drawn from a q-Gaussian distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
randsample


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 711
 -- statistics: Y = randsample (V, K)
 -- statistics: Y = randsample (V, K, REPLACEMENT=false)
 -- statistics: Y = randsample (V, K, REPLACEMENT=false, [W=[]])

     Sample elements from a vector.

     Returns K random elements from a vector V with N elements, sampled without
     or with REPLACEMENT, with an optional weight vector.

     If V is a scalar, samples from 1:V.

     If a weight vector W of the same size as V is specified, the probability of
     each element being sampled is proportional to W.  Unlike Matlab's function
     of the same name, this can be done for sampling with or without
     replacement.

     Randomization is performed using rand().

     See also: datasample, randperm.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 30
Sample elements from a vector.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
rangesearch


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6548
 -- statistics: IDX = rangesearch (X, Y, R)
 -- statistics: [IDX, D] = rangesearch (X, Y, R)
 -- statistics: [...] = rangesearch (..., NAME, VALUE)

     Find all neighbors within specified distance from input data.

     ‘IDX = rangesearch (X, Y, R)’ returns all the points in X that are within
     distance R from the points in Y.  X must be an NxP numeric matrix of input
     data, where rows correspond to observations and columns correspond to
     features or variables.  Y is an MxP numeric matrix with query points, which
     must have the same numbers of column as X.  R must be a nonnegative scalar
     value.  IDX is an Mx1 cell array, where M is the number of observations in
     Y.  The vector IDX{j} contains the indices of observations (rows) in X
     whose distances to Y(j,:) are not greater than R.

     ‘[IDX, D] = rangesearch (X, Y, R)’ also returns the distances, D, which
     correspond to the points in X that are within distance R from the points in
     Y.  D is an Mx1 cell array, where M is the number of observations in Y.
     The vector D{j} contains the distances of observations (rows) in X whose
     distances to Y(j,:) are not greater than R.

     Additional parameters can be specified by Name-Value pair arguments.

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "P"               is the Minkowski distance exponent and it must be a positive
                       scalar.  This argument is only valid when the selected
                       distance metric is "minkowski".  By default it is 2.
                       
     "Scale"           is the scale parameter for the standardized Euclidean distance
                       and it must be a nonnegative numeric vector of equal length to
                       the number of columns in X.  This argument is only valid when
                       the selected distance metric is "seuclidean", in which case
                       each coordinate of X is scaled by the corresponding element of
                       "scale", as is each query point in Y.  By default, the scale
                       parameter is the standard deviation of each coordinate in X.
                       
     "Cov"             is the covariance matrix for computing the mahalanobis
                       distance and it must be a positive definite matrix matching
                       the the number of columns in X.  This argument is only valid
                       when the selected distance metric is "mahalanobis".
                       
     "BucketSize"      is the maximum number of data points in the leaf node of the
                       Kd-tree and it must be a positive integer.  This argument is
                       only valid when the selected search method is "kdtree".
                       
     "SortIndices"     is a boolean flag to sort the returned indices in ascending
                       order by distance and it is true by default.  When the
                       selected search method is "exhaustive" or the "IncludeTies"
                       flag is true, ‘rangesearch’ always sorts the returned indices.
                       
     "Distance"        is the distance metric used by ‘rangesearch’ as specified
                       below:

          "euclidean"      Euclidean distance.
          "seuclidean"     standardized Euclidean distance.  Each coordinate
                           difference between the rows in X and the query matrix Y is
                           scaled by dividing by the corresponding element of the
                           standard deviation computed from X.  To specify a
                           different scaling, use the "Scale" name-value argument.
          "cityblock"      City block distance.
          "chebychev"      Chebychev distance (maximum coordinate difference).
          "minkowski"      Minkowski distance.  The default exponent is 2.  To
                           specify a different exponent, use the "P" name-value
                           argument.
          "mahalanobis"    Mahalanobis distance, computed using a positive definite
                           covariance matrix.  To change the value of the covariance
                           matrix, use the "Cov" name-value argument.
          "cosine"         Cosine distance.
          "correlation"    One minus the sample linear correlation between
                           observations (treated as sequences of values).
          "spearman"       One minus the sample Spearman's rank correlation between
                           observations (treated as sequences of values).
          "hamming"        Hamming distance, which is the percentage of coordinates
                           that differ.
          "jaccard"        One minus the Jaccard coefficient, which is the percentage
                           of nonzero coordinates that differ.
          @DISTFUN         Custom distance function handle.  A distance function of
                           the form ‘function D2 = distfun (XI, YI)’, where XI is a
                           1xP vector containing a single observation in
                           P-dimensional space, YI is an NxP matrix containing an
                           arbitrary number of observations in the same P-dimensional
                           space, and D2 is an NxP vector of distances, where (D2k)
                           is the distance between observations XI and (YIk,:).

     "NSMethod"        is the nearest neighbor search method used by ‘rangesearch’ as
                       specified below.

          "kdtree"         Creates and uses a Kd-tree to find nearest neighbors.
                           "kdtree" is the default value when the number of columns
                           in X is less than or equal to 10, X is not sparse, and the
                           distance metric is "euclidean", "cityblock", "manhattan",
                           "chebychev", or "minkowski".  Otherwise, the default value
                           is "exhaustive".  This argument is only valid when the
                           distance metric is one of the four aforementioned metrics.
          "exhaustive"     Uses the exhaustive search algorithm by computing the
                           distance values from all the points in X to each point in
                           Y.

     See also: knnsearch, pdist2.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 61
Find all neighbors within specified distance from input data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
ranksum


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3075
 -- statistics: P = ranksum (X, Y)
 -- statistics: P = ranksum (X, Y, ALPHA)
 -- statistics: P = ranksum (X, Y, ALPHA, NAME, VALUE)
 -- statistics: P = ranksum (X, Y, NAME, VALUE)
 -- statistics: [P, H] = ranksum (X, Y, ...)
 -- statistics: [P, H, STATS] = ranksum (X, Y, ...)

     Wilcoxon rank sum test for equal medians.  This test is equivalent to a
     Mann-Whitney U-test.

     ‘P = ranksum (X, Y)’ returns the p-value of a two-sided Wilcoxon rank sum
     test.  It tests the null hypothesis that two independent samples, in the
     vectors X and Y, come from continuous distributions with equal medians,
     against the alternative hypothesis that they are not.  X and Y can have
     different lengths and the test assumes that they are independent.

     ‘ranksum’ treats NaN in X, Y as missing values.  The two-sided p-value is
     computed by doubling the most significant one-sided value.

     ‘[P, H] = ranksum (X, Y)’ also returns the result of the hypothesis test
     with ‘H = 1’ indicating a rejection of the null hypothesis at the default
     alpha = 0.05 significance level, and ‘H = 0’ indicating a failure to reject
     the null hypothesis at the same significance level.

     ‘[P, H, STATS] = ranksum (X, Y)’ also returns the structure STATS with
     information about the test statistic.  It contains the field ‘ranksum’ with
     the value of the rank sum test statistic and if computed with the
     "approximate" method it also contains the value of the z-statistic in the
     field ‘zval’.

     ‘[...] = ranksum (X, Y, ALPHA)’ or alternatively ‘[...] = ranksum (X, Y,
     "alpha", ALPHA)’ returns the result of the hypothesis test performed at the
     significance level ALPHA.

     ‘[...] = ranksum (X, Y, "method", M)’ defines the computation method of the
     p-value specified in M, which can be "exact", "approximate", or "oldexact".
     M must be a single string.  When "method" is unspecified, the default is:
     "exact" when ‘min (length (X), length (Y)) < 10’ and ‘length (X) + length
     (Y) < 10’, otherwise the "approximate" method is used.

        • "exact" method uses full enumeration for small total sample size (<
          10), otherwise the network algorithm is used for larger samples.
        • "approximate" uses normal approximation method for computing the
          p-value.
        • "oldexact" uses full enumeration for any sample size.  Note, that this
          option can lead to out of memory error for large samples.  Use with
          caution!

     ‘[...] = ranksum (X, Y, "tail", TAIL)’ defines the type of test, which can
     be "both", "right", or "left".  TAIL must be a single string.

        • "both" - "medians are not equal" (two-tailed test, default)
        • "right" - "median of X is greater than median of Y" (right-tailed
          test)
        • "left" - "median of X is less than median of Y" (left-tailed test)

     Note: the rank sum statistic is based on the smaller sample of vectors X
     and Y.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 41
Wilcoxon rank sum test for equal medians.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
regress


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1449
 -- statistics: [B, BINT, R, RINT, STATS] = regress (Y, X, [ALPHA])

     Multiple Linear Regression using Least Squares Fit of Y on X with the model
     ‘y = X * beta + e’.

     Here,

        • ‘y’ is a column vector of observed values
        • ‘X’ is a matrix of regressors, with the first column filled with the
          constant value 1
        • ‘beta’ is a column vector of regression parameters
        • ‘e’ is a column vector of random errors

     Arguments are

        • Y is the ‘y’ in the model
        • X is the ‘X’ in the model
        • ALPHA is the significance level used to calculate the confidence
          intervals BINT and RINT (see 'Return values' below).  If not
          specified, ALPHA defaults to 0.05

     Return values are

        • B is the ‘beta’ in the model
        • BINT is the confidence interval for B
        • R is a column vector of residuals
        • RINT is the confidence interval for R
        • STATS is a row vector containing:

             • The R^2 statistic
             • The F statistic
             • The p value for the full model
             • The estimated error variance

     R and RINT can be passed to ‘rcoplot’ to visualize the residual intervals
     and identify outliers.

     NaN values in Y and X are removed before calculation begins.

     See also: regress_gp, regression_ftest, regression_ttest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 82
Multiple Linear Regression using Least Squares Fit of Y on X with the model ‘...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
regress_gp


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2393
 -- statistics: [YFIT, YINT, M, K] = regress_gp (X, Y, XFIT)
 -- statistics: [YFIT, YINT, M, K] = regress_gp (X, Y, XFIT, "linear")
 -- statistics: [YFIT, YINT, YSD] = regress_gp (X, Y, XFIT, "rbf")
 -- statistics: [...] = regress_gp (X, Y, XFIT, "linear", SP)
 -- statistics: [...] = regress_gp (X, Y, XFIT, SP)
 -- statistics: [...] = regress_gp (X, Y, XFIT, "rbf", THETA)
 -- statistics: [...] = regress_gp (X, Y, XFIT, "rbf", THETA, G)
 -- statistics: [...] = regress_gp (X, Y, XFIT, "rbf", THETA, G, ALPHA)
 -- statistics: [...] = regress_gp (X, Y, XFIT, THETA)
 -- statistics: [...] = regress_gp (X, Y, XFIT, THETA, G)
 -- statistics: [...] = regress_gp (X, Y, XFIT, THETA, G, ALPHA)

     Regression using Gaussian Processes.

     ‘[YFIT, YINT, M, K] = regress_gp (X, Y, XFIT)’ will estimate a linear
     Gaussian Process model M in the form Y = X' * M, where X is an NxP matrix
     with N observations in P dimensional space and Y is an Nx1 column vector as
     the dependent variable.  The information about errors of the predictions
     (interpolation/extrapolation) is given by the covariance matrix K.  By
     default, the linear model defines the prior covariance of M as ‘SP = 100 *
     eye (size (X, 2) + 1)’.  A custom prior covariance matrix can be passed as
     SP, which must be a P+1xP+1 positive definite matrix.  The model is
     evaluated for input XFIT, which must have the same columns as X, and the
     estimates are returned in YFIT along with the estimated variation in YINT.
     YINT(:,1) contains the upper boundary estimate and YINT(:,1) contains the
     upper boundary estimate with respect to YFIT.

     ‘[YFIT, YINT, YSD, K] = regress_gp (X, Y, XFIT, "rbf")’ will estimate a
     Gaussian Process model with a Radial Basis Function (RBF) kernel with
     default parameters THETA = 5, which corresponds to the characteristic
     lengthscale, and G = 0.01, which corresponds to the nugget effect, and
     ALPHA = 0.05 which defines the confidence level for the estimated intervals
     returned in YINT.  The function also returns the predictive covariance
     matrix in YSD.  For multidimensional predictors X the function will
     automatically normalize each column to a zero mean and a standard deviation
     to one.

     Run ‘demo regress_gp’ to see examples.

     See also: regress, regression_ftest, regression_ttest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 36
Regression using Gaussian Processes.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
regression_ftest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2390
 -- statistics: [H, PVAL, STATS] = regression_ftest (Y, X, FM)
 -- statistics: [...] = regression_ftest (Y, X, FM, RM)
 -- statistics: [...] = regression_ftest (Y, X, FM, RM, NAME, VALUE)
 -- statistics: [...] = regression_ftest (Y, X, FM, [], NAME, VALUE)

     F-test for General Linear Regression Analysis

     Perform a general linear regression F test for the null hypothesis that the
     full model of the form y = b_0 + b_1 * x_1 + b_2 * x_2 + ... + b_n * x_n +
     e, where n is the number of variables in X, does not perform better than a
     reduced model, such as y = b'_0 + b'_1 * x_1 + b'_2 * x_2 + ... + b'_k *
     x_k + e, where k < n and it corresponds to the first k variables in X.
     Explanatory (dependent) variable Y and response (independent) variables X
     must not contain any missing values (NaNs).

     The full model, FM, must be a vector of length equal to the columns of X,
     in which case the constant term b_0 is assumed 0, or equal to the columns
     of X plus one, in which case the first element is the constant b_0.

     The reduced model, RM, must include the constant term and a subset of the
     variables (columns) in X.  If RM is not given, then a constant term b'_0 is
     assumed equal to the constant term, b_0, of the full model or 0, if the
     full model, FM, does not have a constant term.  RM must be a vector or a
     scalar if only a constant term is passed into the function.

     Name-Value pair arguments can be used to set statistical significance.
     "alpha" can be used to specify the significance level of the test (the
     default value is 0.05).  If you want to pass optional Name-Value pair
     without a reduced model, make sure that the latter is passed as an empty
     variable.

     If H is 1 the null hypothesis is rejected, meaning that the full model
     explains the variance better than the restricted model.  If H is 0, it can
     be assumed that the full model does NOT explain the variance any better
     than the restricted model.

     The p-value (1 minus the CDF of this distribution at F) is returned in
     PVAL.

     Under the null, the test statistic F follows an F distribution with 'df1'
     and 'df2' degrees of freedom, which are returned as fields in the STATS
     structure along with the test's F-statistic, 'fstat'

     See also: regression_ttest, regress, regress_gp.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 45
F-test for General Linear Regression Analysis



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
regression_ttest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1782
 -- statistics: H = regression_ttest (Y, X)
 -- statistics: [H, PVAL] = regression_ttest (Y, X)
 -- statistics: [H, PVAL, CI] = regression_ttest (Y, X)
 -- statistics: [H, PVAL, CI, STATS] = regression_ttest (Y, X)
 -- statistics: [...] = regression_ttest (Y, X, NAME, VALUE)

     Perform a linear regression t-test.

     ‘H = regression_ttest (Y, X)’ tests the null hypothesis that the slope
     beta1 of a simple linear regression equals 0.  The result is H = 0 if the
     null hypothesis cannot be rejected at the 5% significance level, or H = 1
     if the null hypothesis can be rejected at the 5% level.  Y and X must be
     vectors of equal length with finite real numbers.

     The p-value of the test is returned in PVAL.  A 100(1-alpha)% confidence
     interval for beta1 is returned in CI.  STATS is a structure containing the
     value of the test statistic (tstat), the degrees of freedom (df), the slope
     coefficient (beta1), and the intercept (beta0).  Under the null, the test
     statistic STATS.tstat follows a T-distribution with STATS.df degrees of
     freedom.

     ‘[...] = regression_ttest (..., NAME, VALUE)’ specifies one or more of the
     following name/value pairs:

          Name             Value
     -----------------------------------------------------------------------------------
          "alpha"          the significance level.  Default is 0.05.
                           
          "tail"           a string specifying the alternative hypothesis
              "both"               beta1 is not 0 (two-tailed, default)
              "left"               beta1 is less than 0 (left-tailed)
              "right"              beta1 is greater than 0 (right-tailed)

     See also: regression_ftest, regress, regress_gp.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 35
Perform a linear regression t-test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
ridge


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1421
 -- statistics: B = ridge (Y, X, K)
 -- statistics: B = ridge (Y, X, K, SCALED)

     Ridge regression.

     ‘B = ridge (Y, X, K)’ returns the vector of coefficient estimates by
     applying ridge regression from the predictor matrix X to the response
     vector Y.  Each value of B is the coefficient for the respective ridge
     parameter given K.  By default, B is calculated after centering and scaling
     the predictors to have a zero mean and standard deviation 1.

     ‘B = ridge (Y, X, K, SCALED)’ performs the regression with the specified
     scaling of the coefficient estimates B.  When SCALED = 0, the function
     restores the coefficients to the scale of the original data thus is more
     useful for making predictions.  When SCALED = 1, the coefficient estimates
     correspond to the scaled centered data.

        • ‘y’ must be an Nx1 numeric vector with the response data.
        • ‘X’ must be an Nxp numeric matrix with the predictor data.
        • ‘k’ must be a numeric vector with the ridge parameters.
        • ‘scaled’ must be a numeric scalar indicating whether the coefficient
          estimates in B are restored to the scale of the original data.  By
          default, SCALED = 1.

     Further information about Ridge regression can be found at
     <https://en.wikipedia.org/wiki/Ridge_regression>

     See also: lasso, stepwisefit, regress.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 17
Ridge regression.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
rmmissing


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2435
 -- statistics: R = rmmissing (A)
 -- statistics: R = rmmissing (A, DIM)
 -- statistics: R = rmmissing (..., NAME, VALUE)
 -- statistics: [R, TF] = rmmissing (...)

     Remove missing data from arrays.

     Given an input vector or matrix (2-D array) A, ‘R = rmmissing (A)’ returns
     an output vector or matrix R of the same type as input A and any missing
     elements removed.  If A is a vector, missing elements are removed
     individually, if A is a matrix, then rows containing missing elements are
     removed.

     Standard missing values and their corresponding data types are:

        • NaN - for double, single, duration, and calendarDuration arrays.
        • NaT - for datetime arrays.
        • <missing> - for string arrays.
        • <undefined> - for categorical arrays.
        • {0x0 char} - for cell arrays of character vectors.

     For any data types that do not support missing values, ‘rmmissing’ returns
     ‘R == A’ and if a second output argument is requested it also returns ‘TF =
     false (size (A))’.

     Given an input matrix (2-D array) A, ‘R = rmmissing (A, DIM)’ further
     specifies whether rows or columns containing missing data are removed from
     the output R based on the value of DIM, which must be either 1 or 0.

        • 1: remove rows.

        • 2: remove columns.

     ‘R = rmmissing (..., NAME, VALUE)’ also accepts the following paired
     arguments.

     Name                  Value
     -----------------------------------------------------------------------------------
     'MinNumMissing'       A positive integer scalar value specifying the required
                           minimum number of missing values for removing any
                           particular row or column from a matrix input.  Note that
                           this argument is ignored if input A is a vector.
                           
     'MissingLocations'    A logical array of the same size as input A indexing the
                           locations of missing values in input array A.  Note that
                           specifying 'MissingLocations' overrides any standard
                           missing values in A.

     Optional return value TF is a logical array where ‘true’ values represent
     removed entries, rows or columns from the original data A.

     See also: fillmissing, ismissing, standardizeMissing.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 32
Remove missing data from arrays.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
runstest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2336
 -- statistics: H = runstest (X)
 -- statistics: H = runstest (X, V)
 -- statistics: H = runstest (X, "ud")
 -- statistics: H = runstest (..., NAME, VALUE)
 -- statistics: [H, PVAL, STATS] = runstest (...)

     Run test for randomness in the vector X.

     ‘H = runstest (X)’ calculates the number of runs of consecutive values
     above or below the mean of X and tests the null hypothesis that the values
     in the data vector X come in random order.  H is 1 if the test rejects the
     null hypothesis at the 5% significance level, or 0 otherwise.

     ‘H = runstest (X, V)’ tests the null hypothesis based on the number of runs
     of consecutive values above or below the specified reference value V.
     Values exactly equal to V are omitted.

     ‘H = runstest (X, "ud")’ calculates the number of runs up or down and tests
     the null hypothesis that the values in the data vector X follow a trend.
     Too few runs indicate a trend, while too many runs indicate an oscillation.
     Values exactly equal to the preceding value are omitted.

     ‘H = runstest (..., NAME, VALUE)’ specifies additional options to the above
     tests by one or more NAME-VALUE pair arguments.

     Name              Value
     -----------------------------------------------------------------------------------
     "alpha"           the significance level.  Default is 0.05.
                       
     "method"          a string specifying the method used to compute the p-value of
                       the test.  It can be either "exact" to use an exact algorithm,
                       or "approximate" to use a normal approximation.  The default
                       is "exact" for runs above/below, and for runs up/down when the
                       length of x is less than or equal to 50.  When testing for
                       runs up/down and the length of X is greater than 50, then the
                       default is "approximate", and the "exact" method is not
                       available.
                       
     "tail"            a string specifying the alternative hypothesis
                      "both"            two-tailed (default)
                      "left"            left-tailed
                      "right"           right-tailed

     See also: signrank, signtest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 40
Run test for randomness in the vector X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
sampsizepwr


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6294
 -- statistics: N = sampsizepwr (TESTTYPE, PARAMS, P1)
 -- statistics: N = sampsizepwr (TESTTYPE, PARAMS, P1, POWER)
 -- statistics: POWER = sampsizepwr (TESTTYPE, PARAMS, P1, [], N)
 -- statistics: P1 = sampsizepwr (TESTTYPE, PARAMS, [], POWER, N)
 -- statistics: [N1, N2] = sampsizepwr ("t2", PARAMS, P1, POWER)
 -- statistics: [...] = sampsizepwr (TESTTYPE, PARAMS, P1, POWER, N, NAME,
          VALUE)

     Sample size and power calculation for hypothesis test.

     ‘sampsizepwr’ computes the sample size, power, or alternative parameter
     value for a hypothesis test, given the other two values.  For example, you
     can compute the sample size required to obtain a particular power for a
     hypothesis test, given the parameter value of the alternative hypothesis.

     ‘N = sampsizepwr (TESTTYPE, PARAMS, P1)’ returns the sample size N required
     for a two-sided test of the specified type to have a power (probability of
     rejecting the null hypothesis when the alternative is true) of 0.90 when
     the significance level (probability of rejecting the null hypothesis when
     the null hypothesis is true) is 0.05.  PARAMS specifies the parameter
     values under the null hypothesis.  P1 specifies the value of the single
     parameter being tested under the alternative hypothesis.  For the
     two-sample t-test, N is the value of the equal sample size for both
     samples, PARAMS specifies the parameter values of the first sample under
     the null and alternative hypotheses, and P1 specifies the value of the
     single parameter from the other sample under the alternative hypothesis.

     The following TESTTYPE values are available:

          "z"      one-sample z-test for normally distributed data with known
                   standard deviation.  PARAMS is a two-element vector [MU0 SIGMA0]
                   of the mean and standard deviation, respectively, under the null
                   hypothesis.  P1 is the value of the mean under the alternative
                   hypothesis.
          "t"      one-sample t-test or paired t-test for normally distributed data
                   with unknown standard deviation.  PARAMS is a two-element vector
                   [MU0 SIGMA0] of the mean and standard deviation, respectively,
                   under the null hypothesis.  P1 is the value of the mean under the
                   alternative hypothesis.
          "t2"     two-sample pooled t-test (test for equal means) for normally
                   distributed data with equal unknown standard deviations.  PARAMS
                   is a two-element vector [MU0 SIGMA0] of the mean and standard
                   deviation of the first sample under the null and alternative
                   hypotheses.  P1 is the the mean of the second sample under the
                   alternative hypothesis.
          "var"    chi-square test of variance for normally distributed data.  PARAMS
                   is the variance under the null hypothesis.  P1 is the variance
                   under the alternative hypothesis.
          "p"      test of the P parameter (success probability) for a binomial
                   distribution.  PARAMS is the value of P under the null hypothesis.
                   P1 is the value of P under the alternative hypothesis.
          "r"      test of the correlation coefficient parameter for significance.
                   PARAMS is the value of r under the null hypothesis.  P1 is the
                   value of r under the alternative hypothesis.

     The "p" test for the binomial distribution is a discrete test for which
     increasing the sample size does not always increase the power.  For N
     values larger than 200, there may be values smaller than the returned N
     value that also produce the desired power.

     ‘N = sampsizepwr (TESTTYPE, PARAMS, P1, POWER)’ returns the sample size N
     such that the power is POWER for the parameter value P1.  For the
     two-sample t-test, N is the equal sample size of both samples.

     ‘[N1, N2] = sampsizepwr ("t2", PARAMS, P1, POWER)’ returns the sample sizes
     N1 and N2 for the two samples.  These values are the same unless the
     "ratio" parameter, ‘RATIO = N2 / N2’, is set to a value other than the
     default (See the name/value pair definition of ratio below).

     ‘POWER = sampsizepwr (TESTTYPE, PARAMS, P1, [], N)’ returns the power
     achieved for a sample size of N when the true parameter value is P1.  For
     the two-sample t-test, N is the smaller one of the two sample sizes.

     ‘P1 = sampsizepwr (TESTTYPE, PARAMS, [], POWER, N)’ returns the parameter
     value detectable with the specified sample size N and power POWER.  For the
     two-sample t-test, N is the smaller one of the two sample sizes.  When
     computing P1 for the "p" test, if no alternative can be rejected for a
     given PARAMS, N and POWER value, the function displays a warning message
     and returns NaN.

     ‘[...] = sampsizepwr (..., N, NAME, VALUE)’ specifies one or more of the
     following NAME / VALUE pairs:

          "alpha"      significance level of the test (default is 0.05)
          "tail"       the type of test which can be:

              "both"           two-sided test for an alternative P1 not equal to
                               PARAMS
                               
              "right"          one-sided test for an alternative P1 larger than
                               PARAMS
                               
              "left"           one-sided test for an alternative P1 smaller than
                               PARAMS

          "ratio"      desired ratio N2 / N2 of the larger sample size N2 to the
                       smaller sample size N1.  Used only for the two-sample t-test.
                       The value of ‘RATIO’ is greater than or equal to 1 (default is
                       1).

     ‘sampsizepwr’ computes the sample size, power, or alternative hypothesis
     value given values for the other two.  Specify one of these as [] to
     compute it.  The remaining parameters (and ALPHA, RATIO) can be scalars or
     arrays of the same size.

     See also: vartest, ttest, ttest2, ztest, binocdf.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Sample size and power calculation for hypothesis test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
sigma_pts


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1095
 -- statistics: PTS = sigma_pts (N)
 -- statistics: PTS = sigma_pts (N, M)
 -- statistics: PTS = sigma_pts (N, M, K)
 -- statistics: PTS = sigma_pts (N, M, K, L)

     Calculates 2*N+1 sigma points in N dimensions.

     Sigma points are used in the unscented transform to estimate the result of
     applying a given nonlinear transformation to a probability distribution
     that is characterized only in terms of a finite set of statistics.

     If only the dimension N is given the resulting points have zero mean and
     identity covariance matrix.  If the mean M or the covariance matrix K are
     given, then the resulting points will have those statistics.  The factor L
     scales the points away from the mean.  It is useful to tune the accuracy of
     the unscented transform.

     There is no unique way of computing sigma points, this function implements
     the algorithm described in section 2.6 "The New Filter" pages 40-41 of

     Uhlmann, Jeffrey (1995).  "Dynamic Map Building and Localization: New
     Theoretical Foundations".  Ph.D. thesis.  University of Oxford.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Calculates 2*N+1 sigma points in N dimensions.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
signrank


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4412
 -- statistics: PVAL = signrank (X)
 -- statistics: PVAL = signrank (X, MY)
 -- statistics: PVAL = signrank (X, MY, NAME, VALUE)
 -- statistics: [PVAL, H] = signrank (...)
 -- statistics: [PVAL, H, STATS] = signrank (...)

     Wilcoxon signed rank test for median.

     ‘PVAL = signrank (X)’ returns the p-value of a two-sided Wilcoxon signed
     rank test.  It tests the null hypothesis that data in X come from a
     distribution with zero median at the 5% significance level under the
     assumption that the distribution is symmetric about its median.  X must be
     a vector.

     If the second argument MY is a scalar, the null hypothesis is that X has
     median MY, whereas if MY is a vector, the null hypothesis is that the
     distribution of ‘X - MY’ has zero median.

     ‘PVAL = signrank (..., NAME, VALUE)’ performs the Wilcoxon signed rank test
     with additional options specified by one or more of the following NAME,
     VALUE pair arguments:

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "alpha"           A scalar value for the significance level of the test.
                       Default is 0.05.
                       
     "tail"            A character vector specifying the alternative hypothesis.  It
                       can take one of the following values:

          VALUE            DESCRIPTION
                           
     -----------------------------------------------------------------------------------
          "both"           For one-sample test (MY is empty or a scalar), the data in
                           X come from a continuous distribution with median
                           different than zero or MY.  For two-sample test (MY is a
                           vector), the data in X - MY come from a continuous
                           distribution with median different than zero.
                           
          "left"           For one-sample test (MY is empty or a scalar), the data in
                           X come from a continuous distribution with median less
                           than zero or MY.  For two-sample test (MY is a vector),
                           the data in X - MY come from a continuous distribution
                           with median less than zero.
                           
          "right"          For one-sample test (MY is empty or a scalar), the data in
                           X come from a continuous distribution with median greater
                           than zero or MY.  For two-sample test (MY is a vector),
                           the data in X - MY come from a continuous distribution
                           with median greater than zero.

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "method"          A character vector specifying the method for computing the
                       p-value.  It can take one of the following values:

          VALUE            DESCRIPTION
                           
     -----------------------------------------------------------------------------------
          "exact"          Exact computation of the p-value.  It is the default value
                           for 15 of fewer observations when "method" is not
                           specified.
                           
          "approximate"    Using normal approximation for computing the p-value.  It
                           is the default value for more than 15 observations when
                           "method" is not specified.

     ‘[PVAL, H] = signrank (...)’ also returns a logical value indicating the
     test decision.  If H is 0, the null hypothesis is accepted, whereas if H is
     1, the null hypothesis is rejected.

     ‘[PVAL, H, STATS] = signrank (...)’ also returns the structure STATS
     containing the following fields:

     FIELD             VALUE
     -----------------------------------------------------------------------------------
     signedrank        Value of the sign rank test statistic.
                       
     zval              Value of the z-statistic (only computed when the "method" is
                       "approximate").

     See also: tiedrank, signtest, runstest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 37
Wilcoxon signed rank test for median.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
signtest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4296
 -- statistics: PVAL = signtest (X)
 -- statistics: PVAL = signtest (X, MY)
 -- statistics: PVAL = signtest (X, MY, NAME, VALUE)
 -- statistics: [PVAL, H] = signtest (...)
 -- statistics: [PVAL, H, STATS] = signtest (...)

     Signed test for median.

     ‘PVAL = signtest (X)’ returns the p-value of a two-sided sign test.  It
     tests the null hypothesis that data in X come from a distribution with zero
     median at the 5% significance level.  X must be a vector.

     If the second argument MY is a scalar, the null hypothesis is that X has
     median MY, whereas if MY is a vector, the null hypothesis is that the
     distribution of ‘X - MY’ has zero median.

     ‘PVAL = signtest (..., NAME, VALUE)’ performs the Wilcoxon signed rank test
     with additional options specified by one or more of the following NAME,
     VALUE pair arguments:

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "alpha"           A scalar value for the significance level of the test.
                       Default is 0.05.
                       
     "tail"            A character vector specifying the alternative hypothesis.  It
                       can take one of the following values:

          VALUE            DESCRIPTION
                           
     -----------------------------------------------------------------------------------
          "both"           For one-sample test (MY is empty or a scalar), the data in
                           X come from a continuous distribution with median
                           different than zero or MY.  For two-sample test (MY is a
                           vector), the data in X - MY come from a continuous
                           distribution with median different than zero.
                           
          "left"           For one-sample test (MY is empty or a scalar), the data in
                           X come from a continuous distribution with median less
                           than zero or MY.  For two-sample test (MY is a vector),
                           the data in X - MY come from a continuous distribution
                           with median less than zero.
                           
          "right"          For one-sample test (MY is empty or a scalar), the data in
                           X come from a continuous distribution with median greater
                           than zero or MY.  For two-sample test (MY is a vector),
                           the data in X - MY come from a continuous distribution
                           with median greater than zero.

     NAME              VALUE
                       
     -----------------------------------------------------------------------------------
     "method"          A character vector specifying the method for computing the
                       p-value.  It can take one of the following values:

          VALUE            DESCRIPTION
                           
     -----------------------------------------------------------------------------------
          "exact"          Exact computation of the p-value.  It is the default value
                           for fewer than 100 observations when "method" is not
                           specified.
                           
          "approximate"    Using normal approximation for computing the p-value.  It
                           is the default value for 100 or more observations when
                           "method" is not specified.

     ‘[PVAL, H] = signtest (...)’ also returns a logical value indicating the
     test decision.  If H is 0, the null hypothesis is accepted, whereas if H is
     1, the null hypothesis is rejected.

     ‘[PVAL, H, STATS] = signtest (...)’ also returns the structure STATS
     containing the following fields:

     FIELD             VALUE
     -----------------------------------------------------------------------------------
     sign              Value of the sign test statistic.
                       
     zval              Value of the z-statistic (only computed when the "method" is
                       "approximate").

     See also: signrank, tiedrank, runstest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 23
Signed test for median.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
silhouette


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1996
 -- statistics: silhouette (X, CLUST)
 -- statistics: [SI, H] = silhouette (X, CLUST)
 -- statistics: [SI, H] = silhouette (..., METRIC, METRICARG)

     Compute the silhouette values of clustered data and show them on a plot.

     X is a n-by-p matrix of n data points in a p-dimensional space.  Each
     datapoint is assigned to a cluster using CLUST, a vector of n elements, one
     cluster assignment for each data point.

     Each silhouette value of SI, a vector of size n, is a measure of the
     likelihood that a data point is accurately classified to the right cluster.
     Defining "a" as the mean distance between a point and the other points from
     its cluster, and "b" as the mean distance between that point and the points
     from other clusters, the silhouette value of the i-th point is:

              bi - ai
     Si =  ------------
            max(ai,bi)

     Each element of SI ranges from -1, minimum likelihood of a correct
     classification, to 1, maximum likelihood.

     Optional input value METRIC is the metric used to compute the distances
     between data points.  Since ‘silhouette’ uses ‘pdist’ to compute these
     distances, METRIC is similar to the DISTANCE input argument of ‘pdist’ and
     it can be:
        • A known distance metric defined as a string: euclidean,
          squaredeuclidean (default), seuclidean, mahalanobis, cityblock,
          minkowski, chebychev, cosine, correlation, hamming, jaccard, or
          spearman.

        • A vector as those created by ‘pdist’.  In this case X does nothing.

        • A function handle that is passed to ‘pdist’ with METRICARG as optional
          inputs.

     Optional return value H is a handle to the silhouette plot.

     *Reference* Peter J. Rousseeuw, Silhouettes: a Graphical Aid to the
     Interpretation and Validation of Cluster Analysis.  1987.
     doi:10.1016/0377-0427(87)90125-7

See also: dendrogram, evalclusters, kmeans, linkage, pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 72
Compute the silhouette values of clustered data and show them on a plot.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
slicesample


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2094
 -- statistics: [SMPL, NEVAL] = slicesample (START, NSAMPLES, PROPERTY, VALUE,
          ...)

     Draws NSAMPLES samples from a target stationary distribution PDF using
     slice sampling of Radford M. Neal.

     Input:
        • START is a 1 by DIM vector of the starting point of the Markov chain.
          Each column corresponds to a different dimension.

        • NSAMPLES is the number of samples, the length of the Markov chain.

     Next, several property-value pairs can or must be specified, they are:

     (Required properties) One of:

        • "PDF": the value is a function handle of the target stationary
          distribution to be sampled.  The function should accept different
          locations in each row and each column corresponds to a different
          dimension.

          or

        • LOGPDF: the value is a function handle of the log of the target
          stationary distribution to be sampled.  The function should accept
          different locations in each row and each column corresponds to a
          different dimension.

     The following input property/pair values may be needed depending on the
     desired output:

        • "burnin" BURNIN the number of points to discard at the beginning, the
          default is 0.

        • "thin" THIN omits M-1 of every M points in the generated Markov chain.
          The default is 1.

        • "width" WIDTH the maximum Manhattan distance between two samples.  The
          default is 10.

     Outputs:

        • SMPL is a NSAMPLES by DIM matrix of random values drawn from PDF where
          the rows are different random values, the columns correspond to the
          dimensions of PDF.

        • NEVAL is the number of function evaluations per sample.
     Example : Sampling from a normal distribution

          start = 1;
          nsamples = 1e3;
          pdf = @(x) exp (-.5 * x .^ 2) / (pi ^ .5 * 2 ^ .5);
          [smpl, accept] = slicesample (start, nsamples, "pdf", pdf, "thin", 4);
          histfit (smpl);

     See also: rand, mhsample, randsample.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Draws NSAMPLES samples from a target stationary distribution PDF using slice
...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
squareform


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1128
 -- statistics: Z = squareform (Y)
 -- statistics: Y = squareform (Z)
 -- statistics: Z = squareform (Y, "tovector")
 -- statistics: Y = squareform (Z, "tomatrix")

     Interchange between distance matrix and distance vector formats.

     Converts between a hollow (diagonal filled with zeros), square, and
     symmetric matrix and a vector of the lower triangular part.

     Its target application is the conversion of the vector returned by ‘pdist’
     into a distance matrix.  It performs the opposite operation if input is a
     matrix.

     If X is a vector, its number of elements must fit into the triangular part
     of a matrix (main diagonal excluded).  In other words, ‘numel (X) = N * (N
     - 1) / 2’ for some integer N.  The resulting matrix will be N by N.

     If X is a distance matrix, it must be square and the diagonal entries of X
     must all be zeros.  If X is not symmetric, only the lower triangular part
     is used.

     The second argument is used to specify the output type in case there is a
     single element.  It will default to "tomatrix" otherwise.

     See also: pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 64
Interchange between distance matrix and distance vector formats.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 18
standardizeMissing


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1756
 -- statistics: B = standardizeMissing (A, INDICATOR)

     Replace selected values by standard missing values.

     ‘Β = standardizeMissing (A, INDICATOR)’ returns a standardized array B of
     the same size and data type as the input array A and with all elements
     specified by INDICATOR replaced by the standard missing value corresponding
     the data type of A.  INDICATOR can be either a scalar or a vector.

     Standard missing values and their corresponding data types are:

        • NaN - for double, single, duration, and calendarDuration arrays.
        • NaT - for datetime arrays.
        • <missing> - for string arrays.
        • <undefined> - for categorical arrays.
        • {0x0 char} - for cell arrays of character vectors.

     For any other data type input that does not support missing values,
     ‘standardizeMissing’ returns ‘B = A’ and any INDICATOR value is ignored.

     The nonstandard missing value INDICATOR must be of the same type as the
     data input A or have a compatible data types according to the following
     rules:

        • all numeric indicators match both double and single data types in A.
        • indicators specified as string arrays, char vectors, and ‘cell’ arrays
          of character vectors match categorical data type in A.
        • a char vector matches a cell array of character vectors in A.

     Note: the generic ‘standardizeMissing’ function from the statistics does
     not operate on table inputs, which is handled by the overloaded method of
     the table class.  Use ‘help table.standardizeMissing’ to find more
     information about the functional specialization on tables.

     See also: fillmissing, ismissing, rmmissing.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 51
Replace selected values by standard missing values.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
stepwisefit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3652
 -- statistics: stepwisefit (X, Y)
 -- statistics: B = stepwisefit (X, Y)
 -- statistics: B, SE, PVAL, FINALMODEL, STATS, NEXTSTEP, HISTORY = stepwisefit
          (X, Y, VARARGIN)

     Perform stepwise linear regression using conditional p-value criteria.

     ‘stepwisefit’ fits a linear regression model to response vector Y using
     predictor matrix X and performs stepwise variable selection based on
     hypothesis tests for individual regression coefficients.

     At each iteration, predictors not currently in the model are tested for
     inclusion using partial F- or t-tests.  The predictor with the smallest
     p-value below the entry threshold is added.  Predictors currently in the
     model (excluding forced predictors) are then tested for removal, and the
     predictor with the largest p-value exceeding the removal threshold is
     removed.  The procedure repeats until the model stabilizes or the maximum
     number of iterations is reached.

     After variable selection, the final regression model is refit using
     ‘regress’ to compute coefficient estimates and inferential statistics for
     both included and excluded predictors.

     Arguments
     ---------

        • X is an N-by-P numeric matrix of predictor variables.

        • Y is an N-by-1 numeric response vector.

        • Optional Name–Value pairs may be supplied to control the stepwise
          selection procedure.

     Name–Value Arguments
     --------------------

     "InModel"
          Logical row vector of length P specifying predictors that are
          initially included in the model.

     "Keep"
          Logical row vector of length P specifying predictors that must remain
          in the model and are never removed during stepwise selection.

     "PEnter"
          Scalar significance level in the open interval (0,1) specifying the
          maximum p-value required for a predictor to enter the model.  Default
          is ‘0.05’.

     "PRemove"
          Scalar significance level in the open interval (0,1) specifying the
          minimum p-value required for a predictor to be removed from the model.
          If not specified, a default value greater than or equal to "PEnter" is
          used.

     "MaxIter"
          Positive integer specifying the maximum number of stepwise iterations.
          Default is ‘Inf’.

     "Scale"
          Either "on" or "off".  When enabled, predictors are standardized prior
          to stepwise selection only.  Final regression coefficients are always
          reported on the original data scale.

     "Display"
          Either "on" or "off".  Accepted for compatibility but currently does
          not affect output.

     Return Values
     -------------

        • B is a P-by-1 vector of regression coefficients.  Coefficients for
          excluded predictors are computed conditionally.

        • SE is a P-by-1 vector of standard errors.

        • PVAL is a P-by-1 vector of two-sided p-values.

        • FINALMODEL is a logical row vector indicating which predictors are
          included in the final model.

        • STATS is a structure containing regression diagnostics, including sums
          of squares, degrees of freedom, residuals, covariance estimates,
          F-statistic, and related quantities.

        • NEXTSTEP is a scalar indicating whether an additional stepwise
          iteration is recommended.  Currently always zero.

        • HISTORY is a structure summarizing the final model state, including
          selected predictors and coefficient history.

     See also: regress.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 70
Perform stepwise linear regression using conditional p-value criteria.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
tabulate


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1204
 -- statistics: tabulate (X)
 -- statistics: TBL = tabulate (X)

     Create a frequency table of unique values in vector X.

     ‘tabulate (x)’ displays a frequency table of the data in the vector X.  The
     input X can be a numeric vector, a logical vector, a character array, a
     cell array of strings, a categorical array, or a string array.

     The table displays the value, the number of instances (count), and the
     percentage of that value in X.  If no output argument is requested, the
     table is displayed in the command window.

     ‘TBL = tabulate (X)’ returns the frequency table, TBL, as a numeric matrix
     when X is numeric and as a cell array otherwise.

     If X is numeric, any missing values (NaNs) are ignored.  Similarly,
     undefined elements in categorical arrays and missing elements in string
     arrays are ignored.

     If all the elements of X are positive integers, then the frequency table
     includes 0 counts for the integers between 1 and max (X) that do not appear
     in X.

     For categorical arrays, the frequency table includes 0 counts for any
     categories that are defined but do not appear in X.

     See also: bar, pareto.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Create a frequency table of unique values in vector X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
tiedrank


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1045
 -- statistics: [R, TIEADJ] = tiedrank (X)
 -- statistics: [R, TIEADJ] = tiedrank (X, TIEFLAG)
 -- statistics: [R, TIEADJ] = tiedrank (X, TIEFLAG, BIDIR)

     Compute rank adjusted for ties.

     ‘[R, TIEADJ] = tiedrank (X)’ computes the ranks of the values in vector X.
     If any values in X are tied, ‘tiedrank’ computes their average rank.  The
     return value TIEADJ is an adjustment for ties required by the nonparametric
     tests ‘signrank’ and ‘ranksum’, and for the computation of Spearman's rank
     correlation.

     ‘[R, TIEADJ] = tiedrank (X, 1)’ computes the ranks of the values in the
     vector X.  TIEADJ is a vector of three adjustments for ties required in the
     computation of Kendall's tau.  ‘tiedrank (X, 0)’ is the same as ‘tiedrank
     (X)’.

     ‘[R, TIEADJ] = tiedrank (X, 0, 1)’ computes the ranks from each end, so
     that the smallest and largest values get rank 1, the next smallest and
     largest get rank 2, etc.  These ranks are used in the Ansari-Bradley test.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 31
Compute rank adjusted for ties.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
trimmean


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2953
 -- statistics: M = trimmean (X, P)
 -- statistics: M = trimmean (X, P, FLAG)
 -- statistics: M = trimmean (..., "all")
 -- statistics: M = trimmean (..., DIM)
 -- statistics: M = trimmean (..., VECDIM)

     Compute the trimmed mean.

     The trimmed mean of X is defined as the mean of X excluding the highest and
     lowest k data values of X, calculated as K = n * (P / 100) / 2), where N is
     the sample size.

     ‘M = trimmean (X, P)’ returns the mean of X after removing the outliers in
     X defined by P percent.
        • If X is a vector, then ‘trimmean (X, P)’ is the mean of all the values
          of X, computed after removing the outliers.
        • If X is a matrix, then ‘trimmean (X, P)’ is a row vector of column
          means, computed after removing the outliers.
        • If X is a multidimensional array, then ‘trimmean’ operates along the
          first nonsingleton dimension of X.

     To specify the operating dimension(s) when X is a matrix or a
     multidimensional array, use the DIM or VECDIM input argument.

     ‘trimmean’ treats NaN values in X as missing values and removes them.

     ‘M = trimmean (X, P, FLAG)’ specifies how to trim when k, i.e.  half the
     number of outliers, is not an integer.  FLAG can be specified as one of the
     following values:
     Value                 Description
     -----------------------------------------------------------------------------------
     "round"               Round k to the nearest integer.  This is the default.
     "floor"               Round k down to the next smaller integer.
     "weighted"            If k = i + f, where i is an integer and f is a fraction,
                           compute a weighted mean with weight (1 - f) for the (i +
                           1)-th and (n - i)-th values, and full weight for the
                           values between them.

     ‘M = trimmean (..., "all")’ returns the trimmed mean of all the values in X
     using any of the input argument combinations in the previous syntaxes.

     ‘M = trimmean (..., DIM)’ returns the trimmed mean along the operating
     dimension DIM specified as a positive integer scalar.  If not specified,
     then the default value is the first nonsingleton dimension of X, i.e.
     whose size does not equal 1.  If DIM is greater than ndims (X) or if size
     (X, DIM) is 1, then ‘trimmean’ returns X.

     ‘M = trimmean (..., VECDIM)’ returns the trimmed mean over the dimensions
     specified in the vector VECDIM.  For example, if X is a 2-by-3-by-4 array,
     then ‘mean (X, [1 2])’ returns a 1-by-1-by-4 array.  Each element of the
     output array is the mean of the elements on the corresponding page of X.
     If VECDIM indexes all dimensions of X, then it is equivalent to ‘mean (X,
     "all")’.  Any dimension in VECDIM greater than ‘ndims (X)’ is ignored.

     See also: mean.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 25
Compute the trimmed mean.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
ttest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2487
 -- statistics: [H, PVAL, CI, STATS] = ttest (X)
 -- statistics: [H, PVAL, CI, STATS] = ttest (X, M)
 -- statistics: [H, PVAL, CI, STATS] = ttest (X, Y)
 -- statistics: [H, PVAL, CI, STATS] = ttest (X, M, NAME, VALUE)
 -- statistics: [H, PVAL, CI, STATS] = ttest (X, Y, NAME, VALUE)

     Test for mean of a normal sample with unknown variance.

     Perform a t-test of the null hypothesis ‘mean (X) == M’ for a sample X from
     a normal distribution with unknown mean and unknown standard deviation.
     Under the null, the test statistic T has a Student's t distribution.  The
     default value of M is 0.

     If the second argument Y is a vector, a paired-t test of the hypothesis
     ‘mean (X) = mean (Y)’ is performed.  If X and Y are vectors, they must have
     the same size and dimensions.

     X (and Y) can also be matrices.  For matrices, ttest performs separate
     t-tests along each column, and returns a vector of results.  X and Y must
     have the same number of columns.  The Type I error rate of the resulting
     vector of PVAL can be controlled by entering PVAL as input to the function
     multcompare.

     ttest treats NaNs as missing values, and ignores them.

     Name-Value pair arguments can be used to set various options.  "alpha" can
     be used to specify the significance level of the test (the default value is
     0.05).  "tail", can be used to select the desired alternative hypotheses.
     If the value is "both" (default) the null is tested against the two-sided
     alternative ‘mean (X) != M’.  If it is "right" the one-sided alternative
     ‘mean (X) > M’ is considered.  Similarly for "left", the one-sided
     alternative ‘mean (X) < M’ is considered.  When argument X is a matrix,
     "dim" can be used to select the dimension over which to perform the test.
     (The default is the first non-singleton dimension).

     If H is 1 the null hypothesis is rejected, meaning that the tested sample
     does not come from a Student's t distribution.  If H is 0, then the null
     hypothesis cannot be rejected and it can be assumed that X follows a
     Student's t distribution.  The p-value of the test is returned in PVAL.  A
     100(1-alpha)% confidence interval is returned in CI.

     STATS is a structure containing the value of the test statistic (TSTAT),
     the degrees of freedom (DF) and the sample's standard deviation (SD).

     See also: hotelling_t2test, ttest2, hotelling_t2test2.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 55
Test for mean of a normal sample with unknown variance.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
ttest2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2004
 -- statistics: [H, PVAL, CI, STATS] = ttest2 (X, Y)
 -- statistics: [H, PVAL, CI, STATS] = ttest2 (X, Y, NAME, VALUE)

     Perform a t-test to compare the means of two groups of data under the null
     hypothesis that the groups are drawn from distributions with the same mean.

     X and Y can be vectors or matrices.  For matrices, ttest2 performs separate
     t-tests along each column, and returns a vector of results.  X and Y must
     have the same number of columns.  The Type I error rate of the resulting
     vector of PVAL can be controlled by entering PVAL as input to the function
     multcompare.

     ttest2 treats NaNs as missing values, and ignores them.

     For a nested t-test, use anova2.

     The argument "alpha" can be used to specify the significance level of the
     test (the default value is 0.05).  The string argument "tail", can be used
     to select the desired alternative hypotheses.  If "tail" is "both"
     (default) the null is tested against the two-sided alternative ‘mean (X) !=
     M’.  If "tail" is "right" the one-sided alternative ‘mean (X) > M’ is
     considered.  Similarly for "left", the one-sided alternative ‘mean (X) < M’
     is considered.

     When "vartype" is "equal" the variances are assumed to be equal (this is
     the default).  When "vartype" is "unequal" the variances are not assumed
     equal.

     When argument X and Y are matrices the "dim" argument can be used to select
     the dimension over which to perform the test.  (The default is the first
     non-singleton dimension.)

     If H is 0 the null hypothesis is accepted, if it is 1 the null hypothesis
     is rejected.  The p-value of the test is returned in PVAL.  A 100(1-alpha)%
     confidence interval is returned in CI.  STATS is a structure containing the
     value of the test statistic (TSTAT), the degrees of freedom (DF) and the
     sample standard deviation (SD).

     See also: hotelling_t2test, anova1, hotelling_t2test2, ttest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a t-test to compare the means of two groups of data under the null
hy...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
vartest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2327
 -- statistics: H = vartest (X, V)
 -- statistics: H = vartest (X, V, NAME, VALUE)
 -- statistics: [H, PVAL] = vartest (...)
 -- statistics: [H, PVAL, CI] = vartest (...)
 -- statistics: [H, PVAL, CI, STATS] = vartest (...)

     One-sample test of variance.

     ‘H = vartest (X, V)’ performs a chi-square test of the hypothesis that the
     data in the vector X come from a normal distribution with variance V,
     against the alternative that X comes from a normal distribution with a
     different variance.  The result is H = 0 if the null hypothesis ("variance
     is V") cannot be rejected at the 5% significance level, or H = 1 if the
     null hypothesis can be rejected at the 5% level.

     X may also be a matrix or an N-D array.  For matrices, ‘vartest’ performs
     separate tests along each column of X, and returns a vector of results.
     For N-D arrays, ‘vartest’ works along the first non-singleton dimension of
     X.  V must be a scalar.

     ‘vartest’ treats NaNs as missing values, and ignores them.

     ‘[H, PVAL] = vartest (...)’ returns the p-value.  That is the probability
     of observing the given result, or one more extreme, by chance if the null
     hypothesis true.

     ‘[H, PVAL, CI] = vartest (...)’ returns a 100 * (1 - ALPHA)% confidence
     interval for the true variance.

     ‘[H, PVAL, CI, STATS] = vartest (...)’ returns a structure with the
     following fields:

          chisqstat        the value of the test statistic
          df               the degrees of freedom of the test

     ‘[...] = vartest (..., NAME, VALUE), ...’ specifies one or more of the
     following name/value pairs:

          Name             Value
     -----------------------------------------------------------------------------------
          "alpha"          the significance level.  Default is 0.05.
                           
          "dim"            dimension to work along a matrix or an N-D array.
                           
          "tail"           a string specifying the alternative hypothesis
              "both"       variance is not V (two-tailed, default)
              "left"       variance is less than V (left-tailed)
              "right"      variance is greater than V (right-tailed)

     See also: ttest, ztest, kstest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 28
One-sample test of variance.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
vartest2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2526
 -- statistics: H = vartest2 (X, Y)
 -- statistics: H = vartest2 (X, Y, NAME, VALUE)
 -- statistics: [H, PVAL] = vartest2 (...)
 -- statistics: [H, PVAL, CI] = vartest2 (...)
 -- statistics: [H, PVAL, CI, STATS] = vartest2 (...)

     Two-sample F test for equal variances.

     ‘H = vartest2 (X, Y)’ performs an F test of the hypothesis that the
     independent data in vectors X and Y come from normal distributions with
     equal variance, against the alternative that they come from normal
     distributions with different variances.  The result is H = 0 if the null
     hypothesis ("variance are equal") cannot be rejected at the 5% significance
     level, or H = 1 if the null hypothesis can be rejected at the 5% level.

     X and Y may also be matrices or N-D arrays.  For matrices, ‘vartest2’
     performs separate tests along each column and returns a vector of results.
     For N-D arrays, ‘vartest2’ works along the first non-singleton dimension
     and X and Y must have the same size along all the remaining dimensions.

     ‘vartest2’ treats NaNs as missing values, and ignores them.

     ‘[H, PVAL] = vartest2 (...)’ returns the p-value.  That is the probability
     of observing the given result, or one more extreme, by chance if the null
     hypothesis true.

     ‘[H, PVAL, CI] = vartest2 (...)’ returns a 100 * (1 - ALPHA)% confidence
     interval for the true ratio var(X)/var(Y).

     ‘[H, PVAL, CI, STATS] = vartest2 (...)’ returns a structure with the
     following fields:

          fstat            the value of the test statistic
          df1              the numerator degrees of freedom of the test
          df2              the denominator degrees of freedom of the test

     ‘[...] = vartest2 (..., NAME, VALUE), ...’ specifies one or more of the
     following name/value pairs:

          Name             Value
     -----------------------------------------------------------------------------------
          "alpha"          the significance level.  Default is 0.05.
                           
          "dim"            dimension to work along a matrix or an N-D array.
                           
          "tail"           a string specifying the alternative hypothesis
              "both"       variance is not V (two-tailed, default)
              "left"       variance is less than V (left-tailed)
              "right"      variance is greater than V (right-tailed)

     See also: ttest2, kstest2, bartlett_test, levene_test.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 38
Two-sample F test for equal variances.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
vartestn


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3307
 -- statistics: vartestn (X)
 -- statistics: vartestn (X, GROUP)
 -- statistics: vartestn (..., NAME, VALUE)
 -- statistics: P = vartestn (...)
 -- statistics: [P, STATS] = vartestn (...)
 -- statistics: [P, STATS] = vartestn (..., NAME, VALUE)

     Test for equal variances across multiple groups.

     ‘H = vartestn (X)’ performs Bartlett's test for equal variances for the
     columns of the matrix X.  This is a test of the null hypothesis that the
     columns of X come from normal distributions with the same variance, against
     the alternative that they come from normal distributions with different
     variances.  The result is displayed in a summary table of statistics as
     well as a box plot of the groups.

     ‘vartestn (X, GROUP)’ requires a vector X, and a GROUP argument that is a
     categorical variable, vector, string array, or cell array of strings with
     one row for each element of X.  Values of X corresponding to the same value
     of GROUP are placed in the same group.

     ‘vartestn’ treats NaNs as missing values, and ignores them.

     ‘P = vartestn (...)’ returns the probability of observing the given result,
     or one more extreme, by chance under the null hypothesis that all groups
     have equal variances.  Small values of P cast doubt on the validity of the
     null hypothesis.

     ‘[P, STATS] = vartestn (...)’ returns a structure with the following
     fields:

          chistat          - the value of the test statistic
          df               - the degrees of freedom of the test

     ‘[P, STATS] = vartestn (..., NAME, VALUE)’ specifies one or more of the
     following NAME/VALUE pairs:

     "display"        "on" to display a boxplot and table, or "off" to omit these
                      displays.  Default "on".
                      
     "testtype"       One of the following strings to control the type of test to
                      perform

        "Bartlett"           Bartlett's test (default).
                             
        "LeveneQuadratic"    Levene's test computed by performing anova on the
                             squared deviations of the data values from their group
                             means.
                             
        "LeveneAbsolute"     Levene's test computed by performing anova on the
                             absolute deviations of the data values from their group
                             means.
                             
        "BrownForsythe"      Brown-Forsythe test computed by performing anova on the
                             absolute deviations of the data values from the group
                             medians.
                             
        "OBrien"             O'Brien's modification of Levene's test with W=0.5.

     The classical Bartlett's test is sensitive to the assumption that the
     distribution in each group is normal.  The other test types are more robust
     to non-normal distributions, especially ones prone to outliers.  For these
     tests, the STATS output structure has a field named fstat containing the
     test statistic, and df1 and df2 containing its numerator and denominator
     degrees of freedom.

     See also: vartest, vartest2, anova1, bartlett_test, levene_test.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 48
Test for equal variances across multiple groups.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
violin


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2676
 -- statistics: violin (X)
 -- statistics: H = violin (X)
 -- statistics: H = violin (..., PROPERTY, VALUE, ...)
 -- statistics: H = violin (HAX, ...)
 -- statistics: H = violin (..., "horizontal")

     Produce a Violin plot of the data X.

     The input data X can be a N-by-m array containing N observations of m
     variables.  It can also be a cell with m elements, for the case in which
     the variables are not uniformly sampled.

     The following PROPERTY can be set using PROPERTY/VALUE pairs (default
     values in parenthesis).  The value of the property can be a scalar
     indicating that it applies to all the variables in the data.  It can also
     be a cell/array, indicating the property for each variable.  In this case
     it should have m columns (as many as variables).

     Color
          ("y") Indicates the filling color of the violins.

     Nbins
          (50) Internally, the function calls ‘hist’ to compute the histogram of
          the data.  This property indicates how many bins to use.  See ‘help
          hist’ for more details.

     SmoothFactor
          (4) The function performs simple kernel density estimation and
          automatically finds the bandwidth of the kernel function that best
          approximates the histogram using optimization (‘sqp’).  The result is
          in general very noisy.  To smooth the result the bandwidth is
          multiplied by the value of this property.  The higher the value the
          smoother the violins, but values too high might remove features from
          the data distribution.

     Bandwidth
          (NA) If this property is given a value other than NA, it sets the
          bandwidth of the kernel function.  No optimization is performed and
          the property SmoothFactor is ignored.

     Width
          (0.5) Sets the maximum width of the violins.  Violins are centered at
          integer axis values.  The distance between two violin middle axis is
          1.  Setting a value higher than 1 in this property will cause the
          violins to overlap.

     If the string "Horizontal" is among the input arguments, the violin plot is
     rendered along the x axis with the variables in the y axis.

     The returned structure H has handles to the plot elements, allowing
     customization of the visualization using set/get functions.

     Example:

          title ("Grade 3 heights");
          axis ([0,3]);
          set (gca, "xtick", 1:2, "xticklabel", {"girls"; "boys"});
          h = violin ({randn(100,1)*5+140, randn(130,1)*8+135}, "Nbins", 10);
          set (h.violin, "linewidth", 2)

     See also: boxplot, hist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 36
Produce a Violin plot of the data X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
wblplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1745
 -- statistics: wblplot (DATA, ...)
 -- statistics: HANDLE = wblplot (DATA, ...)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR, FREQ)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR, FREQ, CONFINT)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR, FREQ, CONFINT,
          FANCYGRID)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR, FREQ, CONFINT,
          FANCYGRID, SHOWLEGEND)

     Plot a column vector DATA on a Weibull probability plot using rank
     regression.

     CENSOR: optional parameter is a column vector of same size as DATA with 1
     for right censored data and 0 for exact observation.  Pass [] when no
     censor data are available.

     FREQ: optional vector same size as DATA with the number of occurrences for
     corresponding data.  Pass [] when no frequency data are available.

     CONFINT: optional confidence limits for plotting upper and lower confidence
     bands using beta binomial confidence bounds.  If a single value is given
     this will be used such as LOW = a and HIGH = 1 - a.  Pass [] if confidence
     bounds is not requested.

     FANCYGRID: optional parameter which if set to anything but 1 will turn off
     the fancy gridlines.

     SHOWLEGEND: optional parameter that when set to zero(0) turns off the
     legend.

     If one output argument is given, a HANDLE for the data marker and plotlines
     is returned, which can be used for further modification of line and marker
     style.

     If a second output argument is specified, a PARAM vector with scale, shape
     and correlation factor is returned.

     See also: normplot, wblpdf.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 78
Plot a column vector DATA on a Weibull probability plot using rank regression.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4
x2fx


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2725
 -- statistics: [D, MODEL, TERMSTART, TERMEND] = x2fx (X)
 -- statistics: [D, MODEL, TERMSTART, TERMEND] = x2fx (X, MODEL)
 -- statistics: [D, MODEL, TERMSTART, TERMEND] = x2fx (X, MODEL, CATEG)
 -- statistics: [D, MODEL, TERMSTART, TERMEND] = x2fx (X, MODEL, CATEG,
          CATLEVELS)

     Convert predictors to design matrix.

     ‘D = x2fx (X, MODEL)’ converts a matrix of predictors X to a design matrix
     D for regression analysis.  Distinct predictor variables should appear in
     different columns of X.

     The optional input MODEL controls the regression model.  By default, ‘x2fx’
     returns the design matrix for a linear additive model with a constant term.
     MODEL can be any one of the following strings:

          "linear"         Constant and linear terms (the default)
          "interaction"    Constant, linear, and interaction terms
          "quadratic"      Constant, linear, interaction, and squared terms
          "purequadratic"  Constant, linear, and squared terms

     If X has n columns, the order of the columns of D for a full quadratic
     model is:

        • The constant term.
        • The linear terms (the columns of X, in order 1,2,...,n).
        • The interaction terms (pairwise products of columns of X, in order
          (1,2), (1,3), ..., (1,n), (2,3), ..., (n-1,n).
        • The squared terms (in the order 1,2,...,n).

     Other models use a subset of these terms, in the same order.

     Alternatively, MODEL can be a matrix specifying polynomial terms of
     arbitrary order.  In this case, MODEL should have one column for each
     column in X and one r for each term in the model.  The entries in any r of
     MODEL are powers for the corresponding columns of X.  For example, if X has
     columns X1, X2, and X3, then a row [0 1 2] in MODEL would specify the term
     (X1.^0).*(X2.^1).*(X3.^2).  A row of all zeros in MODEL specifies a
     constant term, which you can omit.

     ‘D = x2fx (X, MODEL, CATEG)’ treats columns with numbers listed in the
     vector CATEG as categorical variables.  Terms involving categorical
     variables produce dummy variable columns in D.  Dummy variables are
     computed under the assumption that possible categorical levels are
     completely enumerated by the unique values that appear in the corresponding
     column of X.

     ‘D = x2fx (X, MODEL, CATEG, CATLEVELS)’ accepts a vector CATLEVELS the same
     length as CATEG, specifying the number of levels in each categorical
     variable.  In this case, values in the corresponding column of X must be
     integers in the range from 1 to the specified number of levels.  Not all of
     the levels need to appear in X.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 36
Convert predictors to design matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
ztest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2171
 -- statistics: H = ztest (X, M, SIGMA)
 -- statistics: H = ztest (X, M, SIGMA, NAME, VALUE)
 -- statistics: [H, PVAL] = ztest (...)
 -- statistics: [H, PVAL, CI] = ztest (...)
 -- statistics: [H, PVAL, CI, ZVALUE] = ztest (...)

     One-sample Z-test.

     ‘H = ztest (X, V)’ performs a Z-test of the hypothesis that the data in the
     vector X come from a normal distribution with mean M, against the
     alternative that X comes from a normal distribution with a different mean
     M.  The result is H = 0 if the null hypothesis ("mean is M") cannot be
     rejected at the 5% significance level, or H = 1 if the null hypothesis can
     be rejected at the 5% level.

     X may also be a matrix or an N-D array.  For matrices, ‘ztest’ performs
     separate tests along each column of X, and returns a vector of results.
     For N-D arrays, ‘ztest’ works along the first non-singleton dimension of X.
     M and SIGMA must be scalars.

     ‘ztest’ treats NaNs as missing values, and ignores them.

     ‘[H, PVAL] = ztest (...)’ returns the p-value.  That is the probability of
     observing the given result, or one more extreme, by chance if the null
     hypothesis true.

     ‘[H, PVAL, CI] = ztest (...)’ returns a 100 * (1 - ALPHA)% confidence
     interval for the true mean.

     ‘[H, PVAL, CI, ZVALUE] = ztest (...)’ returns the value of the test
     statistic.

     ‘[...] = ztest (..., NAME, VALUE, ...)’ specifies one or more of the
     following NAME/VALUE pairs:

          NAME             VALUE
     -----------------------------------------------------------------------------------
          "alpha"          the significance level.  Default is 0.05.
                           
          "dim"            dimension to work along a matrix or an N-D array.
                           
          "tail"           a string specifying the alternative hypothesis:
              "both"       "mean is not M" (two-tailed, default)
              "left"       "mean is less than M" (left-tailed)
              "right"      "mean is greater than M" (right-tailed)

     See also: ttest, vartest, signtest, kstest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 18
One-sample Z-test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
ztest2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1836
 -- statistics: H = ztest2 (X1, N1, X2, N2)
 -- statistics: H = ztest2 (X1, N1, X2, N2, NAME, VALUE)
 -- statistics: [H, PVAL] = ztest2 (...)
 -- statistics: [H, PVAL, ZVALUE] = ztest2 (...)

     Two proportions Z-test.

     If X1 and N1 are the counts of successes and trials in one sample, and X2
     and N2 those in a second one, test the null hypothesis that the success
     probabilities p1 and p2 are the same.  The result is H = 0 if the null
     hypothesis cannot be rejected at the 5% significance level, or H = 1 if the
     null hypothesis can be rejected at the 5% level.

     Under the null, the test statistic ZVALUE approximately follows a standard
     normal distribution.

     The size of H, PVAL, and ZVALUE is the common size of X1, N1, X2, and N2,
     which must be scalars or of common size.  A scalar input functions as a
     constant matrix of the same size as the other inputs.

     ‘[H, PVAL] = ztest2 (...)’ returns the p-value.  That is the probability of
     observing the given result, or one more extreme, by chance if the null
     hypothesis true.

     ‘[H, PVAL, ZVALUE] = ztest2 (...)’ returns the value of the test statistic.

     ‘[...] = ztest2 (..., NAME, VALUE, ...)’ specifies one or more of the
     following NAME/VALUE pairs:

          NAME             VALUE
     -----------------------------------------------------------------------------------
          "alpha"          the significance level.  Default is 0.05.
                           
          "tail"           a string specifying the alternative hypothesis
              "both"               p1 is not p2 (two-tailed, default)
              "left"               p1 is less than p2 (left-tailed)
              "right"              p1 is greater than p2 (right-tailed)

     See also: chi2test, fishertest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 23
Two proportions Z-test.





