Generalized linear regression training

Perform generalized linear regression to generate predictions, or to model the relationship of a dependent variable to a set of explanatory variables. Identifying and measuring relationships can lead to a better understanding of what is happening in a place, a prediction that something might happen in a place, or an investigation into why something happened in the place where it happened. The regression model extends the distribution of dependent variable to exponential distribution (Gaussian distribution, Bernoulli distribution, Poisson distribution), and can deal with the regression analysis of some common discrete and continuous random variables, especially the attribute data and discrete data. It has advantages in solving the problem of discontinuous and non-numerical variables.

The data Training Procedure of the generalized linear regression method can be used to obtain the corresponding model according to the data characteristics, and then used for prediction.

When creating a generalized linear regression training task, you need to set the following parameters:

Training Dataset: required parameter. The Dataset to be trained accesses Connection Info, including Data Type, Connect Parameter, Dataset name, etc. You can connect HBase data, dsf data, and Local Data.
Data Query Conditions: optional parameter; the specified data can be filtered out for corresponding analysis according to the Query Conditions; attribute conditions and Spatial Query are supported. E.g. SmID <100 and BBOX(the_geom, 120,30,121,31)。
Explanatory Fields: a required parameter, the field list of explanatory variables, that is, independent variables. Enter the name of the One or More Field of the training Dataset as the explanatory variable of the model, which can help predict the value or category.
Modeling field: a required parameter, namely the dependent variable, the value of the model to be trained. This field corresponds to a known (trained) value of a variable that will be used to make predictions at unknown locations.
Model Type: required parameter, regression type is Gaussian model "Gaussian", Logistic model "Logistic", Poisson model "Poisson". The Model type should be selected based on how the dependent variable is measured and summarized and the range of values it contains.
Distance explanatory variable Dataset: optional parameter, supports point, line and Region Dataset, calculates the Closest distance between the elements of the given Dataset and the elements in the training Dataset, and automatically creates a list of explanatory variables.
Model Save Directory: optional parameter; save the model with good Training Result to this address. If it is empty, the Model will not be saved.

After executing the training task, the following Result Parameter is output:

Variable: The Field array of the generalized linear regression model, which refers to the field of the independent variable in the training model.
coefficient: regression coefficient.
Coefficient Standard Errors: Standard errors of regression coefficients and intercepts.
TStatistic: T-statistic of regression coefficient and intercept.
probability: probability of regression coefficient and intercept.
AIC: AIC criterion of the model (minimum information criterion). Can be used to test model performance and compare regression models. Given the complexity of the model, a model with a lower AIC value will fit the data better. AIC is not an absolute measure of goodness of fit, but is useful for comparing models that apply to the same dependent variable and have different explanatory variables.
dispersion: dispersion of generalized linear regression model.
degree sOfFreedom: degree of freedom.
residualD egree OfFreedomNull: residual degrees of freedom for the null model.
re sidualD egree Of Freedom: residual degrees of freedom.