## Linear regression sign. different from zero

### Linear regression sign. different from zero

Hi,
I'm neither a statistics nor SPSS guru so this is probably a simple and
stupid question, but I have a data set of about 90 values and would like
to 1) generate a linear regression (I can do this) and 2) test if the
slope coefficient is significantly different from zero (I can't do
this!).
Any help appreciated!

Jerry

### Linear regression sign. different from zero

Jerry-

Two approaches:
(These both assume a single independent variable)

1) After running regression go to coefficient table. You should have two
lines containing the estimates for B0 (constant) and B1. Examine your
t-statistic and p-value for B1. This is testing the null:
H0:B1=0
Hence, a large t-statistic (small p-value) implies rejection of the null.
This means that the slope is something other than zero.

2) After running the regression go to the ANOVA table. You should have three
lines of data. This table is also a way of testing:
H0: B1=0. Examine the F-ststiatic for the regression. Again, a large F-stat
(small p-value) implies rejection of null, hence slope is something other
than zero.

You may also want to note that the F-stat = (t-stat)^2

Hope this helps.

Todd

> Hi,
> I'm neither a statistics nor SPSS guru so this is probably a simple and
> stupid question, but I have a data set of about 90 values and would like
> to 1) generate a linear regression (I can do this) and 2) test if the
> slope coefficient is significantly different from zero (I can't do
> this!).
> Any help appreciated!

> Jerry

### Linear regression sign. different from zero

Quote:>Todd -- Your reply to Jerry's question below is very informative.  I have a
>question: how do you interpret these items in a multivariate linear
>regression?  For example, say one coefficient's p-value is .000 while another
>variable's is .500.  Does this imply that my entire model is at risk, or
>should I just focus my analysis on those variables with low p-values? Thanks
>for any replies...

Your model is sort of okay as long as the p-value for the
F-statistics is significant. After all, the p-values of model
and coefficients depend on many factors, some are a) the
covariances with variables you've already included, b) the
sample size, c) the type of regression you've chosen.

If you don't chose dependend variables by divine guidance, but
use a theoretical model (e.g. in economics), it's perfectly okay
to include variables with p-values of 0.5, if you like to do so.

If you don't know which variables you should include, use method
Back-Elimination. Starting with the saturated model (including
all possible explanatory variables), this eliminates the least
significant variables step by step. This doesn't always yield
good results, so have a second look at the final model. Model
selection isn't easy, after all.

Bye, Jens

### Linear regression sign. different from zero

Quote:> > 2) After running the regression go to the ANOVA table. You should have three
> > lines of data. This table is also a way of testing:
> > H0: B1=0. Examine the F-ststiatic for the regression. Again, a large F-stat
> > (small p-value) implies rejection of null, hence slope is something other
> > than zero.

A question:  Is the above exactly what the ANOVA is testing?  I
had thought through my readings that the ANOVA tests the
significance of the variation represented by the regression model.  In
other words,  how 'good' (And I use the term good loosely) the model fits
the data.

later...

sean.
---------------------------------------------------------------------------
Sean Richard Clancy, BSc.                  "Objection,  evasion,  happy
Biology Department                          distrust, pleasure in mockery
Memorial University of Newfoundland         are signs of health,  everything
St. John's,  Newfoundland. A1B 3X9          unconditional belongs to
Phone: (709)737-8301                        pathology".

---------------------------------------------------------------------------

### Linear regression sign. different from zero

The questions here are not actually settled in any one or two lines of

: >Todd -- Your reply to Jerry's question below is very informative.  I have a
: >question: how do you interpret these items in a multivariate linear
: >regression?  For example, say one coefficient's p-value is .000 while
another
: >variable's is .500.  Does this imply that my entire model is at risk, or
: >should I just focus my analysis on those variables with low p-values?
Thanks
: >for any replies...

- What model?  What purpose?  A coefficient's  test is the on the
contribution of that variable, when it is entered LAST.  Two great
variables, not quite redundant, may give you great robustness for
replication, but the p-value could be NS for both of them -- and
either one will look good once you remove the other.

: Your model is sort of okay as long as the p-value for the
: F-statistics is significant. After all, the p-values of model
: and coefficients depend on many factors, some are a) the
: covariances with variables you've already included, b) the
: sample size, c) the type of regression you've chosen.

: If you don't chose dependend variables by divine guidance, but
: use a theoretical model (e.g. in economics), it's perfectly okay
: to include variables with p-values of 0.5, if you like to do so.

There should be a REASON for having a variable in a model, and that
should be more potent than the value of the p-value for one set of
data.

: If you don't know which variables you should include, use method
: Back-Elimination. Starting with the saturated model (including
: all possible explanatory variables), this eliminates the least
: significant variables step by step. This doesn't always yield
: good results, so have a second look at the final model. Model
: selection isn't easy, after all.

- Stepwise is generally a bad idea, whether it is step-up or
step-down.  With potent predictors, YES, it can get you a shorter
model.  Do you need a shorter model?   With hundreds of extra
variables being tested, which are not very potent, it can get you a
model that will perform worse than  *chance*  if it ignores
multiple, intercorrelated real predictors in favor of the variables
that reach  ".05"  by happenstance.  See articles in my FAQ for
related arguments.

--

http://www.pitt.edu/~wpilib/statfaq.html   Univ. of Pittsburgh

I have a log-linear regression problem with both structural and sampling
zeros and with continuous predictors. I _think_ it is possible to estimate
this with either Catmod or Genmod. Any tips on how to do this, and which
procedure is more suitable for my problem?

Details:

The objective is to determine whether regional labour and/or housing market
conditions influence where people move to.

We have data on individuals at time t=1 and t=2. This includes their
geographic location in both periods (L1 and L2). There are about 500
different locations coded. We also have information about each of the
locations, in particular the unemployment rate (or some other measure of
labour demand) U(L) and housing prices H(L).

The plan is to estimate a log-linear model of the matrix of counts of people
found in each cell of the L1xL2 table. (In Genmod notation I think this is
described as a poisson model with log link).

It probably makes most sense to do this analysis conditional on the fact
that the person has moved somewhere (since the factors that influence
movement at all may be different from the factors that influence where they
move - and the latter is our main interest). This means that the diagonal of
the table will have structural zeros. Even though our sample is reasonably
large (tens of thousands), there will probably also be some sampling zeros.

For each person, we will calculate variables
DU = U(L2)-U(L1) and DH(L2)-H(L2)
ie the difference in unemployment rates and housing prices between their
origin and destination region.

The model we wish to estimate is thus
N = f(L1, L2, DU, DH)
That is, L1 and L2 are categorical variables (just direct effects only - to
control for region size etc), and DU and DH are continuous variables. In
other words, the interaction between L1 and L2 is assumed to be captured by
our continuous economic variables.

I think it is possible to estimate this using either Catmod or Genmod.
However, the existence of structural and sampling zeros complicates matters.
The Catmod documentation has discussion of how to handle these, but not
Genmod. I get the impression, however, (maybe I am wrong) that Genmod is
more robust in handling sampling zeros (if we extend the model to have
individual fixed effects such as sex, we will have even more sampling
zeros). I presume one could handle structural zeros by including non-zero
data in the table but introducing design effects which perfectly fit those
cells.

Any suggestions most welcome.