oultliers : replacing with the mean

oultliers : replacing with the mean

Post by Denni » Thu, 10 Jul 2003 18:22:04



Hi all

I would like to remove outliers from my repetitive measures design, however
making it missing removes the whole case of the subject.
I've heard that it's possible to replace outliers with the mean of the
group. I am wondering if it's a standard practice (to use for my thesis),
and are there any good references?
If it is acceptable, how should I compute the mean if there are several
outliers in one group/DV, (or variable in SPSS)?

Dennis

 
 
 

oultliers : replacing with the mean

Post by Rich Ulric » Thu, 10 Jul 2003 22:54:17




Quote:> Hi all

> I would like to remove outliers from my repetitive measures design, however
> making it missing removes the whole case of the subject.

Your first issue here is "What do you do with outliers" - which
depends on what you can say about them.  Sometimes, a simple
transformation is justified by the nature of measurement:  square
root for counts, log for biological assays, reciprocal for distances.

If you have *good*  scaling already,  the reasonable thing might
be to write an essay on each outlier, and remove that S  from
the sample.  If you have half-good measurements, like the ones
that I usually see, you might want to Windsordize -- pull in the
most extreme values to whatever was at (say) the 95th percentile.

Quote:> I've heard that it's possible to replace outliers with the mean of the
> group. I am wondering if it's a standard practice (to use for my thesis),
> and are there any good references?

Replacing the outlier with the <some mean>  is, IMHO, a
two-step procedure.  
First, you justify that the outlier should be regarded as 'missing.'
Second, you figure *which*   mean is appropriate to stand in
for something missing.  The usual initial rationale  is that you
use a mean that  disturbs the statistical test the least.  

 - You don't want to increase the tested Mean-square.
 - In Repeated Measures, that could be the Subjects mean;  
but that can raise a problem if you have much that is missing
because you also don't want to decrease the Mean-square
of the error term.

If you still want it, there are books written on Missing data,
and I would use Google:  for instance, look for college courses
and what they list in their Suggested References.

Quote:> If it is acceptable, how should I compute the mean if there are several
> outliers in one group/DV, (or variable in SPSS)?

--

http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization."  Justice Holmes.

 
 
 

oultliers : replacing with the mean

Post by Robert Ehrlic » Thu, 10 Jul 2003 23:54:47


Deleting or altering  measured values from a data set is serious
business including m*as well as analytical aspects.  As previous
writers have said, You have to make a strong  case for doing so in any
report on results.  Are the outliers so large that they are impossible
(e.g. child's' height = 30 m).  Are they large (or small) with respect
to the Gaussian assumption.  After taking care to document your reasons,
 you may substitute the mean only as the beginning of an imputation
procedure.  You also may reduce the effect of outliers on data summaries
by using the median rather than the mean.

Ref:  Statistical analysis with Missing Data; 2nd ed. Roderick Little
and Donald Rubin

this ref discusses how you handle empty cells in a data table--it looks
to me that using the mean replacement for "outliers" falls in the same
category.  Most stat packages have capability to impute data.

Good luck on your thesis, you have a golden oportunity to teach your
advisor some useful concepts.


>Hi all

>I would like to remove outliers from my repetitive measures design, however
>making it missing removes the whole case of the subject.
>I've heard that it's possible to replace outliers with the mean of the
>group. I am wondering if it's a standard practice (to use for my thesis),
>and are there any good references?
>If it is acceptable, how should I compute the mean if there are several
>outliers in one group/DV, (or variable in SPSS)?

>Dennis

 
 
 

oultliers : replacing with the mean

Post by Russell Mart » Fri, 11 Jul 2003 00:19:38



> Hi all

> I would like to remove outliers from my repetitive measures design, however
> making it missing removes the whole case of the subject.
> I've heard that it's possible to replace outliers with the mean of the
> group. I am wondering if it's a standard practice (to use for my thesis),
> and are there any good references?
> If it is acceptable, how should I compute the mean if there are several
> outliers in one group/DV, (or variable in SPSS)?

> Dennis

You have not told us if you have investigated why certain datum appear to be
outliers, as opposed to valid values which don't fit your theory very well,
and how you would justify their replacement with any other value including
the mean.  Does you adviser know about this?  Exactly what field are you
working in?

Regards,
Russell

 
 
 

oultliers : replacing with the mean

Post by Bruce Weave » Fri, 11 Jul 2003 03:31:26



> Hi all

> I would like to remove outliers from my repetitive measures design, however
> making it missing removes the whole case of the subject.
> I've heard that it's possible to replace outliers with the mean of the
> group. I am wondering if it's a standard practice (to use for my thesis),
> and are there any good references?
> If it is acceptable, how should I compute the mean if there are several
> outliers in one group/DV, (or variable in SPSS)?

> Dennis

Several other respondents have addressed the issue of
whether or not you can legitimately treat your outliers as
"missing".  Assuming you can legitimately treat some of them
as missing, rather than impute values (mean or otherwise), I
would reformat the data file from WIDE to LONG--i.e., from
one row per subject with the DVs in several columns to
serveral rows per subject with one column for the DVs.  Then
, rather than using GLM REPEATED measures, use UNIANOVA.
Here is a sample file that shows how:

http://www.angelfire.com/wv/bwhomedir/spss/repmeas_ANOVA_with_long_fi...

This approach is essentially the same as using regression,
which Donald Burrill suggested in his post.  (UNIANOVA will
in fact give you the regression coefficients if you add
PARAMETER to the /PRINT line.)

Cheers,
Bruce
--
Bruce Weaver

www.angelfire.com/wv/bwhomedir/

 
 
 

1. serial.sys meant to be replaced, or simply written to?

Man, I have having real trouble just getting started with this device
driver programming.  The docs, I must say, are useless and
inconsistent.

I am trying to port a serial port network driver from 98/me to
2000/xp.

The old way was by using vcomm services to multiplex a serial port
into 3 virtual ports, one gets bound to a custom ndis miniport driver,
one gets bound to by dialup networking, and one gets bound to by a
protocol dll.

Now all this vcomm stuff is gone now, and I accept that I have to
virtualize the serial port myself, by I cant even visualize a
structure for the device driver.

I looked at the DDK doc.  First it says look at the source in the DDK
under kernel/serial, and modify that.  But hold on, thats the code for
serial.sys, I dont want to change that do I?  Isnt that part of the
OS?  I want to write a driver that sits on top of serial.sys.  Wheres
the example code for that?

Then the docs mention functions such as IRP_MJ_CREATE.  The serial.sys
"example" source in the DDK neither calls nor defines these IRP
functions..

What have I missed?  Wheres the canonical documentation that I must
have overlooked somewhere?  Anyone got any good examples of a driver
that makes uses of windows serial services? Boss is starting to think
I cant program drivers, hell im starting to think that.

Thanks

Kurt

PS Please reply to group the email address is a spamcatcher.

2. How to restore data?

3. sample mean and parametric mean?

4. Craziness with ShowWindow

5. FIM: SAS Notes for 21Sep2001: Replace or Not to Replace: that is the question

6. EUDORA HELP

7. SHFileOperation - Replace my own replace file dialog box

8. Pervasive SQL 7.0 & NetWare 4.2

9. Edit/Replace replaces too much

10. What exactly does "Device RGB" in VueScan mean?

11. Epson 1650 - What does 'Display Using Monitor Compensation' do and mean ?

12. HP Scanjet 6200C flatbed scanner and meaning of output resolution

13. "Factory Installed OS" What does this mean?