crop-牙签
Subchapter 12a.
The Wilcoxon
Signed-Rank Test
In Subchapter 11a we
examined a
non-parametric
alternative to the
t
-test
for independent samples. We now turn to
consider a somewhat analogous
alternative to the
t
-test for correlated
samples. As indicated in the main
body
of Chapter 12, the correlated-samples
t
-test makes certain
assumptions and can be meaningfully
applied only insofar as these
assumptions are met. Namely,
1.
that the scale of measurement for
X
A
and
X
B
has the
properties of an equal-interval
scale;
T
2.
that the
differences between the paired values of
X
A
and
X
B
have been
randomly drawn from the source population;
and
T
3.
that the
source population from which these differences
have been drawn can be reasonably
supposed to have a
normal distribution.
Here again, it is not
simply a question of good manners or good taste.
If there is one or more of these
assumptions that we cannot reasonably
suppose to be satisfied, then the
t
-test for correlated
samples cannot be
legitimately applied.
Of all the correlated-
samples situations that run afoul of these
assumptions,
I expect the most common
are those in which the scale of measurement for
X
A
and
X
B
cannot be assumed to have
the properties of an equal-interval
scale. The most obvious example would
be the case in which the measures
for
X
A
and
X
B
derive from some sort of
rating scale. In any event, when the
data within two correlated samples fail
to meet one or another of the
assumptions of the
t
-test, an appropriate non-
parametric alternative can
often be
found in the
Wilcoxon Signed-Rank
Test
.
To
illustrate, suppose that 16 students in an
introductory statistics course
are
presented with a number of questions (of the sort
you encountered in
Chapters 5 and 6)
concerning basic probabilities. In each instance,
the
question takes the form
However, the students are not allowed
to perform calculations. Their
answers
must be immediate, based only on their raw
intuitions. They are
instructed to
frame each answer in terms of a zero to 100
percent rating
scale, with 0%
corresponding to
P
=0.0, 27%
corresponding to
P
=.27, and
so forth. They are also told that they
can give non-integer answers if they
wish to make really fine-grained
distinctions; for example, 49.0635...%. (As
it turns out, none do.)
The instructor of the course is
particularly interested in student's responses
to two of the questions, which we will
designate as question A and
question B.
He reasons that if students have developed a good,
solid
understanding of the basic
concepts, they will tend to give higher
probability
ratings for question A than
for question B; whereas, if they were sleeping
through that portion of the course,
their answers will be mere shots in the
dark and there will be no overall
tendency one way or the other. The
instructor's hypothesis is of course
directional: he expects his students have
mastered the concepts well enough to
sense, if only intuitively, that the
event described in question A has the
higher probability. The following table
shows the probability ratings of the 16
subjects for each of the two
questions.
Subj.
X
A
X
B
X<
/p>
A
—
X
B
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
78
24
64
45
64
52
30
50
64
50
78
22
84
40
90
72
78
24
62
48
68
56
25
44
56
40
68
36
68
20
58
32
0
0
+2
—
3
—
4
—
4
+5
+6
+8
+10
+10
—
14
+16
+20
+32
+40
mean difference = +7.75
Voilà
! The
observed results are consistent with the
hypothesis. The
probability ratings do
on average end up higher for question A than for
question B. Now to determine whether
the degree of the observed
difference
reflects anything more than some lucky guessing.
?
Mechanics
The Wilcoxon test begins by
transforming each instance of X
A
—
X
B
into its
absolute value, which is accomplished
simply by removing all the positive
and
negative signs. Thus the entries in column 4 of
the table below become
those of column
5. In most applications of the Wilcoxon procedure,
the
cases in which there is zero
difference between X
A
and
X
B
are at this point
eliminated from consideration, since
they provide no useful information, and
the remaining absolute differences are
then ranked from lowest to highest,
with tied ranks included where
appropriate.
The guidelines
for assigning tied ranks are
described
in Subchapter 11a in connection
with
the Mann-Whitney test.
The
result of this step is shown in column 6. The
entries in column 7 will then
give you
the clue to why the Wilcoxon procedure is known as
the
signed-rank test. Here you see the
same entries as in column 6, except now
we have re-attached to each rank the
positive or negative sign that was
removed from the X
A
—
X
B
difference
in the transition from column 4 to
column 5.
1
2
3
4
5
6
rank of
original
absolute
absolute
signed
7
Subj.
X
A
X
B
X<
/p>
A
—
X
B
X
A
—
X
B
X
A
—
X
B
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
78
24
64
45
64
52
30
50
64
50
78
22
84
40
90
72
78
24
62
48
68
56
25
44
56
40
68
36
68
20
58
32
0
0
+2
—
3
—
4
—
4
+5
+6
+8
+10
+10
—
14
+16
+20
+32
+40
0
0
2
3
4
4
5
6
8
10
10
14
16
20
32
40
---
---
1
2
3.5
3.5
5
6
7
8.5
8.5
10
11
12
13
14
rank
---
---
+1
—
2
—
3.5
—
3.5
+5
+6
+7
+8.5
+8.5
—
10
+11
+12
+13
+14
W
= 67.0
T
N = 14
The sum of the signed ranks in column 7
is a quantity symbolized as
W
,
which for the
present example is equal to 67. Two of the
original 16 subjects
were removed from
consideration because of the zero difference they
produced in columns 4 and 5, so our
observed value of
W
is based
on a
sample of size N=14.
?
Logic &
Procedure
Here
again, as with the Mann-Whitney test, the effect
of replacing the
original measures with
ranks is two-fold. The first is that it brings us
to focus
only on the ordinal
relationships among the measures
—
than,
—
with
no illusion that these measures have the
properties of an equal-interval scale.
And the second is that it transforms the
data array into a kind of closed system
whose properties can then be known
by
dint of sheer logic.
For
openers, we know that the sum of the N unsigned
ranks in column 6 will
be equal to
N
(
N+1
)
sum
=
2
From Subchapter 11a
14<
/p>
(
14+1
)
=
= 105
2
Thus the maximum possible positive
value of
W
(in the case
where all signs
are positive) is
W
=+105, and the maximum
possible negative value (in the
case
where all signs are negative) is
W
=
—
105. For the present
example, a
preponderance of positive
signs among the signed ranks would suggest that
subjects tend to rate the probability
higher for question A than for
question
B. A preponderance of negative signs would suggest
the opposite.
The null hypothesis is
that there is no tendency in either direction,
hence
that the numbers of positive and
negative signs will be approximately equal.
In that event, we would expect the
value of
W
to approximate
zero, within
the limits of random
variability.
For fairly small values of N, the
properties of the sampling distribution of
W
can be figured
out through simple (if tedious) enumeration of all
the
possibilities. Suppose, for
example, that we had only N=3 subjects, whose
absolute (unsigned) X
A
—
X
B
differences produced the untied ranks 1, 2,
and 3. The following table shows the
possible combinations of plus and