Project 1b: Descriptive Statistics
By Jeff Royce
6/1/01 –
8/31/01
For this particular project, I chose to analyze the Dow Jones Industrial Average (DJIA) for a 3 month period. I looked at the highs, lows, closing values and trading volume with the hopes of determining if it is possible to reasonably predict the closing value for any particular day.
Using
descriptive statistics, I collected data for a 3-month period, including the
high, low and closing values, as well as the volume. With this data, I also calculated the difference between the
closing value and the high and low value for that day. By looking at this information, I was
looking for a correlation between any of the values that may help in
determining future closing values.
Below
is a correlation table that was developed from the various data collected.
|
|
Date |
Close |
Volume |
High |
Low |
High/Close Diff |
Low/Close Diff |
|
Date |
1 |
|
|
|
|
|
|
|
Close |
-0.81053 |
1 |
|
|
|
|
|
|
Volume |
-0.17846 |
0.06030 |
1 |
|
|
|
|
|
High |
-0.84855 |
0.97612 |
0.11244 |
1 |
|
|
|
|
Low |
-0.80409 |
0.97538 |
0.05355 |
0.98549 |
1 |
|
|
|
High/Close Diff |
0.00970 |
-0.32060 |
0.21426 |
-0.10718 |
-0.16706 |
1 |
|
|
Low/Close Diff |
-0.15001 |
0.25815 |
0.03866 |
0.10565 |
0.03873 |
-0.72083 |
1 |
The
purpose of the correlation chart is to see what linear relationships exist
between various numerical variables. If
I want to determine the best way to help determine the closing value of the
DJIA, I look under the column for “Close” and see which numerical variable has
the highest correlation to the Close.
As you can see, the best relationships exist between the Close and both
the High and Low.
Prior
to developing the correlation chart, I was anticipating that a correlation
could be determined from either the volume traded per day and the Close or an
analysis of the numerical difference between the High and Low values and the
Close for the day. Correlation values
between Close and volume is very low (0.06030) and the correlation value
between Close and High/Close difference (-0.32060) and the Close and Low/Close
difference (0.25815) is too low to use.
Using
the correlation between the high and close of 0.97612 and the correlation of
the low and close of 0.97538, I developed scatter plots to show graphical
representations of the correlations.


Along
with developing the scatter plots, the linear equation as well as the R-squared
value was determined. As you can see,
the R-squared values for both the Close vs. Low (0.9514) and the Close vs. High
(0.9528) are both very high. Either
value can be used to help determine the closing value. The next step is to determine the standard
deviation and the mean of the closing values.
The purpose of calculating the standard deviation is that we are looking
to see how close we can actually come in predicting the close.
The
standard deviation has an established rule of thumb used in determining future
observations (predictions) of data.
Approximately 68% of observations are going to be within 1 standard
deviation of the mean. Approximately
95% of observations are going to be within 2 standard deviations of the
mean. And approximately 99.7% of
observations are going to be within 3 standard deviations of the mean.
Using
Excel’s statistical data command, I generated the following information for the
close;
|
Close |
|
|
|
|
|
Mean |
10502.82108 |
|
Standard Error |
32.57585809 |
|
Median |
10472.48 |
|
Mode |
#N/A |
|
Standard Deviation |
262.6349643 |
|
Sample Variance |
68977.12448 |
|
Kurtosis |
0.563081387 |
|
Skewness |
0.580358015 |
|
Range |
1256.26 |
|
Minimum |
9919.58 |
|
Maximum |
11175.84 |
|
Sum |
682683.37 |
|
Count |
65 |
|
Confidence Level(95.0%) |
65.07769894 |
From
the chart, we can see that the mean is 10502.821 and the standard deviation is
262.635. What this means is that if my prediction
for the Close is within 262.635 of 10502.821, I can be 68% sure that my
prediction will be correct. And if my
prediction is within 525.27 of 10502.821, I can be 95% sure that I will be
correct.
Now
considering the equation from the Close vs. High scatter plot, I can predict
the closing price for any given day.
The linear equation was determined to be y = 1.0246x – 392.08 and
R-squared = 0.9528. If, for example,
the high value is predicted to be 10,654.00 for the day, I can plug this number
into the formula to determine what the close is going to be. Place 10,654 in place of “x” in the formula
and Y = 10,524.0084. Now, compare this
value to the mean value of the Close, 10,502.82108. The difference between these two values is 21.18732. This difference is within 1 standard
deviation so I can be 68% sure that my predication of the close for the day is
going to be accurate.
This
provides a good example on how decision statistics can be applied to the
DJIA. In order to understand the “rules
of thumb” for the mean and standard deviation, I compared the highs and Closes
for the three-month period, within the parameters set by the “rules of thumb. The following frequency chart compares the
percentages of actual highs versus the percentages that the standard deviation
“rules of thumb” say we should get.
|
Frequency Table |
|
|
|
|
|
|
|
|
|
Closing Mean |
10502.821 |
|
|
|
Standard Deviation |
262.635 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Category |
Upper Limit |
Frequency |
|
|
more than 3x stadev. below mean |
9714.916 |
0 |
|
|
between 2 and 3 stadev. below mean |
9977.551 |
2 |
|
|
between 1 and 2 stadev. below mean |
10240.186 |
5 |
|
|
between mean and 1 stadev. below mean |
10502.821 |
29 |
|
|
between mean and 1 stadev. above mean |
10765.456 |
20 |
|
|
between 1 and 2 stadev. above mean |
11028.091 |
5 |
|
|
between 2 and 3 stadev. above mean |
11290.726 |
4 |
|
|
more than 3x stadev. above mean |
11553.361 |
0 |
|
|
|
|
|
|
|
Percentages within "n" stadev of
mean |
|
|
|
|
n |
1 |
2 |
3 |
|
Actual |
75.38% |
90.77% |
100.00% |
|
Rule of Thumb |
68.00% |
95.00% |
99.70% |
As you
can see, the actual percentages are not as close as those dictated by the
“rules of thumb”. Within a +/- of 1
standard deviation, the percentage 7.38% higher than what the rules state
should occur. And within a +/- 2
standard deviation of the mean, the values are 4.23% less than what the rules
show should occur. I think these
percentages would be closer together if your original sample were larger and
more encompassing than the 3-month period chosen. I do believe that the results are close enough to show that the
rules are valid and can be applied.
Unfortunately,
I think so much more goes into trying to predict the closing value of the
DJIA. Obviously, if there were a way to
predict the market, someone would have made millions selling it on late night
t.v. info-mercials. I do think that I
have shown the decision statistics that I used work in analyzing the data and
making a decision, but on something as large as the market, it is very
difficult to get conclusive results.