Project 1b:  Descriptive Statistics

By Jeff Royce

 

 

Analysis of the DJIA for a 3 Month Period

6/1/01 – 8/31/01

 

 

 

For this particular project, I chose to analyze the Dow Jones Industrial Average (DJIA) for a 3 month period.  I looked at the highs, lows, closing values and trading volume with the hopes of determining if it is possible to reasonably predict the closing value for any particular day.

 

Using descriptive statistics, I collected data for a 3-month period, including the high, low and closing values, as well as the volume.  With this data, I also calculated the difference between the closing value and the high and low value for that day.  By looking at this information, I was looking for a correlation between any of the values that may help in determining future closing values. 

 

Below is a correlation table that was developed from the various data collected.

 

 

Date

Close

Volume

High

Low

High/Close Diff

Low/Close Diff

Date

1

 

 

 

 

 

 

Close

-0.81053

1

 

 

 

 

 

Volume

-0.17846

0.06030

1

 

 

 

 

High

-0.84855

0.97612

0.11244

1

 

 

 

Low

-0.80409

0.97538

0.05355

0.98549

1

 

 

High/Close Diff

0.00970

-0.32060

0.21426

-0.10718

-0.16706

1

 

Low/Close Diff

-0.15001

0.25815

0.03866

0.10565

0.03873

-0.72083

1

 

The purpose of the correlation chart is to see what linear relationships exist between various numerical variables.  If I want to determine the best way to help determine the closing value of the DJIA, I look under the column for “Close” and see which numerical variable has the highest correlation to the Close.  As you can see, the best relationships exist between the Close and both the High and Low.

 

Prior to developing the correlation chart, I was anticipating that a correlation could be determined from either the volume traded per day and the Close or an analysis of the numerical difference between the High and Low values and the Close for the day.  Correlation values between Close and volume is very low (0.06030) and the correlation value between Close and High/Close difference (-0.32060) and the Close and Low/Close difference (0.25815) is too low to use.

 

Using the correlation between the high and close of 0.97612 and the correlation of the low and close of 0.97538, I developed scatter plots to show graphical representations of the correlations.

 

 


 


 

 

 


Along with developing the scatter plots, the linear equation as well as the R-squared value was determined.  As you can see, the R-squared values for both the Close vs. Low (0.9514) and the Close vs. High (0.9528) are both very high.  Either value can be used to help determine the closing value.  The next step is to determine the standard deviation and the mean of the closing values.  The purpose of calculating the standard deviation is that we are looking to see how close we can actually come in predicting the close.

 

The standard deviation has an established rule of thumb used in determining future observations (predictions) of data.  Approximately 68% of observations are going to be within 1 standard deviation of the mean.  Approximately 95% of observations are going to be within 2 standard deviations of the mean.  And approximately 99.7% of observations are going to be within 3 standard deviations of the mean.

 

Using Excel’s statistical data command, I generated the following information for the close;

 

Close

 

 

 

Mean

10502.82108

Standard Error

32.57585809

Median

10472.48

Mode

#N/A

Standard Deviation

262.6349643

Sample Variance

68977.12448

Kurtosis

0.563081387

Skewness

0.580358015

Range

1256.26

Minimum

9919.58

Maximum

11175.84

Sum

682683.37

Count

65

Confidence Level(95.0%)

65.07769894

 

 

From the chart, we can see that the mean is 10502.821 and the standard deviation is 262.635.  What this means is that if my prediction for the Close is within 262.635 of 10502.821, I can be 68% sure that my prediction will be correct.  And if my prediction is within 525.27 of 10502.821, I can be 95% sure that I will be correct.

 

Now considering the equation from the Close vs. High scatter plot, I can predict the closing price for any given day.  The linear equation was determined to be y = 1.0246x – 392.08 and R-squared = 0.9528.  If, for example, the high value is predicted to be 10,654.00 for the day, I can plug this number into the formula to determine what the close is going to be.  Place 10,654 in place of “x” in the formula and Y = 10,524.0084.  Now, compare this value to the mean value of the Close, 10,502.82108.  The difference between these two values is 21.18732.  This difference is within 1 standard deviation so I can be 68% sure that my predication of the close for the day is going to be accurate.

 

This provides a good example on how decision statistics can be applied to the DJIA.  In order to understand the “rules of thumb” for the mean and standard deviation, I compared the highs and Closes for the three-month period, within the parameters set by the “rules of thumb.  The following frequency chart compares the percentages of actual highs versus the percentages that the standard deviation “rules of thumb” say we should get.

 

 

 

 

 

Frequency Table

 

 

 

 

 

 

 

Closing Mean

10502.821

 

 

Standard Deviation

262.635

 

 

 

 

 

 

 

 

 

 

Category

Upper Limit

Frequency

 

more than 3x stadev. below mean 

9714.916

0

 

between 2 and 3 stadev. below mean 

9977.551

2

 

between 1 and 2 stadev. below mean 

10240.186

5

 

between mean and 1 stadev. below mean 

10502.821

29

 

between mean and 1 stadev. above mean 

10765.456

20

 

between 1 and 2 stadev. above mean 

11028.091

5

 

between 2 and 3 stadev. above mean 

11290.726

4

 

more than 3x stadev. above mean 

11553.361

0

 

 

 

 

 

Percentages within "n" stadev of mean

 

 

 

n

1

2

3

Actual

75.38%

90.77%

100.00%

Rule of Thumb

68.00%

95.00%

99.70%

 

 

As you can see, the actual percentages are not as close as those dictated by the “rules of thumb”.  Within a +/- of 1 standard deviation, the percentage 7.38% higher than what the rules state should occur.  And within a +/- 2 standard deviation of the mean, the values are 4.23% less than what the rules show should occur.  I think these percentages would be closer together if your original sample were larger and more encompassing than the 3-month period chosen.  I do believe that the results are close enough to show that the rules are valid and can be applied.

 

Unfortunately, I think so much more goes into trying to predict the closing value of the DJIA.  Obviously, if there were a way to predict the market, someone would have made millions selling it on late night t.v. info-mercials.  I do think that I have shown the decision statistics that I used work in analyzing the data and making a decision, but on something as large as the market, it is very difficult to get conclusive results.