1.6 A medical researcher wants to estimate the survival time of a patient after the onset of a particular type of cancer and after a particular regimen of radiotherapy.
a. What is the variable of interest to the medical researcher?
b. Is the variable in part a qualitative, quantitative discrete, or quantitative continuous?
c. Identify the population of interest to the medical researcher.
d. Describe how the researcher could select a sample from the population.
e. What problems might arise in sampling from this population?
1.42     Are
some cities more windy than others?  Does 
| 8.9 | 7.1 | 9.1 | 8.9 | 10.2 | 12.4 | 11.8 | 10.9 | 12.8 | 10.4 | 
| 10.5 | 10.7 | 8.6 | 10.7 | 10.3 | 8.4 | 7.7 | 11.3 | 7.7 | 9.6 | 
| 7.9 | 10.6 | 9.3 | 9.1 | 7.8 | 6.0 | 8.3 | 8.8 | 9.2 | 11.5 | 
| 10.5 | 8.8 | 35.2 | 8.2 | 9.3 | 10.5 | 9.5 | 6.2 | 9.0 | 7.9 | 
| 9.6 | 9.7 | 8.8 | 7.0 | 8.7 | 8.9 | 8.9 | 9.4 |  |  | 
a.     
Construct a relative frequency histogram for the
data. (HINT:  Choose the class boundaries without including
the value  in the range of
values.)
 in the range of
values.)
b.     
The value  was recorded at
 was recorded at 
c.      
The average wind speed in 
1.44 In July of 2000, 22.4 million teenagers and young adults worked, a substantial number more than in April when school was still in session. Many of these young people worked in amusement and theme parks, whose average number of employees jumps dramatically during the summer months. Here are the most common injuries suffered on the job by kids under 18:
| Most Common Injury | Percentage | |
| Bruises and contusions | 14% |  | 
| Cuts and lacerations | 13% |  | 
| Fractures | 8% |  | 
| Heat burns | 9% |  | 
| Sprains and strains | 33% |  | 
a. Are all possible injuries accounted for in the table? Is another category necessary?
b. Create a pie chart to describe the data.
c. Construct a relative frequency histogram for the data.
d. Rearrange the bars in part c so that the categories are ranked from the largest percentage to the smallest,
e. Which of the three methods of presentation – part b, c, or d – is the most effective?
1.50 A group of 50 biomedical students recorded their pulse rates by counting the number of beats for 30 seconds and multiplying by 2.
| 80 | 70 | 88 | 70 | 84 | 66 | 84 | 82 | 66 | 42 | 
| 52 | 72 | 90 | 70 | 96 | 84 | 96 | 86 | 62 | 78 | 
| 60 | 82 | 88 | 54 | 66 | 66 | 80 | 88 | 56 | 104 | 
| 84 | 84 | 60 | 84 | 88 | 58 | 72 | 84 | 68 | 74 | 
| 84 | 72 | 62 | 90 | 72 | 84 | 72 | 110 | 100 | 58 | 
a. Why are all of the measurements even numbers?
b. Draw a stem and leaf plot to describe the data, splitting each stem in two lines.
c. Construct a relative frequency histogram for the data.
d. Write a sentence to describe the distribution of the student pulse rates.
N 1.     A scientist from the Environmental
Protection Agency took samples of the toxic substance polychlorinated biphenyl
(PCB) levels from the soil at 60 different waste disposal facilities located
throughout the 
| 57 | 53 | 51 | 55 | 54 | 47 | 47 | 45 | 58 | 54 | 
| 46 | 45 | 48 | 48 | 50 | 42 | 53 | 53 | 46 | 50 | 
| 54 | 53 | 47 | 56 | 41 | 58 | 51 | 44 | 53 | 53 | 
| 41 | 58 | 48 | 54 | 52 | 48 | 47 | 48 | 45 | 47 | 
| 53 | 52 | 54 | 46 | 46 | 55 | 42 | 49 | 42 | 49 | 
Draw a stem-and-leaf diagram for the data.
2.2       You are given  measurements: 3, 2, 5,
6, 4, 4, 3, 5.
 measurements: 3, 2, 5,
6, 4, 4, 3, 5.
a.     
Find  .
.
b.     
Find the median  .
.
c.      
Based on the results of parts a
and b, are the measurements symmetric or skewed?  Draw a dotplot to
confirm your answer.
2.14     You are given  measurements: 3, 1, 5,
6, 4, 4, 3, 5.
 measurements: 3, 1, 5,
6, 4, 4, 3, 5.
a. Calculate the range.
b. Calculate the sample mean.
c. Calculate the sample variance and standard deviation.
d. Compare the range and the standard deviation. The range is approximately how many standard deviations?
2.26 A group of experimental animals are infected with a particular form of bacteria, and their survival time is found to average 32 days, with a standard deviation of 36 days. You can use the Empirical Rule to see why the distribution of survival times could not be mound-shaped.
a.     
Find the value of  that is exactly one
standard deviation below the mean.
 that is exactly one
standard deviation below the mean.
b.     
If the distribution is in fact mound-shaped,
approximately what percentage of the measurements should be less than the value
of  found in part a?
 found in part a?
c. Since the variable being measured is time, is it possible to find any measurements that are more than one standard deviation below the mean?
d. Use your answers in part b and c to explain why the data distribution cannot be mound-shaped.
2.38 The weights (in pounds) of the 27 packages of ground beef in a supermarket meat display are listed here in order from smallest to largest:
| .75 | .83 | .87 | .89 | .89 | .89 | .92 | 
| .93 | .96 | .96 | .97 | .98 | .99 | 1.06 | 
| 1.08 | 1.08 | 1.12 | 1.12 | 1.14 | 1.14 | 1.17 | 
| 1.18 | 1.18 | 1.24 | 1.28 | 1.38 | 1.41 |  | 
a.     
Confirm the values of the mean and standard
deviation, calculated in Exercise 2.20 as  and s = .17.
 and s = .17.
b. The two largest packages of meat weigh 1.38 and 1.41 pounds. Are these two packages unusually heavy? Explain.
c. Construct a box plot for the package weights. What does the position of the median line and the length of the whiskers tell you about the shape of the distribution?
2.44 The number of television viewing hours per household and the prime viewing times are two factors that affect television advertising income. A random sample of 25 households in a particular viewing area produced the following estimates of viewing hours per household:
| 3.0 | 6.0 | 7.5 | 15.0 | 12.0 | 
| 6.5 | 8.0 | 4.0 | 5.5 | 6.0 | 
| 5.0 | 12.0 | 1.0 | 3.5 | 3.0 | 
| 7.5 | 5.0 | 10.0 | 8.0 | 3.5 | 
| 9.0 | 2.0 | 6.5 | 1.0 | 5.0 | 
a. Scan the data and use the range to find an approximate value for s. Use this value to check your calculations in part b.
b.     
Calculate the sample mean  and the sample
standard deviation s.  Compare s
with the approximate value obtained in part a.
 and the sample
standard deviation s.  Compare s
with the approximate value obtained in part a.
c.      
Find the percentage of the viewing hours per
household that falls into the interval .  Compare with the
corresponding percentage given by the Empirical Rule.
.  Compare with the
corresponding percentage given by the Empirical Rule.
2.58 A random sample of 100 foxes was examined by a team of veterinarians to determine the prevalence of a particular type of parasite. Counting the number of parasites per fox, the veterinarians found that 69 foxes had no parasites, 17 had one parasite, and so on. A frequency tabulation of the data is given here:
| Number of
  Parasites, x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 
| Number of Foxes, f | 69 | 17 | 6 | 3 | 1 | 2 | 1 | 0 | 1 | 
a. Construct a relative frequency histogram for x, the number of parasites per fox.
b.     
Calculate  and s for the sample.
 and s for the sample.
c. What fraction of parasite counts fall within two standard deviations of the mean? Within three standard deviations? Do these results agree with Tchebysheff’s Theorem? With the Empirical Rule?
3.14 Investors are becoming more and more concerned about securities fraud, especially involving initial public offerings (IPOs). During a 6-year period, the number of federal securities-fraud class action suits has continued to increase:
| Year | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 
| Suits | 110 | 178 | 236 | 205 | 211 | 282 | 
a. Plot the data using a scatterplot. How would you describe the relationship between year and number of class action suits?
b. Find the least squares regression line relating the number of class action suit to the year being measured.
c. If you were to predict the number of class action suits in the year 2002, what problems might arise with your predictions?
3.19 Using a chemical procedure called differential pulse polarography, a chemist measured the peak current generated (in microamperes) when a solution containing a given amount of nickel (in parts per billion) is added to a buffer. The data are shown here:
| x =
  Ni (ppb) | y
  = Peak Current (μA) | 
| 19.1 | .095 | 
| 38.2 | .174 | 
| 57.3 | .256 | 
| 76.2 | .348 | 
| 95 | .429 | 
| 114 | .500 | 
| 131 | .580 | 
| 150 | .651 | 
| 170 | .722 | 
Use a graph to describe the relationship between x and y. Add any numerical descriptive measures that are appropriate. Write a sentence summarizing your results.
N 2.        It is suspected that the concentration
of Pitocinase (units/ml) in a pregnant woman's blood
is correlated non-linearly with the number of weeks of pregnancy according to
the following function: y = a + c
log(x) ; where y is the number of
weeks of pregnancy and x is the concentration of Pitocinase
(units/ml). Using the following data, 
| concentration of Pitocinase (units/ml) | 0.06 | 0.6 | 1.4 | 4.3 | 13 | 
| number of weeks of
  pregnancy | 2 | 8 | 12 | 14.5 | 16.5 | 
find:
a. The coefficient of correlation (r).
b. The values of "a" and "c" in the regression equation.
c. For a woman whose blood has a concentration of Pitocinase of 0.85 units/ml, estimate the number of weeks of pregnancy.
d. What does the coefficient of determination tell us about the goodness of the fit, and based on its value, what do you conclude about the reliability of the regression equation to predict the number of weeks of pregnancy?
N 3. A computer scientist tests the lifetimes of 106 CPU computer chips and is interested in determining whether a significant correlation exists between the temperature of the CPU and the number of failures (i.e. chips that “burn out”). The following data was obtained:
| Temperature (°C), x | Failure Rate, y | 
| 85 | 820 | 
| 95 | 830 | 
| 98 | 840 | 
| 107 | 860 | 
| 111 | 880 | 
Draw a scatter diagram and then compute the coefficient of correlation.
4.6 On the first day of kindergarten, the teacher randomly selects 1 of his 25 students and records the student’s gender, as well as whether or not that student had gone to preschool.
b. Construct a tree diagram for this experiment. How many simple events are there?
c. The table below shows the distribution of the 25 students according to gender and preschool experience. Use the table to assign probabilities to the simple events in part b.
|  | Male | Female | 
| Preschool | 8 | 9 | 
| No preschool | 6 | 2 | 
d. What is the probability that the randomly selected student is male? What is the probability that the student is a female and did not go to preschool?
4.32 Five cards are selected from a 52-card deck for a poker hand.
a. How many possible poker hands can be dealt?
b. In how many ways can you receive four cards of the same face value and one card from the other 48 available cards?
c. What is the probability of being dealt four of a kind?
4.50 An experiment can result in one or both of events A and B with the probabilities shown in this probability table:
|  | A | AC | 
| B | .34 | .46 | 
| BC | .15 | .05 | 
Find the following probabilities:
a. P(A) b. P(B) c. P(A Ç B)
d. P(A Č B) e. P(AďB) f. P(BďA)
4.56 Two people enter a room and their birthdays (ignoring years) are recorded.
a. Identify the nature of the simple events in S.
b. What is the probability that the two people have a specific pair of birthdates?
c. Identify the simple events in event A: Both people have the same birthday.
d. Find P(A).
e. Find P(AC).
4.60 A survey of people in a given region showed that 20% were smokers. The probability of death due to lung cancer, given that a person smoked, was roughly 10 times the probability of death due to lung cancer, given that a person did not smoke. If the probability of death due to lung cancer in the region is .006, what is the probability of death due to lung cancer given that a person is a smoker?
4.88 Two tennis professionals, A and B, are scheduled to play a match; the winner is the first player to win three sets in a total that cannot exceed five sets. The event that A wins any one set is independent of the event that A wins any other, and the probability that A wins any one set is equal to .6. Let x equal the total number of sets in the match; that is, x = 3, 4, or 5. Find p(x).
4.112 A rental truck agency services its vehicles on a regular basis, routinely checking for mechanical problems. Suppose that the agency has six moving vans, two of which need to have new brakes. During a routine check, the vans are tested one at a time.
a. What is the probability that the last van with brake problems is the fourth van tested?
b. What is the probability that no more than four vans need to be tested before both brake problems are detected?
c. Given that one van with bad brakes is detected in the first two tests, what is the probability that the remaining van is found on the third or fourth test?
5.4 Use the formula for the binomial probability distribution to calculate the values of p(x), and construct the probability histogram for x when n = 6 and p = .2. [HINt: Calculate P(x = k) for seven different values of k.]
5.20 In a certain population, 85% of the people have Rh-positive blood. Suppose that two people from this population get married. What is the probability that they are both Rh-negative, thus making it inevitable that their children will be Rh-negative?
5.38     Increased
research and discussion have focused on the number of illnesses involving the
organism Escherichia coli (01257:H7),
which causes a breakdown of red blood cells and intestinal hemorrhages in its
victims.  Sporadic outbreaks of E. coli have occurred in 
a.     
What is the probability that at most five cases of E. coli per 100,000 are reported in 
b. What is the probability that more than five cases of E. coli per 100,000 are reported in a given year?
c. Approximately 95% of occurrences of E. coli involve at most how many cases?
5.46 Seeds are often treated with a fungicide for protection in poor-draining, wet environments. In a small-scale trial prior to a large-scale experiment to determine what dilution of the fungicide to apply, five treated seeds and five untreated seeds were planted in clay soil and the number of plants emerging from the treated and untreated seeds were recorded. Suppose the dilution was not effective and only four plants emerged. Let x represent the number of plants that emerged from treated seeds.
a. Find the probability that x = 4.
b. Find P(x Ł 3).
c. Find P(2 Ł x Ł 3).
5.62 Most weather forecasters protect themselves very well by attaching probabilities to their forecasts, such as “The probability of rain today is 40%.” Then, if a particular forecast is incorrect, you are expected to attribute the error to the random behaviour of the weather rather than to the inaccuracy of the forecaster. To check the accuracy of a particular forecaster, records were checked only for those days when the forecaster predicted rain “with 30% probability.” A check of 25 of those days indicated that it rained on 10 of the 25.
a. If the forecaster is accurate, what is the approximate value of p, the probability of rain on one of the 25 days?
b. What are the mean and standard deviation of x, the number of days on which it rained, assuming that the forecaster is accurate?
c.      
Calculate the z-score
for the observed value, x = 10.  [HINT:  Recall from Section
2.6 that  .]
.]
d. Do these data disagree with the forecast of a “30% probability of rain”? Explain.
5.68     Insulin-dependent
diabetes (IDD) is a common chronic disorder of children.  This disease occurs most frequently in
persons of northern European descent but the incidence ranges from a low of 1-2
cases per 100,000 per year to a high of more than 40 per 100,000 in parts of 
a. Can the distribution of the number of cases of IDD in this area be approximated by a Poisson distribution? If so, what is the mean?
b. What is the probability that the number of cases is less than or equal to 3 per 100,000?
c. What is the probability that the number of cases is greater than or equal to 3 but less than or equal to 7 per 100,000?
d. Would you expect to observe 10 or more cases of IDD per 100,000 in this area in a given year? Why or why not?
N 4.     Many colleges nationwide find that not all
applicants who are accepted for admission to a college will actually attend
that college.  Past experience at 
6.4 Find these probabilities for the standard normal variable z. You can find tables of the Standard Normal Distribution here .
            a.  P(z
< 2.33)                         b.  P(z < 1.645)
c. P(z > 1.96) d. P(-2.58 < z < 2.58)
6.10 A normal random variable x has mean m = 10 and standard deviation s = 2. Find the probabilities of these x-values.
a. x > 13.5 b. x < 8.2 c. 9.4 < x < 10.6
6.20 For a car traveling 30 miles per hour (mph), the distance required to brake to a stop is normally distributed with a mean of 50 feet and a standard deviation of 8 feet. Suppose you are traveling 30 mph in a residential area and a car moves abruptly into your path at a distance of 60 feet.
a. If you apply your brakes, what is the probability that you will brake to a stop within 40 feet or less? Within 50 feet or less?
b. If the only way to avoid a collision is to brake to a stop, what is the probability that you will avoid the collision?
6.30 A stringer of tennis rackets has found that the actual string tension achieved for any individual racket stringing will vary as much as 6 pounds per square inch from the desired tension set on the stringing machine. If the stringer wishes to string at a tension lower than that specified by a customer only 5% of the time, how much above or below the customer’s specified tension should the stringer set the stringing machine? (NOTE: Assume that the distribution of string tensions produced by the stringing machine is normally distributed, with a mean equal to the tension set on the machine and a standard deviation equal to 2 pounds per square inch.)
6.34 Let x be a binomial random variable for n = 25, p = .2.
a. Use Table 1 in Appendix I to calculate P(4 Ł x Ł 6).
b. Find m and s for the binomial probability distribution, and use the normal distribution to approximate the probability P(4 Ł x Ł 6). Note that this value is a good approximation to the exact value of P(4 Ł x Ł 6) even though np = 5.
6.42 Compilation of large masses of data on lung cancer shows that approximately 1 of every 40 adults acquires the disease. Workers in a certain occupation are known to work in an air-polluted environment that may cause an increased rate of lung cancer. A random sample of n = 400 workers shows 19 with identifiable cases of lung cancer. Do the data provide sufficient evidence to indicate a higher rate of lung cancer for these workers than for the national average?
6.64 A manufacturing plant uses 3000 electric light bulbs whose life spans are normally distributed, with mean and standard deviation equal to 500 and 50 hours, respectively. In order to minimize the number of bulbs that burn out during operating hours, all the bulbs are replaced after a given period of operation. How often should the bulbs be replaced if we wish no more than 1% of the bulbs to burn out between replacement periods?
6.70 Is television dangerous to your diet? Psychologists believe that excessive eating may be associated with emotional states (being upset or bored) and environmental cues (watching television, reading, and so on). To test this theory, suppose you randomly selected 60 overweight persons and matched them by weight and gender in pairs. For a period of 2 weeks, one of each pair is required to spend evenings reading novels of interest to him or her. The other member of each pair spends each evening watching television. The calorie count for all snack and drink intake for the evenings is recorded for each person, and you record x = 19, the number of pairs for which the television watchers’ calorie intake exceeded the intake of the readers. If there is no difference in the effects of television and reading on calorie intake, the probability p that the calorie intake of one member of a pair exceeds that of the other member is .5. Do these data provide sufficient evidence to indicate a difference between the effects of television watching and reading on calorie intake? (HINT: Calculate the z-score for the observed value, x = 19.)
7.6       A
question was mailed to 1000 registered municipal voters selected at
random.  Only 500 questionnaires were
returned, and of the 500 returned, 360 respondents were strongly opposed to a
surcharge proposed to support the 
7.18 Suppose a random sample of n = 25 observations is selected from a population that is normally distributed, with mean equal to 106 and standard deviation equal to 12.
a.     
Give the mean and the standard deviation of the
sampling distribution of the sample mean  .
.
b.     
Find the probability that  exceeds 110.
 exceeds 110.
c. Find the probability that the sample mean deviates from the population mean m = 106 by no more than 4.
7.22     Suppose
that college faculty with the rank of professor at 2-year institutions earn an
average of $57,785 per year with a standard deviation of $4000.  In an attempt to verify this salary level, a
random sample of 60 professors was selected from a personnel database for all
2-year institutions in the 
a.     
Describe the sampling distribution of the sample mean  .
.
b. Within what limits would you expect the sample average to lie, with probability .95?
c.      
Calculate the probability that the sample mean  is greater than
$60,000.
 is greater than
$60,000.
d. If your random sample actually produced a sample mean of $60,000, would you consider this unusual? What conclusion might you draw?
7.48     Studies
indicate that drinking water supplied by some old lead-lined city piping
systems may contain harmful levels of lead.  An important study of the 
a. Explain why you believe this distribution is or is not normally distributed.
b. Because the researchers were concerned about the shape of the distribution in part a, they calculated the average daily lead levels at 40 different locations on each of 23 randomly selected days. What can you say about the shape of the distribution of the average daily lead levels from which the sample of 23 days was taken?
7.53 A biology experiment was designed to determine whether sprouting radish seeds inhibit the germination of lettuce seeds. Three 10-centimeter Petri dishes were used. The first contained 26 lettuce seeds, the second contained 26 radish seeds, and the third contained 13 lettuce seeds and 13 radish seeds.
a. Assume that the experimenter had a package of 50 radish seeds and another of 50 lettuce seeds. Devise a plan for randomly assigning the radish and lettuce seeds to the three treatment groups.
b. What assumptions must the experimenter make about the packages of 50 seeds in order to assure randomness in the experiment?
7.56 The proportion of individuals with an Rh-positive blood type is 85%. You have a random sample of n = 500 individuals.
a. What are the mean and standard deviation of p-hat, the sample proportion with Rh-positive blood type?
b. Is the distribution of p-hat approximately normal? Justify your answer.
c. What is the probability that the sample proportion p-hat exceeds 82%?
d. What is the probability that the sample proportion lies between 83% and 88%?
e. 99% of the time, the sample proportion would lie between what two limits?
7.58         
The maximum load (with a generous safety factor)
for the elevator in an office building is 2000 pounds.  The relative frequency distribution of the
weights of all men and women using the elevator is mound-shaped (slightly skewed
to the heavy weights), with mean m
equal to 150 pounds and standard deviation s
equal to 35 pounds.  What is the largest
number of people you can allow on an elevator if you want their total weight to
exceed the maximum rate with a small probability (say, near .01)?  (HINT:  If  are independent
observations made on a random variable x,
and if x has mean m and variance
 are independent
observations made on a random variable x,
and if x has mean m and variance , then the mean and variance of
, then the mean and variance of  are nm and
are nm and  , respectively.  This
result was given in Section 7.4.)
, respectively.  This
result was given in Section 7.4.)
N 5. Using the Java Applet called "Sampling Distribution Simulation" which can be found here ( http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html ), carry out the following exercises (first click on the Begin button):
a.   In the right-hand
pull-down menu (click on the down arrow beside the word 
b.   Next go down to the
"Distribution of Means" graph and go to the right-hand pull-down menu
(click on the down arrow beside N=5) and choose the values N=10 and N=25 (these
are the sample sizes).  Each sample is
drawn (randomly) from the parent population (i.e., the skewed
distribution).  For each case, click on
the 10,000 samples (number of samples of size N) button and note the mean value
and standard deviation of the "Distribution of Means" graph (data is
at far left-hand side of graph).  Is the
mean of the sampling distribution approximately equal to the mean of the skewed
parent population (write down the numbers)? 
Should it be?  Is the standard
deviation of the sampling distribution approximately equal to the standard
deviation of the skewed parent population divided by the square root of the
sample size N (write down the numbers)? 
Should it be?  Is the sampling
distribution approximately normal in shape? 
[You may want to click on the box entitled "Fit 
c. What sampling distribution should be more "normal", the one for N=10 or N=25. Explain.
8.22 Find a (1 - a)100% confidence interval for a population mean m for these values:
a.     
a
= .01, n = 38,  = 34, s2 = 12
 = 34, s2 = 12               
b.     
a
= .10, n = 65,  = 1049, s2 = 51
 = 1049, s2 = 51
c.      
a
= .05, n = 89,  = 66.3, s2 = 2.48
 = 66.3, s2 = 2.48
8.34 In a report of why e-shoppers abandon their online sales transactions, Alison Stein Wellner found that “pages took too long to load” and “site was so confusing that I couldn’t find the product” were the two complaints heard most often. Based on customers’ responses, the average time to complete an online order form will take 4.5 minutes. Suppose that n = 50 customers responded and that the standard deviation of the time to complete an online order is 2.7 minutes.
a. Do you think that x, the time to complete the online order form, has a mound-shaped distribution? If not, what shape would you expect?
b. If the distribution of the completing time is not normal, you can still use the standard normal distribution to construct a confidence interval for m, the mean completion time for online shoppers. Why?
c. Construct a 95% confidence interval for m, the mean completion time for online orders.
8.40 An experiment was conducted to compare two diets A and B designed for weight reduction. Two groups of 30 overweight dieters each were randomly selected. One group was placed on diet A and the other on diet B, and their weight losses were recorded over a 30-day period. The means and standard deviations of the weight-loss measurements for the two groups are shown in the table. Find a 95% confidence interval for the difference in mean weight loss for the two diets. Interpret your confidence interval.
| Diet A |       Diet B | 
| 
 | 
 | 
| 
 | 
 | 
8.52 Do you think that we should let Radio Shack film a commercial in outer space? The commercialism of our space program is a topic of great interest since Dennis Tito paid $20 million to ride along with the Russians on the space shuttle. In a survey of 500 men and 500 women, 20% of the men and 26% of the women responded that space should remain commercial-free.
a. Construct a 98% confidence interval for the difference in the proportions of men and women who think that space should remain commercial-free.
b. What does it mean to say that you are “98% confident”?
c. Based on the confidence interval in part a, can you conclude that there is a difference in the proportions of men and women who think space should remain commercial-free?
8.58     Independent
random samples of n1 = n2 = n observations are to be selected from each of two populations 1
and 2.  If you wish to estimate the
difference between the two population means correct to within .17, with
probability equal to .90, how large should n1
and n2 be?  Assume that you know  .
.
8.66 Suppose you wish to estimate the mean pH of rainfalls in an area that suffers heavy pollution due to the discharge of smoke from a power plant. You know that s is in the neighbourhood of .5 pH, and you wish your estimate to lie within .1 of m, with the probability near .95. Approximately how many rainfalls must be included in your sample (one pH reading per rainfall)? Would it be valid to select all of your water specimens from a single rainfall? Explain.
8.88 In an article in the Annals of Botany, a researcher reported the basal stem diameters of two groups of dicot sunflowers: those that were left to sway freely in the wind and those that were artificially supported. A similar experiment was conducted for monocot maize plants. Although the authors measured other variables in a more complicated experimental design, assume that each group consisted of 64 plants (a total of 128 sunflower and 128 maize plants). The values shown in the table are the sample means plus or minus the standard error.
|  | Sunflower | Maize | 
| Free-standing | 35.3 ± .72 | 16.2 ± .41 | 
| Supported | 32.1 ± .72 | 14.6 ± .40 | 
Use your knowledge of statistical estimation to compare the free-standing and supported basal diameters for the two plants. Write a sentence describing your conclusions, making sure to include a measure of the accuracy of your inference.
8.90 A dean of freshmen wishes to estimate the average cost of the freshman year at a particular college correct to within $500, with a probability of .95. If a random sample of freshmen is to be selected and each asked to keep financial data, how many must be included in the sample? Assume that the dean knows only that the range of expenditures will vary from approximately $4800 to $13,000.
9.2 Find the p-value for the following large-sample z tests:
a. A right-tailed test with observed z = 1.15
b. A two-tailed test with observed z = -2.78
c. A left-tailed test with observed z = -1.81
9.8       High
airline occupancy rates on scheduled flights are essential to corporate
profitability.  Suppose a scheduled
flight must average at least 60% occupancy in order to be profitable, and an
examination of the occupancy rate for 120 
a. If m is the mean occupancy per flight and if the company wishes to determine whether or not this scheduled flight is unprofitable, give the alternative and the null hypothesis for the test.
b. Does the alternative hypothesis in part a imply a one- or two-tailed test? Explain.
c. Do the occupancy data for the 120 flights suggest that this scheduled flight is unprofitable? Test using a = .05.
9.16     Suppose
you wish to detect a difference between m1
and m2 (either m1 > m2
or m1 < m2) and, instead of running a two-tailed
test using a = .05, you use the following
test procedure.  You wait until you have
collected the sample data and have calculated  and
 and  .  If
.  If   is larger than
 is larger than  , you choose the alternative hypothesis Ha : m1
> m2 and run a
one-tailed test, placing a1
= .05 in the upper tail of the z
distribution.  If, on the other hand,
, you choose the alternative hypothesis Ha : m1
> m2 and run a
one-tailed test, placing a1
= .05 in the upper tail of the z
distribution.  If, on the other hand,  is larger than
 is larger than  , you reverse the procedure and run a one-tailed test,
placing a2 = .05 in
the lower tail of the z
distribution.  If you use this procedure
and m1 actually
equals m2, what is
the probability a  that you will conclude that m1 is not equal to m2,
(i.e. what is the probability a  that you will incorrectly reject Ho when Ho is true)?  This
exercise demonstrates why statistical tests should be formulated prior to observing the data.
, you reverse the procedure and run a one-tailed test,
placing a2 = .05 in
the lower tail of the z
distribution.  If you use this procedure
and m1 actually
equals m2, what is
the probability a  that you will conclude that m1 is not equal to m2,
(i.e. what is the probability a  that you will incorrectly reject Ho when Ho is true)?  This
exercise demonstrates why statistical tests should be formulated prior to observing the data.
9.32 Contact lenses, worn by about 26 million Americans, come in many styles and colours. Most Americans wear soft contact lenses, with the most popular colours being the blue varieties (25%), followed by greens (24%), and then hazel or brown. A random sample of 80 tinted contact lens wearers was checked for the colour of their lenses. Of these people, 22 wore blue lenses and only 15 wore green lenses.
a. Do the sample data provide sufficient evidence to indicate that the proportion of tinted contact lens wearers who wear blue lenses is different from 25%? Use a = .05.
b. Do the sample data provide sufficient evidence to indicate that the proportion of tinted contact lens wearers who wear green lenses is different from 24%? Use a = .05.
c. Is there any reason to conduct a one-tailed test for either part a or b? Explain.
9.44 a. Define a and b for a statistical test of hypothesis.
b. For a fixed sample size n, if the value of a is decreased, what is the effect on b?
c. In order to decrease both a and b for a particular alternative value of m, how must the sample size change?
9.50 The commercialism of our space program was the topic of Exercise 8.52. In a survey of 500 men and 500 women, 20% of the men and 26% of the women responded that space should remain commercial-free.
a. Is there a significant difference in the population proportions of men and women who think that space should remain commercial-free? Use a = .01.
b. Can you think of any reason why a statistically significant difference in these population proportions might be of practical importance to the administrators of the space program? To the advertisers? To the politicians?
9.62 The braking ability was compared for two 2002 automobile models. Random samples of 64 automobiles were tested for each type. The recorded measurement was the distance (in feet) required to stop when the brakes were applied at 40 miles per hour. These are the computed sample means and variances:
| 
 | 
 | 
| 
 | 
 | 
Do the data provide sufficient evidence to indicate a difference between the mean stopping distance for the two models?
10.2 Find the critical value(s) of t that specify the rejection region in these situations (you can find tables of the Student t-Distribution here):
a. A two-tailed test with a = .01 and 12 df
b. A right-tailed test with a = .05 and 16 df
c. A two-tailed test with a = .05 and 25 df
d. A left-tailed test with a = .01 and 7 df
10.12 Organic chemists often purify organic compounds by a method known as fractional crystallization. An experimenter wanted to prepare and purify 4.85 grams (g) of aniline. Ten 4.85-g quantities of aniline were individually prepared and purified to acetanilide. The following dry yields were recorded:
| 3.85 | 3.80 | 3.88 | 3.85 | 3.90 | 
| 3.36 | 3.62 | 4.01 | 3.72 | 3.82 | 
Approximately how many 4.85-g specimens of aniline are required if you wish to estimate the mean number of grams of acetanilide correct to within .06 g with probability equal to .95?
10.24 Chronic anterior compartment syndrome is a condition characterized by exercise-induced pain in the lower leg. Swelling and impaired nerve and muscle function also accompany this pain, which is relieved by rest. Susan Beckham and colleagues conducted an experiment involving ten healthy runners and ten healthy cyclists to determine whether there are significant differences in pressure measurements within the anterior muscle compartment for runners and cyclists. The data summary – compartment pressure in millimeters of mercury (Hg) – is as follows:
|  | Runners | Cyclists | ||
| Condition | Mean  | Standard Deviation | Mean | Standard Deviation | 
| Rest | 14.5 | 3.92 | 11.1 | 3.98 | 
| 80% maximal O2 consumption | 12.2 | 3.49 | 11.5 | 4.95 | 
| Maximal O2 consumption | 19.1 | 16.9 | 12.2 | 4.47 | 
a. Test for a significant difference in compartment pressure between runners and cyclists under the resting condition. Use a = .05.
b. Construct a 95% confidence interval estimate of the difference in means for runners and cyclists under the condition of exercising at 80% of maximal oxygen consumption.
c. To test for a significant difference in compartment pressure at maximal oxygen consumption should you use the pooled or unpooled t test? Explain.
10.40 The earth’s temperature (which affects seed germination, crop survival in bad weather, and many other aspects of agricultural production) can be measured using either ground-based sensors or infrared-sensing devices mounted in aircraft or space satellites. Ground-based sensoring is tedious, requiring many replications to obtain an accurate estimate of ground temperature. On the other hand, airplane or satellite sensoring of infrared waves appears to introduce a bias in the temperature readings. To determine the bias, readings were obtained at five different locations using both ground- and air-based temperature sensors. The readings (in degrees Celsius) are listed here:
| Location | Ground | Air | 
| 1 | 46.9 | 47.3 | 
| 2 | 45.4 | 48.1 | 
| 3 | 36.3 | 37.9 | 
| 4 | 31.0 | 32.7 | 
| 5 | 24.7 | 26.2 | 
How many paired observations are required to estimate the difference between mean temperatures for ground- versus air-based sensors correct to within .2°C, with probability approximately equal to .95?
10.76 An experiment was conducted to compare mean lengths of time required for the bodily absorption of two drugs A and B. Ten people were randomly selected and assigned to receive one of the drugs. The length of time (in minutes) for the drug to reach a specified level in the blood was recorded, and the data summary is given in the table:
| Drug A | Drug B | 
| 
 | 
 | 
| 
 | 
 | 
a. Do the data provide sufficient evidence to indicate a difference in mean times to absorption for the two drugs? Test using a = .05.
b. Find the approximate p-value for the test. Does this confirm your conclusions?
10.81 Karl Niklas and T.G. Owens examined the differences in a particular plant, Plantago Major L., when grown in full sunlight versus shade conditions. In this study, shaded plants received direct sunlight for less than 2 hours each day, whereas full-sun plants were never shaded. A partial summary of the data based on n1 = 16 full-sun plants and n2 = 15 shade plants is shown here:
|  | Full Sun | Shade | |||
|  | 
 | s | 
 | s | |
| Leaf area (cm2) | 128.00 | 43.00 | 78.70 | 41.70 | |
| Overlap area (cm2) | 46.80 | 2.21 | 8.10 | 1.26 | |
| Leaf number | 9.75 | 2.27 | 6.93 | 1.49 | |
| Thickness (mm) | .90 | .03 | .50 | .02 | |
| Length (cm) | 8.70 | 1.64 | 8.91 | 1.23 | |
| Width (cm) | 5.24 | .98 | 3.41 | .61 | |
a. What assumptions are required in order to use the small-sample procedures given in this chapter to compare full-sun versus shade plants? From the summary presented, do you think that any of these assumptions have been violated?
b. Do the data present sufficient evidence to indicate a difference in mean leaf area for full-sun versus shade plants?
c. Do the data present sufficient evidence to indicate a difference in mean overlap area for full-sun versus shade plants?
10.100 At a time when energy conservation is so important, some scientists think closer scrutiny should be given to the cost (in energy) of producing various forms of food. Suppose you wish to compare the mean amount of oil required to produce 1 acre of corn versus 1 acre of cauliflower. The readings (in barrels of oil per acre), based on 20-acre plots, seven for each crop, are shown in the table. Use these data to find a 90% confidence interval for the difference between the mean amounts of oil required to produce these two crops.
| Corn | Cauliflower | 
| 5.6 | 15.9 | 
| 7.1 | 13.4 | 
| 4.5 | 17.6 | 
| 6.0 | 16.8 | 
| 7.9 | 15.8 | 
| 4.8 | 16.3 | 
| 5.7 | 17.1 | 
10.104 The data shown here were collected on lost-time accidents (the figures given are mean work-hours lost per month over a period of 1 year) before and after an industrial safety program was put into effect. Data were recorded for six industrial plants. Do the data provide sufficient evidence to indicate whether the safety program was effective in reducing lost-time accidents? Test using a = .01.
|  | Plant Number | |||||
|  | 1 | 2 | 3 | 4 | 5 | 6 | 
| Before program | 38 | 64 | 42 | 70 | 58 | 30 | 
| After Program | 31 | 58 | 43 | 65 | 52 | 29 | 
10.44	A random sample of n = 25 observations from a normal population produced a 
sample variance equal to 21.4.  Do these data provide sufficient evidence to indicate 
that  > 15?  Test using a
 = .05.
 > 15?  Test using a
 = .05.
10.102 The closing prices of two common stocks were recorded for a period of 15 days. The means and variances are
| 
 | 
 | 
| 
 | 
 | 
14.2 Use Table 5 in Appendix I or the tables of the Chi-Square Distribution given here to find the value of c2 with the following area a to its right:
a. a = .05, df = 3 b. a = .01, df = 8
14.12 Suppose you are interested in following two independent traits in snap peas – seed texture (S = smooth, s = wrinkled) and seed colour (Y = yellow, y = green) – in a second-generation cross of heterozygous parents. Mendelian theory states that the number of peas classified as smooth and yellow, wrinkled and yellow, smooth and green, wrinkled and green should be in the ratio 9:3:3:1. Suppose that 100 randomly selected snap peas have 56, 19, 17, and 8 in these respective categories. Do these data indicate that the 9:3:3:1 model is correct? Test using a = .01.
14.18 Is there a generation gap? A sample of adult Americans of three different generations were asked to agree or disagree with this statement: If I had the chance to start over in life, I would do things differently. The results are given in the table. Do the data indicate a generation gap for this particular question? That is, does a person’s opinion change depending on the generation group from which he or she comes? If so, describe the nature of the differences. Use a = .05.
|  | GenXers | Boomers | Matures | 
|  | (born 1965-1976) | (born 1946-1964) | (born before 1946) | 
| Agree | 118 | 213 | 88 | 
| Disagree | 80 | 87 | 61 | 
14.26 A particular poultry disease is thought to be non-communicable. To test this theory, 30,000 chickens were randomly partitioned into three groups of 10,000. One group had no contact with diseased chickens, one had moderate contact, and the third had heavy contact. After a 6-month period, data were collected on the number of diseased chickens in each group of 10,000. Do the data provide sufficient evidence to indicate a dependence between the amount of contact between diseased and non-diseased fowl and the incidence of the disease? Use a = .05.
|  | No Contact | Moderate Contact | Heavy Contact | 
| Disease | 87 | 89 | 124 | 
| No Disease | 9,913 | 9,911 | 9,876 | 
| Total | 10,000 | 10,000 | 10,000 | 
14.30     
A survey was conducted to investigate the interest
of middle-aged adults in physical fitness programs in 
|  |  |  |  |  | 
| Participate | 46 | 63 | 108 | 121 | 
| Do not participate | 149 | 178 | 192 | 179 | 
Do the data indicate a difference in adult participation in physical fitness programs from one state to another? If so, describe the nature of the differences.
14.40 A survey was conducted to determine student, faculty, and administration attitudes about a new university parking policy. The distribution of those favouring or opposing the policy is shown in the table. Do the data provide sufficient evidence to indicate that attitudes about the parking policy are independent of student, faculty, or administration status?
|  | Student | Faculty | Administration | 
| Favour | 252 | 107 | 43 | 
| Oppose | 139 | 81 | 40 | 
14.48 Although white has long been the most popular car colour, trends in fashion and home design have signaled the emergence of green as the colour of choice in recent years. The growth in the popularity of green hues stems partially from an increased interest in the environment and increased feelings of uncertainty. According to an article in The Press-Enterprise, “green symbolizes harmony and counteracts emotional stress.” The article cites the top five colours and the percentage of the market share for four difference classes of cars. These data are for the truck-van category.
| Colour | White |  | Green | Red | Black | 
| Percent | 29.72 | 11.00 | 9.24 | 9.08 | 9.01 | 
In an attempt to verify the accuracy of these figures, we take a random sample of 250 trucks and vans and record their colour. Suppose that the number of vehicles that fall into each of the five categories are 82, 22, 27, 21, and 20, respectively.
a. Is any category missing in the classification? How many cars and trucks fell into that category?
b. Is there sufficient evidence to indicate that our percentages of trucks and vans differ from those given? Find the approximate p-value for the test.
12.8 Professor Isaac Asimov was one of the most prolific writers of all time. Prior to his death he wrote nearly 500 books during a 40-year career. In fact, as his career progressed, he became even more productive in terms of the number of books written within a given period of time. The data give the time in months required to write his books in increments of 100:
| Number of Books, x | 100 | 200 | 300 | 400 | 490 | 
| Time in Months, y | 237 | 350 | 419 | 465 | 507 | 
a. Assume that the number of books x and the time in months y are linearly related. Find the least-squares line relating y to x.
b. Plot the time as a function of the number of books written using a scatterplot, and graph the least-squares line on the same paper. Does it seem to provide a good fit to the data points?
12.16 An experiment was designed to compare several different types of air pollution monitors. The monitor was set up, and then exposed to different concentrations of ozone, ranging between 15 and 230 parts per million (ppm) for periods of 8-72 hours. Filters on the monitor were then analyzed, and the amount (in micrograms) of sodium nitrate (NO3) recorded by the monitor was measured. The results for one type of monitor are given in the table.
| Ozone, x (ppm/hr) | .8 | 1.3 | 1.7 | 2.2 | 2.7 | 2.9 | 
| NO3, y (mg) | 2.44 | 5.21 | 6.07 | 8.98 | 10.82 | 12.16 | 
a. Find the least-squares regression line relating the monitor’s response to the ozone concentration.
b. Do the data provide sufficient evidence to indicate that there is a linear relationship between the ozone concentration and the amount of sodium nitrate detected?
c. Calculate r2. What does this value tell you about the effectiveness of the linear regression analysis?
12.28 A marketing research experiment was conducted to study the relationship between the length of time necessary for a buyer to reach a decision and the number of alternative package designs of a product presented. Brand names were eliminated from the packages to reduce the effects of brand preferences. The buyers made their selections using the manufacturer’s product descriptions on the packages as the only buying guide. The length of time necessary to reach a decision was recorded for 15 participants in the marketing research study.
| Length of Decision
  Time, y (sec) | 5, 8, 8, 7, 9 | 7, 9, 8, 9, 10 | 10, 11, 10, 12, 9 | 
| Number of
  Alternatives, x | 2 | 3 | 4 | 
a. Find the least-squares line appropriate for these data.
b. Plot the points and graph the line as a check on your calculations.
c. Calculate s2.
d. Do the data present sufficient evidence to indicate that the length of decision time is linearly related to the number of alternative package designs? (Test at the a = .05 level of significance.)
e. Find the appropriate p-value for the test and interpret its value.
g. Estimate the average length of time necessary to reach a decision when three alternatives are presented, using a 95% confidence interval.
12.40 G.W. Marino investigated the variables related to a hockey player’s ability to make a fast start from a stopped position. In the experiment, each skater started from a stopped position and attempted to move as rapidly as possible over a 6-meter distance. The correlation coefficient r between a skater’s stride rate (number of strides per second) and the length of time to cover the 6-meter distance for the sample of 69 skaters was -.37.
a. Do the data provide sufficient evidence to indicate a correlation between stride rate and time to cover the distance? Test using a = .05.
b. Find the approximate p-value for the test.
c. What are the practical implications of the test in part a?
12.48 Athletes and others suffering the same type of injury to the knee often require anterior and posterior ligament reconstruction. In order to determine the proper length of bone-patellar tendon-bone grafts, experiments were done using three imaging techniques to determine the required length of the grafts and these results were compared to the actual length required. A summary of the results of a simple linear regression analysis for each of these three methods is given in the following table.
| Imaging Technique | Coefficient of Determination, r2 | Intercept | Slope | p-value | 
| Radiographs | 0.80 | -3.75 | 1.031 | <0.0001 | 
| Standard MRI | 0.43 | 20.29 | 0.497 | 0.011 | 
| 3-D MRI | 0.65 | 1.80 | 0.977 | <0.0001 | 
a. What can you say about the significance of each of the three regression analyses?
b. How would you rank the effectiveness of the three regression analyses? What is the basis of your decision?
c. How do the values of r2 and the p-values compare in determining the best predictor of actual graft lengths of ligament required?
13.4     Suppose
that you fit the model  to 15 data points and
found F equal to 57.44.
 to 15 data points and
found F equal to 57.44.
The computer output for multiple regression analysis for the above (Exercise 13.3) provides this information:
| b0 = 1.04 | b1 = 1.29 | b2 = 2.72 | b3 = .41 | 
|  | SE(b1) = .42 | SE(b2) = .65 | SE(b3) = .17 | 
a. Which, if any, of the independent variables x1, x2, and x3 contribute information for the prediction of y?
b. Give the least-squares prediction equation.
c. On the same sheet of graph paper, graph y versus x1 when x2 = 1 and x3 = 0 and when x2 = 1 and x3 = .5. What relationship do the two lines have to each other?
d. What is the practical interpretation of the parameter b1?
13.12 You have a hot grill and an empty hamburger bun, but you have sworn off greasy hamburgers. Would a meatless hamburger do? The data in the table record a flavour and texture score (between 0 and 100) for 12 brands of meatless hamburgers along with the price, number of calories, amount of fat, and amount of sodium per burger. Some of these brands try to mimic the taste of meat, while others do not. The MINITAB printout shows the regression of the taste score y on the four predictor variables: price, calories, fat, and sodium.
| Brand | Score, y | Price, x1 | Calories, x2 | Fat, x3 | Sodium, x4 | 
| 1 | 70 | 91 | 110 | 4 | 310 | 
| 2 | 45 | 68 | 90 | 0 | 420 | 
| 3 | 43 | 92 | 80 | 1 | 280 | 
| 4 | 41 | 75 | 120 | 5 | 370 | 
| 5 | 39 | 88 | 90 | 0 | 410 | 
| 6 | 30 | 67 | 140 | 4 | 440 | 
| 7 | 68 | 73 | 120 | 4 | 430 | 
| 8 | 56 | 92 | 170 | 6 | 520 | 
| 9 | 40 | 71 | 130 | 4 | 180 | 
| 10 | 34 | 67 | 110 | 2 | 180 | 
| 11 | 30 | 92 | 100 | 1 | 330 | 
| 12 | 26 | 95 | 130 | 2 | 340 | 
MINTAB output for Exercise 13.12
Regression Analysis:
y versus x1, x2, x3, x4
The regression equation is
Y = 59.8 + 0.129 x1 – 0.580 x2 + 8.50 x3 + 0.0488 x4
| Predictor | Coef | SE Coef | T | P | 
| Constant | 59.85 | 35.68 | 1.68 | 0.137 | 
| x1 | 0.1287 | 0.3391 | 0.38 | 0.716 | 
| x2 | -0.5805 | 0.2888 | -2.01 | 0.084 | 
| x3 | 8.498 | 3.472 | 2.45 | 0.044 | 
| x4 | 0.04876 | 0.04062 | 1.20 | 0.269 | 
S = 12.72                R-Sq
= 49.9%        R-Sq(adj) = 21.3%
Analysis of Variance
| Source | DF | SS | MS | F | P | 
| Regression | 4 | 1128.4 | 282.1 | 1.74 | 0.244 | 
| Residual
  Error | 7 | 1132.6 | 161.8 |  |  | 
| Total | 11 | 2261.0 |  |  |  | 
| Source | DF | Seq SS | 
| x1 | 1 | 11.2 | 
| x2 | 1 | 19.6 | 
| x3 | 1 | 864.5 | 
| x4 | 1 | 233.2 | 
a. Comment on the fit of the model using the statistical test for the overall fit and the coefficient of determination, R2.
b. If you wanted to refit the model by eliminating one of the independent variables, which one would you eliminate? Why?
13.20   The
Academic Performance Index (API), described in Exercise 12.11, is a measure of
school achievement based on the results of the Stanford 9 Achievement
Test.  The 2001 API scores for eight elementary school in 
| School | API Score, y | Awards, x1 | % Meals, x2 | % ELL, x3 | % Emergency, x4 | 2000 API, x5 | 
| 1 | 588 | Yes | 58 | 34 | 16 | 533 | 
| 2 | 659 | No | 62 | 22 | 5 | 655 | 
| 3 | 710 | Yes | 66 | 14 | 19 | 695 | 
| 4 | 657 | No | 36 | 30 | 14 | 680 | 
| 5 | 669 | No | 40 | 11 | 13 | 670 | 
| 6 | 641 | No | 51 | 26 | 2 | 636 | 
| 7 | 557 | No | 73 | 39 | 14 | 532 | 
| 8 | 743 | Yes | 22 | 6 | 4 | 705 | 
The variables are defined as
x1 = 1 if the school was given a financial award for meeting goals, 0 if not.
x2 = % of students who qualify for free or reduced price meals
x3 = % of students who are English Language Learners
x4 = % of teachers on emergency credentials
x5 = API score in 2000
The MINITAB printout for a first-order regression model is given below.
            Regression Analysis
The regression equation is
y = 269 + 33.2 x1 – 0.003 x2 – 1.02 x3 – 1.00 x4 +
0.636 x5
| Predictor | Coef | STDev | T | P | 
| Constant | 269.03 | 41.55 | 6.48 | 0.023 | 
| x1 | 33.227 | 4.373 | 7.60 | 0.017 | 
| x2 | -0.0027 | 0.1396 | -0.02 | 0.987 | 
| x3 | -1.0159 | 0.3237 | -3.14 | 0.088 | 
| x4 | -1.0032 | 0.3391 | -2.96 | 0.098 | 
| x5 | 0.63560 | 0.05209 | 12.20 | 0.007 | 
S = 4.734                R-Sq
= 99.8%        R-Sq(adj) = 99.4%
Analysis of Variance
| Source | DF | SS | MS | F | P | 
| Regression | 5 | 25197.2 | 5039.4 | 224.87 | .004 | 
| Residual
  Error | 2 | 44.8 | 22.4 |  |  | 
| Total | 7 | 25242.0 |  |  |  | 
a. What is the model that has been fit to this data? What is the least squares prediction equation?
b. How well does the model fit? Use any relevant statistics from the printout to answer this question.
c. Which, if any, of the independent variables are useful in predicting the 2001 API, given the other independent variables already in the model? Explain.
d. Use the values of R2 and R2(adj) in the printout below to choose the best model for prediction. Would you be confident in using the chosen model for predicting the 2002 API score based on a model containing similar variables? Explain.
Best Subsets
regression
Response is y
| Vars | R-Sq | Adj. R-sq | C-p | s | x1 | x2 | x3 | x4 | x5 | 
|  |  |  |  |  |  |  |  |  |  | 
| 1 | 87.9 | 85.8 | 132.7 | 22.596 |  |  |  |  | X | 
| 1 | 84.5 | 81.9 | 170.7 | 25.544 |  |  | X |  |  | 
| 2 | 97.4 | 96.4 | 27.1 | 11.423 | X |  |  |  | X | 
| 2 | 94.6 | 92.4 | 58.8 | 16.512 |  |  | X |  | X | 
| 3 | 99.0 | 98.2 | 11.8 | 8.1361 | X |  | X |  | X | 
| 3 | 98.9 | 98.2 | 11.9 | 8.1654 | X |  |  | X | X | 
| 4 | 99.8 | 99.6 | 4.0 | 3.8656 | X |  | X | X | X | 
| 4 | 99.0 | 97.8 | 12.8 | 8.9626 | X | X | X |  | X | 
| 5 | 99.8 | 99.4 | 6.0 | 4.7339 | X | X | X | X | X | 
13.28 The tuna fish data from Exercise 11.16 were analyzed as a completely randomized design with four treatments. However, we could also view the experimental design as a 2 x 2 factorial experiment with unequal replications. The data are shown below.
|  | Oil | Water | ||
| Light tuna | 2.56 | .62 | .99 | 1.12 | 
| 1.92 | .66 | 1.92 | .63 | |
| 1.30 | .62 | 1.23 | .67 | |
| 1.79 | .65 | .85 | .69 | |
| 1.23 | .60 | .65 | .60 | |
|  | .67 | .53 | .60 | |
|  |  | 1.41 | .66 | |
| White tuna | 1.27 |  | 1.49 | 1.29 | 
| 1.22 |  | 1.29 | 1.00 | |
| 1.19 |  | 1.27 | 1.27 | |
| 1.22 |  | 1.35 | 1.28 | |
The data can be analyzed using the model
                        
where
x1 = 0 if oil, 1 if water
x2 = 0 if light tuna, 1 if white tuna
b. The printout generated by MINITAB is shown below. What is the least-squares prediction equation?
MINTAB output for Exercise 13.28
Regression Analysis
The regression equation is y = 1.15 – 0.251 x1 +
0.078 x2 + 0.306 x1x2
| Predictor | Coef | StDev | T | P | 
| Constant | 1.1473 | 0.1370 | 8.38 | 0.000 | 
| x1 | -0.2508 | 0.1830 | -1.37 | 0.180 | 
| x2 | 0.0777 | 0.2652 | 0.29 | 0.771 | 
| x1x2 | 0.3058 | 0.3330 | 0.92 | 0.365 | 
S = 0.4543              R-Sq
= 11.9%        R-Sq(adj) = 3.9%
Analysis of Variance
| Source | DF | SS | MS | F | P | 
| Regression | 3 | 0.9223 | 0.3074 | 1.49 | 0.235 | 
| Residual
  Error | 33 | 6.8104 | 0.2064 |  |  | 
| Total | 36 | 7.7328 |  |  |  | 
c. Is there any interaction between type of tuna and type of packing liquid?
d. Which, if any, of the main effects (type of tuna and type of packing liquid) contribute significant information for the prediction of y?
e. How well does the model fit the data? Explain.