1. A student wishes to draw a histogram for the following data
$x$Frequency
$0 < x \leqslant 5$$6$
$5 < x \leqslant 15$$17$
$15 < x \leqslant 25$$21$
$25 < x \leqslant 50$$45$
  1. Explain the advantage of using a histogram rather than a bar chart.
  2. The student wishes to use a scale of 1 cm = 0.2 units on the frequency density axis. Calculate the heights of each block in the histogram.
2. The weights of some apples, in grams, are given in the following stem and leaf diagram. Find the mean weight of apples.
$\mathbf{7}$$7\ 8$
$\mathbf{8}$$2\ 7\ 7$
$\mathbf{9}$$0\ 1\ 4\ 6$
$\mathbf{10}$$5\ 5$
3. Shown below is a box plot. An outlier is defined as any data point which lies more than 1.5 times the interquartile range above the upper quartile, or below the lower quartile. Find the bounds where data points can be considered outliers.
4. A student has access to the following diagrams for a set of data: histogram, frequency polygon, box plot, cumulative frequency plot, and stem and leaf diagram. Suggest which diagram would best allow the student to:
  1. find the median and quartiles;
  2. find the 90th percentile of the values.
5. The following data shows the number of sweets eaten by students in one evening, $x$, against the number of minutes they spent doing homework, $y$.
$x$$3$$5$$8$$11$$15$
$y$$15$$12$$10$$7$$2$
  1. Draw a scatter graph for the data.
  2. A new student was found to eat 10 sweets. Estimate the amount of time they spent doing homework.
  3. Another student ate 30 sweets. Suggest why it would be unreasonable to draw conclusions about the amount of time they spent doing homework based on the data collected so far.
6. The equations of two lines of best fit are given as follows: $$a = 0.2b + 3$$ $$c = 0.4 - 0.8d$$
  1. Which of the two pairs of data points are positively correlated?
  2. A student claims that $c$ and $d$ have a greater correlation coefficient between them because of these lines of best fit. Is the student correct?
7. The following diagram shows two sets of data, one shown in crosses and one shown in circles. Suggest the values of the correlation coefficients you would expect for the data shown in:
  1. crosses;
  2. circles.
8. The histogram below summarizes the number of hours people said they spend watching TV in a typical weekend. A total of 360 people said they spend between 5 and 8 hours watching TV.
  1. Estimate the number of people who spend between 1.5 and 3.5 hours watching TV.
  2. What assumption did you have to make in order to make this estimate?
9.
$n$Frequency
$0 < n \leqslant 10$$60$
$10 < n \leqslant 20$$150$
$20 < n \leqslant 30$$250$
$30 < n \leqslant 40$$330$
$40 < n \leqslant 50$$270$
$50 < n \leqslant 60$$110$
$60 < n \leqslant 70$$30$
  1. Use linear interpolation to calculate the interquartile range of the data.
  2. A student calculates the interquartile range of the data by drawing a cumulative frequency graph and joining the points with a smooth curve. By drawing your own cumulative frequency graph, or otherwise, explain how this affects the result.
10. The number of apples ($x$) 8 people eat in a week is plotted against the number of bananas ($y$) those same people eat in a week in the following scatter diagram. The 8 people in the diagram composed of some children and some adults. Suggest, with a reason, how many children and adults there are.
11. The line of best fit for both the crosses and circles is $$y = 0.9x + 1.1$$ and the correlation coefficient is $0.75$. A student decides the circles are outliers and removes them from the data.
  1. What happens to the gradient of the line of best fit?
  2. What happens to the correlation coefficient?
12. The amount of water, $y$, some plants take in over time, $x$, is given below:
$x$$2$$3$$5$$9$$15$$25$
$y$$9$$11$$15$$19$$21$$22$
  1. Draw a scatter graph for this data.
  2. Use the scatter graph to estimate the amount of water a plant takes in after 20 minutes.
  3. A student decides to calculate the amount of water a plant takes in after 20 minutes by using a line of best fit in the form $y = mx+c$. Explain why this estimate is likely to be significantly different from your estimate.