Do you think you can estimate the cover of plants as well as the experts?
I’m sure you can. When we all put our minds to it, we can do it really well.
A few years ago, sixteen experienced botanists were asked to estimate the cover of spinifex (Triodia) tussocks in a patch of Mallee. Each botanist was asked to estimate the cover to the closest 10% – e.g. 20%, 30%, 70% and so on – so they didn’t have to get it exactly right. Each did it privately and didn’t know what the others were going to say.
How variable do you think their estimates were? For example, if the real cover of spinifex was 35%, what do you think the lowest and highest estimate was? Would most estimates have been 30% or 40%, or would the range have been much wider, or perhaps narrower?
Don’t read on. Stop and guess. If the real cover was 35%, what do you think the lowest and highest estimate was?
Now look at the chart below.
The sixteen estimates ranged from 20% to 60%. Four botanists thought spinifex covered 20% of the area while two thought it covered 60%. The highest estimate was three times the lowest. That’s an enormous range.
I don’t want to imply that I’d do any better than the sixteen. From what the literature says about experts (and old white guys in particular), I’d be as bad as the next person and probably worse. Regardless of my personal abilities, the take home message from the chart is, we don’t seem to be very good at our job do we?
You might be wondering, ‘what was the right answer?’ The author of the paper that contained the graph didn’t say. For the moment, the right answer is somewhat irrelevant (although I’ll show you how to work it out below). The point is, visual estimates of plant cover – by experts and novices alike – are notoriously inaccurate.
When does it matter?
Field botanists are always estimating the cover of plants in quadrats. We estimate cover in thousands of surveys, experiments and long-term monitoring programs. Fortunately, it doesn’t always matter if our cover estimates are insanely bad. We’re often more interested in the presence and absence of different species than in the cover of each species.
For example, imagine we sampled ten quadrats in patches of spinifex mallee and ten quadrats in nearby wetlands for a vegetation survey. If we classified the data to describe the floristic vegetation types (or communities or associations), we’d get much the same result if we used cover values or presence / absence data, as the two communities are so distinct. Either way, we’d conclude that two distinct communities were present (Spinifex mallee and Wetlands), each containing a different group of species.
Errors in estimating cover create a much bigger problem when we monitor how vegetation changes over time. Let’s look at the Spinifex data again. Imagine the area was first surveyed by one of the botanists who thought that spinifex covered 60% of the area. Many years later it was surveyed again, but this time by one of the botanists who thought the cover was 20%. If the vegetation hadn’t changed at all, their results would suggest that spinifex cover plummeted from 60% to 20% during the period. Even if cover actually increased by 20%, they might still suggest it was in free-fall. Spinifex tussocks provide important habitat for many animals, so if two-thirds of the cover disappears it’s a big issue, provided it’s real.
The simple spinifex study (and many other studies conducted over decades) suggests that – if a single observer visually estimates plant cover at each point in time, and different observers assess cover at different points in time – then it’s pretty much a complete waste of time to calculate changes in cover over time, as apparent changes are as likely to be due to ‘observer errors’ as to anything real.
Why are our estimates so bad?
We humans are terrible at estimating heaps of stuff, not just the cover of plants: how many lollies in the jar, how likely we are to win the lotto, how many beans make five, and how long is a piece of string, just for starters.
Freakonomics recently made a fabulous podcast on how bad experts (and novices) are at forecasting and predicting. The entertaining show includes: music songlists, crop forecasts, the stock market; witches, turkeys, and other experts. If you have an hour to spare, it’s a really good show (Click the play button to listen):
It turns out that, in some fields, experts are much better than novices at estimating and predicting things, as they continue to refine their skills with practice. In these fields, experts really do refine their expertise. In other fields, experts think they are much better than novices, but they’re actually really bad at it; some are even worse than novices. The Freakonomics podcast gives lots of great examples.
An important factor that determines whether or not practice makes perfect (for all of us, not just experts) is the Triple F function; whether we get Fast & Frequent Feedback. Our estimates improve when we receive lots of accurate feedback to re-calibrate ourselves. Picture the conversation. Jim proclaims, I reckon the cover is 20%. Julie replies, Nup, cold. Jim: 30! Julie: Getting warm. Jim: 50? Julie: You’re frigid. Jim: 40? Julie: Yep, perfect. Jim: Wow that was terrible. Now it’s your turn…
Of course, to have such a conversation, someone has to measure the cover in the first place. It’s easy to measure cover more accurately using point quadrats, line transects and other methods, but accurate measurements are slower than eyeballing, which is why they aren’t used as often as they should.
Cruisin’ without a speedo
We all need feedback to improve our estimates. That’s why cars have speedometers. To make important estimates without frequent feedback is like driving a stranger’s fast car at night without a speedo; and then insisting, ‘Officer, I’m very certain that I was doing only 97 km/hr.’
I wonder how often most field ecologists calibrate their estimates against accurate measurements? Not very often I suspect. We rely on visual estimates because it’s quick, cheap, no one checks our numbers, and because we were trained to (by people like me). Perhaps that’s why the sixteen estimates of spinifex cover ranged from 20% to 60%.
A collective solution
Humans are weird. We’re awful at estimating things on our own, but collective estimates made by groups of people can be extremely accurate. It’s one of the few times in life when the aphorism – Sh!@ In, Sh!@ Out – doesn’t apply. When you throw lots of Sh!@ into the mix, something awesome comes out the other end. It’s called the Truth, or something closer to the Truth than most individuals can reliably generate.
There are some freaky examples of how accurate the estimates made by groups can be, including this one from 1907…
the group average of multiple judgements tends to be very close to the truth, because random and systematic errors of individuals tend to cancel each other out. This statistical sampling phenomenon is remarkably robust. On examining 800 estimates of the weight of a fat ox at a country fair in England, Francis Galton (1907) marvelled that the median (and mean) was within 1% of the true value, outperforming most participants and even the best cattle experts in the crowd, a phenomenon known as the Wisdom of Crowds (Wintle et al. 2013, p. 55).
Of course, crowds aren’t always wise (as social media demonstrated after the Boston Marathon). So our question is: does the Wisdom of the Crowds apply to estimates of plant cover? Do groups of people generate more accurate estimates of plant cover than single observers? This simple question has practical repercussions. If you want to set up a long-term monitoring project, should you ask a group of people or a single observer to assess all of the plots?
A more confronting question is, do groups of relatively inexperienced observers generate more accurate estimates of plant cover than a single expert? Should you rent-a-crowd or employ a single ecologist to assess plant cover? (I’m not trying to diss my peers here; plant cover is intrinsically hard to estimate, and experienced ecologists have many other important skills, like identifying species correctly).
Fortunately, we do not require 800 people at a country fair to see an improvement in judgement. The average judgement from two people is better than one (Soll & Larrick 2009), and even the average of two judgements from a single person tends to be closer to the truth over the long run than adopting a single estimate (Herzog & Hertwig 2009).” [Wintle et al. 2013, p. 55].
This answers the first question (groups beat individuals, as many estimates are better than one) but what about the second? Can an inexperienced group generate more accurate estimates of plant cover than a single expert? The evidence for this is more equivocal, as Wintle and colleagues compared the performance of groups against the best performing member within each group, rather than comparing groups against independently identified ‘experts’. Nevertheless, within this context, they again found that crowds rule:
[In an experiment estimating percentage cover] our results show that group averages perform better than the best performing member of the group over the long run, and averages were remarkably close to true values (Wintle et al. 2013, p. 61).
Over the long run, groups performed better than any individual, as no one was always ‘the best’; some people estimated cover well at some plots, while others performed better at other plots.
The real spinifex story
Now it’s your turn, dear reader. No more passive blog consumption, it’s time to think.
Quiz Question #1: What was the real cover of spinifex in the mallee?
I know you weren’t there, neither was I. But that doesn’t matter. You can calculate a more reliable estimate of the real cover of spinifex than each botanist achieved in the field. Go for it.
How do you do it? Just call on the Wisdom of the Crowd; the crowd of expert botanists. Individually, their 16 estimates varied wildly. But the average of all of the estimates must converge on the true cover, as Galton found for the fair-ground ox.
To estimate the true cover of spinifex, simply calculate the average of all of the individual estimates: (20%*4) + (25%*1) + (30%*2) + (35%*2) + (40%*4) + (50%*1) + (60%*2) divided by 16 botanists = 35% cover.
No one measured the real cover value on the day. In most monitoring activities, no one ever does, it’s just eyeballed. Each eyeballed estimate is dodgy, but the average from all the dodgy eyeballs is ‘remarkably close to true values’, as Francis Galton, Bonnie Wintle and many others have demonstrated.
[Nerd alert: In large groups the median provides a more reliable estimate of the real value than does the mean, but the distinction is trifling here. By coincidence, the mean and median are both 35% in this example].
At this point I’m sure that everybody – bar the control freaks – is busting to ask: how can we make sure that group estimates aren’t high-jacked by strong-minded individuals? After all, every group has a control freak, and the boss can’t be wrong.
The solution is simple. Kill consensus. In the Spinifex mallee, Galton’s fairground and Wintle’s experiments, every participant made their decision privately, without discussion and without disclosing their individual view. The anonymous scores were then averaged. The control freaks and bosses had the same influence as everybody else. The wisdom of the crowd emerged from a blind ballot, not mediated consensus.
For decades now, field ecologists – trained by educators like me – have been cruising in the dark without a speedo, every time we eyeball the cover of plants. We can do lots of things to lift our game, and we don’t have to dump visual estimates completely. We can:
- Hand in our licence. We can fix the problem by refusing to eyeball plant cover. Instead, we can use presence/absence data and compare the number of quadrats that species occur in. (This works best for common species and when lots of quadrats are sampled).
- Take a speed check. We can take regular check-ups, to compare our visual estimates against more accurate approaches such as point quadrats. Continual re-calibration can refine our visual estimates and make sure they don’t drift off the scale.
- Take the bus. We can re-calibrate our estimates by working in groups as often as possible, and learning from the average values that the groups provide. Bonnie Wintle’s paper provides more information on how to improve estimates based on feedback from groups.
Either way, our first task is to acknowledge that we have a problem; we need a speedo. Our second task is to accept that we can improve. Many dodgy eyeballs see better than one. So no more covering for lone individualists.
Many thanks to Bonnie Wintle and John Morgan for providing feedback that improved the accuracy of earlier drafts, and to Dale Nimmo for providing Sarah Avitabile’s photo of spinifex in the Mallee.