Do you think you can estimate the cover of plants as well as the experts? I’m sure you can. When we all put our minds to it, we can do it really well.
A few years ago, sixteen experienced botanists were asked to estimate the cover of spinifex (Triodia) tussocks in a patch of Mallee. Each botanist was asked to estimate the cover to the closest 10% – e.g. 20%, 30%, 70% and so on – so they didn’t have to get it exactly right. Each did it privately and didn’t know what the others were going to say.
How variable do you think their estimates were? For example, if the real cover of spinifex was 35%, what do you think the lowest and highest estimate was? Would most estimates have been 30% or 40%, or would the range have been much wider, or perhaps narrower?
Don’t read on. Stop and guess. If the real cover was 35%, what do you think the lowest and highest estimate was? Now look at the chart below.
The sixteen estimates ranged from 20% to 60%. Four botanists thought spinifex covered 20% of the area while two thought it covered 60%. The highest estimate was three times the lowest. That’s an enormous range.
I don’t want to imply that I’d do any better than the sixteen. From what the literature says about experts (and old white guys in particular), I’d be as bad as the next person and probably worse. Regardless of my personal abilities, the take home message from the chart is: we don’t seem to be very good at our job do we?
You might be wondering, ‘what was the right answer?’ The author of the paper that contained the graph didn’t say. For the moment, the right answer is somewhat irrelevant (although I’ll show you how to work it out below). The point is, visual estimates of plant cover – by experts and novices alike – are notoriously inaccurate.
When does it matter?
Field botanists are always estimating the cover of plants in quadrats. We estimate cover in thousands of surveys, experiments and long-term monitoring programs. Fortunately, it doesn’t always matter if our cover estimates are insanely bad. We’re often more interested in the presence and absence of different species than in the cover of each species.
For example, imagine we sampled ten quadrats in patches of spinifex mallee and ten quadrats in nearby wetlands for a vegetation survey. If we classified the data to describe the floristic vegetation types (or communities or associations), we’d get much the same result if we used cover values or presence / absence data, as the two communities are so distinct. Either way, we’d conclude that two distinct communities were present (Spinifex mallee and Wetlands), each containing a different group of species.
Errors in estimating cover create a much bigger problem when we monitor how vegetation changes over time. Let’s look at the Spinifex data again. Imagine the area was first surveyed by one of the botanists who thought that spinifex covered 60% of the area. Many years later it was surveyed again, but this time by one of the botanists who thought the cover was 20%. If the vegetation hadn’t changed at all, their results would suggest that spinifex cover plummeted from 60% to 20% during the period. Even if cover actually increased by 20%, they might still suggest it was in free-fall. Spinifex tussocks provide important habitat for many animals, so if two-thirds of the cover disappears it’s a big issue, provided it’s real.
The simple spinifex study (and many other studies conducted over decades) suggests that – if a single observer visually estimates plant cover at each point in time, and different observers assess cover at different points in time – then it’s pretty much a complete waste of time to calculate changes in cover over time, as apparent changes are as likely to be due to ‘observer errors’ as to anything real.
Why are our estimates so bad?
We humans are terrible at estimating heaps of stuff, not just the cover of plants: how many lollies in the jar, how likely we are to win the lotto, how many beans make five, and how long is a piece of string, just for starters.
Freakonomics recently made a fabulous podcast on how bad experts (and novices) are at forecasting and predicting. The entertaining show includes: music songlists, crop forecasts, the stock market; witches, turkeys, and other experts. If you have an hour to spare, it’s a really good show (Click the play button to listen):
It turns out that, in some fields, experts are much better than novices at estimating and predicting things, as they continue to refine their skills with practice. In these fields, experts really do refine their expertise. In other fields, experts think they are much better than novices, but they’re actually really bad at it; some are even worse than novices. The Freakonomics podcast gives lots of great examples.
An important factor that determines whether or not practice makes perfect (for all of us, not just experts) is the Triple F function; whether we get Fast & Frequent Feedback. Our estimates improve when we receive lots of accurate feedback to re-calibrate ourselves. Picture the conversation. Jim proclaims, I reckon the cover is 20%. Julie replies, Nup, cold. Jim: 30! Julie: Getting warm. Jim: 50? Julie: You’re frigid. Jim: 40? Julie: Yep, perfect. Jim: Wow that was terrible. Now it’s your turn…
Of course, to have such a conversation, someone has to measure the cover in the first place. It’s easy to measure cover more accurately using point quadrats, line transects and other methods, but accurate measurements are slower than eyeballing, which is why they aren’t used as often as they should.
Cruisin’ without a speedo
We all need feedback to improve our estimates. That’s why cars have speedometers. To make important estimates without frequent feedback is like driving a stranger’s fast car at night without a speedo; and then insisting, ‘Officer, I’m very certain that I was doing only 97 km/hr.’
I wonder how often most field ecologists calibrate their estimates against accurate measurements? Not very often I suspect. We rely on visual estimates because it’s quick, cheap, no one checks our numbers, and because we were trained to (by people like me). Perhaps that’s why the sixteen estimates of spinifex cover ranged from 20% to 60%.
A collective solution
Humans are weird. We’re awful at estimating things on our own, but collective estimates made by groups of people can be extremely accurate. It’s one of the few times in life when the aphorism – Sh!@ In, Sh!@ Out – doesn’t apply. When you throw lots of Sh!@ into the mix, something awesome comes out the other end. It’s called the Truth, or something closer to the Truth than most individuals can reliably generate.
There are some freaky examples of how accurate the estimates made by groups can be, including this one from 1907:
… the group average of multiple judgements tends to be very close to the truth, because random and systematic errors of individuals tend to cancel each other out. This statistical sampling phenomenon is remarkably robust. On examining 800 estimates of the weight of a fat ox at a country fair in England, Francis Galton (1907) marvelled that the median (and mean) was within 1% of the true value, outperforming most participants and even the best cattle experts in the crowd, a phenomenon known as the Wisdom of Crowds.(Wintle et al. 2013, p. 55)
Of course, crowds aren’t always wise (as social media demonstrated after the Boston Marathon). So our question is: does the Wisdom of the Crowds apply to estimates of plant cover? Do groups of people generate more accurate estimates of plant cover than single observers? This simple question has practical repercussions. If you want to set up a long-term monitoring project, should you ask a group of people or a single observer to assess all of the plots?
A more confronting question is, do groups of relatively inexperienced observers generate more accurate estimates of plant cover than a single expert? Should you rent-a-crowd or employ a single ecologist to assess plant cover? (I’m not trying to diss my peers here; plant cover is intrinsically hard to estimate, and experienced ecologists have many other important skills, like identifying species correctly).
In a new paper, Bonnie Wintle and colleagues from the University of Melbourne did two great experiments to answer these questions (Wintle et al. 2013). Their results reinforce earlier findings:
Fortunately, we do not require 800 people at a country fair to see an improvement in judgement. The average judgement from two people is better than one (Soll & Larrick 2009), and even the average of two judgements from a single person tends to be closer to the truth over the long run than adopting a single estimate (Herzog & Hertwig 2009).[Wintle et al. 2013, p. 55]
This answers the first question (groups beat individuals, as many estimates are better than one) but what about the second? Can an inexperienced group generate more accurate estimates of plant cover than a single expert? The evidence for this is more equivocal, as Wintle and colleagues compared the performance of groups against the best performing member within each group, rather than comparing groups against independently identified ‘experts’. Nevertheless, within this context, they again found that crowds rule:
[In an experiment estimating percentage cover] our results show that group averages perform better than the best performing member of the group over the long run, and averages were remarkably close to true values.(Wintle et al. 2013, p. 61)
Over the long run, groups performed better than any individual, as no one was always ‘the best’; some people estimated cover well at some plots, while others performed better at other plots.
The real spinifex story
Now it’s your turn, dear reader. No more passive blog consumption, it’s time to think. Quiz Question #1: What was the real cover of spinifex in the mallee? I know you weren’t there, neither was I. But that doesn’t matter. You can calculate a more reliable estimate of the real cover of spinifex than each botanist achieved in the field. Go for it.
How do you do it? Just call on the Wisdom of the Crowd; the crowd of expert botanists. Individually, their 16 estimates varied wildly. But the average of all of the estimates must converge on the true cover, as Galton found for the fair-ground ox.
To estimate the true cover of spinifex, simply calculate the average of all of the individual estimates: (20%*4) + (25%*1) + (30%*2) + (35%*2) + (40%*4) + (50%*1) + (60%*2) divided by 16 botanists = 35% cover.
No one measured the real cover value on the day. In most monitoring activities, no one ever does, it’s just eyeballed. Each eyeballed estimate is dodgy, but the average from all the dodgy eyeballs is ‘remarkably close to true values’, as Francis Galton, Bonnie Wintle and many others have demonstrated.
[Nerd alert: In large groups the median provides a more reliable estimate of the real value than does the mean, but the distinction is trifling here. By coincidence, the mean and median are both 35% in this example].
At this point I’m sure that everybody – bar the control freaks – is busting to ask: how can we make sure that group estimates aren’t high-jacked by strong-minded individuals? After all, every group has a control freak, and the boss can’t be wrong.
The solution is simple. Kill consensus. In the Spinifex mallee, Galton’s fairground and Wintle’s experiments, every participant made their decision privately, without discussion and without disclosing their individual view. The anonymous scores were then averaged. The control freaks and bosses had the same influence as everybody else. The wisdom of the crowd emerged from a blind ballot, not mediated consensus.
For decades now, field ecologists – trained by educators like me – have been cruising in the dark without a speedo, every time we eyeball the cover of plants. We can do lots of things to lift our game, and we don’t have to dump visual estimates completely. We can:
- Hand in our licence. We can fix the problem by refusing to eyeball plant cover. Instead, we can use presence/absence data and compare the number of quadrats that species occur in. (This works best for common species and when lots of quadrats are sampled).
- Take a speed check. We can take regular check-ups, to compare our visual estimates against more accurate approaches such as point quadrats. Continual re-calibration can refine our visual estimates and make sure they don’t drift off the scale.
- Take the bus. We can re-calibrate our estimates by working in groups as often as possible, and learning from the average values that the groups provide. Bonnie Wintle’s paper provides more information on how to improve estimates based on feedback from groups.
Either way, our first task is to acknowledge that we have a problem; we need a speedo. Our second task is to accept that we can improve. Many dodgy eyeballs see better than one. So no more covering for lone individualists.
Many thanks to Bonnie Wintle and John Morgan for providing feedback that improved the accuracy of earlier drafts, and to Dale Nimmo for providing Sarah Avitabile’s photo of spinifex in the Mallee.
Thanks again Ian for another great read! This has really got me thinking on a slightly different trajectory – still related (I hope). Imagine a forum that could facilitate the “many dodgy eyeballs” for Environmental Assessments (EA)!
There must be hundreds of residential, commercial and infrastructure developments going on all around Australia on a monthly basis, all requiring an EA in some form. This EA would require one or two people to prepare and another to approve. Each EA presents a snapshot of the proposed site and identify the possible impacts the development may have. In such cases, there are not many eyeballs evaluating the proposed impacts to the site.
Perhaps if there was some facility that could make use of the broader environmental assessment community (or just a big dumb crowd) and increase the number of analysts, a closer-to-the-truth evaluation may be made. I’m aware that some big developments go for public exhibition, but not all.
I’m sure my comments are just like most ideas, it sounds great until the details are worked out. However, it is a great point you raise about the “bigger problem when we monitor how vegetation changes over time”. If an EA is undertaken by one set of eyes at one point in time, then the impact occurs, then a different observers assess that impact – it may be a waste of time to evaluate the changes. As you suggest, using presence/absence data may prove better if anyone cares to identify the changes post development.
If there is such a forum that could facilitate the “many dodgy eyeballs” or the “big dumb crowd” with regard to EA, please let me know! I believe, that if we don’t get it right, we may be approving the death-by-a-thousand-cuts with dodgy data to state otherwise! Regards, Rus
Hello Rus, thanks for your comment. I suspect it might be prudent to demonstrate the feasibility and reliability of crowd sourced estimates in areas that weren’t subject to legal challenges first. I can imagine that developer’s lawyers would have a field day in court otherwise 🙂 More broadly, I wonder how often it would be practical to get groups to come together regularly to monitor vegetation? We’d need coordinators with great organisational and motivational skills to make it work well. You highlight a great untapped potential. I guess we need to work out how to make it work well practically and socially. I’m sure there are lots of readers with extension skills who could provide great insights on this. Thanks again Ian
Hi Ian / Russ
Another great blog, with much food for thought. Here’s a related take on the value of citizen science http://decision-point.com.au/images/DPoint_files/DPoint_73/dp73%20p12%20butt%20citizens.pdf suggesting that lots of ‘average quality data’ are as good as a little high quality data. Looks like the Szabo ref is well worth a read also. Cheers
Hi Tim, thanks very much for the links to two great short articles on the values of citizen science projects. Best wishes Ian
I guess ecology faces all sorts of human bias related obstacles that make it a little like wading through quicksand when compared to certain other sciences. The most objective scientific tools, like double blind trials, probably aren’t that applicable in ecology.
Also, I wonder how many ecologists with access to aerial photos of vegetation would correctly identify which of these two dot plots represents a pattern http://www.empiricalzeal.com/2012/12/21/what-does-randomness-look-like/pinker-glow-worms-and-stars-plot/
Not really intuitive, is it. Cheers 🙂
Hello Mel, thanks for writing in and thanks for the great link. (Mel’s link goes to two pictures which are from the following post, which is well worth a read after you’ve followed the link to the pictures): http://www.empiricalzeal.com/2012/12/21/what-does-randomness-look-like/
That’s a really great blog post. I won’t comment on the patterns, as I don’t want to spoil the fun for everyone else. Thanks again and best wishes Ian
Hi Ian, great post! One that many ecology teachers would empathize with.
Student: Well, what’s the answer? Teacher: I don’t know!
Another thing to mention is that the requirement for accuracy and precision depends on the job. For the purpose of defining vegetation types and mapping, accuracy and imprecision may not matter too much. But for monitoring change over time, more sensitive measurement is needed. cheers, Peter
Thanks Peter, I’m glad you enjoyed it. Thanks also for contributing to Bonnie Wintle’s paper and giving me such a great paper to write about too. Best wishes Ian
When I did my “primary survey” of the veg. of the Brisbane Rangeshere in Victoria back in 1967, I only did presence or absence in the quadrats, precisely because I had no trust in my ability to estimate plant cover, particularly in the dense grassy understorey of the woodlands. Later I regretted that decision, as at least my estimates would have helped to sort out dominant species from the others, even if rather crudely.
Hi Neville, it’s a dilemma isn’t it. We want to collect more data, even if we don’t trust it. If you had collected cover data, I wonder what people would do with it later on. They would either analyse it, assuming that it was accurate, and that a later data was also accurate, and both were similarly ‘calibrated’, or they’d analyse the presence / absence data, and thereby eliminate the cover problem. Given the big problems in estimating cover, the latter approach would be less problematic. I was recently involved in a project that compared data from the early 1970s to a new dataset and we could see big changes based on the presence / absence data alone, which we used because we didn’t have the original cover data. I think that was probably a more appropriate comparison anyway, as we avoided problems with errors in estimating cover. Best wishes Ian
Hi Ian, I did do understorey cover using vertical pins later on in the Brisbane Range project…a typical grassy woodland site versus a typical shrubby woodland site. I must say it was hellish picking which grass spp. had touched a pin, more hellish in the shrubby area when a pin “hit” a Xanthorrhoea canopy..the slightest wind and the pin contacts would change. As for the Xanth. “skirts”, counting ‘dead cover’ in them was impossible. All the same it was an interesting exercise
Hi Ian, Thanks. Someone had to say it. I know a PhD student at ANU whose response to the challenge you identify has been to photograph each (0.25 sq m) quadrat and use the free program Samplepoint to determine cover. Another issue is what counts as cover. People differ in whether (vegetation/plant) cover includes lower plants and fungi, both of which have a disproportionate effect on soil loss.
Hello Don, thanks for your message. I’ve had a number of people tell me they’ve either changed their approach, or have at least thought a lot more critically, about how they assess cover, since I wrote this story. It’s great to see the blog having a positive influence on practice. The real credit of course goes to Bonnie Wintle and colleagues for doing their great study. I just passed it on. I hadn’t heard of the Samplepoint program, so thanks very much for the tip, it looks really useful. I’ll pass it on to my students too. If anyone else is interested, it’s available at: http://www.samplepoint.org/ . Thanks again and best wishes Ian
Thanks for the article Ian – have been going through this exact issue in the classroom and field with my ecology students. In reflection I think the answer is in embracing technology – photo point monitoring over time, the use of GPS cameras and other tools are not only accurate but are able to be referenced in the future. (and are not subjective!)
Thanks Angie, yes there are a lot of new ways we could improve estimates. The main issue will probably be how fast or slow each method is. Any new and accurate methods that are very fast in the field (or lab) will prove very valuable. Thanks for your comment, best wishes Ian
How much experience had the botanists had? I just find that in their first few years, botanists over-estimate cover, then as they get experience and develop technique their estimates come down. Looking at those estimates it just makes me think that maybe 20% was closer to the truth.