How accurate are crowdsourced morphometricians?
Previously: Building a web-based image markup system
One of the main goals of my Encyclopedia of Life project is to speed up the collection of phenotypic data through crowdsourcing. However, we cannot expect that the typical crowdsourced worker has the same domain-specific knowledge that an expert scientist has. But does this make a difference when digitizing the shape of fishes?
To look for a difference, I constructed an experiment where crowdsourced Amazon Mechanical Turk workers would digitize the same set of 5 images 5 times each. I then asked some expert fish morphologists to digitize the same images using the same instructions. This setup allowed me to examine how consistent marks were for each group of workers, and also compare the two to see if their marks differed on average. The results are below:
Can you spot the difference between the two images? The top image shows landmarks averaged across several MTurk workers, while the bottom image is from a fish morphologist following the same protocol. The length of each line indicates the amount of error in each x,y direction.
Many landmarks are qualitatively identically marked. However, there is a difference, especially in the fin landmarks. The expert consistently uses the most anterior and posterior fin rays and marks it accordingly; however, turkers will instead tend towards the point that more intuitively defines the shape of the fin.
Both approaches are correct in a sense, though they are looking at very different aspects of fish morphology. This discrepancy is in part due to a difference in how turkers interpret the protocol. I am currently working to further refine this protocol in order to reduce this difference and get results that are nearly indistinguishable from traditionally collected data sets.
Many thanks to the Mechanical Turk workers and Tina Marcroft for digitizing images, and Matt McGee, Adam Summers, and Brian Sidlauskas for helping to clarify the protocol.