This paper describes advanced accuracy metrics for entity extraction beyond precision, recall, and F-measure.
Entity extraction software is typically evaluated based on the widely-accepted accuracy metrics of precision, recall, and F-measure. These metrics are certainly useful but limited in their scope. Additional factors including the types of errors, the cost of different error types, the facility of making changes to the system, and the efficiency of the system compared to human tagging should also be incorporated when evaluating entity extraction software. This paper illustrates the need for these additional factors and demonstrates how they can be implemented in evaluation.
Natural language processing systems, including entity extraction tools, are typically evaluated
using the metrics of precision, recall, and F-measure. Precision measures the proportion of
extracted entities that are, in fact, entities; recall measures the proportion of actual entities in the data that are successfully extracted by the software; F-measure represents a weighted mean of precision and recall.
While it is certainly useful to identify accuracy metrics that can be uniformly applied across
systems, as Powers (2011) notes, there is bias inherent in these metrics.
Continue reading - download the PDF: