• @Zachariah
    link
    126 months ago

    In a recent paper, computer scientists have described a new way to approximate the number of distinct entries in a long list, a method that requires remembering only a small number of entries. The algorithm will work for any list where the items come in one at a time — think words in a speech, goods on a conveyor belt or cars on the interstate.

    The CVM algorithm … is a significant step toward solving what’s called the distinct elements problem, which computer scientists have grappled with for more than 40 years. It asks for a way to efficiently monitor a stream of elements — the total number of which may exceed available memory — and then estimate the number of unique elements.

    “The new algorithm is astonishingly simple and easy to implement,” said Andrew McGregor of the University of Massachusetts, Amherst. “I wouldn’t be surprised if this became the default way the distinct elements problem is approached in practice.”