I’m reaching out to the community to gather your thoughts and suggestions on how we can enhance Lemmy’s search functionality, as discussed in Issue #846. Currently, the search options (new or top of a specific time) do not consistently deliver relevant or useful results, which creates difficulties for users trying to find posts based on specific keywords. While search engines like Google employ factors such as backlinks, freshness, keyword mentions, user experience, and topical authority, we need to strike a balance between improving search results and maintaining low complexity.

Please consider leaving a thumbs up on the GitHub issue.

  • @PriorProject
    link
    English
    11 year ago

    While agree that I’d like Lemmy search is not up to snuff, I think it will be a while before it meaningfully improves:

    • There are higher priorities that need work right now, and that work is not simple or quick. Lemmy desperately needs performance/scaling improvements like db query optimization and support for pg read replicas to weather the influx of reddit immigrants. It desperately needs improved mod tools so mods/admins can keep up with the torrent of bots and abuse. As useful as improved search would be, scaling and moderation are existential challenges that need attention first.
    • Proper multi-dimensional search weighting is complicated. The search boxes we’ve become accustomed to on the commercial internet are powered by incredibly complex backends with multiple different data-bas-ish components, multiple async analysis/weighting pipelines, and plus bits responding to queries. While these techniques can be scaled down to work on smaller deployments, they will definitely make Lemmy more complex to run… which is a very expensive tradeoff for an ecosystem that depends on amateur sysadmins to volunteer to run instances.
    • These search systems are also computationally expensive, much much more so than “simple” storage and fetching of posts/comments. Lemmy instances are already groaning under the weight of the reddit user influx, and I don’t see devs or admins signing up immediately to add resource hungry features to their setups.
    • It’s possible to improve search from outside Lemmy. Lemmy instances are not well indexed by external search engines yet, but searching for site:lemmy.* do return some results and as Lemmy instances begin to fill up with high quality content I think we’ll see the the “anchor instances” climb the rankings and crawl priority relatively quickly.

    All of which is to say… better search would be very useful but there are even more important features right now… and it won’t be easy when the time comes. A combination of making better use of type/community constraints and searching outside Lemmy is probably your best bet unless you’re a developer who has built multi-dimensionally weighted search tools before and can do some dev/testing to show how how much better an alternative could be.