cross-posted from: https://lemmy.ca/post/37011397

[email protected]

The popular open-source VLC video player was demonstrated on the floor of CES 2025 with automatic AI subtitling and translation, generated locally and offline in real time. Parent organization VideoLAN shared a video on Tuesday in which president Jean-Baptiste Kempf shows off the new feature, which uses open-source AI models to generate subtitles for videos in several languages.

  • @[email protected]
    link
    fedilink
    English
    136 minutes ago

    No such comment yet? I’ll be the first then.

    Oh no, AI bad, next thing they add is cryptocurrency mining!

  • @[email protected]
    link
    fedilink
    English
    242 minutes ago

    I will be impressed only when it can get through a single episode of Still Game without making a dozen mistakes

  • @renzev
    link
    English
    889 hours ago

    This sounds like a great thing for deaf people and just in general, but I don’t think AI will ever replace anime fansub makers who have no problem throwing a wall of text on screen for a split second just to explain an obscure untranslatable pun.

    • @cley_faye
      link
      English
      12 hours ago

      It’s unlikely to even replace good subtitles, fan or not. It’s just a nice thing to have for a lot of content though.

    • @FordBeeblebrox
      link
      English
      94 hours ago

      They are like the * in any Terry Pratchett (GNU) novel, sometimes a funny joke can have a little more spice added to make it even funnier

    • @FMT99
      link
      English
      147 hours ago

      Translator’s note: keikaku means plan

  • @m8052
    link
    English
    11813 hours ago

    What’s important is that this is running on your machine locally, offline, without any cloud services. It runs directly inside the executable

    YES, thank you JB

  • TheRealKuni
    link
    English
    2110 hours ago

    And yet they turned down having thumbnails for seeking because it would be too resource intensive. 😐

    • @cley_faye
      link
      English
      62 hours ago

      Video decoding is resource intensive. We’re used to it, we have hardware acceleration for some of it, but spewing something around 52 million pixels every second from a highly compressed data source is not cheap. I’m not sure how both compare, but small LLM models are not that costly to run if you don’t factor their creation in.

    • @DreamlandLividity
      link
      English
      92 hours ago

      I mean, it would. For example Jellyfin implements it, but it does so by extracting the pictures ahead of time and saving them. It takes days to do this for my library.

    • @serenissi
      link
      English
      25 hours ago

      It is useful for internet streams though, not really for local or lan video.

  • Phoenixz
    link
    fedilink
    English
    37
    edit-2
    11 hours ago

    As vlc is open source, can we expect this technology to also be available for, say, jellyfin, so that I can for once and for all have subtitles.done right?

    Edit: I think it’s great that vlc has this, but this sounds like something many other apps could benefit from

    • JustEnoughDucks
      link
      fedilink
      English
      156 minutes ago

      In the *arr suite, bazarr has a plugin called Subgen which you can add and you can set it to generate subtitles on your entire library if you want, or only missing subtitles. The sync is spot on compared to 90% of what Opensubtitles delivers. I sometimes re-gen them with this plugin just because opensubtitles is so constantly out of sync (e.g. highly rated subtitles 4 lines will be at breakneck pace and the next 10 will be super slow and then everything is 3 seconds off)

      It isn’t in-player but it works. The downside is it is a larger model and takes ~20 minutes to generate a movie length of subtitles.

      • @Eagle0110
        link
        English
        13 hours ago

        Has there been any estimated minimal system requirements for this yet, since it runs locally?

        • @[email protected]
          link
          fedilink
          English
          2
          edit-2
          2 hours ago

          It’s actually using whisper.cpp

          From the README:

          Memory usage Model Disk Mem tiny 75 MiB ~273 MB base 142 MiB ~388 MB small 466 MiB ~852 MB medium 1.5 GiB ~2.1 GB large 2.9 GiB ~3.9 GiB

          Those are the model sizes

    • @GreenKnight23
      link
      English
      1611 hours ago

      crunchyroll is currently using AI subtitles. it’s obvious because when someone says “mothra. Funky…” it captions “mother fucker”

      • @Alexstarfire
        link
        English
        1411 hours ago

        That explains why their subtitles have seemed worse to me lately. Every now and then I see something obviously wrong and wonder how it got by anyone who looked at it. Now I know why. No one looked at it.

        • @GreenKnight23
          link
          English
          1410 hours ago

          my wife and I love laughing at the dumbass mistakes it makes.

          some characters name is Asura Halls?

          instead of “That’s Asura Halls!” you get “That asshole!”

          but if I was actually hearing impaired I’d be really pissed that I’m being treated as second class even though Sony still took my money like everyone else.

      • @dance_ninja
        link
        English
        39 hours ago

        Malevolent Kitchen Intensifies

      • @NOT_RICK
        link
        English
        310 hours ago

        ( ͡° ͜ʖ ͡°)

    • @asbestos
      link
      English
      211 hours ago

      Ooooh I like this

  • @asbestos
    link
    English
    22215 hours ago

    Finally, some good fucking AI

    • @shyguyblue
      link
      English
      13215 hours ago

      I was just thinking, this is exactly what AI should be used for. Pattern recognition, full stop.

      • snooggums
        link
        English
        5514 hours ago

        Yup, and if it isn’t perfect that is ok as long as it is close enough.

        Like getting name spellings wrong or mixing homophones is fine because it isn’t trying to be factually accurate.

        • @[email protected]
          link
          fedilink
          English
          99 hours ago

          I’d like to see this fix the most annoying part about subtitles, timing. find transcript/any subs on the Internet and have the AI align it with the audio properly.

        • TJA!
          link
          fedilink
          English
          2314 hours ago

          Problem ist that now people will say that they don’t get to create accurate subtitles because VLC is doing the job for them.

          Accessibility might suffer from that, because all subtitles are now just “good enough”

          • snooggums
            link
            English
            119 hours ago

            Regular old live broadcast closed captioning is pretty much ‘good enough’ and that is the standard I’m comparing to.

            Actual subtitles created ahead of time should be perfect because they have the time to double check.

          • @[email protected]
            link
            fedilink
            English
            613 hours ago

            Honestly though? If your audio is even half decent you’ll get like 95% accuracy. Considering a lot of media just wouldn’t have anything, that is a pretty fair trade off to me

            • @[email protected]
              link
              fedilink
              English
              5
              edit-2
              9 hours ago

              From experience AI translation is still garbage, specially for languages like Chinese, Japanese, and Korean , but if it only subtitles in the actual language such creating English subtitles for English then it is probably fine.

              • @[email protected]
                link
                fedilink
                English
                15 hours ago

                That’s probably more due to lack of training than anything else. Existing models are mostly made by American companies and trained on English-language material. Naturally, the further you get from the model, the worse the result.

                • @[email protected]
                  link
                  fedilink
                  English
                  24 hours ago

                  It is not the lack of training material that is the issue, it doesn’t understand context and cultural references. Someone commented here that crunchyroll AI subtitles translated Asura Hall a name to asshole.

          • @[email protected]
            link
            fedilink
            English
            613 hours ago

            I have a feeling that if you care enough about subtitles you’re going to look for good ones, instead of using “ok” ai subs.

          • @shyguyblue
            link
            English
            2
            edit-2
            8 hours ago

            I imagine it would be not-exactly-simple-but-not- complicated to add a “threshold” feature. If Ai is less than X% certain, it can request human clarification.

            Edit: Derp. I forgot about the “real time” part. Still, as others have said, even a single botched word would still work well enough with context.

            • snooggums
              link
              English
              1
              edit-2
              9 hours ago

              That defeats the purpose of doing it in real time as it would introduce a delay.

              • @shyguyblue
                link
                English
                18 hours ago

                Derp. You’re right, I’ve added an edit to my comment.

    • @[email protected]
      link
      fedilink
      English
      8
      edit-2
      13 hours ago

      Yeah it’s pretty wonderful To see how far auto generated transcription/captioning has become over the last couple of years. A wonderful victory for many communities with various disabilities.

  • m-p{3}
    link
    fedilink
    English
    59
    edit-2
    13 hours ago

    Now I want some AR glasses that display subtitles above someone’s head when they talk à la Cyberpunk that also auto-translates. Of course, it has to be done entirely locally.

    • @[email protected]
      link
      fedilink
      English
      1512 hours ago

      I guess we have most of the ingredients to make this happen. Software-wise we’re there, hardware wise I’m still waiting for AR glasses I can replace my normal glasses with (that I wear 24/7 except for sleep). I’d accept having to carry a spare in a charging case so I swap them out once a day or something but other than that I want them to be close enough in terms of weight and comfort to my regular glasses and just give me AR like overlaid GPS, notifications, etc, and indeed instant translation with subtitles would be a function that I could see having a massive impact on civilization tbh.

      • Midnight Wolf
        link
        English
        114 minutes ago

        soon

        Breaking news: “WW3 starts over an insult due to a mistranslated phrase at the G7 summit. We will be nuked in 37 seconds. Fuck like rabbits, it’s all we can do. Now over to Robert with traffic.”

      • @[email protected]
        link
        fedilink
        English
        12 hours ago

        It’d be incredible for deaf people being able to read captions for spoken conversations and to have the other person’s glasses translate from ASL to English.

        Honestly I’d be a bit shocked if the AI ASL -> English doesn’t exist already, there’s so much training data available, the Deaf community loves video for obvious reasons.

      • @[email protected]
        link
        fedilink
        English
        39 hours ago

        I think we’re closer with hardware than software. the xreal/rokid category of hmds are comfortable enough to wear all day, and I don’t mind a cable running from behind my ear under a clothes layer to a phone or mini PC in my pocket. Unfortunately you still need to byo cameras to get the overlays appearing in the correct points in space, but cameras are cheap, I suspect these glasses will grow some cameras in the next couple of iterations.

      • m-p{3}
        link
        fedilink
        English
        511 hours ago

        I believe you can put prescription lenses in most AR glasses out there, but I suppose the battery is a concern…

        I’m in the same boat, I gotta wear my glasses 24/7.

  • @[email protected]
    link
    fedilink
    English
    2514 hours ago

    I hope Mozilla can benefit of a good local translation engine that could come out of it as well.

        • @[email protected]
          link
          fedilink
          English
          19 hours ago

          And it takes forever. I’m using the TWP plugin for Firefox (which uses external resources, configurable to google, bing and yandex translate respectively), and it’s near instantaneous. The local one from Mozilla often takes 30 seconds, and sometimes hangs until I refresh the page.

  • @[email protected]
    link
    fedilink
    English
    1013 hours ago

    Haven’t watched the video yet, but it makes a lot of sense that you could train an AI using already subtitled movies and their audio. There are times when official subtitles paraphrase the speech to make it easier to read quickly, so I wonder how that would work. There’s also just a lot of voice recognition everywhere nowadays, so maybe that’s all they need?

    • @[email protected]
      link
      fedilink
      English
      157 minutes ago

      This is already implemented on windows

      Tools > Preferences > Show settings=All > Video \ Subtitles/OSD: Text rendering module [Speech synthesis for Windows]