• 3 Posts
  • 605 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle
  • Completely depends on your laptop hardware, but generally:

    • TabbyAPI (exllamav2/exllamav3)
    • ik_llama.cpp, and its openai server
    • kobold.cpp (or kobold.cpp rocm, or croco.cpp, depends)
    • An MLX host with one of the new distillation quantizations
    • Text-gen-web-ui (slow, but supports a lot of samplers and some exotic quantizations)
    • SGLang (extremely fast for parallel calls if thats what you want).
    • Aphrodite Engine (lots of samplers, and fast at the expense of some VRAM usage).

    I use text-gen-web-ui at the moment only because TabbyAPI is a little broken with exllamav3 (which is utterly awesome for Qwen3), otherwise I’d almost always stick to TabbyAPI.

    Tell me (vaguely) what your system has, and I can be more specific.











  • The front end.

    Some UIs (Like Open Web UI) have built in “agents” or extensions that can fetch and parse search results as part of the context, allowing LLMs to “research.” There are in fact some finetunes specializing in this, though these days you are probably best off with regular Qwen3.

    This is sometimes called tool use.

    I also (sometimes) use a custom python script (modified from another repo) for research, getting the LLM to search a bunch of stuff and work through it.

    But fundamentally the LLM isn’t “searching” anything, you are just programmatically feeding it text (and maybe fetching its own requests for search terms).

    The backend for all this is a TabbyAPI server, with 2-4 parallel slots for fast processing.


  • No amount of AI can replace professionalism.

    This!

    I don’t want to do that. Instead I let it suggest better phrasing, words, basically a better editor.

    This!

    Locally, I actually keep a long context llm that can fit all or most of the story, and sometimes use it as an “autocomplete.” For instance, when my brain freezes and I can’t finish a sentence, I see what it suggests. If I am thinking of the next word, I let it generate one token and look at the logprobs for all the words it considered, kind of like a thesaurus sorted by contextual relevance.

    This is only doable locally, as prompts are cached so you get instant/free responses ingesting (say) a 80K word block of text.


  • That work well out of the box? Honestly, I’m not sure.

    Back in the day, I’d turn to vapoursynth or (Or avisynth+) filters and a lot of hand editing, basically go through the trouble sections one-by-one and see which combination of VHS-specific correction and regeneration looks best.

    These days, we have far more powerful tools. I’d probably start by training a LoRA for Wan 2B or something, then use it to straight up regenerate damaged test sections with video-2-video. Then I’d write a script to detect them, and mix in some “traditional” vapoursynth filters.

    …But this is all very manual, like python dev level with some media/ml knowledge, unfortunately. I am much less familiar with, like, a GUI that could accomplish this. I’m sure paid services out there offer this, though.



  • Well, if I am going to push this into the project I envision, privacy is going to be key, so everything will be done locally.

    Reasonable! And yes languagetool has some online AI thing, probably a junky wrapper around an LLM API TBH.


    One thing I’d be wary of is… well, showing you’re using AI?

    As a random example, I helped clean up a TV show a long time ago, with mixed results. More recently, I brought the idea of making another attempt to the fandom, and got banned for even considering “AI,” with most of them clearly oblivious to how the original restoration was made… I have a little personal writing project going too, and wouldn’t dare bring it up to the fandom either.

    I don’t know what’s involved in your project, but be aware that you may get some very hostile interaction if it’s apparent you use diffusion models and LLMs as helpers.


  • You don’t strictly need a huge GPU. These days, there are a lot of places for free generations (like the AI Horde), and a lot of quantization/optimization that gets things running on small VRAM pools if you know where to look. Renting GPUs on vast.ai is pretty cheap.

    Also, I’d recommend languagetool as a locally runnable (and AI free/CPU only) grammar checker. It’s pretty good!

    As for online services, honestly Gemini Pro via the AI Studio web app is way better and way more generous than ChatGPT, or pretty much anything else. It can ingest an entire story for context, and stay coherent. I don’t like using Google, but if I’m not paying them a dime…


  • Before it was hot, I used ESRGAN and some other stuff for restoring old TV. There was a niche community that finetuned models just to, say, restore classic SpongeBob or DBZ or whatever they were into.

    These days, I am less into media, but keep Qwen3 32B loaded on my desktop… pretty much all the time? For brainstorming, basic questions, making scripts, an agent to search the internet for me, a ‘dumb’ writing editor, whatever. It’s a part of my “degoogling” effort, and I find myself using it way more often since it’s A: totally free/unlimited, B: private and offline on an open source stack, and C: doesn’t support Big Tech at all. It’s kinda amazing how “logical” a 14GB file can be these days, and I can bounce really personal/sensitive ideas off it that I would hardly trust anyone with.

    …I’ve pondered getting back into video restoration, with all the shiny locally runnable tools we have now.