Lvxferre [he/him]

The catarrhine who invented a perpetual motion machine, by dreaming at night and devouring its own dreams through the day.

  • 0 Posts
  • 713 Comments
Joined 1 年前
cake
Cake day: 2024年1月12日

help-circle

  • I don’t see what the problem is with using AI for translations. if the translations are good enough and cheap enough, they should be used.

    Because machine translations for any large chunk of text are consistently awful: they don’t get references right, they often miss the point of the original utterance, they ignore cultural context, so goes on. It’s like wiping your arse with an old sock - sure, you could do it in a pinch, but you definitively don’t want to do it regularly!

    Verbose example, using Portuguese to English

    I’ll give you an example, using PT→EN because I don’t speak JP. Let’s say Alice tells Bob “ma’ tu é uma nota de três pila, né?” (literally: “bu[t] you’re a three bucks bill, isn’t it?”) . A human translator will immediately notice a few things:

    • It’s an informal and regional register. If Alice typically uses this register, it’s part of her characterisation; else, it register shift is noteworthy. Either way, it’s meaningful.
    • There’s an idiom there; “nota de três pila” (three bucks bill). It conveys some[thing/one] is blatantly false.
    • There’s a rhetorical question, worded like an accusation. The scene dictates how it should be interpreted.

    So depending on the context, the translator might translate this as “ain’t ya full of shit…”, or perhaps “wow, you’re as fake as Monopoly money, arentcha?”. Now, check how chatbots do it:

    • GPT-4o mini: “But you’re a three-buck note, right?”
    • Llama 4 Scout: “But you are a three-dollar bill, aren’t you?”; or “You’re a three-dollar bill, right?” (it offers both alternatives)

    Both miss the mark. If you talk about three dollar bills in English, lots of people associate it with gay people, creating an association that simply does not exist in the original. The extremely informal and regional register is gone, as well as the accusatory tone.

    With Claude shitting this pile of idiocy, that I had to screenshot because otherwise people wouldn’t believe me:


    [This is wrong on so many levels I don’t… I don’t even…]

    This is what you get for AI translations between two IE languages in the same Sprachbund, that’ll often do things in a similar way. It gets way worse for Japanese → English - because they’re languages from different families, different cultures, that didn’t historically interact that much. It’s like the dumb shit above, multiplied by ten.

    If they’re not good enough, another business can offer better translations as a differentiator.

    That “business” is called watching pirated anime with fan subs, made by people who genuinely enjoy anime and want others to enjoy it too.



  • When something similar happened in the UK, it was pretty much exclusively smaller/niche forums, run by volunteers and donations, that went offline.

    [Warning, IANAL] I am really not sure if the experience is transposable for two reasons:

    1. UK follows Saxon tribal law, while Brazil follows Roman civil law. I am not sure but I believe the former requires both sides to dig up precedents, and that puts a heavier burden on the smaller side of a legal litigation. While in the later, if you show “ackshyually in that older case the defendant was deemed guilty”, all the judge will say is “so? What is written is what matters; if the defendant violated the law or not.”.
    2. The Americas in general are notorious for sloppy law enforcement. Specially Brazil. Doubly so when both parties are random nobodies.

    So there’s still a huge room for smaller forums to survive, or even thrive. It all depends on how the STF enforces it. For example it might take into account that a team of volunteers has less liability because their ability to remove random junk from the internet is lower than some megacorpo from the middle of nowhere.

    Additionally, it might be possible the legislative screeches at the judiciary, and releases some additional law that does practically the same as that article 19, except it doesn’t leave room for the judiciary to claim it’s unconstitutional. Because, like, as I said the judiciary is a bit too powerful, but the other powers still can fight back, specially the legislative.


  • For context:

    There’s an older law called Marco Civil da Internet (roughly “internet civil framework”), from 2014. The Article 19 of that law boils down to “if a third party posts content that violates the law in an internet service, the service provider isn’t legally responsible, unless there’s a specific judicial order telling it to remove it.”

    So. The new law gets rid of that article, claiming it’s unconstitutional. In effect, this means service providers (mostly social media) need to proactively remove illegal content, even without judicial order.

    I kind of like the direction this is going, but it raises three concerns:

    1. False positives becoming more common.
    2. The burden will be considerably bigger for smaller platforms than bigger ones.
    3. It gives the STF yet another tool for vendetta. The judiciary is already a bit too strong in comparison with the other two powers, and this decision only feeds the beast further.

    On a lighter side, regardless of #2, I predict a lower impact in the Fediverse than in centralised social media.



  • It’s mostly fluff kept for sentimental value. Worst case scenario (complete data loss) would be annoying, but I can deal with it.

    That’s one of the two things the 3-2-1 rule of thumb doesn’t address - depending on the value of the data, you need more backups, or the backup might be overkill. (The other is what you’re talking with smeg about, the reliability of each storage device in question.)

    I do have an internal hard disk drive (coincidentally 2TB)*; theoretically I could store a third copy of the backup there, it’s just ~15GiB of data anyway. However:

    • HDDs tend to be a bit less reliable than flash memory. Specially given the stick and SSD are relatively new, but the HDD is a bit older
    • since the stick is powered ~once a month (as I check if the backup needs to be updated), and I do a diff of the most important bits of the data, bit rot is not an issue
    • those sticks tend to fail more from usage than from old age.
    • Any failure affecting my computer as a while would affect both the HDD and the SSD, so the odds of dependent failure are not negligible.
    • I tend to accumulate a lot of junk in my HDD (like 490GiB of anime and shit like this), since I use it for my home LAN

    That makes the benefit of a potential new backup in the HDD fairly low, in comparison with the bother (i.e. labour and opportunity cost) of keeping yet another backup.

    *I don’t recall how much I paid for it, but checking local hardware sites a new one would be 475 reals. Or roughly 75 euros… meh, if buying a new HDD might as well use it to increase my LAN.










  • Yes, it is expensive. But most of that cost is not because of simple applications, like in my example with grammar tables. It’s because those models have been scaled up to a bazillion parameters and “trained” with a gorillabyte of scrapped data, in the hopes they’ll magically reach sentience and stop telling you to put glue on pizza. It’s because of meaning (semantics and pragmatics), not grammar.

    Also, natural languages don’t really have nonsensical rules; sure, sometimes you see some weird stuff (like Italian genderbending plurals, or English question formation), but even those are procedural: “if X, do Y”. LLMs are actually rather good at regenerating those procedural rules based on examples from the data.

    But I wish it had some broader use, that would justify its cost.

    I with that they cut down the costs based on the current uses. Small models for specific applications, dirty cheap in both training and running costs.

    (In both our cases, it’s about matching cost vs. use.)



  • Why not quanta? Don’t you believe in the power of the crystals? Quantum vibrations of the Universe from negative ions from the Himalayan salt lamps give you 153.7% better spiritual connection with the soul of the cosmic rays of the Unity!

    …what makes me sadder about the generative models is that the underlying tech is genuinely interesting. For example, for languages with large presence online they get the grammar right, so stuff like “give me a [declension | conjugation] table for [noun | verb]” works great, and if it’s any application where accuracy isn’t a big deal (like “give me ideas for [thing]”) you’ll probably get some interesting output. But it certainly not give you reliable info about most stuff, unless directly copied from elsewhere.


  • The whole thing can be summed up as the following: they’re selling you a hammer and telling you to use it with screws. Once you hammer the screw, it trashes the wood really bad. Then they’re calling the wood trashing “hallucination”, and promising you better hammers that won’t do this. Except a hammer is not a tool to use with screws dammit, you should be using a screwdriver.

    An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates.

    So he’s suggesting that the models are producing less accurate results… because they have higher rates of less accurate results? This is a tautological pseudo-explanation.

    AI chatbots from tech companies such as OpenAI and Google have been getting so-called reasoning upgrades over the past months

    When are people going to accept the fact that large “language” models are not general intelligence?

    ideally to make them better at giving us answers we can trust

    Those models are useful, but only a fool trusts = is gullible towards their output.

    OpenAI says the reasoning process isn’t to blame.

    Just like my dog isn’t to blame for the holes in my garden. Because I don’t have a dog.

    This is sounding more and more like model collapse - models perform worse when trained on the output of other models.

    inb4 sealions asking what’s my definition of reasoning in 3…2…1…