Lvxferre [he/him]

Lvxferre [he/him]@mander.xyz · 2 天前

Time to cancel my Crunchyroll subscription. Oh wait I don’t have one, I simply torrent my series.

Seriously now. The anime fansubbing scene is one that makes me genuinely happy. It shows me there are plenty amateurs out there that are as good or better than plenty professionals like me.

Lvxferre [he/him]@mander.xyz · 2 天前

I don’t see what the problem is with using AI for translations. if the translations are good enough and cheap enough, they should be used.

Because machine translations for any large chunk of text are consistently awful: they don’t get references right, they often miss the point of the original utterance, they ignore cultural context, so goes on. It’s like wiping your arse with an old sock - sure, you could do it in a pinch, but you definitively don’t want to do it regularly!

Verbose example, using Portuguese to English

I’ll give you an example, using PT→EN because I don’t speak JP. Let’s say Alice tells Bob “ma’ tu é uma nota de três pila, né?” (literally: “bu[t] you’re a three bucks bill, isn’t it?”) . A human translator will immediately notice a few things:

It’s an informal and regional register. If Alice typically uses this register, it’s part of her characterisation; else, it register shift is noteworthy. Either way, it’s meaningful.
There’s an idiom there; “nota de três pila” (three bucks bill). It conveys some[thing/one] is blatantly false.
There’s a rhetorical question, worded like an accusation. The scene dictates how it should be interpreted.

So depending on the context, the translator might translate this as “ain’t ya full of shit…”, or perhaps “wow, you’re as fake as Monopoly money, arentcha?”. Now, check how chatbots do it:

GPT-4o mini: “But you’re a three-buck note, right?”
Llama 4 Scout: “But you are a three-dollar bill, aren’t you?”; or “You’re a three-dollar bill, right?” (it offers both alternatives)

Both miss the mark. If you talk about three dollar bills in English, lots of people associate it with gay people, creating an association that simply does not exist in the original. The extremely informal and regional register is gone, as well as the accusatory tone.

With Claude shitting this pile of idiocy, that I had to screenshot because otherwise people wouldn’t believe me:

[This is wrong on so many levels I don’t… I don’t even…]

This is what you get for AI translations between two IE languages in the same Sprachbund, that’ll often do things in a similar way. It gets way worse for Japanese → English - because they’re languages from different families, different cultures, that didn’t historically interact that much. It’s like the dumb shit above, multiplied by ten.

If they’re not good enough, another business can offer better translations as a differentiator.

That “business” is called watching pirated anime with fan subs, made by people who genuinely enjoy anime and want others to enjoy it too.

Lvxferre [he/him]@mander.xyz · edit-2 8 天前

I hope so, too. Their current situation isn’t currently the best (a lot of them went away in the late 10s, simply because people were using them less); I’m kind of hoping to see a revival, but that’s at the mercy of the STF, so I can’t completely rule out that the situation will evolve exactly like in the UK. It’s “let’s wait and see”, you know?

I’m also wondering the impact of that on chatrooms, that used to be extremely popular here.

Lvxferre [he/him]@mander.xyz · 8 天前

When something similar happened in the UK, it was pretty much exclusively smaller/niche forums, run by volunteers and donations, that went offline.

[Warning, IANAL] I am really not sure if the experience is transposable for two reasons:

UK follows Saxon tribal law, while Brazil follows Roman civil law. I am not sure but I believe the former requires both sides to dig up precedents, and that puts a heavier burden on the smaller side of a legal litigation. While in the later, if you show “ackshyually in that older case the defendant was deemed guilty”, all the judge will say is “so? What is written is what matters; if the defendant violated the law or not.”.
The Americas in general are notorious for sloppy law enforcement. Specially Brazil. Doubly so when both parties are random nobodies.

So there’s still a huge room for smaller forums to survive, or even thrive. It all depends on how the STF enforces it. For example it might take into account that a team of volunteers has less liability because their ability to remove random junk from the internet is lower than some megacorpo from the middle of nowhere.

Additionally, it might be possible the legislative screeches at the judiciary, and releases some additional law that does practically the same as that article 19, except it doesn’t leave room for the judiciary to claim it’s unconstitutional. Because, like, as I said the judiciary is a bit too powerful, but the other powers still can fight back, specially the legislative.

Lvxferre [he/him]@mander.xyz · edit-2 8 天前

For context:

There’s an older law called Marco Civil da Internet (roughly “internet civil framework”), from 2014. The Article 19 of that law boils down to “if a third party posts content that violates the law in an internet service, the service provider isn’t legally responsible, unless there’s a specific judicial order telling it to remove it.”

So. The new law gets rid of that article, claiming it’s unconstitutional. In effect, this means service providers (mostly social media) need to proactively remove illegal content, even without judicial order.

I kind of like the direction this is going, but it raises three concerns:

False positives becoming more common.
The burden will be considerably bigger for smaller platforms than bigger ones.
It gives the STF yet another tool for vendetta. The judiciary is already a bit too strong in comparison with the other two powers, and this decision only feeds the beast further.

On a lighter side, regardless of #2, I predict a lower impact in the Fediverse than in centralised social media.

Lvxferre [he/him]@mander.xyz · edit-2 16 天前

The stick in question is off-site; it sees the PC once per month, then it gets back to the drawer in another room. And regardless of its fate, if I had a flood or fire affecting my PC, in the second store of a brick house, odds are that I’d have far more pressing matters than the data.

Lvxferre [he/him]@mander.xyz · edit-2 16 天前

It’s mostly fluff kept for sentimental value. Worst case scenario (complete data loss) would be annoying, but I can deal with it.

That’s one of the two things the 3-2-1 rule of thumb doesn’t address - depending on the value of the data, you need more backups, or the backup might be overkill. (The other is what you’re talking with smeg about, the reliability of each storage device in question.)

I do have an internal hard disk drive (coincidentally 2TB)*; theoretically I could store a third copy of the backup there, it’s just ~15GiB of data anyway. However:

HDDs tend to be a bit less reliable than flash memory. Specially given the stick and SSD are relatively new, but the HDD is a bit older
since the stick is powered ~once a month (as I check if the backup needs to be updated), and I do a diff of the most important bits of the data, bit rot is not an issue
those sticks tend to fail more from usage than from old age.
Any failure affecting my computer as a while would affect both the HDD and the SSD, so the odds of dependent failure are not negligible.
I tend to accumulate a lot of junk in my HDD (like 490GiB of anime and shit like this), since I use it for my home LAN

That makes the benefit of a potential new backup in the HDD fairly low, in comparison with the bother (i.e. labour and opportunity cost) of keeping yet another backup.

*I don’t recall how much I paid for it, but checking local hardware sites a new one would be 475 reals. Or roughly 75 euros… meh, if buying a new HDD might as well use it to increase my LAN.

Lvxferre [he/him]@mander.xyz · 17 天前

No, it’s really not.

It is enough for my use case, considering the likelihood of my SSD and the USB stick going kaboom in the span of a single month is next to zero; if only one of them does it, I can use the other to recover the data to a third medium.

Lvxferre [he/him]@mander.xyz · 17 天前

I mean just about anyone of sufficient size is susceptible to this.

Sure - the bigger the business, the more expendable each user/customer is. And Microsoft is really huge.

Just keep multiple backups.

Two are enough for most people (the 3-2-1 rule); sometimes one. The catch is that at least one of those backups must be off-line, and in a different medium than the original. While you can use the cloud to increase the reliability of the whole system, you should never rely exclusively on it.

Lvxferre [he/him]@mander.xyz · edit-2 17 天前

Reminder “the cloud” is someone else’s computer. If you’re going to use it at least make sure the “someone else” isn’t a clown hat like Microsoft.

(This article also prompted me to update the backup of my personal files. I’m not following the 3-2-1 rule; a USB stick is enough. I do like to keep it updated though.)

Lvxferre [he/him]@mander.xyz · 20 天前

The site works fine for me, but the same software is available from Github if desired.

Note I’m recommending anime streaming software (instead of an Anitaku-like site) because it’s a bit less likely to be taken down.

Lvxferre [he/him]@mander.xyz · 20 天前

As Kolanaki said, copyright holders killed it.

If you’re looking for alternatives give Hayase (formerly Miru) a try.

Lvxferre [he/him]@mander.xyz · 1 个月前

If you don’t hold it, you’ll eventually lose it. Plus sharing is loving, and if you don’t have it you can’t share it.

Lvxferre [he/him]@mander.xyz · edit-2 1 个月前

So, just for show? It sounds possible but implausible IMO; I don’t think YouTube cares about that cesspool of its own comments, not even enough to set a smoke screen up.

Lvxferre [he/him]@mander.xyz · 1 个月前

Disgusting.

This would be sad and highly unethical if coming from some small, relatively new company, struggling to keep up. But since it’s coming from a large, monopolistic and 30yo old corporation it becomes way worse.

I’m glad I stopped buying their cards. My last one is an AMD, and I’m going Chinese for the next one (a decade or so from now).

Lvxferre [he/him]@mander.xyz · edit-2 2 个月前

Yes, it is expensive. But most of that cost is not because of simple applications, like in my example with grammar tables. It’s because those models have been scaled up to a bazillion parameters and “trained” with a gorillabyte of scrapped data, in the hopes they’ll magically reach sentience and stop telling you to put glue on pizza. It’s because of meaning (semantics and pragmatics), not grammar.

Also, natural languages don’t really have nonsensical rules; sure, sometimes you see some weird stuff (like Italian genderbending plurals, or English question formation), but even those are procedural: “if X, do Y”. LLMs are actually rather good at regenerating those procedural rules based on examples from the data.

But I wish it had some broader use, that would justify its cost.

I with that they cut down the costs based on the current uses. Small models for specific applications, dirty cheap in both training and running costs.

(In both our cases, it’s about matching cost vs. use.)

Lvxferre [he/him]@mander.xyz · 2 个月前

I’d go further: you won’t reach AGI through LLM development. It’s like randomly throwing bricks on a construction site, no cement, and hoping that you’ll get a house.

I’m not even sure if AGI is cost-wise feasible with the current hardware, we’d probably need cheaper calculations per unit of energy.

Lvxferre [he/him]@mander.xyz · 2 个月前

Why not quanta? Don’t you believe in the power of the crystals? Quantum vibrations of the Universe from negative ions from the Himalayan salt lamps give you 153.7% better spiritual connection with the soul of the cosmic rays of the Unity!

…what makes me sadder about the generative models is that the underlying tech is genuinely interesting. For example, for languages with large presence online they get the grammar right, so stuff like “give me a [declension | conjugation] table for [noun | verb]” works great, and if it’s any application where accuracy isn’t a big deal (like “give me ideas for [thing]”) you’ll probably get some interesting output. But it certainly not give you reliable info about most stuff, unless directly copied from elsewhere.

Lvxferre [he/him]@mander.xyz · 2 个月前

The whole thing can be summed up as the following: they’re selling you a hammer and telling you to use it with screws. Once you hammer the screw, it trashes the wood really bad. Then they’re calling the wood trashing “hallucination”, and promising you better hammers that won’t do this. Except a hammer is not a tool to use with screws dammit, you should be using a screwdriver.

An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates.

So he’s suggesting that the models are producing less accurate results… because they have higher rates of less accurate results? This is a tautological pseudo-explanation.

AI chatbots from tech companies such as OpenAI and Google have been getting so-called reasoning upgrades over the past months

When are people going to accept the fact that large “language” models are not general intelligence?

ideally to make them better at giving us answers we can trust

Those models are useful, but only a fool trusts = is gullible towards their output.

OpenAI says the reasoning process isn’t to blame.

Just like my dog isn’t to blame for the holes in my garden. Because I don’t have a dog.

This is sounding more and more like model collapse - models perform worse when trained on the output of other models.

inb4 sealions asking what’s my definition of reasoning in 3…2…1…

Lvxferre [he/him]@mander.xyz · 2 个月前

If anything, printers today are worse than they used to be in the 90s. For example, I don’t remember chips preventing you from using third party ink being a thing back then. So I believe the printing ~~industry~~ mafia has been spending those decades adding antifeatures to their designs.

And IMO it highlights how much we [society in general] need open hardware.