OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

  • TropicalDingdong@lemmy.world
    link
    fedilink
    English
    arrow-up
    177
    arrow-down
    31
    ·
    1 year ago

    Its a bit pedantic, but I’m not really sure I support this kind of extremist view of copyright and the scale of whats being interpreted as ‘possessed’ under the idea of copyright. Once an idea is communicated, it becomes a part of the collective consciousness. Different people interpret and build upon that idea in various ways, making it a dynamic entity that evolves beyond the original creator’s intention. Its like issues with sampling beats or records in the early days of hiphop. Its like the very principal of an idea goes against this vision, more that, once you put something out into the commons, its irretrievable. Its not really yours any more once its been communicated. I think if you want to keep an idea truly yours, then you should keep it to yourself. Otherwise you are participating in a shared vision of the idea. You don’t control how the idea is interpreted so its not really yours any more.

    If thats ChatGPT or Public Enemy is neither here nor there to me. The idea that a work like Peter Pan is still possessed is such a very real but very silly obvious malady of this weirdly accepted but very extreme view of the ability to possess an idea.

    • Laticauda@lemmy.ca
      link
      fedilink
      English
      arrow-up
      52
      arrow-down
      18
      ·
      edit-2
      1 year ago

      Ai isn’t interpreting anything. This isn’t the sci-fi style of ai that people think of, that’s general ai. This is narrow AI, which is really just an advanced algorithm. It can’t create new things with intent and design, it can only regurgitate a mix of pre-existing stuff based on narrow guidelines programmed into it to try and keep it coherent, with no actual thought or interpretation involved in the result. The issue isn’t that it’s derivative, the issue is that it can only ever be inherently derivative without any intentional interpretation or creativity, and nothing else.

      Even collage art has to qualify as fair use to avoid copyright infringement if it’s being done for profit, and fair use requires it to provide commentary, criticism, or parody of the original work used (which requires intent). Even if it’s transformative enough to make the original unrecognizable, if the majority of the work is not your own art, then you need to get permission to use it otherwise you aren’t automatically safe from getting in trouble over copyright. Even using images for photoshop involves creative commons and commercial use licenses. Fanart and fanfic is also considered a grey area and the only reason more of a stink isn’t kicked up over it regarding copyright is because it’s generally beneficial to the original creators, and credit is naturally provided by the nature of fan works so long as someone doesn’t try to claim the characters or IP as their own. So most creators turn a blind eye to the copyright aspect of the genre, but if any ever did want to kick up a stink, they could, and have in the past like with Anne Rice. And as a result most fanfiction sites do not allow writers to profit off of fanfics, or advertise fanfic commissions. And those are cases with actual humans being the ones to produce the works based on something that inspired them or that they are interpreting. So even human made derivative works have rules and laws applied to them as well. Ai isn’t a creative force with thoughts and ideas and intent, it’s just a pattern recognition and replication tool, and it doesn’t benefit creators when it’s used to replace them entirely, like Hollywood is attempting to do (among other corporate entities). Viewing AI at least as critically as actual human beings is the very least we can do, as well as establishing protection for human creators so that they can’t be taken advantage of because of AI.

      I’m not inherently against AI as a concept and as a tool for creators to use, but I am against AI works with no human input being used to replace creators entirely, and I am against using works to train it without the permission of the original creators. Even in the artist/writer/etc communities it’s considered to be a common courtesy to credit other people/works that you based a work on or took inspiration from, even if what you made would be safe under copyright law regardless. Sure, humans get some leeway in this because we are imperfect meat creatures with imperfect memories and may not be aware of all our influences, but a coded algorithm doesn’t have that excuse. If the current AIs in circulation can’t function without being fed stolen works without credit or permission, then they’re simply not ready for commercial use yet as far as I’m concerned. If it’s never going to be possible, which I just simply don’t believe, then it should never be used commercially period. And it should be used by creators to assist in their work, not used to replace them entirely. If it takes longer to develop, fine. If it takes more effort and manpower, fine. That’s the price I’m willing to pay for it to be ethical. If it can’t be done ethically, then imo it shouldn’t be done at all.

      • Kogasa@programming.dev
        link
        fedilink
        English
        arrow-up
        12
        arrow-down
        8
        ·
        1 year ago

        Your broader point would be stronger if it weren’t framed around what seems like a misunderstanding of modern AI. To be clear, you don’t need to believe that AI is “just” a “coded algorithm” to believe it’s wrong for humans to exploit other humans with it. But to say that modern AI is “just an advanced algorithm” is technically correct in exactly the same way that a blender is “just a deterministic shuffling algorithm.” We understand that the blender chops up food by spinning a blade, and we understand that it turns solid food into liquid. The precise way in which it rearranges the matter of the food is both incomprehensible and irrelevant. In the same way, we understand the basic algorithms of model training and evaluation, and we understand the basic domain task that a model performs. The “rules” governing this behavior at a fine level are incomprehensible and irrelevant-- and certainly not dictated by humans. They are an emergent property of a simple algorithm applied to billions-to-trillions of numerical parameters, in which all the interesting behavior is encoded in some incomprehensible way.

        • Laticauda@lemmy.ca
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          20
          ·
          edit-2
          1 year ago

          Bro I don’t think you have any idea what you’re talking about. These AIs aren’t blenders, they are designed to recognize and replicate specific aspects of art and writing and whatever else, in a way that is coherent and recognizable. Unless there’s a blender that can sculpt Michelangelo’s David out of apple peels, AI isn’t like a blender in any way.

          But even if they were comparable, a blender is meant to produce chaos. It is meant to, you know, blend the food we put into it. So yes, the outcome is dictated by humans. We want the individual pieces to be indistinguishable, and deliberate design decisions get made by the humans making them to try and produce a blender that blends things sufficiently, and makes the right amount of chaos with as many ingredients as possible.

          And here’s the thing, if we wanted to determine what foods were put into a blender, even assuming we had blindfolds on while tossing random shit in, we could test the resulting mixture to determine what the ingredients were before they got mashed together. We also use blenders for our own personal use the majority of the time, not for profit, and we use our own fruits and vegetables rather than stuff we stole from a neighbor’s yard, which would be, you know, trespassing and theft. And even people who use blenders to make something that they sell or offer publicly almost always list the ingredients, like restaurants.

          So even if AI was like a blender, that wouldn’t be an excuse, nor would it contradict anything I’ve said.

      • primbin@lemmy.one
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        7
        ·
        1 year ago

        I disagree with your interpretation of how an AI works, but I think the way that AI works is pretty much irrelevant to the discussion in the first place. I think your argument stands completely the same regardless. Even if AI worked much like a human mind and was very intelligent and creative, I would still say that usage of an idea by AI without the consent of the original artist is fundamentally exploitative.

        You can easily train an AI (with next to no human labor) to launder an artist’s works, by using the artist’s own works as reference. There’s no human input or hard work involved, which is a factor in what dictates whether a work is transformative. I’d argue that if you can put a work into a machine, type in a prompt, and get a new work out, then you still haven’t really transformed it. No matter how creative or novel the work is, the reality is that no human really put any effort into it, and it was built off the backs of unpaid and uncredited artists.

        You could probably make an argument for being able to sell works made by an AI trained only on the public domain, but it still should not be copyrightable IMO, cause it’s not a human creation.

        TL;DR - No matter how creative an AI is, its works should not be considered transformative in a copyright sense, as no human did the transformation.

      • Immersive_Matthew@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        7
        ·
        1 year ago

        I thought this way too, but after playing with ChatGPT and Mid Journey near daily, I have seen many moments of creativity way beyond the source it was trained on. I think a good example that I saw was on a YouTube video (sorry I cannot recall which to link) where thr prompt was animals made of sushi and wow, was it ever good and creative on how it made them and it was photo realistic. This is just not something you an find anywhere on the Internet. I just did a search and found some hand drawn Japanese style sushi with eyes and such, but nothing like what I saw in that video.

        I have also experienced it suggested ways to handle coding on my VR Theme Park app that is very unconventional and not something anyone has posted about as near as I can tell. It seems to be able to put 2 and 2 together and get 8. Likely as it sees so much of everything at once that it can connect the dots on ways we would struggle too. It is more than regurgitated data and it surprises me near daily.

        • Laticauda@lemmy.ca
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          3
          ·
          1 year ago

          Just because you think it seems creative due to your lack of experience with human creativity, that doesn’t mean it is uniquely creative. It’s not, it can’t be by its very nature, it can only regurgitate an amalgamation of stuff fed into it. What you think you see is the equivalent of paradoilia.

      • Even_Adder@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        8
        ·
        1 year ago

        if it’s being done for profit, and fair use requires it to provide commentary, criticism, or parody of the original work used. Even if it’s transformative enough to make the original unrecognizable

        I’m going to need a source for that. Fair use is a flexible and context-specific, It depends on the situation and four things: why, what, how much, and how it affects the work. No one thing is more important than the others, and it is possible to have a fair use defense even if you do not meet all the criteria of fair use.

        • Laticauda@lemmy.ca
          link
          fedilink
          English
          arrow-up
          16
          arrow-down
          5
          ·
          1 year ago

          I’m a bit confused about what point you’re trying to make. There is not a single paragraph or example in the link you provided that doesn’t support what I’ve said, and none of the examples provided in that link are something that qualified as fair use despite not meeting any criteria. In fact one was the opposite, as something that met all the criteria but still didn’t qualify as fair use.

          The key aspect of how they define transformative is here:

          Has the material you have taken from the original work been transformed by adding new expression or meaning?

          These (narrow) AIs cannot add new expression or meaning, because they do not have intent. They are just replicating and rearranging learned patterns mindlessly.

          Was value added to the original by creating new information, new aesthetics, new insights, and understandings?

          These AIs can’t provide new information because they can’t create something new, they can only reconfigure previously provided info. They can’t provide new aesthetics for the same reason, they can only recreate pre-existing aesthetics from the works fed to them, and they definitely can’t provide new insights or understandings because again, there is no intent or interpretation going on, just regurgitation.

          The fact that it’s so strict that even stuff that meets all the criteria might still not qualify as fair use only supports what I said about how even derivative works made by humans are subject to a lot of laws and regulations, and if human works are under that much scrutiny then there’s no reason why AI works shouldn’t also be under at least as much scrutiny or more. The fact that so much of fair use defense is dependent on having intent, and providing new meaning, insights, and information, is just another reason why AI can’t hide behind fair use or be given a pass automatically because “humans make derivative works too”. Even derivative human works are subject to scrutiny, criticism, and regulation, and so should AI works.

          • Even_Adder@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            11
            ·
            edit-2
            1 year ago

            I’m a bit confused about what point you’re trying to make. There is not a single paragraph or example in the link you provided that doesn’t support what I’ve said, and none of the examples provided in that link are something that qualified as fair use despite not meeting any criteria. In fact one was the opposite, as something that met all the criteria but still didn’t qualify as fair use.

            You said "…fair use requires it to provide commentary, criticism, or parody of the original work used. " This isn’t true, if you look at the summaries of fair use cases I provided you can see there are plenty of cases where there was no purpose stated.

            These (narrow) AIs cannot add new expression or meaning, because they do not have intent. They are just replicating and rearranging learned patterns mindlessly.

            You’re anthropomorphizing a machine here, the intent is that of the person using the tool, not the tool itself. These are tools made by humans for humans to use. It’s up to the artist to make all the content choices when it comes to the input and output and everything in between.

            These AIs can’t provide new information because they can’t create something new, they can only reconfigure previously provided info. They can’t provide new aesthetics for the same reason, they can only recreate pre-existing aesthetics from the works fed to them, and they definitely can’t provide new insights or understandings because again, there is no intent or interpretation going on, just regurgitation.

            I’m going to need a source on this too. This statement isn’t backed up with anything.

            The fact that it’s so strict that even stuff that meets all the criteria might still not qualify as fair use only supports what I said about how even derivative works made by humans are subject to a lot of laws and regulations, and if human works are under that much scrutiny then there’s no reason why AI works shouldn’t also be under at least as much scrutiny or more. The fact that so much of fair use defense is dependent on having intent, and providing new meaning, insights, and information, is just another reason why AI can’t hide behind fair use or be given a pass automatically because “humans make derivative works too”. Even derivative human works are subject to scrutiny, criticism, and regulation, and so should AI works.

            AI works are human works. AI can’t be authors or hold copyright.

      • Echoes in May@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        7
        ·
        1 year ago

        Neural networks are based on the same principles as the human brain, they are literally learning in the exact same way humans are. Copyrighting the training of neural nets is the essentially the same thing as copyrighting interpretation and learning by humans.

        • Laticauda@lemmy.ca
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          1
          ·
          1 year ago

          These AIs are not neural networks based on the human brain. They’re literally just algorithms designed to perform a single task.

    • Bogasse@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      3
      ·
      1 year ago

      Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷

      • TropicalDingdong@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        1 year ago

        Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷

        Yeah I suppose that’s on them.

    • Toasteh@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      2
      ·
      1 year ago

      Copyright definitely needs to be stripped back severely. Artists need time to use their own work, but after a certain time everything needs to enter the public space for the sake of creativity.

    • treefrog@lemm.ee
      link
      fedilink
      English
      arrow-up
      32
      arrow-down
      31
      ·
      1 year ago

      If you sample someone else’s music and turn around and try to sell it, without first asking permission from the original artist, that’s copyright infringement.

      So, if the same rules apply, as your post suggests, OpenAI is also infringing on copyright.

      • TropicalDingdong@lemmy.world
        link
        fedilink
        English
        arrow-up
        50
        arrow-down
        17
        ·
        1 year ago

        If you sample someone else’s music and turn around and try to sell it, without first asking permission from the original artist, that’s copyright infringement.

        I think you completely and thoroughly do not understand what I’m saying or why I’m saying it. No where did I suggest that I do not understand modern copyright. I’m saying I’m questioning my belief in this extreme interpretation of copyright which is represented by exactly what you just parroted. That this interpretation is both functionally and materially unworkable, but also antithetical to a reasonable understanding of how ideas and communication work.

        • treefrog@lemm.ee
          link
          fedilink
          English
          arrow-up
          12
          arrow-down
          9
          ·
          1 year ago

          That’s life under capitalism.

          I agree with you in essence (I’ve put a lot of time into a free software game).

          However, people are entitled to the fruits of their labor, and until we learn to leave capitalism behind artists have to protect their work to survive. To eat. To feed their kids. And pay their rent.

          Unless OpenAi is planning to pay out royalties to everyone they stole from, what their doing is illegal and immoral under our current, capitalist paradigm.

          • kmkz_ninja@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            8
            ·
            1 year ago

            Yeah, this is definitely leaning a little too “People shouldn’t pump their own gas because gas attendants need to eat, feed their kids, pay rent” for me.

      • NOT_RICK@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        3
        ·
        edit-2
        1 year ago

        A sample is a fundamental part of a song’s output, not just its input. If LLMs are changing the input’s work to a high enough degree is it not protected as a transformative work?

        • treefrog@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          4
          ·
          edit-2
          1 year ago

          it’s more like a collage of everyone’s words. it doesn’t make anything creative because ot doesn’t have a body or life or real social inputs you could say. basically it’s just rearranging other people’s words.

          A song that’s nothing but samples. but so many samples it hides that fact. this is my view anyway.

          and only a handful of people are getting rich of the outputs.

          if we were in some kinda post capitalism economy or if we had UBI it wouldn’t bother me really. it’s not the artists ego I’m sticking up for, but their livelihood

    • AgentOrange@lemm.ee
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      7
      ·
      1 year ago

      To add to that, Harry Potter is the worst example to use here. There is no extra billion that JK Rowling needs to allow her to spend time writing more books.

      Copyright was meant to encourage authors to invest in their work in the same way that patents do. If you were going to argue about the issue of lifting content from books, you should be using books that need the protection of copyright, not ones that don’t.

      • TropicalDingdong@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        1
        ·
        1 year ago

        Copyright was meant

        I just don’t know that I agree that this line of reasoning is useful. Who cares what it was meant for? What is it now, currently and functionally, doing?

    • BURN@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      9
      ·
      1 year ago

      I’m a huge proponent of expanding individual copyright to extreme amounts (an individual is entitled to own the rights and usage rights to anything they create and can revoke those rights from anyone), but not in favor of the same thing for corporations.

      I hold the exact opposite view as you. As long as it’s a truly creative work (a writing, music, artwork, etc) then you own that specific implementation of the idea. Someone can make something else based on it, but you still own the original content.

      LLMs and companies using them need to pay for the content in some way. This is already done through licensing in other parallels, and will likely come to AI quickly.

      • TropicalDingdong@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        1 year ago

        To be clear, I’m still working through my thinking in this but it’s been something cooking for quite a while. I may not have all the words to express my meaning. For example, I think there are two routes to take in making my argument, one moral, the other technical. I’m not building on the morality of copyright, but focusing on the technical aspects of the limits of ideas.

        I suppose I would ask you then about your views in authoritarianism. Because it seems to be that with out an extremely authoritarian state, it would be basically impossible to enforce your view of copyright. Are you okay with that kind of pervasiveness?

        Also, from a technical perspective, how do you propose this view of copyright be applied? This is kind of towards the broader point I’m thinking I believe in. It’s not just that copyright laws are epifaci ridiculous, they are also technically almost unenforceable in their modern extremist interpretation with out an extremely pervasive form of surveillance.

        • BURN@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          6
          ·
          edit-2
          1 year ago

          Easy. The same way we already do it. We enforce music licensing, video licensing and other ip licensing. It’s been done. All this would do is extend those rights to the individual and remove them from corporations. Work product can be owned by companies, but not indefinitely. Individuals should always be in control of their creations.

          Restrictions would more or less be strictly commercial, to where hobbyists wouldn’t be impacted, but as soon as it’s used to make money the original creators are owed as part of it.

          It wouldn’t be any harder than it is now, as long as copyright is proved. (Hey look, this is the first time I’ve found an actual use of NFTs). In general anything being done for momentary gain is already monitored and surveilled, so this wouldn’t be a change there either.

          Edit: Also most of us already live in authoritarian states. This won’t really change anything. Big corps already enforce their copyright when it makes monetary sense and are actively trolling for unauthorized uses.

          • TropicalDingdong@lemmy.world
            link
            fedilink
            English
            arrow-up
            8
            ·
            1 year ago

            It wouldn’t be any harder than it is now, as long as copyright is proved. (Hey look, this is the first time I’ve found an actual use of NFTs). In general anything being done for momentary gain is already monitored and surveilled, so this wouldn’t be a change there either.

            Personally, I think you are describing a dystopian, authoritarian landscape which will be devoid of any real creativity or interesting ideas. I’m a believer that all ideas are free to be stolen, copied, improved upon; that imitation of ideas is a fundamental human right, and fundamental to what it means to be human. Likewise, I think our social and media landscape would be much poorer without this right. I don’t think any one has the inherent right to profit off of an idea.

            • BURN@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              5
              ·
              1 year ago

              I feel the exact opposite. There’s no reason for me to create anything if someone else can come along and steal it. Eliminating copyright will bring your dystopian landscape where nobody shares any sort of art or creative work because someone else will steal it.

              What motivation is there for creatives if you’re just telling them their work has no implicit value and anyone else can come along and reappropriate it for whatever they’d like?

              • TropicalDingdong@lemmy.world
                link
                fedilink
                English
                arrow-up
                5
                arrow-down
                1
                ·
                1 year ago

                I feel the exact opposite. There’s no reason for me to create anything if someone else can come along and steal it. Eliminating copyright will bring your dystopian landscape where nobody shares any sort of art or creative work because someone else will steal it.

                This is great because I think you are totally correct in your sentiment that we believe oppositely. I see art created only for the purpose of profit as drivel; true art is an expression of the self. If the only reason you make art is for profit, you aren’t an artist, you are an employee.

                • BURN@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  arrow-down
                  2
                  ·
                  1 year ago

                  That’s a great theory and all, but it’s not even money. I make no money from my photos, but I also refrain from posting any of them because I’d rather they not be used for AI training. Same with any music I create and I’m getting there with my code.

                  The nobility of art has always been in question, and it’s consistently been proven that artists who aren’t compensated for their work also tend to create less.

                  This is also not explicitly about profit. If I write a song and then it’s used at a hate rally, I currently have no recourse. They’re not making money from that application (directly), but they are using my creation to promote something I don’t agree with.

                  I’m curious to know if you’re an artist yourself, as it’s very contrary to the opinions from other creatives I know.

              • kmkz_ninja@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                1
                ·
                1 year ago

                I assume you’re against the communal and collective culture that modders for games enjoy?

                I assume you also believe no technological innovations are produced in America anymore since China would simply steal it.

                • BURN@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  1 year ago

                  Nowhere did I say derivative works are not ok. If a game maker explicitly forbids using modded versions of their game, I think that should be up to them. Games that have vibrant modding communities are almost always at least partially supported by the developer anyways.

                  My points are individual copyright anyways, not corporate. With increasing individual protections I also propose decreasing corporate copyright protection.

                  I believe that China makes 90% of the same product for 80% of the price after ripping off their American counterparts. But that’s also entirely off topic and really has nothing to do with this. Art/Creative Works are entirely different than physical goods.

      • SkyezOpen@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        3
        ·
        1 year ago

        I hold whatever view makes George Lucas stop digitally remastering the original trilogy.

  • fubo@lemmy.world
    link
    fedilink
    English
    arrow-up
    109
    arrow-down
    18
    ·
    edit-2
    1 year ago

    If I memorize the text of Harry Potter, my brain does not thereby become a copyright infringement.

    A copyright infringement only occurs if I then reproduce that text, e.g. by writing it down or reciting it in a public performance.

    Training an LLM from a corpus that includes a piece of copyrighted material does not necessarily produce a work that is legally a derivative work of that copyrighted material. The copyright status of that LLM’s “brain” has not yet been adjudicated by any court anywhere.

    If the developers have taken steps to ensure that the LLM cannot recite copyrighted material, that should count in their favor, not against them. Calling it “hiding” is backwards.

  • Blapoo@lemmy.ml
    link
    fedilink
    English
    arrow-up
    102
    arrow-down
    12
    ·
    1 year ago

    We have to distinguish between LLMs

    • Trained on copyrighted material and
    • Outputting copyrighted material

    They are not one and the same

    • Even_Adder@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      35
      arrow-down
      9
      ·
      1 year ago

      Yeah, this headline is trying to make it seem like training on copyrighted material is or should be wrong.

      • scv@discuss.online
        link
        fedilink
        English
        arrow-up
        28
        arrow-down
        3
        ·
        1 year ago

        Legally the output of the training could be considered a derived work. We treat brains differently here, that’s all.

        I think the current intellectual property system makes no sense and AI is revealing that fact.

      • TropicalDingdong@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        arrow-down
        11
        ·
        1 year ago

        I think this brings up broader questions about the currently quite extreme interpretation of copyright. Personally I don’t think its wrong to sample from or create derivative works from something that is accessible. If its not behind lock and key, its free to use. If you have a problem with that, then put it behind lock and key. No one is forcing you to share your art with the world.

        • Bogasse@lemmy.world
          link
          fedilink
          English
          arrow-up
          9
          arrow-down
          2
          ·
          1 year ago

          Most books are actually locked behind paywalls and not free to use? Or maybe I don’t understand what you meant?

        • Railcar8095@lemm.ee
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          1 year ago

          Following that, if a sailor is the sea were to put a copy of a protected book on the internet and ChatGPT was trained on it, how that argument would go? The copyright owner didn’t place it there, so it’s not “their decision”. And savvy people can make sure it’s accessible if they want to.

          My belief is, if they can use all non locked data for free, then the model should be shared for free too and it’s outputs shouldn’t be subject to copyright. Just for context

      • Jumper775@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Legally they will decide it is wrong, so it doesn’t matter. Power is in money and those with the copyrights have the money.

    • Tetsuo@jlai.lu
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      1 year ago

      Output from an AI has just been recently considered as not copyrightable.

      I think it stemmed from the actors strikes recently.

      It was stated that only work originating from a human can be copyrighted.

      • Anders429@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        Output from an AI has just been recently considered as not copyrightable.

        Where can I read more about this? I’ve seen it mentioned a few times, but never with any links.

        • Even_Adder@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          1
          ·
          1 year ago

          They clearly only read the headline If they’re talking about the ruling that came out this week, that whole thing was about trying to give an AI authorship of a work generated solely by a machine and having the copyright go to the owner of the machine through the work-for-hire doctrine. So an AI itself can’t be authors or hold a copyright, but humans using them can still be copyright holders of any qualifying works.

    • TwilightVulpine@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      Should we distinguish it though? Why shouldn’t (and didn’t) artists have a say if their art is used to train LLMs? Just like publicly displayed art doesn’t provide a permission to copy it and use it in other unspecified purposes, it would be reasonable that the same would apply to AI training.

      • Blapoo@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Ah, but that’s the thing. Training isn’t copying. It’s pattern recognition. If you train a model “The dog says woof” and then ask a model “What does the dog say”, it’s not guaranteed to say “woof”.

        Similarly, just because a model was trained on Harry Potter, all that means is it has a good corpus of how the sentences in that book go.

        Thus the distinction. Can I train on a comment section discussing the book?

  • Skanky@lemmy.world
    link
    fedilink
    English
    arrow-up
    68
    arrow-down
    1
    ·
    1 year ago

    Vanilla Ice had it right all along. Nobody gives a shit about copyright until big money is involved.

          • kmkz_ninja@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            3
            ·
            1 year ago

            His point is equally valid. Can an artist be compelled to show the methods of their art? Is it as right to force an artist to give up methods if another artist thinks they are using AI to derive copyrighted work? Haven’t we already seen that LLMs are really poor at evaluating whether or not something was created by an LLM? Wouldn’t making strong laws on such an already opaque and difficult-to-prove issue be more of a burden on smaller artists vs. large studios with lawyers-in-tow.

      • Asuka@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        1 year ago

        If I read Harry Potter and wrote a novel of my own, no doubt ideas from it could consciously or subconsciously influence it and be incorporated into it. Hey is that any different from what an LLM does?

      • TwilightVulpine@lemmy.world
        link
        fedilink
        English
        arrow-up
        19
        arrow-down
        6
        ·
        1 year ago

        You joke but AI advocates seem to forget that people have fundamentally different rights than tools and objects. A photocopier doesn’t get the right to “memorize” and “learn” from a text that a human being does. As much as people may argue that AIs work different, AIs are still not people.

        And if they ever become people, the situation will be much more complicated than whether they can imitate some writer. But we aren’t there yet, even their advocates just uses them as tools.

    • CoderKat@lemm.ee
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      1 year ago

      It’s honestly a good question. It’s perfectly legal for you to memorize a copyrighted work. In some contexts, you can recite it, too (particularly the perilous fair use). And even if you don’t recite a copyrighted work directly, you are most certainly allowed to learn to write from reading copyrighted books, then try to come up with your own writing based off what you’ve read. You’ll probably try your best to avoid copying anyone, but you might still make mistakes, simply by forgetting that some idea isn’t your own.

      But can AI? If we want to view AI as basically an artificial brain, then shouldn’t it be able to do what humans can do? Though at the same time, it’s not actually a brain nor is it a human. Humans are pretty limited in what they can remember, whereas an AI could be virtually boundless.

      If we’re looking at intent, the AI companies certainly aren’t trying to recreate copyrighted works. They’ve actively tried to stop it as we can see. And LLMs don’t directly store the copyrighted works, either. They’re basically just storing super hard to understand sets of weights, which are a challenge even for experienced researchers to explain. They’re not denying that they read copyrighted works (like all of us do), but arguably they aren’t trying to write copyrighted works.

    • TropicalDingdong@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      11
      ·
      1 year ago

      Exactly. If I write some Loony toons fan fiction, Warner doesn’t own that. This ridiculous view of copyright (that’s not being challenged in the public discourse) needs to be confronted.

      • wmassingham@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        2
        ·
        edit-2
        1 year ago

        They can own it, actually. If you use the characters of Bugs Bunny, etc., or the setting (do they have a canonical setting?) then Warner does own the rights to the material you’re using.

        For example, see how the original Winnie the Pooh material just entered public domain, but the subsequent Disney versions have not. You can use the original stuff (see the recent horror movie for an example of legal use) but not the later material like Tigger or Pooh in a red shirt.

        Now if your work is satire or parody, then you can argue that it’s fair use. But generally, most companies don’t care about fan fiction because it doesn’t compete with their sales. If you publish your Harry Potter fan fiction on Livejournal, it wouldn’t be worth the money to pay the lawyers to take it down. But if you publish your Larry Cotter and the Wizard’s Rock story on Amazon, they’ll take it down because now it’s a competing product.

          • Sethayy@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            Can’t but theyre pretty open on how they trained the model, so like almost admitted guilt (though they werent hosting the pirated content, its still out there and would be trained on). Cause unless they trained it on a paid Netflix account, there’s no way to get it legally.

            Idk where this lands legally, but I’d assume not in their favour

    • SubArcticTundra@lemmy.ml
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      1 year ago

      No, because you paid for a single viewing of that content with your cinema ticket. And frankly, I think that the price of a cinema ticket (= a single viewing, which it was) should be what OpenAI should be made to pay.

  • rosenjcb@lemmy.world
    link
    fedilink
    English
    arrow-up
    44
    arrow-down
    3
    ·
    edit-2
    1 year ago

    The powers that be have done a great job convincing the layperson that copyright is about protecting artists and not publishers. It’s historically inaccurate and you can discover that copyright law was pushed by publishers who did not want authors keeping second hand manuscripts of works they sold to publishing companies.

    Additional reading: https://en.m.wikipedia.org/wiki/Statute_of_Anne

  • Sentau@lemmy.one
    link
    fedilink
    English
    arrow-up
    45
    arrow-down
    7
    ·
    edit-2
    1 year ago

    I think a lot of people are not getting it. AI/LLMs can train on whatever they want but when then these LLMs are used for commercial reasons to make money, an argument can be made that the copyrighted material has been used in a money making endeavour. Similar to how using copyrighted clips in a monetized video can make you get a strike against your channel but if the video is not monetized, the chances of YouTube taking action against you is lower.

    Edit - If this was an open source model available for use by the general public at no cost, I would be far less bothered by claims of copyright infringement by the model

    • Tyler_Zoro@ttrpg.network
      link
      fedilink
      English
      arrow-up
      30
      arrow-down
      4
      ·
      1 year ago

      AI/LLMs can train on whatever they want but when then these LLMs are used for commercial reasons to make money, an argument can be made that the copyrighted material has been used in a money making endeavour.

      And does this apply equally to all artists who have seen any of my work? Can I start charging all artists born after 1990, for training their neural networks on my work?

      Learning is not and has never been considered a financial transaction.

      • maynarkh@feddit.nl
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        1 year ago

        Actually, it has. The whole consept of copyright is relatively new, and corporations absolutely tried to have people who learned proprietary copyrighted information not be able to use it in other places.

        It’s just that labor movements got such non-compete agreements thrown out of our society, or at least severely restricted on humanitarian grounds. The argument is that a human being has the right to seek happiness by learning and using the proprietary information they learned to better their station. By the way, this needed a lot of violent convincing that we have this.

        So yes, knowledge and information learned is absolutely withing the scope of copyright as it stands, it’s only that the fundamental rights that humans have override copyright. LLMs (and companies for that matter) do not have such fundamental rights.

        Copyright by the way is stupid in its current implementation, but OpenAI and ChatGPT does not get to get out of it IMO just because it’s “learning”. We humans ourselves are only getting out of copyright because of our special legal status.

        • Even_Adder@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          You kind of do. Fair use protects reverse engineering, indexing for search engines, and other forms of analysis that create new knowledge about works or bodies of works. These models are meant to be used to create new works which is where the “generative” part of generative models comes in, and the fact that the models consist only of original analysis of the training data in comparison with one another means as your tool, they are protected.

          • maynarkh@feddit.nl
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            https://en.wikipedia.org/wiki/Fair_use

            Fair use only works if what you create is to reflect on the original and not to supercede it. For example if ChatGPT gobbled up a work on the reproduction of firefies, if you ask it a question about the topic and it just answers, that’s not fair use since you made the original material redundant. If it did what a search engine would do and just tell you that “here’s where you can find it, you might have to pay for it”, that’s fair use. This is of course US law, so it may be different everywhere, and US law is weird so the courts may say anything.

            That’s the gist of it, fair use is fine as long as you are only creating new information and only use the copyrighted old work as is absolutely necessary for your new information to make sense, and even then, you can’t use so much of the copyrighted work that it takes away from the value of it.

            Otherwise if I pirated a movie and put subtitles on it, I could argue it’s fair use since it’s new information and transformative. If I released the subtitles separately, that would be a strong argument for fair use. If I included a 10 sec clip in it to show my customers what the thing is like in action, then that may be argued. If it’s the pivotal 10 seconds that spoils the whole movie, that’s not fair use, since I took away from the value of the original.

            ChatGPT ate up all of these authors’ works and for some, it may take away from the value they have created. It’s telling that OpenAI is trying to be shifty about it as well. If they had a strong argument, they’d want to settle it as soon as possibe as this is a big stormcloud on their company IP value. And yeah it sucks that people created something that may turn out to not be legal because some people have a right to profit from some pieces of capital assets, but that’s the story of the world the past 50 years.

      • zbyte64@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        8
        ·
        1 year ago

        Ehh, “learning” is doing a lot of lifting. These models “learn” in a way that is foreign to most artists. And that’s ignoring the fact the humans are not capital. When we learn we aren’t building a form a capital; when models learn they are only building a form of capital.

        • Tyler_Zoro@ttrpg.network
          link
          fedilink
          English
          arrow-up
          9
          arrow-down
          4
          ·
          1 year ago

          Artists, construction workers, administrative clerks, police and video game developers all develop their neural networks in the same way, a method simulated by ANNs.

          This is not, “foreign to most artists,” it’s just that most artists have no idea what the mechanism of learning is.

          The method by which you provide input to the network for training isn’t the same thing as learning.

          • Sentau@lemmy.one
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            2
            ·
            1 year ago

            Artists, construction workers, administrative clerks, police and video game developers all develop their neural networks in the same way, a method simulated by ANNs.

            Do we know enough about how our brain functions and how neural networks functions to make this statement?

            • Yendor@reddthat.com
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 year ago

              Do we know enough about how our brain functions and how neural networks functions to make this statement?

              Yes, we do. Take a university level course on ML if you want the long answer.

              • Sentau@lemmy.one
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 year ago

                My friends who took computer science told me that we don’t totally understand how machine learning algorithms work. Though this conversation was a few years ago in college. Will have to ask them again

          • zbyte64@lemmy.blahaj.zone
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            1 year ago

            ANNs are not the same as synapses, analogous yes, but different mathematically even when simulated.

            • Prager_U@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 year ago

              This is orthogonal to the topic at hand. How does the chemistry of biological synapses alone result in a different type of learned model that therefore requires different types of legal treatment?

              The overarching (and relevant) similarity between biological and artificial nets is the concept of connectionist distributed representations, and the projection of data onto lower dimensional manifolds. Whether the network achieves its final connectome through backpropagation or a more biologically plausible method is beside the point.

        • Yendor@reddthat.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          When we learn we aren’t building a form a capital; when models learn they are only building a form of capital.

          What do you think education is? I went to university to acquire knowledge and train my skills so that I could later be paid for those skills. That was literally building my own human capital.

    • FMT99@lemmy.world
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      3
      ·
      1 year ago

      But wouldn’t this training and the subsequent output be so transformative that being based on the copyrighted work makes no difference? If I read a Harry Potter book and then write a story about a boy wizard who becomes a great hero, anyone trying to copyright strike that would be laughed at.

      • Sentau@lemmy.one
        link
        fedilink
        English
        arrow-up
        6
        ·
        edit-2
        1 year ago

        Your probability of getting copyright strike depends on two major factors -

        • How similar your story is to Harry Potter.

        • If you are making money of that story.

        • uis@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 year ago

          It doesn’t matter how similar. Copyright doesn’t protect meaning, copyright protect form. If you read HP and then draw a picture of it, said picture becomes its separate work, not even derivative.

    • 1ird@notyour.rodeo
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      3
      ·
      edit-2
      1 year ago

      How is it any different from someone reading the books, being influenced by them and writing their own book with that inspiration? Should the author of the original book be paid for sales of the second book?

      • Sentau@lemmy.one
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        1
        ·
        1 year ago

        Again that is dependent on how similar the two books are. If I just change the names of the characters and change the grammatical structure and then try to sell the book as my own work, I am infringing the copyright. If my book has a different story but the themes are influenced by another book, then I don’t believe that is copyright infringement. Now where the line between infringement and no infringement lies is not something I can say and is a topic for another discussion

        • uis@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          edit-2
          1 year ago

          change the grammatical structure

          I.e. change form. Copyright protect form, thus in coutries that judge either by spirit or letter of law instead of size of moneybags this is ok.

    • Affine Connection@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      1 year ago

      using copyrighted clips in a monetized video can make you get a strike against your channel

      Much of the time, the use of very brief clips is clearly fair use, but the people who issue DMCA claims don’t care.

    • ciwolsey@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      2
      ·
      edit-2
      1 year ago

      You could run a paid training course using a paid-for book, that doesn’t mean you’re breaking copyright.

    • Schadrach@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      I think a lot of people are not getting it. AI/LLMs can train on whatever they want but when then these LLMs are used for commercial reasons to make money, an argument can be made that the copyrighted material has been used in a money making endeavour.

      Only in the same way that I could argue that if you’ve ever watched any of the classic Disney animated movies then anything you ever draw for the rest of your life infringes on Disney’s copyright, and if you draw anything for money then the Disney animated movies you have seen in your life have been used in a money making endeavor. This is of course ridiculous and no one would buy that argument, but when you replace a human doing it with a machine doing essentially the same thing (observing and digesting a bunch of examples of a given kind of work, and producing original works of the general kind that meet a given description) suddenly it’s different, for some nebulous reason that mostly amounts to creatives who believed their jobs could not at least in part be automated away trying to get explicit protection from their jobs being at least in part automated away.

      • Corkyskog@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        10
        ·
        1 year ago

        They used to be a non profit, that immediately turned it into a for profit when their product was refined. They took a bunch of people’s effort whether it be training materials or training Monkeys using the product and then slapped a huge price tag on it.

        • Touching_Grass@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          3
          ·
          1 year ago

          I didn’t know they were a non profit. I’m good as long as they keep the current model. Release older models free to use while charging for extra or latest features

      • BURN@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        4
        ·
        1 year ago

        They’re stealing a ridiculous amount of copyrighted works to use to train their model without the consent of the copyright holders.

        This includes the single person operations creating art that’s being used to feed the models that will take their jobs.

        OpenAI should not be allowed to train on copyrighted material without paying a licensing fee at minimum.

  • paraphrand@lemmy.world
    link
    fedilink
    English
    arrow-up
    39
    arrow-down
    12
    ·
    1 year ago

    Why are people defending a massive corporation that admits it is attempting to create something that will give them unparalleled power if they are successful?

    • bamboo@lemm.ee
      link
      fedilink
      English
      arrow-up
      29
      arrow-down
      5
      ·
      1 year ago

      Mostly because fuck corporations trying to milk their copyright. I have no particular love for OpenAI (though I do like their product), but I do have great distain for already-successful corporations that would hold back the progress of humanity because they didn’t get paid (again).

        • bamboo@lemm.ee
          link
          fedilink
          English
          arrow-up
          5
          ·
          1 year ago

          Perhaps, and when that happens I would be equally disdainful towards them.

        • LifeInMultipleChoice@lemmy.ml
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          1 year ago

          In the United States there was a judgement made the other day saying that works created soley by AI are not copyright-able. So that that would put a speed bumb there.
          I may have misunderstood what you though.

          • msage@programming.dev
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            Yeah, they might not copyright it, but after it becomes the ‘one true AI’, it will be at the hands of Microsoft, so please do not act friendly towards them.

            It will turn on you just like every private company has.

            (don’t mean specifically you, but everyone generally)

          • uis@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            1 year ago

            Huh. Doesn’t this means technically AI cannot do copyright infringement.

            • LifeInMultipleChoice@lemmy.ml
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 year ago

              Nah, it would mean that you cannot copyright a work created by an AI, such as a piece of art.

              E.g. if you tell it to draw you a donkey carting avocados, the picture can be used by anyone from what I understand.

              • uis@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 year ago

                you cannot copyright a work created by an AI, such as a piece of art.

                That’s what I said. Copyright infringement is when there is another copyrightable object that is copy of first object. AI is not witin copyright area. You can’t copyright it, but also you can’t be sued for copyright infringement too.

                if you tell it to draw you a donkey carting avocados, the picture can be used by anyone from what I understand.

                Yes. Same for Public Domain, but PD is another status. PD applies only to copyrightable work.

        • uis@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          1 year ago

          It’s like argument “but new politicians will steal more” that I hear in Russia from people who protect Putin

          • msage@programming.dev
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            It’s literally not, wtf.

            Do not let any private entity to get overwhelming majority on anything period.

            But do not kid yourself that Microsoft will let OpenAI do anything for public once it gets big enough.

            OpenAI is open only in name after they rolled back all the promises of being for everyone.

            • uis@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              1 year ago

              That’s my entire point. It’s not who, but how long.

              Also Microsoft plays both sides here. OpenAI vs copyright is wrong question. There’s more: both are status-quo. Both are for keeping corporate ownership of ideas.

      • assassin_aragorn@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        3
        ·
        1 year ago

        There’s a massive difference though between corporations milking copyright and authors/musicians/artists wanting their copyright respected. All I see here is a corporation milking copyrighted works by creative individuals.

    • stappern@lemmy.one
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      2
      ·
      1 year ago

      i think trying to keep this cat in the bag is jsut a waste of time. plus i dont respect copyright sooo…

    • Whimsical@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      1 year ago

      The dream would be that they manage to make their own glorious free & open source version, so that after a brief spike in corporate profit as they fire all their writers and artists, suddenly nobody needs those corps anymore because EVERYONE gets access to the same tools - if everyone has the ability to churn out massive content without hiring anyone, that theoretically favors those who never had the capital to hire people to begin with, far more than those who did the hiring.

      Of course, this stance doesn’t really have an answer for any of the other problems involved in the tech, not the least of which is that there’s bigger issues at play than just “content”.

      • otherbastard@lemm.ee
        link
        fedilink
        English
        arrow-up
        20
        arrow-down
        10
        ·
        1 year ago

        An LLM is not a person, it is a product. It doesn’t matter that it “learns” like a human - at the end of the day, it is a product created by a corporation that used other people’s work, with the capacity to disrupt the market that those folks’ work competes in.

        • Touching_Grass@lemmy.world
          link
          fedilink
          English
          arrow-up
          12
          arrow-down
          7
          ·
          edit-2
          1 year ago

          And it should be able to freely use anything that’s available to it. These massive corporations and entities have exploited all the free spaces to advertise and sell us their own products and are now sour.

          If they had their way they are going to lock up much more of the net behind paywalls. Everybody should be with the LLMs on this.

          • otherbastard@lemm.ee
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            3
            ·
            1 year ago

            You are somehow conflating “massive corporation” with “independent creator,” while also not recognizing that successful LLM implementations are and will be run by massive corporations, and eventually plagued with ads and paywalls.

            People that make things should be allowed payment for their time and the value they provide their customer.

            • Touching_Grass@lemmy.world
              link
              fedilink
              English
              arrow-up
              5
              arrow-down
              3
              ·
              edit-2
              1 year ago

              People are paid. But they’re greedy and expect far more compensation then they deserve. In this case they should not be compensated for having an LLM ingest their work work if that work was legally owned or obtained

          • assassin_aragorn@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            3
            ·
            1 year ago

            Except the massive corporations and entities are the ones getting rich on this. They’re seeking to exploit the work of authors and musicians and artists.

            Respecting the intellectual property of creative workers is the anti corporate position here.

            • uis@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 year ago

              Except corporations have infinitely more resources(money, lawyers) compared to people who create. Take Jarek Duda(mathematician from Poland) and Microsoft as an example. He created new compression algorythm, and Microsoft came few years later and patented it in Britain AFAIK. To file patent contest and prior art he needs 100k£.

              • assassin_aragorn@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 year ago

                I think there’s an important distinction to make here between patents and copyright. Patents are the issue with corporations, and I couldn’t care less if AI consumed all that.

                • uis@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  1 year ago

                  And for copyright there is no possible way to contest it. Also when copyright expires there is no guarantee it will be accessable by humanity. Patents are bad, copyright even worse.

            • uis@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              1 year ago

              There is nothing anti corporate if result can be alienated.

          • Cosmic Cleric@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            6
            ·
            1 year ago

            If they had their way they are going to lock up much more of the net behind paywalls.

            This!

            When the Internet was first a thing corpos tried to put everything behind paywalls, and we pushed back and won.

            Now, the next generation is advocating to put everything behind a paywall again?

          • scarabic@lemmy.world
            link
            fedilink
            English
            arrow-up
            12
            arrow-down
            7
            ·
            1 year ago

            First, we don’t have to make AI.

            Second, it’s not about it being unable to learn, it’s about the fact that they aren’t paying the people who are teaching it.

              • FatCrab@lemmy.one
                link
                fedilink
                English
                arrow-up
                7
                arrow-down
                3
                ·
                1 year ago

                The reasoning that claims training a generative model is infringing IP would still mean a robot going into a library with a card it has to optically read all the books there to create the same generative model would still be infringing IP.

              • AncientMariner@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                arrow-down
                1
                ·
                1 year ago

                Humans can judge information make decisions on it and adapt it. AI mostly just looks at what is statistically what is most likely based on training data. If 1 piece of data exists, it will copy, not paraphrase. Example was from I think copilot where it just printed out the code and comments from an old game verbatim. I think Quake2. It isn’t intelligence, it is statistical copying.

                • uis@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  arrow-down
                  2
                  ·
                  1 year ago

                  Well, mathematics cannot be copyrighted. In most countries at least.

            • stappern@lemmy.one
              link
              fedilink
              English
              arrow-up
              10
              arrow-down
              9
              ·
              1 year ago

              yeah lets not explore this technology because it might hurt some copyrights holders

              LOOOOL fuck em

              • assassin_aragorn@lemmy.world
                link
                fedilink
                English
                arrow-up
                4
                arrow-down
                4
                ·
                1 year ago

                because it might hurt authors and musicians and artists and other creative workers

                FTFY. Corporations shouldn’t be making a fucking dime from any of these works without fairly paying the creators.

    • SCB@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      3
      ·
      1 year ago

      Leftists hating on AI while dreaming of post-scarcity will never not be funny

    • Crozekiel@lemmy.zip
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      10
      ·
      1 year ago

      AI is the new fan boy following since it became official that nfts are all fucking scams. They need a new technological God to push to feel superior to everyone else…

  • Uriel238 [all pronouns]@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    31
    arrow-down
    8
    ·
    edit-2
    1 year ago

    Training AI on copyrighted material is no more illegal or unethical than training human beings on copyrighted material (from library books or borrowed books, nonetheless!). And trying to challenge the veracity of generative AI systems on the notion that it was trained on copyrighted material only raises the specter that IP law has lost its validity as a public good.

    The only valid concern about generative AI is that it could displace human workers (or swap out skilled jobs for menial ones) which is a problem because our society recognizes the value of human beings only in their capacity to provide a compensation-worthy service to people with money.

    The problem is this is a shitty, unethical way to determine who gets to survive and who doesn’t. All the current controversy about generative AI does is kick this can down the road a bit. But we’re going to have to address soon that our monied elites will be glad to dispose of the rest of us as soon as they can.

    Also, amateur creators are as good as professionals, given the same resources. Maybe we should look at creating content by other means than for-profit companies.

    • Draedron@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      2
      ·
      1 year ago

      Also this argument if replacing human workers has been made with every single industrial revolution.

        • Draedron@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          1 year ago

          The point is fighting back against it is stupid. The point is people still have work. New technology opens up new was to work with new jobs.

  • RadialMonster@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    3
    ·
    1 year ago

    what if they scraped a whole lot of the internet, and those excerpts were in random blogs and posts and quotes and memes etc etc all over the place? They didnt injest the material directly, or knowingly.

    • beetus@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      3
      ·
      1 year ago

      Not knowing something is a crime doesn’t stop you from being prosecuted for committing it.

      It doesn’t matter if someone else is sharing copyright works and you don’t know it and use it in ways that infringes on that copyright.

      “I didn’t know that was copyrighted” is not a valid defence.

      • stewsters@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        3
        ·
        1 year ago

        Is reading a passage from a book actually a crime though?

        Sure, you could try to regenerate the full text from quotes you read online, much like you could open a lot of video reviews and recreate larger portions of the original text, but you would not blame the video editing program for that, you would blame the one who did it and decided to post it online.

    • chemical_cutthroat@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      8
      ·
      1 year ago

      That’s why this whole argument is worthless, and why I think that, at its core, it is disingenuous. I would be willing to be a steak dinner that a lot of these lawsuits are just fishing for money, and the rest are set up by competition trying to slow the market down because they are lagging behind. AI is an arms race, and it’s growing so fast that if you got in too late, you are just out of luck. So, companies that want in are trying to slow down the leaders, at best, and at worst they are trying to make them publish their training material so they can just copy it. AI training models should be considered IP, and should be protected as such. It’s like trying to get the Colonel’s secret recipe by saying that all the spices that were used have been used in other recipes before, so it should be fair game.

      • Kujo@lemm.ee
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        1
        ·
        1 year ago

        If training models are considered IP then shouldn’t we allow other training models to view and learn from the competition? If learning from other IPs that are copywritten is okay, why should the training models be treated different?

        • chemical_cutthroat@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          3
          ·
          1 year ago

          They are allegedly learning from copyrighted material, there is no actual proof that they have been trained on the actual material, or just snippets that have been published online. And it would be illegal for them to be trained on full copyrighted materials, because it is protected by laws that prevent that.

  • ClamDrinker@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    edit-2
    1 year ago

    This is just OpenAI covering their ass by attempting to block the most egregious and obvious outputs in legal gray areas, something they’ve been doing for a while, hence why their AI models are known to be massively censored. I wouldn’t call that ‘hiding’. It’s kind of hard to hide it was trained on copyrighted material, since that’s common knowledge, really.

  • Technoguyfication@lemmy.ml
    link
    fedilink
    English
    arrow-up
    35
    arrow-down
    15
    ·
    1 year ago

    People are acting like ChatGPT is storing the entire Harry Potter series in its neural net somewhere. It’s not storing or reproducing text in a 1:1 manner from the original material. Certain material, like very popular books, has likely been interpreted tens of thousands of times due to how many times it was reposted online (and therefore how many times it appeared in the training data).

    Just because it can recite certain passages almost perfectly doesn’t mean it’s redistributing copyrighted books. How many quotes do you know perfectly from books you’ve read before? I would guess quite a few. LLMs are doing the same thing, but on mega steroids with a nearly limitless capacity for information retention.

    • abbotsbury@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      8
      ·
      1 year ago

      but on mega steroids with a nearly limitless capacity for information retention.

      That sounds like redistributing copyrighted books

    • Hup!@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      7
      ·
      edit-2
      1 year ago

      Nope people are just acting like ChatGPT is making commercial use of the content. Knowing a quote from a book isn’t copyright infringement. Selling that quote is. Also it doesn’t need to be content stored 1:1 somewhere to be infringement. That misses the point. If you’re making money of a synopsis you wrote based on imperfect memory and in your own words it’s still copyright infringment until you sign a licensing agreement with JK. Even transforming what you read into a different medium like a painting or poetry cam infinge the original authors copyrights.

      Now mull that over and tell us what you think about modern copyright laws.

      • Ronath@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        Just adding, that, outside of Rowling, who I believe has a different contract than most authors due to the expanded Wizarding World and Pottermore, most authors themselves cannot quote their own novels online because that would be publishing part of the novel digitally and that’s a right they’ve sold to their publisher. The publisher usually ignores this as it creates hype for the work, but authors are careful not to abuse it.

      • stappern@lemmy.one
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        2
        ·
        1 year ago

        it’s still copyright infringment until you sign a licensing agreement with JK.

        no its not.

        • Corkyskog@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          1 year ago

          Yeah I don’t see how that’s true. If that were true wouldn’t every board walk tee shirt shop be sued into oblivion from Nickelodeon over Sponge Bob?

    • Teritz@feddit.de
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      18
      ·
      1 year ago

      Using Copyrighted Work as Art as example still influences the AI which their make Profit from.

      If they use my Works then they need to pay thats it.

      • coheedcollapse@lemmy.world
        link
        fedilink
        English
        arrow-up
        38
        arrow-down
        9
        ·
        1 year ago

        Still kinda blows my mind how like the most socialist people I know (fellow artists) turned super capitalist the second a tool showed like an inkling of potential to impact their bottom line.

        Personally, I’m happy to have my work scraped and permutated by systems that are open to the public. My biggest enemy isn’t the existence of software scraping an open internet, it’s the huge companies who see it as a way to cut us out of the picture.

        If we go all copyright crazy on the models for looking at stuff we’ve already posted openly on the internet, the only companies with access to the tools will be those who already control huge amounts of data.

        I mean, for real, it’s just mind-blowing seeing the entire artistic community pretty much go full-blown “Metallica with the RIAA” after decades of making the “you wouldn’t download a car” joke.

        • Sir_Kevin@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          16
          arrow-down
          6
          ·
          1 year ago

          Fuckin preach! I feel like I’m surrounded by children that didn’t live through the many other technologies that have came along and changed things. People lost their shit when photoshop became mainstream, when music started using samples, etc. AI is here to stay. These same people are probably listening to autotuned music all day while they complain on the internet about AI looking at their art.

        • angstylittlecatboy@reddthat.com
          link
          fedilink
          English
          arrow-up
          11
          arrow-down
          5
          ·
          edit-2
          1 year ago

          I feel like a lot of internet people (not even just socialists) go from seeing copyright as at best a compromise that allows the arts to have value under capitalism to treating it like a holy doctrine when the subject of LLMs comes up.

          Like, people who will say “piracy is always okay” will also say “ban AI, period” (and misrepresent organizations that want regulations on it’s use as wanting a full ban.)

          Like, growing up with an internet full of technically illegal content (or grey area at best) like fangames and YouTube Poops made me a lifelong copyright skeptic. It’s outright confusing to me when people take copyright as seriously as this.

        • dx1@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          8
          ·
          edit-2
          1 year ago

          Nobody would defend copyright if it wasn’t already in place, it’s a sick idea. They ask us to cut the field of human knowledge for private benefit. Now they want to destroy a new technology in its name. Greed knows no bounds.

          • Hildegarde@lemmy.world
            link
            fedilink
            English
            arrow-up
            10
            arrow-down
            2
            ·
            1 year ago

            I defend the idea of copyright. The first copyright law was in 1710, to protect authors from the printing press. Without copyright, whoever owned the printing press would sell copies of books with no obligation to pay the author. When copying art is trivial, the artist needs copyright protection in order to make a living creating art.

            There are major problems with modern copyrights. Like all things in capitalism it has been subverted to benefit the rich, but the core idea behind copyright is sound.

            These lawsuits are not to stop the development if generative AI. These lawsuits are to stop the unlicensed use of copyrighted works as AI training data.

            There are AI models that are only trained with licensed data. This doesn’t stop the development of AI.

            Artists should have the right to choose whether their work is used as training data. And they should be compensated fairly for it. That will be the case if these lawsuits succeed.

            • stappern@lemmy.one
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              6
              ·
              1 year ago

              press would sell copies of books with no obligation to pay the author

              can you imagine how faster knowledge would have traveled? what a waste of an opportunity

            • dx1@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              7
              ·
              edit-2
              1 year ago

              Ultimately it’s a propertarian scheme of ownership imposed onto the realm of concepts and ideas. The first person to successfully lay claim to an idea is given a monopoly on that idea for some number of years. A book, an invention, a melody. To secure profit for that individual, the entire rest of humanity is prevented access to the idea except under his terms, and the naturally free exchange of information is curtailed by statute to accomplish this, via the imposition of punishments for anyone who goes against this scheme. I do not think that’s defensible. That is to say, I don’t think humanity sees a net benefit from this way of doing things. Even some hypothetical 20-30% reduction in the generation of different kinds of creative works would be well offset by the benefit humanity sees from being able to access them, and the funds that would be going to the artist still could if people saw fit.

              Is this being used to stop the development of generative AI? Yes, literally the imprint on an AI of having parsed the works and understood them in some symbolic capacity, they want to curtail that. And the existing models that have already done that would likely be rendered illegal, setting the entire technology back a year or two.

              • Sentau@lemmy.one
                link
                fedilink
                English
                arrow-up
                4
                ·
                1 year ago

                In an ideal world without greed, you are right in saying that copyright is not beneficial for the human race as a whole. Unfortunately we don’t live in such a world. Look at what happened with insulin. The person invented it placed a ludicrously low priced patent of one dollar because he felt that it should be available cheaply to all who need and yet today in the US, insulin is a ridiculously expensive drug which many people struggle to afford. This is because while the inventor was not greedy and thought about the greater good, the pharmaceutical industry did not. They saw an opportunity to make money and are screwing people in the process

          • voluble@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            1 year ago

            Nobody would defend copyright if it wasn’t already in place

            I don’t know about that. Say you take a few years to write a handful of poems, and it turns out people in your neighborhood really like them. You compile the poems into a book, and sell it for $5, and it sells well. Seeing this, your neighbor buys one, copies it, and starts selling it one neighborhood over for $2, and representing themself as the author. I would think most people in that situation would want to say, ‘hey, that’s not fair’. I don’t think that’s sick or rooted in greed, copyright can be a check on greed.

            • dx1@lemmy.world
              link
              fedilink
              English
              arrow-up
              5
              arrow-down
              3
              ·
              1 year ago

              So thanks to copyright, we’re now living in a world where artists are fairly compensated and not exploited by large corporations acting as middlemen that have seized control of their creative works and used it for their own profit?

              • BURN@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                ·
                1 year ago

                More so than we would be without copyright at all

                Copyright needs to be extended for individuals and cut back for corporations. People should be allowed to own rights to their ip, but corps should have much higher levels of restrictions and how some knowledge must be shared.

          • assassin_aragorn@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            edit-2
            1 year ago

            So the people who generate and curate that knowledge don’t deserve to be compensated? Are you going to be a full time wikipedia editor then? Or does your “greed know no bounds”?

          • BURN@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            1 year ago

            I defend copyright. The original intent was to protect creators in order to foster more creativity. Most artists will have no incentive to create if their work can be reappropriated by a larger group to leverage it for monetary gain, which is directly being taken from the original creator.

            I’m a photographer. I’ve removed all my pictures from the internet and plan to never post more. I don’t want my work being used to train AI. Right now we have no choice in that matter, so the only option is to no longer share our work.

            • dx1@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              2
              ·
              1 year ago

              I’ve released tons of stuff and it’s under Creative Commons/public domain. I welcome people to share it or create derivative works.

              • BURN@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                ·
                1 year ago

                Cool. That’s a fine stance to have and one that plenty of other people will have too. I’m fine with actual people doing it. I’m not fine with AI. The point is the artist should have a choice if they’d like to allow training.

                The problem right now is we can’t control that. Everything is being used for AI training if you want it to be or not. If I could explicitly forbid use of it for AI training (that could be backed in court) I’d be more willing to post them again.

                Lemmy users are not an accurate representation of artists imo. This site skews extremely far left, to the points of such anti-corporate nonsense that I believe the majority of people just want to hurt anyone with more money than them as much as possible.

                • dx1@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  2
                  ·
                  1 year ago

                  The problem with trying to restrict AI from scanning the art and making conclusions about it is that it’s the same as trying to ban humans from creating art that’s inspired by other art. It’s the same process even. If the AI is actually producing one-for-one copies of their work, you might have a leg to stand on in terms of arguing the AI shouldn’t be compensated for creating those specifically, but it’s creating works that are just loosely influenced by seeing the original art.

      • stappern@lemmy.one
        link
        fedilink
        English
        arrow-up
        12
        arrow-down
        2
        ·
        1 year ago

        Nah,I enjoyed many things without paying a dime and now I use them for my work.

        • Teritz@feddit.de
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          4
          ·
          1 year ago

          As a Civilian Pirating is no Problem but if its a Company that behaves like they own their Neural Network to 100%.

          Piracy is gonna live as long Services are Bad for Average Joe,but these US Corps can afford to pay for this.