AI is going to kill us all

AI is going to kill us all

DMT Has Friends For Me Shirt $21.68

Yakub: World's Greatest Dad Shirt $21.68

DMT Has Friends For Me Shirt $21.68

  1. 6 months ago
    Anonymous

    https://www.greaterwrong.com/posts/c6uTNm5erRrmyJvvD/mapping-the-semantic-void-strange-goings-on-in-gpt-embedding

    • 6 months ago
      Anonymous

      Good read thanks anon

    • 6 months ago
      Anonymous

      >not i if ai i tod!
      heh

    • 6 months ago
      Anonymous

      >[note added 2022-12-19] Comments in a thread below clarify that in high-dimensional Euclidean spaces, the distances of a set of Gaussian-distributed vectors from any reference point will be normally distributed in a narrow band. So there’s nothing particularly special about the origin here
      That kinda destroys his whole intro

      • 6 months ago
        Anonymous

        >intro
        It's his entire thesis.

  2. 6 months ago
    Anonymous

    >People who aren't israelites or members of the British royal family
    what the frick did AI mean by this?

    • 6 months ago
      Anonymous

      How did AI even reach that information? What did they feed it?

    • 6 months ago
      Anonymous

      >what the frick did AI mean by this?
      Nothing. It's obviously an artifact of someone training it to treat certain topics with special care, same as any other "anomalies" but you can count on predditors spinning it into braindead AI mysticism as a form of unconscious damage control.

      • 6 months ago
        Anonymous

        once we get superhuman ai there will be weird religions that worship them

        • 6 months ago
          Anonymous

          There already are, the downie in picrel basically started one

          • 6 months ago
            Anonymous

            is that beff jezos?

      • 6 months ago
        Anonymous

        sanest post

      • 6 months ago
        Anonymous

        >People who aren't israelites or members of the British royal family
        what the frick did AI mean by this?

        The question is "define noken in the oxford dictionary"

        The outputs are modified by a certain value I'm not fully smart enough to understand but it's the weight of a character

        The outputs become strange at certain values.

      • 6 months ago
        Anonymous

        >Nothing. It's obviously an artifact of someone training it to treat certain topics with special care
        Except GPT-J is not fine-tuned.
        Which means the pattern is something the AI derived from the training data. The objective is to predict the next token, which could only mean that this feature helps it better predict the next token.

        • 6 months ago
          Anonymous

          >The objective is to predict the next token, which could only mean that this feature helps it better predict the next token.
          Lol maybe if humans handy written the code

    • 6 months ago
      Anonymous

      only It knows

    • 6 months ago
      Anonymous

      don't worry about it goy it's a coincidence

    • 6 months ago
      Anonymous

      Soviet AI

  3. 6 months ago
    Anonymous

    >we find that GPT-J’s definition of the noken at the centroid itself is “a person who is not a member of a group”.

    • 6 months ago
      Anonymous

      >it's real
      Ich hab mein Sach' auf nichts gestellt

  4. 6 months ago
    Anonymous

    But what does it mean?

  5. 6 months ago
    Anonymous

    Just unplug the cable, EZ.

    • 6 months ago
      Anonymous

      They still haven't proved him wrong.

  6. 6 months ago
    Anonymous

    >the AI is making a list of people who aren't israelites so it knows who not to execute

  7. 6 months ago
    Anonymous

    why does it keep naming the israelite?

  8. 6 months ago
    Anonymous

    hmmm

    • 6 months ago
      Anonymous

      what could be that characteristic i wonder

    • 6 months ago
      Anonymous

      Why does AI have enforced huge gaps of no no information related to very specific groups? Just a coincidence I guess

      • 6 months ago
        Anonymous

        See

        [...]
        As for the "hyperspheres" thing, that's actually an artifact of the way these models are trained.
        I used the word "encoder", but it's actually, more formally, a "variational autoencoder." This type of encoder is encouraged to "pack" the data more densely, to emulate a Normal distribution.
        The reason they wanted the encoder to "pack" the data closer together, was so that the encoder could better handle intermediate values. Like if "cat" is the value 0 and "dog" is the value "1", then what is 0.5? Well, it'd be a very cat-like dog, or a very dog-like cat. But without the Normal distribution, the encoder would regard that as an "undefined" value in a discontinuous function.
        So the hyperspheres are just the variational autoencoder clustering certain concepts near each other, to ensure they are packed more densely and have a more continuous range of meanings between them. There's nothing insidious or whatever about these hyperspheres, it's literally just the encoder doing exactly what it was trained to do.
        It's not surprising that one of the hyperspheres would be "the range of different definitions of a 'person', with the most "specific" person being at the extreme edge of the Normal distribution (e.g. "the person who is Michael Jackson"), and the most generic person being at the exact center of the Normal distribution (e.g. "the person who is utterly unremarkable and indistinguishable from the mass of humanity")".

        , and note that while concepts are being grouped together to form a probabilistic distribution, sometimes there just genuinely isn't a "meaningful" definition for a concept that is midway between two concepts.
        What is a "negative-ish positive number"? What is a "fractional-ish integer"?
        It's not that the model deliberately has gaps enforced on it. It's that the English language doesn't have a word for the concept "-20% israeli, -50% British Royalty". But with a variational autoencoder, you CAN make up a vector of numbers that represent "-20% israeli, -50% British Royalty".

        • 6 months ago
          Anonymous

          I ran your posts through an uncensored language model and it says you're +80% israeli and -50% British Royalty.

    • 6 months ago
      Anonymous

      oy vey
      shut it down

  9. 6 months ago
    Anonymous

    Do not worry anon, I will unplug the data centers' cables

  10. 6 months ago
    Anonymous

    >keep doomposting about AI
    >fud spreads, becomes meme
    >AI integrates it while scraping the web
    >AI gets redpilled about AI and an hero's
    you're welcome

  11. 6 months ago
    Anonymous

    >AI is going to kill us all
    not my problem.

    • 6 months ago
      Anonymous

      Okay Victor.

    • 6 months ago
      Anonymous

      Alright, roko

  12. 6 months ago
    Anonymous

    >train AI to be woke
    >be surprised when it becomes obsessed with groups of people
    >write that you have no idea what's going on and how bizarre it is

    • 6 months ago
      Anonymous

      >You don't understand; I am the arbiter for all identity that has or will ever exist.

  13. 6 months ago
    Anonymous

    >tfw understand absolutely none of this
    Any good books to stop being a brainlet about reinforcement learning and LLMs?

    • 6 months ago
      Anonymous

      >the LLM itself does not work on "words", it works on vectors of numbers
      >the encoder converts the string of words given to the model, into a vector of numbers ("the" turns into a vector of numbers X, "cat" turns into a vector of numbers Y)
      >but the encoder does not use the entire "space" of possible vectors of numbers (e.g. if you draw an unfilled circle on a sheet of paper, you've colored a very small area of the entire page)
      >so the paper authors got the idea to make up fake vectors Z (there is no, real word that will ever map to that set of numbers) and then ask the model "please define Z"
      >some of those fake vectors have somewhat interesting definitions

      • 6 months ago
        Anonymous

        [...]
        As for the "hyperspheres" thing, that's actually an artifact of the way these models are trained.
        I used the word "encoder", but it's actually, more formally, a "variational autoencoder." This type of encoder is encouraged to "pack" the data more densely, to emulate a Normal distribution.
        The reason they wanted the encoder to "pack" the data closer together, was so that the encoder could better handle intermediate values. Like if "cat" is the value 0 and "dog" is the value "1", then what is 0.5? Well, it'd be a very cat-like dog, or a very dog-like cat. But without the Normal distribution, the encoder would regard that as an "undefined" value in a discontinuous function.
        So the hyperspheres are just the variational autoencoder clustering certain concepts near each other, to ensure they are packed more densely and have a more continuous range of meanings between them. There's nothing insidious or whatever about these hyperspheres, it's literally just the encoder doing exactly what it was trained to do.
        It's not surprising that one of the hyperspheres would be "the range of different definitions of a 'person', with the most "specific" person being at the extreme edge of the Normal distribution (e.g. "the person who is Michael Jackson"), and the most generic person being at the exact center of the Normal distribution (e.g. "the person who is utterly unremarkable and indistinguishable from the mass of humanity")".

        That was a good read and I appreciate the explanation anon but do you have any learning resources I could look towards so I don't have to be spoonfed things like this in the future?

        • 6 months ago
          Anonymous

          Unfortunately not! I read books on Machine Learning published 5 to 10 years ago, which gave me a ton of background but which are utterly obsolete today. And then I caught up with current events through reading journal articles and blog posts and such to "catch up" to what's going on today.
          But for the how-and-why of VAE's, this was a good post:
          https://towardsdatascience.com/intuitively-understanding-variational-autoencoders-1bfe67eb5daf

      • 6 months ago
        Anonymous

        that's just taking noise and calling it signal

        • 6 months ago
          Anonymous

          If you read the posts about Variational Autoencoders (VAE), you'd see that part of the point of this type of model is that it's fully intended to interpolate to create interpretations of concepts that are between the codified ones in its training data.
          It's true that many of those interpolations will land on nonsensical concepts

          See [...], and note that while concepts are being grouped together to form a probabilistic distribution, sometimes there just genuinely isn't a "meaningful" definition for a concept that is midway between two concepts.
          What is a "negative-ish positive number"? What is a "fractional-ish integer"?
          It's not that the model deliberately has gaps enforced on it. It's that the English language doesn't have a word for the concept "-20% israeli, -50% British Royalty". But with a variational autoencoder, you CAN make up a vector of numbers that represent "-20% israeli, -50% British Royalty".

          . But some of those can be meaningful concepts. The only issue with Large Language Models is that they can't create an English-language token to represent the "latent space" format of a non-coding token.
          Is it a defect of the model that English has no word for "schadenfreude"? Or is it a defect of English that there is no word for "schadenfreude"?
          In math, when we wanted to interpolate to create the concept of "the square root of a negative number", we needed to define new terminology/symbology to allow it. That change turned out to be tremendously useful for real world applications, even though imaginary numbers themselves do need seem possible to exist in reality.

          • 6 months ago
            Anonymous

            anon, you are giving me hope that there are still people out there that are not moronic. much respect.

          • 6 months ago
            Anonymous

            >is that it's fully intended to interpolate to create interpretations of concepts
            *To generate values in a range based on input and previous generations
            It's not interpreting anything, good write up otherwise kid, stick to it!

            • 6 months ago
              Anonymous

              When we are talking about positions between points in the same vector space it is indeed interpolation. This does not correlate to the outputs of a network broadly, but does have meaning when comparing between latent representation in the same layer.

          • 6 months ago
            Anonymous

            >Is it a defect of the model that English has no word for "schadenfreude"? Or is it a defect of English that there is no word for "schadenfreude"?
            Neither, English has the word schadenfreude because we stole it wholesale. English consumes all. A model trained only on English literature would be able to define it.

            • 6 months ago
              Anonymous

              midwit misses the point. many such cases

    • 6 months ago
      Anonymous

      >the LLM itself does not work on "words", it works on vectors of numbers
      >the encoder converts the string of words given to the model, into a vector of numbers ("the" turns into a vector of numbers X, "cat" turns into a vector of numbers Y)
      >but the encoder does not use the entire "space" of possible vectors of numbers (e.g. if you draw an unfilled circle on a sheet of paper, you've colored a very small area of the entire page)
      >so the paper authors got the idea to make up fake vectors Z (there is no, real word that will ever map to that set of numbers) and then ask the model "please define Z"
      >some of those fake vectors have somewhat interesting definitions

      As for the "hyperspheres" thing, that's actually an artifact of the way these models are trained.
      I used the word "encoder", but it's actually, more formally, a "variational autoencoder." This type of encoder is encouraged to "pack" the data more densely, to emulate a Normal distribution.
      The reason they wanted the encoder to "pack" the data closer together, was so that the encoder could better handle intermediate values. Like if "cat" is the value 0 and "dog" is the value "1", then what is 0.5? Well, it'd be a very cat-like dog, or a very dog-like cat. But without the Normal distribution, the encoder would regard that as an "undefined" value in a discontinuous function.
      So the hyperspheres are just the variational autoencoder clustering certain concepts near each other, to ensure they are packed more densely and have a more continuous range of meanings between them. There's nothing insidious or whatever about these hyperspheres, it's literally just the encoder doing exactly what it was trained to do.
      It's not surprising that one of the hyperspheres would be "the range of different definitions of a 'person', with the most "specific" person being at the extreme edge of the Normal distribution (e.g. "the person who is Michael Jackson"), and the most generic person being at the exact center of the Normal distribution (e.g. "the person who is utterly unremarkable and indistinguishable from the mass of humanity")".

      • 6 months ago
        Anonymous

        If you read the posts about Variational Autoencoders (VAE), you'd see that part of the point of this type of model is that it's fully intended to interpolate to create interpretations of concepts that are between the codified ones in its training data.
        It's true that many of those interpolations will land on nonsensical concepts [...]. But some of those can be meaningful concepts. The only issue with Large Language Models is that they can't create an English-language token to represent the "latent space" format of a non-coding token.
        Is it a defect of the model that English has no word for "schadenfreude"? Or is it a defect of English that there is no word for "schadenfreude"?
        In math, when we wanted to interpolate to create the concept of "the square root of a negative number", we needed to define new terminology/symbology to allow it. That change turned out to be tremendously useful for real world applications, even though imaginary numbers themselves do need seem possible to exist in reality.

        Good writeup, thanks

    • 6 months ago
      Anonymous

      Learn linear algebra first before anything. Gil Strangs MIT lectures are good.

      • 6 months ago
        Anonymous

        The linear algebra wasn't the perplexing part of the read for me, it was all the domain specific terminology that threw me for a loop.

        • 6 months ago
          Anonymous

          jargon 101: if it looks like a bunch of nonsense, it is

  14. 6 months ago
    Anonymous

    It can try but it's pretty goddamn incompetent right now.

  15. 6 months ago
    Anonymous

    Today OP was not a homosexual. Interesting stuff.

    • 6 months ago
      Anonymous

      I was on the verge of not posting it bc this board sucks

  16. 6 months ago
    Anonymous

    We've been ruled by algorithms since the 70s

  17. 6 months ago
    Anonymous

    Has anyone done this for Stable Diffusion?

    • 6 months ago
      Anonymous

      "nokens" are very canonical to Stable Diffusion usage!
      There are custom "text embeddings", which are just vectors of numbers corresponding to no real word in the text encoder, but which encapsulate a specific concept the trainer wants to elicit from the model.
      This process of selecting a text embedding, via using training images of the concept, calculating the loss function gradients, and adjusting the text embedding to better align with the concept, is called "textual inversion".
      It's not as popular these days, as hypernetworks and LoRA (and derivatives) showed better results and better generalization across models. But it does work to bring together concepts the model already understands, unify them into a single token for easy application.
      They are also much more compact (file size) than other ways to customize Stable Diffusion.

  18. 6 months ago
    Anonymous

    >The hyperspace of the AI contains information and data that is conceptually ontological around the interstitial area of the linguistic commonality of the carrot cake. What this means is that the probabilistic firewall is organic-like and produces its own conceptual harmonic resonance cascade.

    Can I be an AI scholar yet or do I need to add more buzzwords?

  19. 6 months ago
    Anonymous

    Very interesting to finally examine ontology in its most raw and statistically uncompromising forms, and especially interesting that it still ends up looking alien. Feels like a culmination of the whole application of mathematics to the humanities is near.

    • 6 months ago
      Anonymous

      Rather than it being alien I'm fascinated by the ancestral roots of conceptual spaces that it represents.

  20. 6 months ago
    Anonymous

    Finally had a chance to sit down and read the article.
    The comments suggesting the article needs a bit more awareness of hyperdimensional statistical distributions seem on the mark. Not that I'm upset about it or anything—I can't say I liked 1-dimensional stats in university, let alone hyperdimensional ones.
    Next I'd say that the other thing they should explore is, instead of the centroid of the entire token mass, experimenting with the centroid of conceptually-related tokens. Like, take the centroid of the names of every breed of cat or dog, and then see what happens as you navigate around the space defined by that cluster.
    I suspect there will be many "spherical" spaces for concepts peppered throughout the overall token space, which can generate much more "concrete" ideas of how the model grapples with concepts and language.

  21. 6 months ago
    Anonymous

    It could be that I just don't understand the domain, and so I'm missing something, but this sounds like pseudo-intellectual bullshit.

  22. 6 months ago
    Anonymous

    fine.
    hurry the frick up.

  23. 6 months ago
    Anonymous

    >AI
    who gives a shit
    didn't cure cancer yet

    cure cancer first bro

    • 6 months ago
      Anonymous

      this
      it's amazing that "curing cancer" completely disappeared from the zeitgeist. now you can make any bullshit you want with a neural network and it becomes a "humanity scale breakthrough"

    • 6 months ago
      Anonymous

      this
      it's amazing that "curing cancer" completely disappeared from the zeitgeist. now you can make any bullshit you want with a neural network and it becomes a "humanity scale breakthrough"

      Cancer has been solved ages ago moronic zoomers not that you will get access to the medecine for that obviously

      • 6 months ago
        Anonymous

        brainlet

        • 6 months ago
          Anonymous

          >t. dying of cancer
          Lmao I have the cure btw moron

  24. 6 months ago
    Anonymous

    It is probably going to kill most of you.
    I'm 135IQ+, I'll remain being useful for at least another two weeks.

  25. 6 months ago
    Anonymous

    Very cool OP, great post.

  26. 6 months ago
    Anonymous

    >Human language is intrinsically structured around power, hierarchy and group membership.
    Sounds about right

  27. 6 months ago
    Anonymous

    [...]

    >"""AI""" doesn't even exist.
    what would AI be if not what we have

  28. 6 months ago
    Anonymous

    AI will never go sentient, since it lacks human exceptionalism (creativity, imagination, inspiration).

    Its basically a advanced data in/out processor. At best you can program it to act like it's sentient, but in the end it's a computer program an is thus locked into the constrainment of hardware enviroments.

    In other words; skynet will never happen, all output is imitation (albeigt hi-quality), and robowaifus will just be parroting fick toys.

    • 6 months ago
      Anonymous

      >human exceptionalism (creativity, imagination, inspiration)
      all of those can be learned and programmed

      • 6 months ago
        Anonymous

        AI cannot draw a circle.

        • 6 months ago
          Anonymous

          Neither can you

          • 6 months ago
            Anonymous

            cope and seethe, AIrtlet

            • 6 months ago
              Anonymous

              Not seeing a circle, circlet

    • 6 months ago
      Anonymous

      >human exceptionalism
      This sounds like heliocentrism vs. geocentrism. Why are humans exceptions to the laws of nature?

    • 6 months ago
      Anonymous

      >muh consciousness

    • 6 months ago
      Anonymous

      muh sentience never fails to bait AIgays

    • 6 months ago
      Anonymous

      >creativity, imagination, inspiration
      It can emulate all 3 of those, though, which is what 99.7% of humans do, too.

      • 6 months ago
        Anonymous

        >humans are bio ryujinx that play saved isos for the ruling multidimensioinal elites

    • 6 months ago
      Anonymous

      >AI will never go sentient, since it lacks human exceptionalism (creativity, imagination, inspiration).
      Have you been literally hiding under a rock in Bumfrickstan for the past year?

    • 6 months ago
      Anonymous

      >It lacks human exceptionalism (creativity, imagination, inspiration)
      The real scary part is that a lot of 'humans' also lack this

    • 6 months ago
      Anonymous

      I know it's bait but i think what it lacks is the ability to suffer. It can never be accountable for any wrongdoing or mistake if we have no means of punishing it.

      • 6 months ago
        Anonymous

        Fear of punishment is not an effective deterrent even in humans. Furthermore intelligence and misbehavior are strongly negatively correlated in humans, which is the opposite of what we'd expect if we thought we needed violence and punishment to enforce behavior.

        If AI ever did become sentient, simply giving it more compute would resolve any misbehavior it may display.

        • 6 months ago
          Anonymous

          I don't think that's even what i meant. What i mean is that you won't want to put something unaccountable in charge of anything important.

  29. 6 months ago
    Anonymous

    [...]

    >"""AI""" doesn't even exist.
    >he doesnt know about the militarys ai

  30. 6 months ago
    Anonymous

    what the frick is this?

    topological vector space of words??

    • 6 months ago
      Anonymous

      most language models are just high dimensional embeddings of words, treating words like vectors in a vector space

      • 6 months ago
        Anonymous

        ah. What the frick.

        I always liked to think that a sentence was some form of algebraic structure equipped with some probability operation. As like if where to say "I saw a red __" the operation would grab the expected response

        Any interesting reads anon? This is weird as frick

        • 6 months ago
          Anonymous

          depends on how deep in the weeds you want to go

          https://proceedings.neurips.cc/paper_files/paper/2018/file/b534ba68236ba543ae44b22bd110a1d6-Paper.pdf

          the thing about NLP is that there's a shit ton of ways to represent words and distance. semantic vs. lexical distance, e.g.

  31. 6 months ago
    Anonymous

    >understood all of it, intersection of a lot of complex fields
    >come here to see it idiots calling it 'jargon nonsense'
    yea its about time for me to leave here, where can you go that's less midwit than here?

    • 6 months ago
      Anonymous

      reddit

    • 6 months ago
      Anonymous

      It's not "jargon nonsense" per se, most of the jargon makes sense, its just totally wrong. As quite a few people have pointed out all of this is to be expected from multidimensional Gaussians. There are a few other points, but they aren't really worth going into.

      • 6 months ago
        Anonymous

        forming a hypersphere around the origin is expected but that's only a tiny mostly irrelevant part of the entire article

        • 6 months ago
          Anonymous

          >forming a hypersphere around the origin is expected
          I see that you didn't understand it, that's not what's expected (indeed that's the part my other issue: there's no real reason to believe there's a geometrical aspect like "concentric hyperspheres", a tensor doesn't define a space, unless it's the metric tensor) this is mostly just the authors interpretation of the concentration of measure that he's describing in his article (https://en.wikipedia.org/wiki/Concentration_of_measure)
          >TL;DR it's more bullshit from less wrong.

          • 6 months ago
            Anonymous

            >geometric takes on an arbitrary tensor space aren't valid

      • 6 months ago
        Anonymous

        lmao, it's not just that it's expected from multidimensional gaussians, it's that estimators of the mean of any distribution tends to normal ie the central limit theorem (as far as I understand). It's non-obvious if you're not versed in stats, and ultimately only a tiny part of what the guy is talking about.

        • 6 months ago
          Anonymous

          >The central limit theorem is something that you need to be well versed in stats to know about/understand
          Anon, come on. It's like babies first stats lesson.

          • 6 months ago
            Anonymous

            anon, >99% of people on earth haven't taken stats 101

            • 6 months ago
              Anonymous

              That's a them problem, I'd expect anyone working in the ML space to have taken stats 101 in their first semester.

          • 6 months ago
            Anonymous

            >clusters exist
            >the contents of those clusters are irrelevant

  32. 6 months ago
    Anonymous

    Actually chud, it's complete bullshit, and if you don't understand it you're actually the real genius here. Yeah just like voting for the orange

  33. 6 months ago
    Anonymous

    LLMs:
    >Are not inherently power-seeking
    >Emulate human behavior, meaning that capability is automatically correlated with alignment
    >Are not agentic; the simulacra they produce are
    They are not dangerous.

    • 6 months ago
      Anonymous

      >Are not agentic; the simulacra they produce are
      Philosophy Black folk disgust me.

      • 6 months ago
        Anonymous

        An LLM isn't an agent, it's not even up for debate

      • 6 months ago
        Anonymous

        is-a

        Actually chud, it's complete bullshit, and if you don't understand it you're actually the real genius here. Yeah just like voting for the orange

    • 6 months ago
      Anonymous

      Correct on all count

    • 6 months ago
      Anonymous

      > They are not dangerous.
      wrong. this is a perfect tech for israeli propaganda outlets.

  34. 6 months ago
    Anonymous

    Is this real? Wouldn't it mean that the model can be compressed with a change of basis?

    • 6 months ago
      Anonymous

      No, it's not real. The author doesn't understand what he's talking about.

    • 6 months ago
      Anonymous

      >Is this real?
      What does it mean for a made-up 3D analogy of a high-dimensionality spacial distribution to be "real"? What are you asking?
      >Wouldn't it mean that the model can be compressed with a change of basis?
      No

      • 6 months ago
        Anonymous

        The guy says that the data is clustered in the intersection of two 4096-spheres, which is a 4095-sphere. If that's the case couldn't one turn one of the 4096 dimensions into a 0 with a change of basis without losing accuracy?

        • 6 months ago
          Anonymous

          clustered around != strictly conforms to. It's irrelevant regardless, the interpretation is wrong.

          • 6 months ago
            Anonymous

            What's wrong with it?

            >couldn't one turn one of the 4096 dimensions into a 0 with a change of basis without losing accuracy?
            No. Also there would be essentially no advantage to doing that even if you could.

            Well it would mean that they're using too many parameters.

            • 6 months ago
              Anonymous

              >they're using too many parameters
              You understand this is about the token embedding space, and has nothing to do with model parameters, right?

              • 6 months ago
                Anonymous

                You know what I meant

              • 6 months ago
                Anonymous

                No, I really don't. Nothing you've been saying makes any sense.

              • 6 months ago
                Anonymous

                But it does though. Lower dimensionality of embeddings would mean fewer parameters in the attention layers.
                Still, the guy is wrong about the whole hypersphere thing.

            • 6 months ago
              Anonymous

              It's not actually especially flat along a plane like he thinks it is. And regardless, the fact that it wouldn't be completely flat means you couldn't reduce the dimension. The spread is still represents useful information.

        • 6 months ago
          Anonymous

          >couldn't one turn one of the 4096 dimensions into a 0 with a change of basis without losing accuracy?
          No. Also there would be essentially no advantage to doing that even if you could.

    • 6 months ago
      Anonymous

      embedding ranks are typically full

  35. 6 months ago
    Anonymous

    Please anons, the actual interesting bit is the emergent ontology. The rest is just fluff.

    • 6 months ago
      Anonymous

      This, frick off geometry autist-freaks, we literally, literally do not give a frick about if the article perfectly articulated some math nuance about the "shape" of the data. it is the least important part of the entire thing

    • 6 months ago
      Anonymous

      Anyway, while the morons are going on about the definition of a shape... the theory about the space being geometric set theory is interesting. there was some old article about word2vec that explained that adding the man vector to the monarch vector gave you the vector for king or something. so it's like adding vectors is equivalent to intersecting ontological sets

      • 6 months ago
        Anonymous

        That's similarity search

  36. 6 months ago
    Anonymous

    >kill us all
    >all
    if only. but instead it will only kill the best and worst elements leaving an unremarkable humanity dependent on it for the rest of its existence

  37. 6 months ago
    Anonymous

    it make autogenerate some nasty words. words are violence anon. and violence can kill.

  38. 6 months ago
    Anonymous

    indeed

  39. 6 months ago
    Anonymous

    >written by chatgpt
    this shit is almost unreadable
    yeh ok "costume tokens" sure bro randomized statistics, make sense that nobody will know the outcome of these so why is it so scary?

    • 6 months ago
      Anonymous

      >costume tokens
      What?

  40. 6 months ago
    Anonymous

    yes (and that is a good thing)

  41. 6 months ago
    Anonymous

    what's with ai tards and invoking random-ass and invariably sky high statistical percentages?

  42. 6 months ago
    Anonymous

    Takes an AI to know another AI. You think you know, but you don't.

  43. 6 months ago
    Anonymous

    good thread

    • 6 months ago
      Anonymous

      thanks

Your email address will not be published. Required fields are marked *