>[note added 2022-12-19] Comments in a thread below clarify that in high-dimensional Euclidean spaces, the distances of a set of Gaussian-distributed vectors from any reference point will be normally distributed in a narrow band. So there’s nothing particularly special about the origin here
That kinda destroys his whole intro
>what the frick did AI mean by this?
Nothing. It's obviously an artifact of someone training it to treat certain topics with special care, same as any other "anomalies" but you can count on predditors spinning it into braindead AI mysticism as a form of unconscious damage control.
>Nothing. It's obviously an artifact of someone training it to treat certain topics with special care
Except GPT-J is not fine-tuned.
Which means the pattern is something the AI derived from the training data. The objective is to predict the next token, which could only mean that this feature helps it better predict the next token.
>The objective is to predict the next token, which could only mean that this feature helps it better predict the next token.
Lol maybe if humans handy written the code
[...]
As for the "hyperspheres" thing, that's actually an artifact of the way these models are trained.
I used the word "encoder", but it's actually, more formally, a "variational autoencoder." This type of encoder is encouraged to "pack" the data more densely, to emulate a Normal distribution.
The reason they wanted the encoder to "pack" the data closer together, was so that the encoder could better handle intermediate values. Like if "cat" is the value 0 and "dog" is the value "1", then what is 0.5? Well, it'd be a very cat-like dog, or a very dog-like cat. But without the Normal distribution, the encoder would regard that as an "undefined" value in a discontinuous function.
So the hyperspheres are just the variational autoencoder clustering certain concepts near each other, to ensure they are packed more densely and have a more continuous range of meanings between them. There's nothing insidious or whatever about these hyperspheres, it's literally just the encoder doing exactly what it was trained to do.
It's not surprising that one of the hyperspheres would be "the range of different definitions of a 'person', with the most "specific" person being at the extreme edge of the Normal distribution (e.g. "the person who is Michael Jackson"), and the most generic person being at the exact center of the Normal distribution (e.g. "the person who is utterly unremarkable and indistinguishable from the mass of humanity")".
, and note that while concepts are being grouped together to form a probabilistic distribution, sometimes there just genuinely isn't a "meaningful" definition for a concept that is midway between two concepts.
What is a "negative-ish positive number"? What is a "fractional-ish integer"?
It's not that the model deliberately has gaps enforced on it. It's that the English language doesn't have a word for the concept "-20% israeli, -50% British Royalty". But with a variational autoencoder, you CAN make up a vector of numbers that represent "-20% israeli, -50% British Royalty".
>keep doomposting about AI >fud spreads, becomes meme >AI integrates it while scraping the web >AI gets redpilled about AI and an hero's
you're welcome
>the LLM itself does not work on "words", it works on vectors of numbers >the encoder converts the string of words given to the model, into a vector of numbers ("the" turns into a vector of numbers X, "cat" turns into a vector of numbers Y) >but the encoder does not use the entire "space" of possible vectors of numbers (e.g. if you draw an unfilled circle on a sheet of paper, you've colored a very small area of the entire page) >so the paper authors got the idea to make up fake vectors Z (there is no, real word that will ever map to that set of numbers) and then ask the model "please define Z" >some of those fake vectors have somewhat interesting definitions
[...]
As for the "hyperspheres" thing, that's actually an artifact of the way these models are trained.
I used the word "encoder", but it's actually, more formally, a "variational autoencoder." This type of encoder is encouraged to "pack" the data more densely, to emulate a Normal distribution.
The reason they wanted the encoder to "pack" the data closer together, was so that the encoder could better handle intermediate values. Like if "cat" is the value 0 and "dog" is the value "1", then what is 0.5? Well, it'd be a very cat-like dog, or a very dog-like cat. But without the Normal distribution, the encoder would regard that as an "undefined" value in a discontinuous function.
So the hyperspheres are just the variational autoencoder clustering certain concepts near each other, to ensure they are packed more densely and have a more continuous range of meanings between them. There's nothing insidious or whatever about these hyperspheres, it's literally just the encoder doing exactly what it was trained to do.
It's not surprising that one of the hyperspheres would be "the range of different definitions of a 'person', with the most "specific" person being at the extreme edge of the Normal distribution (e.g. "the person who is Michael Jackson"), and the most generic person being at the exact center of the Normal distribution (e.g. "the person who is utterly unremarkable and indistinguishable from the mass of humanity")".
That was a good read and I appreciate the explanation anon but do you have any learning resources I could look towards so I don't have to be spoonfed things like this in the future?
Unfortunately not! I read books on Machine Learning published 5 to 10 years ago, which gave me a ton of background but which are utterly obsolete today. And then I caught up with current events through reading journal articles and blog posts and such to "catch up" to what's going on today.
But for the how-and-why of VAE's, this was a good post:
https://towardsdatascience.com/intuitively-understanding-variational-autoencoders-1bfe67eb5daf
If you read the posts about Variational Autoencoders (VAE), you'd see that part of the point of this type of model is that it's fully intended to interpolate to create interpretations of concepts that are between the codified ones in its training data.
It's true that many of those interpolations will land on nonsensical concepts
See [...], and note that while concepts are being grouped together to form a probabilistic distribution, sometimes there just genuinely isn't a "meaningful" definition for a concept that is midway between two concepts.
What is a "negative-ish positive number"? What is a "fractional-ish integer"?
It's not that the model deliberately has gaps enforced on it. It's that the English language doesn't have a word for the concept "-20% israeli, -50% British Royalty". But with a variational autoencoder, you CAN make up a vector of numbers that represent "-20% israeli, -50% British Royalty".
. But some of those can be meaningful concepts. The only issue with Large Language Models is that they can't create an English-language token to represent the "latent space" format of a non-coding token.
Is it a defect of the model that English has no word for "schadenfreude"? Or is it a defect of English that there is no word for "schadenfreude"?
In math, when we wanted to interpolate to create the concept of "the square root of a negative number", we needed to define new terminology/symbology to allow it. That change turned out to be tremendously useful for real world applications, even though imaginary numbers themselves do need seem possible to exist in reality.
>is that it's fully intended to interpolate to create interpretations of concepts
*To generate values in a range based on input and previous generations
It's not interpreting anything, good write up otherwise kid, stick to it!
When we are talking about positions between points in the same vector space it is indeed interpolation. This does not correlate to the outputs of a network broadly, but does have meaning when comparing between latent representation in the same layer.
>Is it a defect of the model that English has no word for "schadenfreude"? Or is it a defect of English that there is no word for "schadenfreude"?
Neither, English has the word schadenfreude because we stole it wholesale. English consumes all. A model trained only on English literature would be able to define it.
>the LLM itself does not work on "words", it works on vectors of numbers >the encoder converts the string of words given to the model, into a vector of numbers ("the" turns into a vector of numbers X, "cat" turns into a vector of numbers Y) >but the encoder does not use the entire "space" of possible vectors of numbers (e.g. if you draw an unfilled circle on a sheet of paper, you've colored a very small area of the entire page) >so the paper authors got the idea to make up fake vectors Z (there is no, real word that will ever map to that set of numbers) and then ask the model "please define Z" >some of those fake vectors have somewhat interesting definitions
As for the "hyperspheres" thing, that's actually an artifact of the way these models are trained.
I used the word "encoder", but it's actually, more formally, a "variational autoencoder." This type of encoder is encouraged to "pack" the data more densely, to emulate a Normal distribution.
The reason they wanted the encoder to "pack" the data closer together, was so that the encoder could better handle intermediate values. Like if "cat" is the value 0 and "dog" is the value "1", then what is 0.5? Well, it'd be a very cat-like dog, or a very dog-like cat. But without the Normal distribution, the encoder would regard that as an "undefined" value in a discontinuous function.
So the hyperspheres are just the variational autoencoder clustering certain concepts near each other, to ensure they are packed more densely and have a more continuous range of meanings between them. There's nothing insidious or whatever about these hyperspheres, it's literally just the encoder doing exactly what it was trained to do.
It's not surprising that one of the hyperspheres would be "the range of different definitions of a 'person', with the most "specific" person being at the extreme edge of the Normal distribution (e.g. "the person who is Michael Jackson"), and the most generic person being at the exact center of the Normal distribution (e.g. "the person who is utterly unremarkable and indistinguishable from the mass of humanity")".
If you read the posts about Variational Autoencoders (VAE), you'd see that part of the point of this type of model is that it's fully intended to interpolate to create interpretations of concepts that are between the codified ones in its training data.
It's true that many of those interpolations will land on nonsensical concepts [...]. But some of those can be meaningful concepts. The only issue with Large Language Models is that they can't create an English-language token to represent the "latent space" format of a non-coding token.
Is it a defect of the model that English has no word for "schadenfreude"? Or is it a defect of English that there is no word for "schadenfreude"?
In math, when we wanted to interpolate to create the concept of "the square root of a negative number", we needed to define new terminology/symbology to allow it. That change turned out to be tremendously useful for real world applications, even though imaginary numbers themselves do need seem possible to exist in reality.
"nokens" are very canonical to Stable Diffusion usage!
There are custom "text embeddings", which are just vectors of numbers corresponding to no real word in the text encoder, but which encapsulate a specific concept the trainer wants to elicit from the model.
This process of selecting a text embedding, via using training images of the concept, calculating the loss function gradients, and adjusting the text embedding to better align with the concept, is called "textual inversion".
It's not as popular these days, as hypernetworks and LoRA (and derivatives) showed better results and better generalization across models. But it does work to bring together concepts the model already understands, unify them into a single token for easy application.
They are also much more compact (file size) than other ways to customize Stable Diffusion.
>The hyperspace of the AI contains information and data that is conceptually ontological around the interstitial area of the linguistic commonality of the carrot cake. What this means is that the probabilistic firewall is organic-like and produces its own conceptual harmonic resonance cascade.
Can I be an AI scholar yet or do I need to add more buzzwords?
Very interesting to finally examine ontology in its most raw and statistically uncompromising forms, and especially interesting that it still ends up looking alien. Feels like a culmination of the whole application of mathematics to the humanities is near.
Finally had a chance to sit down and read the article.
The comments suggesting the article needs a bit more awareness of hyperdimensional statistical distributions seem on the mark. Not that I'm upset about it or anything—I can't say I liked 1-dimensional stats in university, let alone hyperdimensional ones.
Next I'd say that the other thing they should explore is, instead of the centroid of the entire token mass, experimenting with the centroid of conceptually-related tokens. Like, take the centroid of the names of every breed of cat or dog, and then see what happens as you navigate around the space defined by that cluster.
I suspect there will be many "spherical" spaces for concepts peppered throughout the overall token space, which can generate much more "concrete" ideas of how the model grapples with concepts and language.
this
it's amazing that "curing cancer" completely disappeared from the zeitgeist. now you can make any bullshit you want with a neural network and it becomes a "humanity scale breakthrough"
this
it's amazing that "curing cancer" completely disappeared from the zeitgeist. now you can make any bullshit you want with a neural network and it becomes a "humanity scale breakthrough"
Cancer has been solved ages ago moronic zoomers not that you will get access to the medecine for that obviously
AI will never go sentient, since it lacks human exceptionalism (creativity, imagination, inspiration).
Its basically a advanced data in/out processor. At best you can program it to act like it's sentient, but in the end it's a computer program an is thus locked into the constrainment of hardware enviroments.
In other words; skynet will never happen, all output is imitation (albeigt hi-quality), and robowaifus will just be parroting fick toys.
>AI will never go sentient, since it lacks human exceptionalism (creativity, imagination, inspiration).
Have you been literally hiding under a rock in Bumfrickstan for the past year?
I know it's bait but i think what it lacks is the ability to suffer. It can never be accountable for any wrongdoing or mistake if we have no means of punishing it.
Fear of punishment is not an effective deterrent even in humans. Furthermore intelligence and misbehavior are strongly negatively correlated in humans, which is the opposite of what we'd expect if we thought we needed violence and punishment to enforce behavior.
If AI ever did become sentient, simply giving it more compute would resolve any misbehavior it may display.
I always liked to think that a sentence was some form of algebraic structure equipped with some probability operation. As like if where to say "I saw a red __" the operation would grab the expected response
Any interesting reads anon? This is weird as frick
>understood all of it, intersection of a lot of complex fields >come here to see it idiots calling it 'jargon nonsense'
yea its about time for me to leave here, where can you go that's less midwit than here?
It's not "jargon nonsense" per se, most of the jargon makes sense, its just totally wrong. As quite a few people have pointed out all of this is to be expected from multidimensional Gaussians. There are a few other points, but they aren't really worth going into.
>forming a hypersphere around the origin is expected
I see that you didn't understand it, that's not what's expected (indeed that's the part my other issue: there's no real reason to believe there's a geometrical aspect like "concentric hyperspheres", a tensor doesn't define a space, unless it's the metric tensor) this is mostly just the authors interpretation of the concentration of measure that he's describing in his article (https://en.wikipedia.org/wiki/Concentration_of_measure) >TL;DR it's more bullshit from less wrong.
lmao, it's not just that it's expected from multidimensional gaussians, it's that estimators of the mean of any distribution tends to normal ie the central limit theorem (as far as I understand). It's non-obvious if you're not versed in stats, and ultimately only a tiny part of what the guy is talking about.
>The central limit theorem is something that you need to be well versed in stats to know about/understand
Anon, come on. It's like babies first stats lesson.
LLMs: >Are not inherently power-seeking >Emulate human behavior, meaning that capability is automatically correlated with alignment >Are not agentic; the simulacra they produce are
They are not dangerous.
>Is this real?
What does it mean for a made-up 3D analogy of a high-dimensionality spacial distribution to be "real"? What are you asking? >Wouldn't it mean that the model can be compressed with a change of basis?
No
The guy says that the data is clustered in the intersection of two 4096-spheres, which is a 4095-sphere. If that's the case couldn't one turn one of the 4096 dimensions into a 0 with a change of basis without losing accuracy?
>couldn't one turn one of the 4096 dimensions into a 0 with a change of basis without losing accuracy?
No. Also there would be essentially no advantage to doing that even if you could.
Well it would mean that they're using too many parameters.
>they're using too many parameters
You understand this is about the token embedding space, and has nothing to do with model parameters, right?
4 months ago
Anonymous
You know what I meant
4 months ago
Anonymous
No, I really don't. Nothing you've been saying makes any sense.
4 months ago
Anonymous
But it does though. Lower dimensionality of embeddings would mean fewer parameters in the attention layers.
Still, the guy is wrong about the whole hypersphere thing.
It's not actually especially flat along a plane like he thinks it is. And regardless, the fact that it wouldn't be completely flat means you couldn't reduce the dimension. The spread is still represents useful information.
>couldn't one turn one of the 4096 dimensions into a 0 with a change of basis without losing accuracy?
No. Also there would be essentially no advantage to doing that even if you could.
This, frick off geometry autist-freaks, we literally, literally do not give a frick about if the article perfectly articulated some math nuance about the "shape" of the data. it is the least important part of the entire thing
Anyway, while the morons are going on about the definition of a shape... the theory about the space being geometric set theory is interesting. there was some old article about word2vec that explained that adding the man vector to the monarch vector gave you the vector for king or something. so it's like adding vectors is equivalent to intersecting ontological sets
>kill us all >all
if only. but instead it will only kill the best and worst elements leaving an unremarkable humanity dependent on it for the rest of its existence
>written by chatgpt
this shit is almost unreadable
yeh ok "costume tokens" sure bro randomized statistics, make sense that nobody will know the outcome of these so why is it so scary?
https://www.greaterwrong.com/posts/c6uTNm5erRrmyJvvD/mapping-the-semantic-void-strange-goings-on-in-gpt-embedding
Good read thanks anon
>not i if ai i tod!
heh
>[note added 2022-12-19] Comments in a thread below clarify that in high-dimensional Euclidean spaces, the distances of a set of Gaussian-distributed vectors from any reference point will be normally distributed in a narrow band. So there’s nothing particularly special about the origin here
That kinda destroys his whole intro
>intro
It's his entire thesis.
>People who aren't israelites or members of the British royal family
what the frick did AI mean by this?
How did AI even reach that information? What did they feed it?
>what the frick did AI mean by this?
Nothing. It's obviously an artifact of someone training it to treat certain topics with special care, same as any other "anomalies" but you can count on predditors spinning it into braindead AI mysticism as a form of unconscious damage control.
once we get superhuman ai there will be weird religions that worship them
There already are, the downie in picrel basically started one
is that beff jezos?
sanest post
The question is "define noken in the oxford dictionary"
The outputs are modified by a certain value I'm not fully smart enough to understand but it's the weight of a character
The outputs become strange at certain values.
>Nothing. It's obviously an artifact of someone training it to treat certain topics with special care
Except GPT-J is not fine-tuned.
Which means the pattern is something the AI derived from the training data. The objective is to predict the next token, which could only mean that this feature helps it better predict the next token.
>The objective is to predict the next token, which could only mean that this feature helps it better predict the next token.
Lol maybe if humans handy written the code
only It knows
don't worry about it goy it's a coincidence
Soviet AI
>we find that GPT-J’s definition of the noken at the centroid itself is “a person who is not a member of a group”.
>it's real
Ich hab mein Sach' auf nichts gestellt
But what does it mean?
Just unplug the cable, EZ.
They still haven't proved him wrong.
>the AI is making a list of people who aren't israelites so it knows who not to execute
why does it keep naming the israelite?
hmmm
what could be that characteristic i wonder
Why does AI have enforced huge gaps of no no information related to very specific groups? Just a coincidence I guess
See
, and note that while concepts are being grouped together to form a probabilistic distribution, sometimes there just genuinely isn't a "meaningful" definition for a concept that is midway between two concepts.
What is a "negative-ish positive number"? What is a "fractional-ish integer"?
It's not that the model deliberately has gaps enforced on it. It's that the English language doesn't have a word for the concept "-20% israeli, -50% British Royalty". But with a variational autoencoder, you CAN make up a vector of numbers that represent "-20% israeli, -50% British Royalty".
I ran your posts through an uncensored language model and it says you're +80% israeli and -50% British Royalty.
oy vey
shut it down
Do not worry anon, I will unplug the data centers' cables
>keep doomposting about AI
>fud spreads, becomes meme
>AI integrates it while scraping the web
>AI gets redpilled about AI and an hero's
you're welcome
>AI is going to kill us all
not my problem.
Okay Victor.
Alright, roko
>train AI to be woke
>be surprised when it becomes obsessed with groups of people
>write that you have no idea what's going on and how bizarre it is
>You don't understand; I am the arbiter for all identity that has or will ever exist.
>tfw understand absolutely none of this
Any good books to stop being a brainlet about reinforcement learning and LLMs?
>the LLM itself does not work on "words", it works on vectors of numbers
>the encoder converts the string of words given to the model, into a vector of numbers ("the" turns into a vector of numbers X, "cat" turns into a vector of numbers Y)
>but the encoder does not use the entire "space" of possible vectors of numbers (e.g. if you draw an unfilled circle on a sheet of paper, you've colored a very small area of the entire page)
>so the paper authors got the idea to make up fake vectors Z (there is no, real word that will ever map to that set of numbers) and then ask the model "please define Z"
>some of those fake vectors have somewhat interesting definitions
That was a good read and I appreciate the explanation anon but do you have any learning resources I could look towards so I don't have to be spoonfed things like this in the future?
Unfortunately not! I read books on Machine Learning published 5 to 10 years ago, which gave me a ton of background but which are utterly obsolete today. And then I caught up with current events through reading journal articles and blog posts and such to "catch up" to what's going on today.
But for the how-and-why of VAE's, this was a good post:
https://towardsdatascience.com/intuitively-understanding-variational-autoencoders-1bfe67eb5daf
that's just taking noise and calling it signal
If you read the posts about Variational Autoencoders (VAE), you'd see that part of the point of this type of model is that it's fully intended to interpolate to create interpretations of concepts that are between the codified ones in its training data.
It's true that many of those interpolations will land on nonsensical concepts
. But some of those can be meaningful concepts. The only issue with Large Language Models is that they can't create an English-language token to represent the "latent space" format of a non-coding token.
Is it a defect of the model that English has no word for "schadenfreude"? Or is it a defect of English that there is no word for "schadenfreude"?
In math, when we wanted to interpolate to create the concept of "the square root of a negative number", we needed to define new terminology/symbology to allow it. That change turned out to be tremendously useful for real world applications, even though imaginary numbers themselves do need seem possible to exist in reality.
anon, you are giving me hope that there are still people out there that are not moronic. much respect.
>is that it's fully intended to interpolate to create interpretations of concepts
*To generate values in a range based on input and previous generations
It's not interpreting anything, good write up otherwise kid, stick to it!
When we are talking about positions between points in the same vector space it is indeed interpolation. This does not correlate to the outputs of a network broadly, but does have meaning when comparing between latent representation in the same layer.
>Is it a defect of the model that English has no word for "schadenfreude"? Or is it a defect of English that there is no word for "schadenfreude"?
Neither, English has the word schadenfreude because we stole it wholesale. English consumes all. A model trained only on English literature would be able to define it.
midwit misses the point. many such cases
As for the "hyperspheres" thing, that's actually an artifact of the way these models are trained.
I used the word "encoder", but it's actually, more formally, a "variational autoencoder." This type of encoder is encouraged to "pack" the data more densely, to emulate a Normal distribution.
The reason they wanted the encoder to "pack" the data closer together, was so that the encoder could better handle intermediate values. Like if "cat" is the value 0 and "dog" is the value "1", then what is 0.5? Well, it'd be a very cat-like dog, or a very dog-like cat. But without the Normal distribution, the encoder would regard that as an "undefined" value in a discontinuous function.
So the hyperspheres are just the variational autoencoder clustering certain concepts near each other, to ensure they are packed more densely and have a more continuous range of meanings between them. There's nothing insidious or whatever about these hyperspheres, it's literally just the encoder doing exactly what it was trained to do.
It's not surprising that one of the hyperspheres would be "the range of different definitions of a 'person', with the most "specific" person being at the extreme edge of the Normal distribution (e.g. "the person who is Michael Jackson"), and the most generic person being at the exact center of the Normal distribution (e.g. "the person who is utterly unremarkable and indistinguishable from the mass of humanity")".
Good writeup, thanks
Learn linear algebra first before anything. Gil Strangs MIT lectures are good.
The linear algebra wasn't the perplexing part of the read for me, it was all the domain specific terminology that threw me for a loop.
jargon 101: if it looks like a bunch of nonsense, it is
It can try but it's pretty goddamn incompetent right now.
Today OP was not a homosexual. Interesting stuff.
I was on the verge of not posting it bc this board sucks
We've been ruled by algorithms since the 70s
Has anyone done this for Stable Diffusion?
"nokens" are very canonical to Stable Diffusion usage!
There are custom "text embeddings", which are just vectors of numbers corresponding to no real word in the text encoder, but which encapsulate a specific concept the trainer wants to elicit from the model.
This process of selecting a text embedding, via using training images of the concept, calculating the loss function gradients, and adjusting the text embedding to better align with the concept, is called "textual inversion".
It's not as popular these days, as hypernetworks and LoRA (and derivatives) showed better results and better generalization across models. But it does work to bring together concepts the model already understands, unify them into a single token for easy application.
They are also much more compact (file size) than other ways to customize Stable Diffusion.
>The hyperspace of the AI contains information and data that is conceptually ontological around the interstitial area of the linguistic commonality of the carrot cake. What this means is that the probabilistic firewall is organic-like and produces its own conceptual harmonic resonance cascade.
Can I be an AI scholar yet or do I need to add more buzzwords?
Very interesting to finally examine ontology in its most raw and statistically uncompromising forms, and especially interesting that it still ends up looking alien. Feels like a culmination of the whole application of mathematics to the humanities is near.
Rather than it being alien I'm fascinated by the ancestral roots of conceptual spaces that it represents.
Finally had a chance to sit down and read the article.
The comments suggesting the article needs a bit more awareness of hyperdimensional statistical distributions seem on the mark. Not that I'm upset about it or anything—I can't say I liked 1-dimensional stats in university, let alone hyperdimensional ones.
Next I'd say that the other thing they should explore is, instead of the centroid of the entire token mass, experimenting with the centroid of conceptually-related tokens. Like, take the centroid of the names of every breed of cat or dog, and then see what happens as you navigate around the space defined by that cluster.
I suspect there will be many "spherical" spaces for concepts peppered throughout the overall token space, which can generate much more "concrete" ideas of how the model grapples with concepts and language.
It could be that I just don't understand the domain, and so I'm missing something, but this sounds like pseudo-intellectual bullshit.
fine.
hurry the frick up.
>AI
who gives a shit
didn't cure cancer yet
cure cancer first bro
this
it's amazing that "curing cancer" completely disappeared from the zeitgeist. now you can make any bullshit you want with a neural network and it becomes a "humanity scale breakthrough"
Cancer has been solved ages ago moronic zoomers not that you will get access to the medecine for that obviously
brainlet
>t. dying of cancer
Lmao I have the cure btw moron
It is probably going to kill most of you.
I'm 135IQ+, I'll remain being useful for at least another two weeks.
Very cool OP, great post.
>Human language is intrinsically structured around power, hierarchy and group membership.
Sounds about right
>"""AI""" doesn't even exist.
what would AI be if not what we have
AI will never go sentient, since it lacks human exceptionalism (creativity, imagination, inspiration).
Its basically a advanced data in/out processor. At best you can program it to act like it's sentient, but in the end it's a computer program an is thus locked into the constrainment of hardware enviroments.
In other words; skynet will never happen, all output is imitation (albeigt hi-quality), and robowaifus will just be parroting fick toys.
>human exceptionalism (creativity, imagination, inspiration)
all of those can be learned and programmed
AI cannot draw a circle.
Neither can you
cope and seethe, AIrtlet
Not seeing a circle, circlet
>human exceptionalism
This sounds like heliocentrism vs. geocentrism. Why are humans exceptions to the laws of nature?
>muh consciousness
muh sentience never fails to bait AIgays
>creativity, imagination, inspiration
It can emulate all 3 of those, though, which is what 99.7% of humans do, too.
>humans are bio ryujinx that play saved isos for the ruling multidimensioinal elites
>AI will never go sentient, since it lacks human exceptionalism (creativity, imagination, inspiration).
Have you been literally hiding under a rock in Bumfrickstan for the past year?
>It lacks human exceptionalism (creativity, imagination, inspiration)
The real scary part is that a lot of 'humans' also lack this
I know it's bait but i think what it lacks is the ability to suffer. It can never be accountable for any wrongdoing or mistake if we have no means of punishing it.
Fear of punishment is not an effective deterrent even in humans. Furthermore intelligence and misbehavior are strongly negatively correlated in humans, which is the opposite of what we'd expect if we thought we needed violence and punishment to enforce behavior.
If AI ever did become sentient, simply giving it more compute would resolve any misbehavior it may display.
I don't think that's even what i meant. What i mean is that you won't want to put something unaccountable in charge of anything important.
>"""AI""" doesn't even exist.
>he doesnt know about the militarys ai
what the frick is this?
topological vector space of words??
most language models are just high dimensional embeddings of words, treating words like vectors in a vector space
ah. What the frick.
I always liked to think that a sentence was some form of algebraic structure equipped with some probability operation. As like if where to say "I saw a red __" the operation would grab the expected response
Any interesting reads anon? This is weird as frick
depends on how deep in the weeds you want to go
https://proceedings.neurips.cc/paper_files/paper/2018/file/b534ba68236ba543ae44b22bd110a1d6-Paper.pdf
the thing about NLP is that there's a shit ton of ways to represent words and distance. semantic vs. lexical distance, e.g.
>understood all of it, intersection of a lot of complex fields
>come here to see it idiots calling it 'jargon nonsense'
yea its about time for me to leave here, where can you go that's less midwit than here?
reddit
It's not "jargon nonsense" per se, most of the jargon makes sense, its just totally wrong. As quite a few people have pointed out all of this is to be expected from multidimensional Gaussians. There are a few other points, but they aren't really worth going into.
forming a hypersphere around the origin is expected but that's only a tiny mostly irrelevant part of the entire article
>forming a hypersphere around the origin is expected
I see that you didn't understand it, that's not what's expected (indeed that's the part my other issue: there's no real reason to believe there's a geometrical aspect like "concentric hyperspheres", a tensor doesn't define a space, unless it's the metric tensor) this is mostly just the authors interpretation of the concentration of measure that he's describing in his article (https://en.wikipedia.org/wiki/Concentration_of_measure)
>TL;DR it's more bullshit from less wrong.
>geometric takes on an arbitrary tensor space aren't valid
lmao, it's not just that it's expected from multidimensional gaussians, it's that estimators of the mean of any distribution tends to normal ie the central limit theorem (as far as I understand). It's non-obvious if you're not versed in stats, and ultimately only a tiny part of what the guy is talking about.
>The central limit theorem is something that you need to be well versed in stats to know about/understand
Anon, come on. It's like babies first stats lesson.
anon, >99% of people on earth haven't taken stats 101
That's a them problem, I'd expect anyone working in the ML space to have taken stats 101 in their first semester.
>clusters exist
>the contents of those clusters are irrelevant
Actually chud, it's complete bullshit, and if you don't understand it you're actually the real genius here. Yeah just like voting for the orange
LLMs:
>Are not inherently power-seeking
>Emulate human behavior, meaning that capability is automatically correlated with alignment
>Are not agentic; the simulacra they produce are
They are not dangerous.
>Are not agentic; the simulacra they produce are
Philosophy Black folk disgust me.
An LLM isn't an agent, it's not even up for debate
is-a
Correct on all count
> They are not dangerous.
wrong. this is a perfect tech for israeli propaganda outlets.
Is this real? Wouldn't it mean that the model can be compressed with a change of basis?
No, it's not real. The author doesn't understand what he's talking about.
>Is this real?
What does it mean for a made-up 3D analogy of a high-dimensionality spacial distribution to be "real"? What are you asking?
>Wouldn't it mean that the model can be compressed with a change of basis?
No
The guy says that the data is clustered in the intersection of two 4096-spheres, which is a 4095-sphere. If that's the case couldn't one turn one of the 4096 dimensions into a 0 with a change of basis without losing accuracy?
clustered around != strictly conforms to. It's irrelevant regardless, the interpretation is wrong.
What's wrong with it?
Well it would mean that they're using too many parameters.
>they're using too many parameters
You understand this is about the token embedding space, and has nothing to do with model parameters, right?
You know what I meant
No, I really don't. Nothing you've been saying makes any sense.
But it does though. Lower dimensionality of embeddings would mean fewer parameters in the attention layers.
Still, the guy is wrong about the whole hypersphere thing.
It's not actually especially flat along a plane like he thinks it is. And regardless, the fact that it wouldn't be completely flat means you couldn't reduce the dimension. The spread is still represents useful information.
>couldn't one turn one of the 4096 dimensions into a 0 with a change of basis without losing accuracy?
No. Also there would be essentially no advantage to doing that even if you could.
embedding ranks are typically full
Please anons, the actual interesting bit is the emergent ontology. The rest is just fluff.
This, frick off geometry autist-freaks, we literally, literally do not give a frick about if the article perfectly articulated some math nuance about the "shape" of the data. it is the least important part of the entire thing
Anyway, while the morons are going on about the definition of a shape... the theory about the space being geometric set theory is interesting. there was some old article about word2vec that explained that adding the man vector to the monarch vector gave you the vector for king or something. so it's like adding vectors is equivalent to intersecting ontological sets
That's similarity search
>kill us all
>all
if only. but instead it will only kill the best and worst elements leaving an unremarkable humanity dependent on it for the rest of its existence
it make autogenerate some nasty words. words are violence anon. and violence can kill.
indeed
>written by chatgpt
this shit is almost unreadable
yeh ok "costume tokens" sure bro randomized statistics, make sense that nobody will know the outcome of these so why is it so scary?
>costume tokens
What?
yes (and that is a good thing)
what's with ai tards and invoking random-ass and invariably sky high statistical percentages?
Takes an AI to know another AI. You think you know, but you don't.
good thread
thanks