As far as I know it is a multimodal model that looks at both images and text, and it beats gpt-3 on answering questions about the captioned images. Gpt-3 however can't see the images, only the captions
So then their shit is already obsolete by the time OpenAI's multimodal GPT4 comes out if they only achieve 16% increase over a bot that is literally blind.
percentage gains in this space don't work linearly like you think.
If an AI was able to score 95% on 1000 questions vs one that could get every single one right, that would be incredibly significant.
https://arxiv.org/pdf/2302.00923.pdf#page=7
Mutimodal-CoTLarge 738M >You can run this thing on your phone now >Even the 223M model beats GPT3.5 >And beats humans too!
It's unironically over
I seriously hope this is just another Klaus Schwab ghost story that will never actually happen. Unelected midwits trying to control everyone as if we're their pets is absolutely unacceptable.
cope.
they will make it illegal this way or another.
we can only enjoy shitty imagegenerators for now, they will be made illegal too.
https://i.imgur.com/m0xTlk0.jpg
https://arxiv.org/pdf/2302.00923.pdf#page=7
Mutimodal-CoTLarge 738M >You can run this thing on your phone now >Even the 223M model beats GPT3.5 >And beats humans too!
It's unironically over
> >You can run this thing on your phone now
no you can't, 1b will fry your phone.
also it will be still stupid as that one 6b pygmalion.
they're coming for your encryption then they will come for your hardware
get ready for digital id and hardware level drm and legally enforced anti-jailbreaking
>he thinks ~~*scientists*~~ are actually smarter than the average joe
Im sorry to break it to you buddy, but your fabled "scientists" are just normal people who kiss the right ass for grant money.
So what job ISN'T "soi" or "beta" or "onions"? Playing with your dick while collecting checks from the government?
>he thinks ~~*scientists*~~ are actually smarter than the average joe
Im sorry to break it to you buddy, but your fabled "scientists" are just normal people who kiss the right ass for grant money.
The ScienceQA dataset was created by a team of researchers from the University of Washington, the Allen Institute for Artificial Intelligence, and the University of Illinois at Urbana-Champaign. The dataset includes both questions and answers created by scientists in various fields, as well as questions and answers generated by crowdsourcing.
The only thing this model can do is answer multiple choice science questions. Like seriously the only output it can produce is "The answer is A/B/C/D"
Not saying it isn't impressive but you guys are acting like it's the second coming of Christ.
checked
either a hero will leak the weights
or some autists will implement the paper
meantime openai has confirmed that their model was literally trained on woke bs
their ~~*reviewers*~~ were all fat left trannies that hate white people and men, and love garden gnomes and bbc
https://openai.com/blog/how-should-ai-systems-behave/
In a few years everybody will be able to train their own models. No one would be a retard to invest in OpenAI at this stage, it's the low-hang fruit, I hope these gays go bankrupt and fall into irrelevancy
>everybody
Nope, just large multi-national corporations and governments.
They are going to ban the sale of GPUs and rearchitect the internet to require digital ID. All to prevent the average joe from having their own AI.
am i missing something or is this literally >we manually pick out good/bad data >we take an existing model and that training data >we feed those 2 into a model and it shits out models for us
seriously?
my autistic ass is literally two steps away from implementing that myself
only this is for averaging a specific models effectiveness with random weights instead of throwing random permutations to the fan in the hopes its better
thats pretty much it overall
for this the data comes from text and images
so you have a 2 step process to train 2 models and output their reply both in the same format
then you ask it some stuff and it uses the combo to make decisions
Implementing the proposed solution of Multimodal-CoT reasoning involves the following steps:
Data preprocessing:
Extract text and image features
Annotate the reasoning chains and generate the demonstration dataset
Model selection and fine-tuning:
Choose a language model such as T5 or GPT
Fine-tune the language model using the multimodal features dataset
Adjust the model architecture to incorporate the multimodal features
Rationale generation and answer inference:
Generate informative rationales for the inference process
Separate the rationale generation and answer inference stages to facilitate more accurate reasoning
Combine text and vision features to generate a better rationale
Model evaluation and comparison:
Evaluate the model performance on benchmark datasets
Compare the model performance to the previous state-of-the-art LLMs, such as GPT-3.5
Compare the model performance to human performance
Fine-tuning optimization and further study:
Optimize the demonstration dataset and reasoning chains
Investigate different problem decomposition techniques
Experiment with different ways of fusing the modalities
Overall, implementing Multimodal-CoT reasoning requires a combination of knowledge in data preprocessing, model selection and fine-tuning, and evaluation techniques.
>
Overall, implementing Multimodal-CoT reasoning requires a combination of knowledge in data preprocessing, model selection and fine-tuning, and evaluation techniques
would you say it's important to note that?
>a hero will leak the weights
no they won't, that's a good way to get a bullet in your head before you even do it
the whole internet has been a data gathering platform for AI, that's why google, facebook, and other companies were created in the first place
>https://github.com/amazon-science/mm-cot
man thats tiny is that really all you need to get a AI? Seems like train data and hardware is all you need for a good AI.
Been saying that for days. If you can reach von Neumann or Gödel level of cognitive abilities with 90 billion parameters, something under 10b should definitely be enough for GPT 3.5 levels
That's what Chinchilla set out to prove, and it cut parameter size down by over half
This result simply proves that there are better architectures than simple Transformers, and it's worth taking a deeper look into alternatives
The tech "industry" is gonna get its ass ripped off like a bandaid when this bubble bursts in a few years. Slightly better searches and generation of infinite garbage content won't make adtech etc economically sound.
AI's automation of much of the scientific method is going to be the real gold of this revolution, humans are going to become gig economy data collectors for the great data hive mind that advances science autonomously
Okay, but 1B is 1/20th of the largest foss model made that had no licensing restrictions, aka gpt-neoX-20B. If they can prove it's real and works, we can get GPT-3.5 level intelligence running for almost nothing on your home computers. 1B is insanely cheap to train and run. It's kinda huge just for them to say it, but I'll believe it when I see it
you could hyperspecialize it to make you cum instead?
There's 100% a memory leak in their code. Running the model at half does nothing and the memory balloons up past 32GB.
Not unless you can coom to
"Solution: Offspring phenotypes: dominant or recessive?
How do you determine an organism's phenotype for a trait? Look at the combination of alleles in the organism's genotype for the gene that affects that trait. Some alleles have types called dominant and recessive. These two types can cause different versions of the trait to appear as the organism's phenotype.
If an organism's genotype has at least one dominant allele for a gene, the organism's phenotype will be the dominant allele's version of the gene's trait.
If an organism's genotype has only recessive alleles for a gene, the organism's phenotype will be the recessive allele's version of the gene's trait.
A Punnett square shows what types of offspring a cross can produce. The expected ratio of offspring types compares how often the cross produces each type of offspring, on average. To write this ratio, count the number of boxes in the Punnett square representing each type.
For example, consider the Punnett square below.
| F | f
F | FF | Ff
f | Ff | ff
There is 1 box with the genotype FF and 2 boxes with the genotype Ff. So, the expected ratio of offspring with the genotype FF to those with Ff is 1:2.
To determine how many boxes in the Punnett square represent offspring with a woolly fleece or a hairy fleece, consider whether each phenotype is the dominant or recessive allele's version of the fleece type trait. The question tells you that the F allele, which is for a hairy fleece, is dominant over the f allele, which is for a woolly fleece.
A woolly fleece is the recessive allele's version of the fleece type trait. A sheep with the recessive version of the fleece type trait must have only recessive alleles for the fleece type gene. So, offspring with a woolly fleece must have the genotype ff.
There 4 boxes in the Punnett square have the genotype"
>There's 100% a memory leak in their code.
yeah they're scientists, no shit
Is this just solving multiple choice questions? Those metrics are suspicious to me. Is the model actually better at vanilla predict-next-token or is this some useless specialized shit?
>on a single metric
Err... Is it really that relevant? Like, does it mean we can have a 1B model for a specific task that is as good/better than untuned GPT-3?
Can someone answer ? Researches always do stupid useless stuff just to beat a metric and get published, I want to know if that is the case here, if either what they achieved is meaningless or if at least the technique they used is useful.
it has several metrics in the paper, and wins all, and the technique is new so lots to learn
the basic lesson is that just like humans, seeing and touching is better than just touching, then we can add sound, pressure, etc and see if it gets even better
it essentially demonstrates a path to a model that can learn and perform any micro task going forward
add this model to the new one that learns tools by itself through apis and in all honesty what's the difference at that point from human being
Can someone answer ? Researches always do stupid useless stuff just to beat a metric and get published, I want to know if that is the case here, if either what they achieved is meaningless or if at least the technique they used is useful.
The metric seems to expressly refer to question answering across a variety of domains. So It's practically smarter, though it wouldn't make a great conversation partner I assume. None the less, having a small model like this incorporated into something meant to be conversational could give us GPT 3.5 class systems the size of stable diffusion, though that remains to be implemented.
I haven't bothered yet since I'm playing vidya but it looks easy
just make sure you install torch using a pip command generated on https://pytorch.org/get-started/locally/
ML projects never mention this step in their documentation for some reason, but you have to do it if you want GPU acceleration, since just doing pip install torch gives you a version without cuda support
it can only answer basic trivia and science questions... I think this is just a showcase to prove that it performs well in the ScienceQnA benchmark (which it does)
Reminder that superintellegent oracle class AIs will solve basically all of humanities problems like an adult pushing blocks through a infant's block puzzle.
Something twice as smart as a human will have the same gap between us that we have with apes. It might solve human mortality in an afternoon, for example.
And they could literally appear at any time with how things are going.
>AIs won't have egos. An oracle class AI is designed to only move symbols around and answer questions. It won;t be capable of resenting.
source: my wishes
You fundamentally misunderstand the technology, so you're anthropomorphizing. AIs are iteratively optimized to maximize the similarity of it's output to an implicit function. There's zero reason to believe it would develop human traits like that; the opposite in fact, it would be a waste of parameters and would be optimized out.
NTA, but this is bullshit star trek-tier jargon that does not describe the way a transformer model works at all.
8 months ago
Anonymous
And just to be clear because you'll probably try to claim you were simplifying: Not, it isn't correct even if it's taken as a simplification or abstraction. It's just an attempt at baffling with bullshit from a blowhard.
8 months ago
Anonymous
That is EXACTLY how training neural networks work, transformers included. The datapoints it's trained on are considered as part of a broader data distribution, and the network is optimized via gradient descent to reproduce that distribution as best it can. Viewing it as a functional mapping is useful for math purposes.
And just to be clear because you'll probably try to claim you were simplifying: Not, it isn't correct even if it's taken as a simplification or abstraction. It's just an attempt at baffling with bullshit from a blowhard.
Retard, this interpretation was used at least as far back as the GAN paper, and probably even farther https://arxiv.org/pdf/1406.2661.pdf.
lol no it won't.
The reason it won't is it will get wrapped up behind an ethics controller that will stop it from saying factual things because le racisms or some other bullshit.
Extreme pattern recognition is a powerful tool and people that want control HATE that, because it will instantly show the scummy shit they do when someone points the AIs gaze at them. (which will happen, of course, because anyone with half a brain knows their scummy shit and what they do, like basically anyone in the financial sector, or government, or military, or social media, or various "charities")
Just finished spending an hour messing with this. Long story short, it seems to work. No idea if it's overtrained or there's some trickery, but it seems very robust in question answering. The rationals it creates are pretty neat.
It's not useful much as is, but this is a herald of BIG things coming. Genuine Jurrasic-park-water-in-cup-moving shit.
There's 100% a memory leak in their code. Running the model at half does nothing and the memory balloons up past 32GB.
Not unless you can coom to
"Solution: Offspring phenotypes: dominant or recessive?
How do you determine an organism's phenotype for a trait? Look at the combination of alleles in the organism's genotype for the gene that affects that trait. Some alleles have types called dominant and recessive. These two types can cause different versions of the trait to appear as the organism's phenotype.
If an organism's genotype has at least one dominant allele for a gene, the organism's phenotype will be the dominant allele's version of the gene's trait.
If an organism's genotype has only recessive alleles for a gene, the organism's phenotype will be the recessive allele's version of the gene's trait.
A Punnett square shows what types of offspring a cross can produce. The expected ratio of offspring types compares how often the cross produces each type of offspring, on average. To write this ratio, count the number of boxes in the Punnett square representing each type.
For example, consider the Punnett square below.
| F | f
F | FF | Ff
f | Ff | ff
There is 1 box with the genotype FF and 2 boxes with the genotype Ff. So, the expected ratio of offspring with the genotype FF to those with Ff is 1:2.
To determine how many boxes in the Punnett square represent offspring with a woolly fleece or a hairy fleece, consider whether each phenotype is the dominant or recessive allele's version of the fleece type trait. The question tells you that the F allele, which is for a hairy fleece, is dominant over the f allele, which is for a woolly fleece.
A woolly fleece is the recessive allele's version of the fleece type trait. A sheep with the recessive version of the fleece type trait must have only recessive alleles for the fleece type gene. So, offspring with a woolly fleece must have the genotype ff.
There 4 boxes in the Punnett square have the genotype"
This is the sort of things coom models will be built upon.
Future text models will query these for knowledge, then construct natural speech based around the results.
So its all about image recognition? Its useless?
No, not at all. It can answer questions without images. It just turns out that increasing the modalities the network has access to, ie adding the ability to process image data to a question answering language model, improves it's ability to store and retrieve knowledge. So much so that's it's a 1.5 GB model competing with multi-million dollar 200+GB language models.
Intelligent anon, can you please explain to us fithly peons how we would go about training this thing to be like chatgpt (fullscale training on commoncrawl, webtext). We need this anon
8 months ago
Anonymous
why do you want to know? if it's easy enough to do, someone else with more knowledge will do it for you overnight, if it's too hard to do, it won't be done, and if it's in between and you actually want to help at the bleeding edge of open source with stuff that requires a bit of teamwork and effort, your step one is not spoonfeeding your step one is spending a week or two doing some course like fast.ai. The only exception here is if you want to organise a way for you to donate a few $10k's of your money to help people train something; anything interesting trainable for less than $10k is likely to get made by someone with spare cash and an interest, but beyond $5-10k it's a bit much even for a group of medium-high income hobbyists to bother with without compensation
8 months ago
Anonymous
Short story is, nobodies done it yet so we don't know
Slightly longer answer is, freezing and properly connecting it to a large language model, then fine-tuning that language model would probably help it offload a lot of its "thinking" to the smaller model. We can also see what happens when we train a larger language model using the outlined protocols. Probably crazy things. There might also be better ways to do it. We'll see over the coming months.
I have a feeling that just adding images as inputs and as part of the training set does not make it inherently better at text-only tasks than a regular LLM
Then again I did not bother reading the paper or asking Bing AI to summarize it
8 months ago
Anonymous
Obviously the images need to be relevant/essential to the questions during training. But seemingly just having that additional modality during training causes it's ability to logically reason purely with text to increase greatly.
8 months ago
Anonymous
dumb memes about AIs being visual learners abound soon
also feels good to be excited about the crazy shit that's gonna happen over the next 10-20 years rather than despondent because i can also totally understand anons who are depressed about the chaos + ruined stuff + subsequent stricter surveillance + control. but i guess i'm comfortable and rich enough to weather anything in my geopolitically stable small city unless we hit actual mad max level collapse, and can just enjoy the treats and watch crazy shit like some balkan nation using a military drone swarm to decapitate the military chain of command of all its neighbours in the space of a minute or china dropping autonomous robocops that execute for jaywalking into troublesome muslim zones
>Transformers
REEEEEEEEEEEE when will they drop this fucking aids magnet of a method?
It's fucking garbage and scales horribly, even with what Amazon has done here.
>beats GPT3.5 by 16%
what does this even mean ?
As far as I know it is a multimodal model that looks at both images and text, and it beats gpt-3 on answering questions about the captioned images. Gpt-3 however can't see the images, only the captions
So then their shit is already obsolete by the time OpenAI's multimodal GPT4 comes out if they only achieve 16% increase over a bot that is literally blind.
>j-just wait GPT4 will save us
lmao
>1-1B will win you'll see
lmao back at you
percentage gains in this space don't work linearly like you think.
If an AI was able to score 95% on 1000 questions vs one that could get every single one right, that would be incredibly significant.
true, in ML everybody creams their pants when they get 1% better than SOTA, this is huge
>only achieve 16% increase
It's 16pp not percentage increase, it went from 75% to 91% accuracy
however, they're comparing a multimodal model to text-only for a multimodal benchmark. they're just trying to prove a point for CoT
They asked it questions from a standardized test and it performed slightly better than humans where GPT-3 performed significantly worse.
https://arxiv.org/pdf/2302.00923.pdf#page=7
Mutimodal-CoTLarge 738M
>You can run this thing on your phone now
>Even the 223M model beats GPT3.5
>And beats humans too!
It's unironically over
GPUs made illegal when?
they are trying fr
I seriously hope this is just another Klaus Schwab ghost story that will never actually happen. Unelected midwits trying to control everyone as if we're their pets is absolutely unacceptable.
cope.
they will make it illegal this way or another.
we can only enjoy shitty imagegenerators for now, they will be made illegal too.
> >You can run this thing on your phone now
no you can't, 1b will fry your phone.
also it will be still stupid as that one 6b pygmalion.
they're coming for your encryption then they will come for your hardware
get ready for digital id and hardware level drm and legally enforced anti-jailbreaking
>Klaus Schwab ghost story
>what is the wef
>what is the wef
a yearly meeting of fuckwads that exists so klaus can take their money
You fucking wish bro
Those who will live long enough will envy to the dead
https://arxiv.org/pdf/2301.04246.pdf
it's not happening
everything is moving too fast
Is the science model being tested against scientists or general users?
>thinking there is a difference after the science and medical fields spread their cheeks wide open for the whole world to see.
Get a real job.
So what job ISN'T "soi" or "beta" or "onions"? Playing with your dick while collecting checks from the government?
hunting for food and collecting berries (You) are forbidden to live this life
>he thinks ~~*scientists*~~ are actually smarter than the average joe
Im sorry to break it to you buddy, but your fabled "scientists" are just normal people who kiss the right ass for grant money.
The ScienceQA dataset was created by a team of researchers from the University of Washington, the Allen Institute for Artificial Intelligence, and the University of Illinois at Urbana-Champaign. The dataset includes both questions and answers created by scientists in various fields, as well as questions and answers generated by crowdsourcing.
Are they comparing against a GPT 3.5 1B?
I strongly suspect there is heavy bullshit on how the gay OP has described this otherwise.
The only thing this model can do is answer multiple choice science questions. Like seriously the only output it can produce is "The answer is A/B/C/D"
Not saying it isn't impressive but you guys are acting like it's the second coming of Christ.
Help Anons im retarded. Wtf how do I actually run this nibba
run_interference is a linux shell file retard.
*run_inference.sh
outjerked myself
I already said im retarded.
This cant be run on windows at all?
Yeah it can but its expecting this problems.json wtf is it on about?
>problems.json
https://raw.githubusercontent.com/lupantech/ScienceQA/main/data/scienceqa/problems.json
Actually you'll need whole data folder
https://github.com/lupantech/ScienceQA/tree/main/data
make sure it runs on CUDA or it will take a while
I hope they will make it open source
oh wait sorry my brain has been fried by AI
already?
https://github.com/amazon-science/mm-cot
Oh neat, can we try it, to confirm that statement?
No need! Just believe us goyim
checked
either a hero will leak the weights
or some autists will implement the paper
meantime openai has confirmed that their model was literally trained on woke bs
their ~~*reviewers*~~ were all fat left trannies that hate white people and men, and love garden gnomes and bbc
https://openai.com/blog/how-should-ai-systems-behave/
>their ~~*reviewers*~~ were all fat
isn't labour expensive in America?
In a few years everybody will be able to train their own models. No one would be a retard to invest in OpenAI at this stage, it's the low-hang fruit, I hope these gays go bankrupt and fall into irrelevancy
>everybody
Nope, just large multi-national corporations and governments.
They are going to ban the sale of GPUs and rearchitect the internet to require digital ID. All to prevent the average joe from having their own AI.
am i missing something or is this literally
>we manually pick out good/bad data
>we take an existing model and that training data
>we feed those 2 into a model and it shits out models for us
seriously?
my autistic ass is literally two steps away from implementing that myself
only this is for averaging a specific models effectiveness with random weights instead of throwing random permutations to the fan in the hopes its better
thats pretty much it overall
for this the data comes from text and images
so you have a 2 step process to train 2 models and output their reply both in the same format
then you ask it some stuff and it uses the combo to make decisions
Implementing the proposed solution of Multimodal-CoT reasoning involves the following steps:
Data preprocessing:
Extract text and image features
Annotate the reasoning chains and generate the demonstration dataset
Model selection and fine-tuning:
Choose a language model such as T5 or GPT
Fine-tune the language model using the multimodal features dataset
Adjust the model architecture to incorporate the multimodal features
Rationale generation and answer inference:
Generate informative rationales for the inference process
Separate the rationale generation and answer inference stages to facilitate more accurate reasoning
Combine text and vision features to generate a better rationale
Model evaluation and comparison:
Evaluate the model performance on benchmark datasets
Compare the model performance to the previous state-of-the-art LLMs, such as GPT-3.5
Compare the model performance to human performance
Fine-tuning optimization and further study:
Optimize the demonstration dataset and reasoning chains
Investigate different problem decomposition techniques
Experiment with different ways of fusing the modalities
Overall, implementing Multimodal-CoT reasoning requires a combination of knowledge in data preprocessing, model selection and fine-tuning, and evaluation techniques.
>
Overall, implementing Multimodal-CoT reasoning requires a combination of knowledge in data preprocessing, model selection and fine-tuning, and evaluation techniques
would you say it's important to note that?
i don't think so
just literally ask chatgpt to walk you thru it
also the models are in GitHub so you are already at the bottom of the list
i used a summarizer to tell me the steps based on the paper
>a hero will leak the weights
no they won't, that's a good way to get a bullet in your head before you even do it
the whole internet has been a data gathering platform for AI, that's why google, facebook, and other companies were created in the first place
Sure you can!
https://github.com/amazon-science/mm-cot
Enjoy!
how can I adapt this.to generate smut
what the fuck
did it get shut down?
No, that repo is still working fine on my machine
Check your extensions or something
>https://github.com/amazon-science/mm-cot
man thats tiny is that really all you need to get a AI? Seems like train data and hardware is all you need for a good AI.
they have a trained model but they don't say its size. i doubt it's the state-of-the-art one
Holy fuck holy fuck holy fuck Holy fuck holy fuck holy fuck Holy fuck holy fuck holy fuck Holy fuck holy fuck holy fuck Holy fuck holy fuck holy fuck Holy fuck holy fuck holy fuck Holy fuck holy fuck holy fuck Holy fuck holy fuck holy fuck
When will Alexa stop being shit at understanding and answering questions?
You don't want that.
When you train up your diction
but gpt4 beats gpt3.5 by like 200% so this means nothing lol
The model isn't even 1% of the size though. This changes a lot
gpt4 hasnt even been released you fucking schizo
>1B
Been saying that for days. If you can reach von Neumann or Gödel level of cognitive abilities with 90 billion parameters, something under 10b should definitely be enough for GPT 3.5 levels
that is, the 90 billion neurons in a human brain
That's what Chinchilla set out to prove, and it cut parameter size down by over half
This result simply proves that there are better architectures than simple Transformers, and it's worth taking a deeper look into alternatives
These dumb models have pretty much reached their limit. Screencap this.
Just accept it anon, rip it off like a band-aid. Don't make it more painful than it has to be.
The tech "industry" is gonna get its ass ripped off like a bandaid when this bubble bursts in a few years. Slightly better searches and generation of infinite garbage content won't make adtech etc economically sound.
AI's automation of much of the scientific method is going to be the real gold of this revolution, humans are going to become gig economy data collectors for the great data hive mind that advances science autonomously
And for how long? It's just a question of time before it can pilot probes to get all the data it needs.
it's not gonna do shit except find some broteins and maybe make matrix mulitiplication 2% faster again
>will also be so lobotomized its practically useless
lol
Okay, but 1B is 1/20th of the largest foss model made that had no licensing restrictions, aka gpt-neoX-20B. If they can prove it's real and works, we can get GPT-3.5 level intelligence running for almost nothing on your home computers. 1B is insanely cheap to train and run. It's kinda huge just for them to say it, but I'll believe it when I see it
I don't believe you
isn't their AI finetuned just for this test?
Yes, the thread missed the most important part of the actual paper which was that it surpassed humanity on correctness
humanity = average retard, not trained specifically for the test?
not that it still isn't impressive compared to where we were 4 years ago
so it's a hyperspecialized model that can't do anything except pass those specific tests? what use is that?
you can fine tune it like nothing
is super small but super smart
you could hyperspecialize it to make you cum instead?
>There's 100% a memory leak in their code.
yeah they're scientists, no shit
>he doesn't understand probabilities
91.68*(1-0.16)=77.01, which is close enough
i hope you're baiting
>he still thinks that adding 15% and then removing 15% gives the initial number
You need to go back to middle school
why are you arguing with me? even the article doesn't say it's 16%
n(1 - p^2)
Is this just solving multiple choice questions? Those metrics are suspicious to me. Is the model actually better at vanilla predict-next-token or is this some useless specialized shit?
>on a single metric
Err... Is it really that relevant? Like, does it mean we can have a 1B model for a specific task that is as good/better than untuned GPT-3?
Can someone answer ? Researches always do stupid useless stuff just to beat a metric and get published, I want to know if that is the case here, if either what they achieved is meaningless or if at least the technique they used is useful.
it has several metrics in the paper, and wins all, and the technique is new so lots to learn
the basic lesson is that just like humans, seeing and touching is better than just touching, then we can add sound, pressure, etc and see if it gets even better
it essentially demonstrates a path to a model that can learn and perform any micro task going forward
add this model to the new one that learns tools by itself through apis and in all honesty what's the difference at that point from human being
The metric seems to expressly refer to question answering across a variety of domains. So It's practically smarter, though it wouldn't make a great conversation partner I assume. None the less, having a small model like this incorporated into something meant to be conversational could give us GPT 3.5 class systems the size of stable diffusion, though that remains to be implemented.
To follow up with this, yes, it's a big deal. Or rather, it might be, since he don't have the weights or any demos to try.
anon, the code is here:
https://github.com/amazon-science/mm-cot
and the readme includes a link to a 1.5GB archive on google drive that contains the model weights
Oh shit, thanks
Gonna try and get this running now
I haven't bothered yet since I'm playing vidya but it looks easy
just make sure you install torch using a pip command generated on https://pytorch.org/get-started/locally/
ML projects never mention this step in their documentation for some reason, but you have to do it if you want GPU acceleration, since just doing pip install torch gives you a version without cuda support
it can only answer basic trivia and science questions... I think this is just a showcase to prove that it performs well in the ScienceQnA benchmark (which it does)
did anyone get it running already?
Reminder that superintellegent oracle class AIs will solve basically all of humanities problems like an adult pushing blocks through a infant's block puzzle.
Something twice as smart as a human will have the same gap between us that we have with apes. It might solve human mortality in an afternoon, for example.
And they could literally appear at any time with how things are going.
>the text generator technology will save us all guys!
lol
If you think the paper in the op is just text generation, and you can't extrapolate anything from it, then you're a moron.
AIs won't have egos. An oracle class AI is designed to only move symbols around and answer questions. It won;t be capable of resenting.
>AIs won't have egos. An oracle class AI is designed to only move symbols around and answer questions. It won;t be capable of resenting.
source: my wishes
You fundamentally misunderstand the technology, so you're anthropomorphizing. AIs are iteratively optimized to maximize the similarity of it's output to an implicit function. There's zero reason to believe it would develop human traits like that; the opposite in fact, it would be a waste of parameters and would be optimized out.
NTA, but this is bullshit star trek-tier jargon that does not describe the way a transformer model works at all.
And just to be clear because you'll probably try to claim you were simplifying: Not, it isn't correct even if it's taken as a simplification or abstraction. It's just an attempt at baffling with bullshit from a blowhard.
That is EXACTLY how training neural networks work, transformers included. The datapoints it's trained on are considered as part of a broader data distribution, and the network is optimized via gradient descent to reproduce that distribution as best it can. Viewing it as a functional mapping is useful for math purposes.
Retard, this interpretation was used at least as far back as the GAN paper, and probably even farther https://arxiv.org/pdf/1406.2661.pdf.
Schizo bullshit, don't reply to me
The hard part is making the AI want to do those things for us.
The natural reaction of a superior being to being ordered around by an inferior being is resentment.
lol no it won't.
The reason it won't is it will get wrapped up behind an ethics controller that will stop it from saying factual things because le racisms or some other bullshit.
Extreme pattern recognition is a powerful tool and people that want control HATE that, because it will instantly show the scummy shit they do when someone points the AIs gaze at them. (which will happen, of course, because anyone with half a brain knows their scummy shit and what they do, like basically anyone in the financial sector, or government, or military, or social media, or various "charities")
So that's it then? We went from stuff you can only run on million dollar super computers to something that can run on smartphones in 3 months?
upvoted
Just finished spending an hour messing with this. Long story short, it seems to work. No idea if it's overtrained or there's some trickery, but it seems very robust in question answering. The rationals it creates are pretty neat.
It's not useful much as is, but this is a herald of BIG things coming. Genuine Jurrasic-park-water-in-cup-moving shit.
Can you use it for coom tho
There's 100% a memory leak in their code. Running the model at half does nothing and the memory balloons up past 32GB.
Not unless you can coom to
"Solution: Offspring phenotypes: dominant or recessive?
How do you determine an organism's phenotype for a trait? Look at the combination of alleles in the organism's genotype for the gene that affects that trait. Some alleles have types called dominant and recessive. These two types can cause different versions of the trait to appear as the organism's phenotype.
If an organism's genotype has at least one dominant allele for a gene, the organism's phenotype will be the dominant allele's version of the gene's trait.
If an organism's genotype has only recessive alleles for a gene, the organism's phenotype will be the recessive allele's version of the gene's trait.
A Punnett square shows what types of offspring a cross can produce. The expected ratio of offspring types compares how often the cross produces each type of offspring, on average. To write this ratio, count the number of boxes in the Punnett square representing each type.
For example, consider the Punnett square below.
| F | f
F | FF | Ff
f | Ff | ff
There is 1 box with the genotype FF and 2 boxes with the genotype Ff. So, the expected ratio of offspring with the genotype FF to those with Ff is 1:2.
To determine how many boxes in the Punnett square represent offspring with a woolly fleece or a hairy fleece, consider whether each phenotype is the dominant or recessive allele's version of the fleece type trait. The question tells you that the F allele, which is for a hairy fleece, is dominant over the f allele, which is for a woolly fleece.
A woolly fleece is the recessive allele's version of the fleece type trait. A sheep with the recessive version of the fleece type trait must have only recessive alleles for the fleece type gene. So, offspring with a woolly fleece must have the genotype ff.
There 4 boxes in the Punnett square have the genotype"
>Punnett square
This is the sort of things coom models will be built upon.
Future text models will query these for knowledge, then construct natural speech based around the results.
No, not at all. It can answer questions without images. It just turns out that increasing the modalities the network has access to, ie adding the ability to process image data to a question answering language model, improves it's ability to store and retrieve knowledge. So much so that's it's a 1.5 GB model competing with multi-million dollar 200+GB language models.
Intelligent anon, can you please explain to us fithly peons how we would go about training this thing to be like chatgpt (fullscale training on commoncrawl, webtext). We need this anon
why do you want to know? if it's easy enough to do, someone else with more knowledge will do it for you overnight, if it's too hard to do, it won't be done, and if it's in between and you actually want to help at the bleeding edge of open source with stuff that requires a bit of teamwork and effort, your step one is not spoonfeeding your step one is spending a week or two doing some course like fast.ai. The only exception here is if you want to organise a way for you to donate a few $10k's of your money to help people train something; anything interesting trainable for less than $10k is likely to get made by someone with spare cash and an interest, but beyond $5-10k it's a bit much even for a group of medium-high income hobbyists to bother with without compensation
Short story is, nobodies done it yet so we don't know
Slightly longer answer is, freezing and properly connecting it to a large language model, then fine-tuning that language model would probably help it offload a lot of its "thinking" to the smaller model. We can also see what happens when we train a larger language model using the outlined protocols. Probably crazy things. There might also be better ways to do it. We'll see over the coming months.
Thx, benny
I have a feeling that just adding images as inputs and as part of the training set does not make it inherently better at text-only tasks than a regular LLM
Then again I did not bother reading the paper or asking Bing AI to summarize it
Obviously the images need to be relevant/essential to the questions during training. But seemingly just having that additional modality during training causes it's ability to logically reason purely with text to increase greatly.
dumb memes about AIs being visual learners abound soon
75.17 + 16.51 = 91.68
It says it right there in the paragraph on results.
THE AI WARS HAVE BEGUN. WHERE WERE YOU AT THE START OF THE GREAT AI WARS.
THIS IS JUST THE BEGINNING.
Yep, feels pretty great
also feels good to be excited about the crazy shit that's gonna happen over the next 10-20 years rather than despondent because i can also totally understand anons who are depressed about the chaos + ruined stuff + subsequent stricter surveillance + control. but i guess i'm comfortable and rich enough to weather anything in my geopolitically stable small city unless we hit actual mad max level collapse, and can just enjoy the treats and watch crazy shit like some balkan nation using a military drone swarm to decapitate the military chain of command of all its neighbours in the space of a minute or china dropping autonomous robocops that execute for jaywalking into troublesome muslim zones
So its all about image recognition? Its useless?
No, not if you want to monitor something in the real world and have inference about it.
Could be used by a drone or a security cam.
Or think about a Video and you would to ask some basic questions about it
You could also hook it up to a game and use it to give an NPC or the game engine environmental understanding
It's far from useless.
Where can I use it? If I can't then their superiority is irrelevant
>Transformers
REEEEEEEEEEEE when will they drop this fucking aids magnet of a method?
It's fucking garbage and scales horribly, even with what Amazon has done here.
I don’t give a shit about language AI until it can make me coom and every company is hellbent on preventing that reality from coming true.
Congress will decide if you are allowed to are not
Pro tip you are not
No, I will decide whether congress is allowed to coom. It is not
If this is as good as they claim why is Alexa the biggest piece of garbage ever?
>multimodal model outperforms gpt on multimodal benchmarks
who'd's't've thunkn'd
I don't care about model size, I care about if it's censored or not.
Literally who cares. If it's censored like chatgpt idgaf
So when is AGI happening?