>: two things that look the same when compressed by an encoding algorithm must also look the same when uncompressed.
This is completely wrong. Using huffman encoding on the two strings "aaaaaaaa" and "mmmmmmmm" Would yeild the exact same encoded number 0b00000000. The distance is 0 despite the distance between the original strings being very big. You know nothing of what you're talking about.
>https://aclanthology.org/2023.findings-acl.426.pdf
So they're comparing how many new bytes are needed for the query string given an example string.
It's sort of a hanning distance of the embeddings but faster to calculate?.
2 months ago
Anonymous
You could probably do a search to find a continuation given a set of example strings and a prompt string. Then this could be used for inference.
Hashing algorithms are perturbative, a small change (a single bit flipped) in the input will result in a big change in the output. Calculating distance between hash sums is completely pointless because it has almost no relation to the distance between hashed content.
is text classification something like being able to feed it a stack overflow post and it recognizing that the post is discussing something to do with programming?
is gzip sentient?
yes
my sides have left orbit
What does this all mean for us tech nerds who aren't programmers?
>tech nerds who aren't programmers?
That sounds extremely gay. You should go outside.
It means sexbots are coming. Buy the lube now.
He won.
I've always honestly suspected a greater risk of rampancy from tar.
Just you wait. In 500 years, gzipping and feathering will be known as barbarian method of punishment of the past.
Note that this is classification only, no generation.
Why can't gzip be reversed to generate, provided we put in a sampling algorithm?
Quick rundown? How does a compression software kill an AI company?
14 lines of python outperforms 350 million parameter models.
It's only a matter of time until a better compression algorithm allows the script to outperform the billion parameter llamas
>llamas are just a noisy dictionary hack
Sometimes I wish I had a programmer brain so I could know exactly what the heck I'm looking at.
its a very simple idea: two things that look the same when compressed by an encoding algorithm must also look the same when uncompressed.
this algorithm takes two things x1 x2, encodes them using gzip , calculates a distance , then use that distance to compare x1 to anything else
>: two things that look the same when compressed by an encoding algorithm must also look the same when uncompressed.
This is completely wrong. Using huffman encoding on the two strings "aaaaaaaa" and "mmmmmmmm" Would yeild the exact same encoded number 0b00000000. The distance is 0 despite the distance between the original strings being very big. You know nothing of what you're talking about.
so how does it work then
https://aclanthology.org/2023.findings-acl.426.pdf
>https://aclanthology.org/2023.findings-acl.426.pdf
So they're comparing how many new bytes are needed for the query string given an example string.
It's sort of a hanning distance of the embeddings but faster to calculate?.
You could probably do a search to find a continuation given a set of example strings and a prompt string. Then this could be used for inference.
https://en.wikipedia.org/wiki/Normalized_compression_distance
hmm.. I wonder if lossy compression would work. Doubt a hash like sha256 would work
Hashing algorithms are perturbative, a small change (a single bit flipped) in the input will result in a big change in the output. Calculating distance between hash sums is completely pointless because it has almost no relation to the distance between hashed content.
is text classification something like being able to feed it a stack overflow post and it recognizing that the post is discussing something to do with programming?
How do I use this breaking edge news to make money though?
Install AI on your computer and tell it to invent immortality, allow donations on your website. BAM, set for life and praised for saving humanity.
Scrape twitter and generate e-trade API calls via sentiment analysis from the example python code.