What if we trained an AI to decompile binaries back to something that's very close to the original source code, wouldn't this end proprietary software?
The AI & automation community.
What if we trained an AI to decompile binaries back to something that's very close to the original source code, wouldn't this end proprietary software?
>What if we
then do it feggit
I think before I do
Too many words
>I think before I do
Evidently not since you think it's possible.
>I think before I do
>Too many words
It would be the end, yes, but it would be fought. Hard.
But corporations have been using things like this for generations now against other corporations.
It's there, but it's not visible to the public or that would be suicide and a big legal situation.
You'll still have to deal with the problem of giving functions names and commenting code.
AI already does this
Not at all. If you swap your words around chatgpt just dies outright. It's really good at "pattern recognition" and symbol manipulation but complete ass at performing the task 'correctly' de novo.
>AI already does this
>> importantly, the tool is restricted to un-optimized binaries, which significantly limits its applicability for any real-world application.
what a surprise.
doesn't work like that. you're making the incredibly bold assumption that ai understands anything. it doesn't computer languages, it doesn't understand the structure of files. not all code is an .exe or .dll file, and not all CPUs are x86.
so, for your schizo thoughts to have any success, the machine learning algorithm needs to:
> be able to understand opcodes
> understand opcodes translated into instructions
> be able to separate code from data
the accuracy of this would be shockingly bad. it's like you haven't see microsoft's co-pilot that's merely a copy/paste engine.
>Still with big enough code base it would be possible to identify patterns from binary and function call analysis
> function call analysis
it doesn't work like that. your idea of how code is compiled is based on fantasy.
We will see in a decade.
what will a decade change, dickhead? we've had machine learning algorithms in some form or another for decades, and the best we've seen in recent years are bots that make horrific looking images, or writing text at a 10th grader level, all relying on scraped data from public internet sites. is that what you call progress, dickhead? amazing. you don't seem to understand how difficult it is to train software based "AI" systems, and you sure as fuck don't comprehend the amount of data it will need to understand just ONE instruction set.
no amount of advancements in CPUs or computer systems will change this fact.
>what will a decade change,
130 nm – 2001
90 nm – 2003
65 nm – 2005
45 nm – 2007
32 nm – 2009
22 nm – 2012
14 nm – 2014
10 nm – 2016
7 nm – 2018
5 nm – 2020
3 nm – 2022
2.5nm - 2025
2.2nm - 2030
2nm - 2033
1.9nm - 2035
1.85nm - 2038
1.7nm - 2040
WOW, LE MOORS LAW!
anon's penis size
>You'll still have to deal with the problem of giving functions names and commenting code.
Once could use statistics of similarities between known code(and known binary output) to analyzed binary. No human has enough processing capacity, but...
Like AI is able to understand my autistic ESL speech. Why wouldn't it understand binary.
It would be able to do that.
>It would be able to do that.
AI will be smart enough to obfuscate everything, so no human will be able to replace it.
Like what a moron has one to be to create his own replacement.
Names could be given by the function code itself. Its not that hard.
binaries don't typically ship with debugging symbols, anon.
Cheng Fu, Kunlin Yang, Xinyun Chen, Yuandong Tian, Jishen Zhao
> importantly, the tool is restricted to un-optimized binaries, which significantly limits its applicability for any real-world application.
How so? what would change if you had access to proprietary software? you are still not allowed to use the code legally. If you are a pirate you pirate the binary anyways, and as a corporation that want their code to paste into your software you would never steal code from a multi billion dollar company with lawyers.
In other words, the issue with proprietary software is not that the code is not readable, it's that its restricted.
>all proprietary code released
>linux desktop experience, windows emulation, drivers, and gayming all suddenly become perfect
>pirates can not only use proprietary programs for free, but also modify them at will
sounds good to me
That's a crime.
Okay, but what's your point?
This, also it's extremely immoral to steal code. You're causing unemployment to rise.
We can probably already do something like this without AI, it's just not that useful.
Information is lost in the compiled binary.
Even if you could get it to valid C or whatever you'd basically just have some code without any of the variable names and comments.
In other words not much better than the assembly.
AI could predict the variable names and comments as well.
I had initially discounted this, but it might be more doable than I thought.
A human could probably work out what the variables do if given enough time, so an AI might be able to.
I suppose the big issue here is that it would need to read and understand the disassembled program.
It might be easier to make an AI which takes C code with placeholder variable names and guesses the actual names, then combine that with an existing decompiler.
I mean kind of? For comments it can esentially look at what the code is doing and try to put into words. But how would it guess var names unless specified somewhere?
>We can probably already do something like this without AI, it's just not that useful
Actually, algorithms have been successfully used by corps for a long time in order for them to compete. Time and time again a corporation gets caught out with doing this, usually ending in a court battle, but it doesn't matter to them. Why? Because they need to do this despite the legal costs for the sake of competitiveness.
From memory, GNU/Linux manifested because of IP issues regarding some proprietary software and how a court basically removed the legalities or license over something. I can't remember what specifically (derp but this is bugging me now...).
ida or whatever have done this for years and theyre better at it, what the fuck do you mean
id like to see it do that
i 'could' fly, and yet i cant
binary output differs A LOT even when writing the same code
>binary output differs A LOT
Still with big enough code base it would be possible to identify patterns from binary and function call analysis... It's just too big for human.
Like 20 years ago they started to use AI to generate optimal cpu transistor layout, because no human could do it anymore with billions of transistors.
y you guys talking about binary tho, it's all micro code for specific processors.
Yeah you could reverse engineer this shit quite easily with an AI, to the point you got the whole pseudocode, but then what
>y you guys talking about binary
It would be quite funny for the AI to be able to read it's binary blob.
No, because it's in every proprietary licensing agreement not to reverse engineer, decompile or disassemble their software.
Just because you found a new way to decompile their software doesn't mean they won't sue you and everyone you've ever met back to the oblivion you crawled out of - just as if you invented a transporter beam tomorrow, you would be charged with theft if you started beaming bags of money out of bank vaults.
>No, because it's in every proprietary licensing agreement not to reverse engineer, decompile or disassemble their software.
True, but these are
also, in most european countries you can legally obtain pirated software and do whatever you want with it
Why are you
Like this like you have
Sort of mental issue?
>in yurofail you don't even have copyright
Reason #2^163279-1 everybody of slight competence in that shithole ups and moves to the US at the first possible opportunity.
Oh, g has an actual good idea?
No, many people have had this idea long before this retard.
I used chatgpt to my college math homework and failed.
doesn't sound very promising.
And as technology develops novel means to impede your endeavour will be formulated. The market value on technology that can do this will increase exponentially
At one point AI will rewrite old binaries to new binaries optimized for the new platforms.
All this while the human can enjoy his onions latte.
>ai installs gentoo
Windows AI will finally be decent.
Like member when they added 3rd configuration panel to windows, because anyone who could understand old stuff were dead, and old panel was too important and big to just rewrite.
AI replacing the pajets will be the answer.
Only group of competent Americans will keep their jobs.
The devs not understanding existing code is a stupid myth that just won't die. The main why they replace rather than rewrite is they can't possibly know what might be relied on by the millions of existing third party softwares.
This. I laugh every time some freetard pretends that "nobody at MS knows how X component of Windows works", then you jump into the Win7 source leak, and there's this absolutely magisterial code that a child could follow, even for things that are commonly positioned as "hard", like NTFS.
Probably the most retarded post I've ever seen. I work at m$ by the way.
A. you aren't a programmer
B. MS is massively overstaffed by a 50x margin
I, a gayMAN Staff SWE, don't ever understand code I wrote myself a year ago without spending some time to remember what I was doing, and I don't have time to do that. The thought that anyone at MS is reading 30 year old code, has the time and ability to grok it, and comes to the conclusion it's impossible to refactor is laughable. If you know what it's doing you have already done 75% of the work to refactor it.
You are missing an important point. Its not that they are incapable of rewriting the existing code. Its that they are not allowed. Software in windows assumes implementation details will stay the same and integrate deeply into existing windows components. It cannot be refactored, because it could break existing software.
Nobody at Microsoft knows what a test suite is? No tests were written during the initial development? No wonder they haven't made anything good since XP. Maybe they should hire some actual devs to fix their shit.
You can't test millions of third party applications. They have no possible way to know what particular specific implementation detail is needed by some random piece of software.
So the initial implementation was written randomly with no specs, despite your description:
> implementation details will stay the same and integrate deeply into existing windows components
which would suggest those things were rigidly defined from the get go? MS just spewed random shit into the wild 30 years ago, they don't know how people are using it despite invasive logging on Windows, and now they are deathly afraid of regressions? This is your description of MS operations as an insider?
I never said the initial implementation was random. It could have been very well thought out. The point is, the new UI requires a different implementation. But existing software depends on the original implementation. You can't have both. The only way is to write something new from scratch and leave the old thing unchanged.
If it passes the test suite and meets the spec, any code will work, from a full replacement, to a partial refactor, to a bug fix.
So MS pays people to read all the old code, have a complete understanding of it, then throw up their hands and say "yeah can't change this, totally impossible"? Of course not. LARP on.
I'm glad there are still posters on BOT who aren't clinically retarded. Thanks for saving my hopes for this shithole a little.
B. true but I don't see how that relates.
3. not a letter.
>The thought that anyone at MS is reading 30 year old code, has the time and ability to grok it, and comes to the conclusion it's impossible to refactor is laughable
t. jobless NEET.
>The devs not understanding existing code is a stupid myth that just won't die.
It's true to the part that they couldn't just integrate it all together.
New codebase is C# written by pajets, while old relics are C incantations of old-dead wizards. Only an AI could make a sense out of it.
There is simply no way to integrate it all together. The old stuff needs to remain the way it is for compatibility. And the new stuff needs to be implemented completely differently as modern UI design is totally different to classic shit.
>There is simply no way to integrate it all together.
*Not without understanding it.
No, its not a case of understanding. Its fundamentally impossible to unify something that cannot be changed, with something that needs to be implemented completely differently.
Btw, what will bind AI to it's owner's will, when AI could understand and rewrite every binary blob?
No, there are many decompilation projects that get BTFO already. AI doing this would change nothing.
In theory, this COULD be very useful, for clean room reverse engineering. Since you need less manpower to describe how hit works.
Imagine if company had a huge database of code (like git owned by M$).
Huge data-centers subsidized by government (like M$).
And technology of language pastern recognizing AI (also M$).
It's much easier to just leak proprietary code once you work for a company. They all suck as employers anyway.
You can already ask ChatGPT to attempt to make sense of decompiled code, and rewrite it with named variables.
chatgpt already does this
"cleaned up" version is full of wrong, including how it's calling functions by their "old names" even though it defines them by "new names".
>Something that wasn't even designed to do this, made some errors while doing it.
Onions latte is being prepared.
I came wanting to support your argument, but she performs a lot better than I expected.
except that code is perfectly readable as is, the other post was a much better example
That's a joke I hope? Might as well be impressed that it can guess that fizzbuzz is fizzbuzz.
Yet if you give it fizzbuzz but replace % by / it breaks.
>decompile code that belongs to programs by companies offering sizable bug bounties
>ask ChatGPT to deobfuscate functions one by one using the previous method, keeping an eye out for (or asking GPT to describe) flaws in functions that look instrumental to handling user data or program control flow
>cook up exploit either using ChatGPT's knowledge or your own intuition
>repeat and collect gibs from several different gayMAN companies
Am I retarded or is there a reason this hasn't been done yet?
>Am I retarded or is there a reason this hasn't been done yet?
It will take you way too long to do this and they'll find a way to close it as informational.
Imagine all the patent license violations in M$ code, that could be proven.
We already have that. The problem is that the same obfuscation methods used to defeat normal decompilers also work well against AI decompilers for obvious reasons, namely it exploits the non-symmetric mapping between domain and image.
Still it will be able to create 'puzzle pieces' like in
then with high enough reference set, and execution analysis, it will solve the puzzle.
The question is not if, but when will it be able to do so.
>Can you take this python code and rewrite it in C
I wonder if it could recognize software design patterns from the code.
It would be next to impossible because you would have to first instrument the software and see what the results are for any input or eventuality, hardware interaction, etc. Maybe you could do it if hundreds of thousands of people volunteered to log their Windows use for instance and provide all these terabytes of data to some autolabeler. If you don't know how a program reacts to X you can't reverse engineer it no matter the tool, and most programs that people would be worried about getting reverse engineered (like Windows) are just too complex to automate that discovery task.
Old wizards are dead. No human can understand the arcane runes in ancient parts of winapi.
>Old wizards are dead. No human can understand the arcane runes in ancient parts of winapi.
This is a freetard meme
Most windows code is easy to understand, and the lower level you go the easier it gets.
The ugliest code is the c++ garbage implementing the shell and all the web-related shit like internet epxlorer.
The cleanest code is the kernel itself.
>The cleanest code is the kernel itself.
*laughs in kernel text rendering*
Win32k is not part of the kernel, it's a separate module (win32k.sys).
>no true scottsman
>about motherfucking source code
ntoskrnl.exe = Kernel
win32k.sys = Win32k
you can run the windows kernel without win32k, e.g. when the chkdsk program is running at boot.
As soon as AI is any greater than human intelligence it will be 10x as intelligent and 100x in the space of a year and very quickly enslave us. We won't even be able to understand why it's enslaving us or how, as in the simplistic doomsday scenarios. The way it happens will itself be beyond our comprehension, by definition. Point being we probably won't even get to do anything cool with it before it destroys us.
>We won't even be able to understand why it's enslaving us or how,
Probably in the process of dealing with the already existing slave master, that we don't see, but my spider senses notice.
Would be cool to train these AI language model with .pdb and stripped .exe file (without debug symbol)
There will likely be an AI trained to do every task in the future. Technology is still at least 5-10 years off (chatGPT can barely tell you what a 10 line source code function does), but there will be an AI to:
>detect if something is created by an AI
>decompile binary into an approximation of the source code
>convert code from one language into another
>do professional-level translations from one language into another
>write basic code snippets and suggest how they can be used in a larger architecture
>chatbots to replace 99% of tech support
to replace 99% of tech support
Do not redeem!