mmmmmm
no solidity contracts deployed without their ABI's
no more need to verify and publish source code
No more mystery contracts with large volumes being mysterious
I might be wrong but I thought decompilation was pretty straightforward/mechanistic.
You won't get the exact code that produced the complied output but it will compile to give the same end result.
People have built tools that try to obfuscate what a program does (to basically do everything in a roundabout way) so I imagine that is where an AI might be useful to understand the gist of a program.
Ultimately, I think the problem boils down to the halting problem so it won't work across the board.
Basically, some step will involve comparing the given illegible program to a cleaned up program (given by the ai) to see if they are functionally the same.
If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.
The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.
> ramifications
Only people who rely on security through obscurity will suffer.
There is only open source from now on.
Look up indistinguishability obfuscation
https://en.wikipedia.org/wiki/Indistinguishability_obfuscation
There will still be ways to hand you a program that you have no way of understanding what it is doing.
>If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.
Halting problem is proven to be unsolvable.
>Halting problem is proven to be unsolvable
No shit.
I might be wrong but I thought decompilation was pretty straightforward/mechanistic.
You won't get the exact code that produced the complied output but it will compile to give the same end result.
People have built tools that try to obfuscate what a program does (to basically do everything in a roundabout way) so I imagine that is where an AI might be useful to understand the gist of a program.
Ultimately, I think the problem boils down to the halting problem so it won't work across the board.
Basically, some step will involve comparing the given illegible program to a cleaned up program (given by the ai) to see if they are functionally the same.
If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.
The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.
[...]
Look up indistinguishability obfuscation
https://en.wikipedia.org/wiki/Indistinguishability_obfuscation
There will still be ways to hand you a program that you have no way of understanding what it is doing.
>I think the problem boils down to the halting problem so it won't work across the board.
Clearly I presumed it to be unsolvable and then concluded >so it won't work across the board
And suggested >The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.
[...]
Regular decompilation gives you the structure already
The low level machine code technically gives the structure too (in a very convoluted form).
Maybe look into obfuscation to understand what the goal of decompilation is.
https://en.wikipedia.org/wiki/Obfuscation_(software)
An example is the mov instruction is turing complete. There is a way to compile code in a way that only uses mov instructions.
Unless you have designed a decompiler to understand the mov turing machine and how to translate its operations back into the original operations, it will just return a bunch of mov instructions.
You took out part of what I said. I said structured code where nothing is named is fairly useless. Because it's extremely hard to read and a long way from anything like a typical high level language. If everything is structured but with totally random names then it tells you almost nothing about what the code is doing and you have to rely solely on any logic that might be mixed in with it to work out what it's doing. I dig through minified JavaScript in chrome devtools fairly often for my work and most of it is totally unreadable but I can make my way around by the odd things that didn't get renamed and by using the debugger. The structure is all there and it doesn't help much, I usually have to hope they used some strings around the place which didn't get minified
The naming is the least important issue. Variables/functions are how they are used.
The crux of the issue is you have a turing machine and you want a way to be able to say what it does in a more coherent way than just outputting the initial state and all of the transition rules.
I don't see how AI could perform any better than a typical decompiler. A decompiler doesn't estimate things, it just translates machine code back to opcodes or some low level language using a fixed algorithm. It can't decompile further than opcodes or low level language because the machine code doesn't contain the original source code information, it's lost during the initial compilation. If you have debug symbols or a sourcemap in the case of JavaScript then you can somewhat return it to its original state but it won't be the same
>I don't see how AI could perform any better than a typical decompiler.
A typical decompiler might just come up with a bunch of ints called int a int aa int aaa, but an AI compiler could actually come up with contextual names for all the variables that would make understanding the code much easier for people.
I was thinking that might be the idea, that it would somehow keep guessing the context of things until it settles on what it thinks the code does and then names things appropriately. But the problem is that most of the time the context is totally lost during the initial compilation. Like if i have a class called Dog and I instantiate a few dogs and call the bark function on each one, then I compile it, then there's nothing in the output to say that class was called Dog and it had a bark function, it will just be a random class name with random function names. You probably know that already but I think that's going to be extremely difficult to overcome if not impossible in a lot of compiled code cases.
Where it might work is with like you said JavaScript and other interpreted code where the code is only partially renamed during minification. Or if they have access to sourcemaps or other assets. Like unbundling something built with webpack, but even then the original filenames are usually lost and bundled files are referenced by numbers. So yeah, deobfuscation like the first paper talks about. Decompilation only from machine code though sounds a lot harder
I know how they work much better than you do, but that's besides the point. The point is that you completely misunderstand what the point of NN-assisted decompilation is. It's not about recovering superficial details like variable names. It's about recovering the higher level structure of a program.
So if you know how they work then how is this a braindead take >keep guessing the context of things until it settles on what it thinks the code does and then names things appropriately
because this is exactly what they do, they assign parsed and stemmed language features to nodes and redistribute weights based on input values. I've written my own NLP neural network from scratch in JavaScript before, it was for a chatbot project, it was similar to the code nlp.js uses
Once again, I don't know what your mouth-breathing rhetoric is about. I'm just telling you the point of NN-assisted decompilation is to recover the high level structure of a program, not variable names. This is possible because using abstractions results in unnatural and characteristic machine code patterns.
...and the higher level structure of a program is fairly useless without any context. Minified JavaScript is already structured so it doesn't help there if it can't restore variable names. C++ and similar languages can decompile to C and c++ already but without anything named properly. The structure is already in the compiled code, it has to be, there's no issue getting the structure back from decompilation, it might be a bit disorganized but it's all there. The problem is nothing is named
Regular decompilation gives you the structure already
...and the higher level structure of a program is fairly useless without any context. Minified JavaScript is already structured so it doesn't help there if it can't restore variable names. C++ and similar languages can decompile to C and c++ already but without anything named properly. The structure is already in the compiled code, it has to be, there's no issue getting the structure back from decompilation, it might be a bit disorganized but it's all there. The problem is nothing is named
You took out part of what I said. I said structured code where nothing is named is fairly useless. Because it's extremely hard to read and a long way from anything like a typical high level language. If everything is structured but with totally random names then it tells you almost nothing about what the code is doing and you have to rely solely on any logic that might be mixed in with it to work out what it's doing. I dig through minified JavaScript in chrome devtools fairly often for my work and most of it is totally unreadable but I can make my way around by the odd things that didn't get renamed and by using the debugger. The structure is all there and it doesn't help much, I usually have to hope they used some strings around the place which didn't get minified
5 months ago
Anonymous
>I said structured code where nothing is named is fairly useless.
It's a lot more useful than barely structured code where nothing is named, which is what reverse engineering has to deal with currently, hence it's being worked on. You're a mouth breather.
mmmmmm
no solidity contracts deployed without their ABI's
no more need to verify and publish source code
No more mystery contracts with large volumes being mysterious
I might be wrong but I thought decompilation was pretty straightforward/mechanistic.
You won't get the exact code that produced the complied output but it will compile to give the same end result.
People have built tools that try to obfuscate what a program does (to basically do everything in a roundabout way) so I imagine that is where an AI might be useful to understand the gist of a program.
Ultimately, I think the problem boils down to the halting problem so it won't work across the board.
Basically, some step will involve comparing the given illegible program to a cleaned up program (given by the ai) to see if they are functionally the same.
If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.
The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.
Look up indistinguishability obfuscation
https://en.wikipedia.org/wiki/Indistinguishability_obfuscation
There will still be ways to hand you a program that you have no way of understanding what it is doing.
>If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.
Halting problem is proven to be unsolvable.
>Halting problem is proven to be unsolvable
No shit.
>I think the problem boils down to the halting problem so it won't work across the board.
Clearly I presumed it to be unsolvable and then concluded
>so it won't work across the board
And suggested
>The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.
The low level machine code technically gives the structure too (in a very convoluted form).
Maybe look into obfuscation to understand what the goal of decompilation is.
https://en.wikipedia.org/wiki/Obfuscation_(software)
An example is the mov instruction is turing complete. There is a way to compile code in a way that only uses mov instructions.
Unless you have designed a decompiler to understand the mov turing machine and how to translate its operations back into the original operations, it will just return a bunch of mov instructions.
The naming is the least important issue. Variables/functions are how they are used.
The crux of the issue is you have a turing machine and you want a way to be able to say what it does in a more coherent way than just outputting the initial state and all of the transition rules.
> ramifications
Only people who rely on security through obscurity will suffer.
There is only open source from now on.
I don't see how AI could perform any better than a typical decompiler. A decompiler doesn't estimate things, it just translates machine code back to opcodes or some low level language using a fixed algorithm. It can't decompile further than opcodes or low level language because the machine code doesn't contain the original source code information, it's lost during the initial compilation. If you have debug symbols or a sourcemap in the case of JavaScript then you can somewhat return it to its original state but it won't be the same
>I don't see how AI could perform any better than a typical decompiler.
A typical decompiler might just come up with a bunch of ints called int a int aa int aaa, but an AI compiler could actually come up with contextual names for all the variables that would make understanding the code much easier for people.
I was thinking that might be the idea, that it would somehow keep guessing the context of things until it settles on what it thinks the code does and then names things appropriately. But the problem is that most of the time the context is totally lost during the initial compilation. Like if i have a class called Dog and I instantiate a few dogs and call the bark function on each one, then I compile it, then there's nothing in the output to say that class was called Dog and it had a bark function, it will just be a random class name with random function names. You probably know that already but I think that's going to be extremely difficult to overcome if not impossible in a lot of compiled code cases.
Where it might work is with like you said JavaScript and other interpreted code where the code is only partially renamed during minification. Or if they have access to sourcemaps or other assets. Like unbundling something built with webpack, but even then the original filenames are usually lost and bundled files are referenced by numbers. So yeah, deobfuscation like the first paper talks about. Decompilation only from machine code though sounds a lot harder
bump
bump
>I have no idea why you conflate this braindamaged take with my post
How do you think a neural network works exactly
I know how they work much better than you do, but that's besides the point. The point is that you completely misunderstand what the point of NN-assisted decompilation is. It's not about recovering superficial details like variable names. It's about recovering the higher level structure of a program.
So if you know how they work then how is this a braindead take
>keep guessing the context of things until it settles on what it thinks the code does and then names things appropriately
because this is exactly what they do, they assign parsed and stemmed language features to nodes and redistribute weights based on input values. I've written my own NLP neural network from scratch in JavaScript before, it was for a chatbot project, it was similar to the code nlp.js uses
Once again, I don't know what your mouth-breathing rhetoric is about. I'm just telling you the point of NN-assisted decompilation is to recover the high level structure of a program, not variable names. This is possible because using abstractions results in unnatural and characteristic machine code patterns.
Regular decompilation gives you the structure already
...and the higher level structure of a program is fairly useless without any context. Minified JavaScript is already structured so it doesn't help there if it can't restore variable names. C++ and similar languages can decompile to C and c++ already but without anything named properly. The structure is already in the compiled code, it has to be, there's no issue getting the structure back from decompilation, it might be a bit disorganized but it's all there. The problem is nothing is named
> the higher level structure of a program is fairly useless
>Regular decompilation gives you the structure already
Clueless imbecile 100% confirmed.
You took out part of what I said. I said structured code where nothing is named is fairly useless. Because it's extremely hard to read and a long way from anything like a typical high level language. If everything is structured but with totally random names then it tells you almost nothing about what the code is doing and you have to rely solely on any logic that might be mixed in with it to work out what it's doing. I dig through minified JavaScript in chrome devtools fairly often for my work and most of it is totally unreadable but I can make my way around by the odd things that didn't get renamed and by using the debugger. The structure is all there and it doesn't help much, I usually have to hope they used some strings around the place which didn't get minified
>I said structured code where nothing is named is fairly useless.
It's a lot more useful than barely structured code where nothing is named, which is what reverse engineering has to deal with currently, hence it's being worked on. You're a mouth breather.
bump