Could AI be used to decompile code? What would be the ramifications?

1 year ago

Reply

Anonymous

mmmmmm
no solidity contracts deployed without their ABI's
no more need to verify and publish source code
No more mystery contracts with large volumes being mysterious

1 year ago

Reply

Anonymous

I might be wrong but I thought decompilation was pretty straightforward/mechanistic.
You won't get the exact code that produced the complied output but it will compile to give the same end result.
People have built tools that try to obfuscate what a program does (to basically do everything in a roundabout way) so I imagine that is where an AI might be useful to understand the gist of a program.

Ultimately, I think the problem boils down to the halting problem so it won't work across the board.
Basically, some step will involve comparing the given illegible program to a cleaned up program (given by the ai) to see if they are functionally the same.
If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.

The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.

> ramifications

Only people who rely on security through obscurity will suffer.

There is only open source from now on.

Look up indistinguishability obfuscation
https://en.wikipedia.org/wiki/Indistinguishability_obfuscation
There will still be ways to hand you a program that you have no way of understanding what it is doing.
- 1 year ago
  
  Reply
  
  Anonymous
  
  >If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.
  Halting problem is proven to be unsolvable.
  - 1 year ago
    
    Reply
    
    Anonymous
    
    >Halting problem is proven to be unsolvable
    No shit.
    
    I might be wrong but I thought decompilation was pretty straightforward/mechanistic.
    You won't get the exact code that produced the complied output but it will compile to give the same end result.
    People have built tools that try to obfuscate what a program does (to basically do everything in a roundabout way) so I imagine that is where an AI might be useful to understand the gist of a program.
    
    Ultimately, I think the problem boils down to the halting problem so it won't work across the board.
    Basically, some step will involve comparing the given illegible program to a cleaned up program (given by the ai) to see if they are functionally the same.
    If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.
    
    The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.
    
    [...]
    Look up indistinguishability obfuscation
    https://en.wikipedia.org/wiki/Indistinguishability_obfuscation
    There will still be ways to hand you a program that you have no way of understanding what it is doing.
    
    >I think the problem boils down to the halting problem so it won't work across the board.
    Clearly I presumed it to be unsolvable and then concluded
    >so it won't work across the board
    And suggested
    >The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.
    
    [...]
    Regular decompilation gives you the structure already
    
    The low level machine code technically gives the structure too (in a very convoluted form).
    Maybe look into obfuscation to understand what the goal of decompilation is.
    https://en.wikipedia.org/wiki/Obfuscation_(software)
    An example is the mov instruction is turing complete. There is a way to compile code in a way that only uses mov instructions.
    
    Unless you have designed a decompiler to understand the mov turing machine and how to translate its operations back into the original operations, it will just return a bunch of mov instructions.
    
    You took out part of what I said. I said structured code where nothing is named is fairly useless. Because it's extremely hard to read and a long way from anything like a typical high level language. If everything is structured but with totally random names then it tells you almost nothing about what the code is doing and you have to rely solely on any logic that might be mixed in with it to work out what it's doing. I dig through minified JavaScript in chrome devtools fairly often for my work and most of it is totally unreadable but I can make my way around by the odd things that didn't get renamed and by using the debugger. The structure is all there and it doesn't help much, I usually have to hope they used some strings around the place which didn't get minified
    
    The naming is the least important issue. Variables/functions are how they are used.
    The crux of the issue is you have a turing machine and you want a way to be able to say what it does in a more coherent way than just outputting the initial state and all of the transition rules.

1 year ago

Reply

Anonymous

> ramifications

Only people who rely on security through obscurity will suffer.

There is only open source from now on.

1 year ago

Reply

Anonymous

I don't see how AI could perform any better than a typical decompiler. A decompiler doesn't estimate things, it just translates machine code back to opcodes or some low level language using a fixed algorithm. It can't decompile further than opcodes or low level language because the machine code doesn't contain the original source code information, it's lost during the initial compilation. If you have debug symbols or a sourcemap in the case of JavaScript then you can somewhat return it to its original state but it won't be the same

1 year ago

Reply

Anonymous

>I don't see how AI could perform any better than a typical decompiler.
A typical decompiler might just come up with a bunch of ints called int a int aa int aaa, but an AI compiler could actually come up with contextual names for all the variables that would make understanding the code much easier for people.
- 1 year ago
  
  Reply
  
  Anonymous
  
  [...]
  
  I was thinking that might be the idea, that it would somehow keep guessing the context of things until it settles on what it thinks the code does and then names things appropriately. But the problem is that most of the time the context is totally lost during the initial compilation. Like if i have a class called Dog and I instantiate a few dogs and call the bark function on each one, then I compile it, then there's nothing in the output to say that class was called Dog and it had a bark function, it will just be a random class name with random function names. You probably know that already but I think that's going to be extremely difficult to overcome if not impossible in a lot of compiled code cases.
  
  Where it might work is with like you said JavaScript and other interpreted code where the code is only partially renamed during minification. Or if they have access to sourcemaps or other assets. Like unbundling something built with webpack, but even then the original filenames are usually lost and bundled files are referenced by numbers. So yeah, deobfuscation like the first paper talks about. Decompilation only from machine code though sounds a lot harder

1 year ago

Reply

Anonymous

bump

1 year ago

Reply

Anonymous

bump

1 year ago

Reply

Anonymous

[...]

>I have no idea why you conflate this braindamaged take with my post
How do you think a neural network works exactly

1 year ago

Reply

Anonymous

I know how they work much better than you do, but that's besides the point. The point is that you completely misunderstand what the point of NN-assisted decompilation is. It's not about recovering superficial details like variable names. It's about recovering the higher level structure of a program.
- 1 year ago
  
  Reply
  
  Anonymous
  
  So if you know how they work then how is this a braindead take
  >keep guessing the context of things until it settles on what it thinks the code does and then names things appropriately
  because this is exactly what they do, they assign parsed and stemmed language features to nodes and redistribute weights based on input values. I've written my own NLP neural network from scratch in JavaScript before, it was for a chatbot project, it was similar to the code nlp.js uses
  - 1 year ago
    
    Reply
    
    Anonymous
    
    Once again, I don't know what your mouth-breathing rhetoric is about. I'm just telling you the point of NN-assisted decompilation is to recover the high level structure of a program, not variable names. This is possible because using abstractions results in unnatural and characteristic machine code patterns.
    - 1 year ago
      
      Reply
      
      Anonymous
      
      ...and the higher level structure of a program is fairly useless without any context. Minified JavaScript is already structured so it doesn't help there if it can't restore variable names. C++ and similar languages can decompile to C and c++ already but without anything named properly. The structure is already in the compiled code, it has to be, there's no issue getting the structure back from decompilation, it might be a bit disorganized but it's all there. The problem is nothing is named
      
      Regular decompilation gives you the structure already
  - 1 year ago
    
    Reply
    
    Anonymous
    
    ...and the higher level structure of a program is fairly useless without any context. Minified JavaScript is already structured so it doesn't help there if it can't restore variable names. C++ and similar languages can decompile to C and c++ already but without anything named properly. The structure is already in the compiled code, it has to be, there's no issue getting the structure back from decompilation, it might be a bit disorganized but it's all there. The problem is nothing is named
    - 1 year ago
      
      Reply
      
      Anonymous
      
      [...]
      Regular decompilation gives you the structure already
      
      > the higher level structure of a program is fairly useless
      >Regular decompilation gives you the structure already
      Clueless imbecile 100% confirmed.
      - 1 year ago
        
        Reply
        
        Anonymous
        
        You took out part of what I said. I said structured code where nothing is named is fairly useless. Because it's extremely hard to read and a long way from anything like a typical high level language. If everything is structured but with totally random names then it tells you almost nothing about what the code is doing and you have to rely solely on any logic that might be mixed in with it to work out what it's doing. I dig through minified JavaScript in chrome devtools fairly often for my work and most of it is totally unreadable but I can make my way around by the odd things that didn't get renamed and by using the debugger. The structure is all there and it doesn't help much, I usually have to hope they used some strings around the place which didn't get minified
        
        1 year ago
        
        Anonymous
        
        >I said structured code where nothing is named is fairly useless.
        It's a lot more useful than barely structured code where nothing is named, which is what reverse engineering has to deal with currently, hence it's being worked on. You're a mouth breather.

1 year ago

Reply

Anonymous

bump

Cancel reply