Could AI be used to decompile code? What would be the ramifications?

https://arxiv.org/abs/2102.07492
https://www.debykatz.com/saner-rnn-decomp.pdf

  1. 1 month ago
    Anonymous

    mmmmmm
    no solidity contracts deployed without their ABI's
    no more need to verify and publish source code
    No more mystery contracts with large volumes being mysterious

    • 1 month ago
      Anonymous

      I might be wrong but I thought decompilation was pretty straightforward/mechanistic.
      You won't get the exact code that produced the complied output but it will compile to give the same end result.
      People have built tools that try to obfuscate what a program does (to basically do everything in a roundabout way) so I imagine that is where an AI might be useful to understand the gist of a program.

      Ultimately, I think the problem boils down to the halting problem so it won't work across the board.
      Basically, some step will involve comparing the given illegible program to a cleaned up program (given by the ai) to see if they are functionally the same.
      If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.

      The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.

      > ramifications

      Only people who rely on security through obscurity will suffer.

      There is only open source from now on.

      Look up indistinguishability obfuscation
      https://en.wikipedia.org/wiki/Indistinguishability_obfuscation
      There will still be ways to hand you a program that you have no way of understanding what it is doing.

      • 1 month ago
        Anonymous

        >If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.
        Halting problem is proven to be unsolvable.

        • 1 month ago
          Anonymous

          >Halting problem is proven to be unsolvable
          No shit.

          I might be wrong but I thought decompilation was pretty straightforward/mechanistic.
          You won't get the exact code that produced the complied output but it will compile to give the same end result.
          People have built tools that try to obfuscate what a program does (to basically do everything in a roundabout way) so I imagine that is where an AI might be useful to understand the gist of a program.

          Ultimately, I think the problem boils down to the halting problem so it won't work across the board.
          Basically, some step will involve comparing the given illegible program to a cleaned up program (given by the ai) to see if they are functionally the same.
          If such a way to compare programs existed, I think there is a way to show it can solve the halting problem.

          The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.

          [...]
          Look up indistinguishability obfuscation
          https://en.wikipedia.org/wiki/Indistinguishability_obfuscation
          There will still be ways to hand you a program that you have no way of understanding what it is doing.

          >I think the problem boils down to the halting problem so it won't work across the board.
          Clearly I presumed it to be unsolvable and then concluded
          >so it won't work across the board
          And suggested
          >The best you could hope for is a list of all valid transformations/reductions that leave a program unchanged in functionality and then have the ai work with those to "intelligently" beat a program into something a human can read.

          [...]
          Regular decompilation gives you the structure already

          The low level machine code technically gives the structure too (in a very convoluted form).
          Maybe look into obfuscation to understand what the goal of decompilation is.
          https://en.wikipedia.org/wiki/Obfuscation_(software)
          An example is the mov instruction is turing complete. There is a way to compile code in a way that only uses mov instructions.

          Unless you have designed a decompiler to understand the mov turing machine and how to translate its operations back into the original operations, it will just return a bunch of mov instructions.

          You took out part of what I said. I said structured code where nothing is named is fairly useless. Because it's extremely hard to read and a long way from anything like a typical high level language. If everything is structured but with totally random names then it tells you almost nothing about what the code is doing and you have to rely solely on any logic that might be mixed in with it to work out what it's doing. I dig through minified JavaScript in chrome devtools fairly often for my work and most of it is totally unreadable but I can make my way around by the odd things that didn't get renamed and by using the debugger. The structure is all there and it doesn't help much, I usually have to hope they used some strings around the place which didn't get minified

          The naming is the least important issue. Variables/functions are how they are used.
          The crux of the issue is you have a turing machine and you want a way to be able to say what it does in a more coherent way than just outputting the initial state and all of the transition rules.

  2. 1 month ago
    Anonymous

    > ramifications

    Only people who rely on security through obscurity will suffer.

    There is only open source from now on.

  3. 1 month ago
    Anonymous

    I don't see how AI could perform any better than a typical decompiler. A decompiler doesn't estimate things, it just translates machine code back to opcodes or some low level language using a fixed algorithm. It can't decompile further than opcodes or low level language because the machine code doesn't contain the original source code information, it's lost during the initial compilation. If you have debug symbols or a sourcemap in the case of JavaScript then you can somewhat return it to its original state but it won't be the same

    • 1 month ago
      Anonymous

      >I don't see how AI could perform any better than a typical decompiler.
      A typical decompiler might just come up with a bunch of ints called int a int aa int aaa, but an AI compiler could actually come up with contextual names for all the variables that would make understanding the code much easier for people.

      • 1 month ago
        Anonymous

        [...]

        I was thinking that might be the idea, that it would somehow keep guessing the context of things until it settles on what it thinks the code does and then names things appropriately. But the problem is that most of the time the context is totally lost during the initial compilation. Like if i have a class called Dog and I instantiate a few dogs and call the bark function on each one, then I compile it, then there's nothing in the output to say that class was called Dog and it had a bark function, it will just be a random class name with random function names. You probably know that already but I think that's going to be extremely difficult to overcome if not impossible in a lot of compiled code cases.

        Where it might work is with like you said JavaScript and other interpreted code where the code is only partially renamed during minification. Or if they have access to sourcemaps or other assets. Like unbundling something built with webpack, but even then the original filenames are usually lost and bundled files are referenced by numbers. So yeah, deobfuscation like the first paper talks about. Decompilation only from machine code though sounds a lot harder

  4. 1 month ago
    Anonymous

    bump

  5. 1 month ago
    Anonymous

    bump

  6. 1 month ago
    Anonymous

    [...]

    >I have no idea why you conflate this braindamaged take with my post
    How do you think a neural network works exactly

    • 1 month ago
      Anonymous

      I know how they work much better than you do, but that's besides the point. The point is that you completely misunderstand what the point of NN-assisted decompilation is. It's not about recovering superficial details like variable names. It's about recovering the higher level structure of a program.

      • 1 month ago
        Anonymous

        So if you know how they work then how is this a braindead take
        >keep guessing the context of things until it settles on what it thinks the code does and then names things appropriately
        because this is exactly what they do, they assign parsed and stemmed language features to nodes and redistribute weights based on input values. I've written my own NLP neural network from scratch in JavaScript before, it was for a chatbot project, it was similar to the code nlp.js uses

        • 1 month ago
          Anonymous

          Once again, I don't know what your mouth-breathing rhetoric is about. I'm just telling you the point of NN-assisted decompilation is to recover the high level structure of a program, not variable names. This is possible because using abstractions results in unnatural and characteristic machine code patterns.

          • 1 month ago
            Anonymous

            ...and the higher level structure of a program is fairly useless without any context. Minified JavaScript is already structured so it doesn't help there if it can't restore variable names. C++ and similar languages can decompile to C and c++ already but without anything named properly. The structure is already in the compiled code, it has to be, there's no issue getting the structure back from decompilation, it might be a bit disorganized but it's all there. The problem is nothing is named

            Regular decompilation gives you the structure already

        • 1 month ago
          Anonymous

          ...and the higher level structure of a program is fairly useless without any context. Minified JavaScript is already structured so it doesn't help there if it can't restore variable names. C++ and similar languages can decompile to C and c++ already but without anything named properly. The structure is already in the compiled code, it has to be, there's no issue getting the structure back from decompilation, it might be a bit disorganized but it's all there. The problem is nothing is named

          • 1 month ago
            Anonymous

            [...]
            Regular decompilation gives you the structure already

            > the higher level structure of a program is fairly useless
            >Regular decompilation gives you the structure already
            Clueless imbecile 100% confirmed.

            • 1 month ago
              Anonymous

              You took out part of what I said. I said structured code where nothing is named is fairly useless. Because it's extremely hard to read and a long way from anything like a typical high level language. If everything is structured but with totally random names then it tells you almost nothing about what the code is doing and you have to rely solely on any logic that might be mixed in with it to work out what it's doing. I dig through minified JavaScript in chrome devtools fairly often for my work and most of it is totally unreadable but I can make my way around by the odd things that didn't get renamed and by using the debugger. The structure is all there and it doesn't help much, I usually have to hope they used some strings around the place which didn't get minified

              • 1 month ago
                Anonymous

                >I said structured code where nothing is named is fairly useless.
                It's a lot more useful than barely structured code where nothing is named, which is what reverse engineering has to deal with currently, hence it's being worked on. You're a mouth breather.

  7. 1 month ago
    Anonymous

    bump

Your email address will not be published. Required fields are marked *