Ai decompiler

What if we trained an AI to decompile binaries back to something that's very close to the original source code, wouldn't this end proprietary software?

ChatGPT Wizard Shirt $21.68

Beware Cat Shirt $21.68

ChatGPT Wizard Shirt $21.68

  1. 1 year ago
    Anonymous

    >What if we
    then do it feggit

    • 1 year ago
      Anonymous

      I think before I do

      It would be the end, yes, but it would be fought. Hard.

      Of course

      You'll still have to deal with the problem of giving functions names and commenting code.

      Solid point

      https://openreview.net/forum?id=6GkL6qM3LV

      Too many words

      • 1 year ago
        Anonymous

        >I think before I do
        Evidently not since you think it's possible.

      • 1 year ago
        Anonymous

        >I think before I do
        >Too many words

  2. 1 year ago
    Anonymous

    It would be the end, yes, but it would be fought. Hard.

    • 1 year ago
      Anonymous

      But corporations have been using things like this for generations now against other corporations.
      It's there, but it's not visible to the public or that would be suicide and a big legal situation.

  3. 1 year ago
    Anonymous

    You'll still have to deal with the problem of giving functions names and commenting code.

    • 1 year ago
      Anonymous

      AI already does this

      • 1 year ago
        Anonymous

        Not at all. If you swap your words around ChatGPT just dies outright. It's really good at "pattern recognition" and symbol manipulation but complete ass at performing the task 'correctly' de novo.

      • 1 year ago
        Anonymous

        >AI already does this

        > importantly, the tool is restricted to un-optimized binaries, which significantly limits its applicability for any real-world application.

        >> importantly, the tool is restricted to un-optimized binaries, which significantly limits its applicability for any real-world application.
        what a surprise.

        That's a crime.

        Full stop.

        >Full stop.
        moron

        >binary output differs A LOT
        Still with big enough code base it would be possible to identify patterns from binary and function call analysis... It's just too big for human.
        Like 20 years ago they started to use AI to generate optimal cpu transistor layout, because no human could do it anymore with billions of transistors.

        doesn't work like that. you're making the incredibly bold assumption that ai understands anything. it doesn't computer languages, it doesn't understand the structure of files. not all code is an .exe or .dll file, and not all CPUs are x86.

        so, for your schizo thoughts to have any success, the machine learning algorithm needs to:
        > be able to understand opcodes
        > understand opcodes translated into instructions
        > be able to separate code from data
        the accuracy of this would be shockingly bad. it's like you haven't see microsoft's co-pilot that's merely a copy/paste engine.

        >Still with big enough code base it would be possible to identify patterns from binary and function call analysis
        > function call analysis
        it doesn't work like that. your idea of how code is compiled is based on fantasy.

        • 1 year ago
          Anonymous

          We will see in a decade.

          • 1 year ago
            Anonymous

            what will a decade change, dickhead? we've had machine learning algorithms in some form or another for decades, and the best we've seen in recent years are bots that make horrific looking images, or writing text at a 10th grader level, all relying on scraped data from public internet sites. is that what you call progress, dickhead? amazing. you don't seem to understand how difficult it is to train software based "AI" systems, and you sure as frick don't comprehend the amount of data it will need to understand just ONE instruction set.

            no amount of advancements in CPUs or computer systems will change this fact.

            • 1 year ago
              Anonymous

              >what will a decade change,
              130 nm – 2001
              90 nm – 2003
              65 nm – 2005
              45 nm – 2007
              32 nm – 2009
              22 nm – 2012
              14 nm – 2014
              10 nm – 2016
              7 nm – 2018
              5 nm – 2020
              3 nm – 2022

              • 1 year ago
                Anonymous

                2.5nm - 2025
                2.2nm - 2030
                2nm - 2033
                1.9nm - 2035
                1.85nm - 2038
                1.7nm - 2040
                WOW, LE MOORS LAW!

              • 1 year ago
                Anonymous

                production process
                anon's penis size

                2.5nm - 2025
                2.2nm - 2030
                2nm - 2033
                1.9nm - 2035
                1.85nm - 2038
                1.7nm - 2040
                WOW, LE MOORS LAW!

    • 1 year ago
      Anonymous

      >You'll still have to deal with the problem of giving functions names and commenting code.
      Once could use statistics of similarities between known code(and known binary output) to analyzed binary. No human has enough processing capacity, but...

      • 1 year ago
        Anonymous

        Like AI is able to understand my autistic ESL speech. Why wouldn't it understand binary.

    • 1 year ago
      Anonymous

      It would be able to do that.

      • 1 year ago
        Anonymous

        >It would be able to do that.
        AI will be smart enough to obfuscate everything, so no human will be able to replace it.
        Like what a moron has one to be to create his own replacement.

    • 1 year ago
      Anonymous

      Names could be given by the function code itself. Its not that hard.

      • 1 year ago
        Anonymous

        binaries don't typically ship with debugging symbols, anon.

  4. 1 year ago
    Anonymous

    https://openreview.net/forum?id=6GkL6qM3LV

    • 1 year ago
      Anonymous

      Cheng Fu, Kunlin Yang, Xinyun Chen, Yuandong Tian, Jishen Zhao

    • 1 year ago
      Anonymous

      > importantly, the tool is restricted to un-optimized binaries, which significantly limits its applicability for any real-world application.

  5. 1 year ago
    Anonymous

    How so? what would change if you had access to proprietary software? you are still not allowed to use the code legally. If you are a pirate you pirate the binary anyways, and as a corporation that want their code to paste into your software you would never steal code from a multi billion dollar company with lawyers.

    • 1 year ago
      Anonymous

      In other words, the issue with proprietary software is not that the code is not readable, it's that its restricted.

    • 1 year ago
      Anonymous

      In other words, the issue with proprietary software is not that the code is not readable, it's that its restricted.

      >all proprietary code released
      >linux desktop experience, windows emulation, drivers, and gayming all suddenly become perfect
      >pirates can not only use proprietary programs for free, but also modify them at will
      sounds good to me

      • 1 year ago
        Anonymous

        That's a crime.

        Full stop.

        • 1 year ago
          Anonymous

          Okay, but what's your point?

    • 1 year ago
      Anonymous

      This, also it's extremely immoral to steal code. You're causing unemployment to rise.

  6. 1 year ago
    Anonymous

    We can probably already do something like this without AI, it's just not that useful.
    Information is lost in the compiled binary.
    Even if you could get it to valid C or whatever you'd basically just have some code without any of the variable names and comments.
    In other words not much better than the assembly.

    • 1 year ago
      Anonymous

      AI could predict the variable names and comments as well.

      • 1 year ago
        Anonymous

        I had initially discounted this, but it might be more doable than I thought.
        A human could probably work out what the variables do if given enough time, so an AI might be able to.
        I suppose the big issue here is that it would need to read and understand the disassembled program.
        It might be easier to make an AI which takes C code with placeholder variable names and guesses the actual names, then combine that with an existing decompiler.

      • 1 year ago
        Anonymous

        I mean kind of? For comments it can esentially look at what the code is doing and try to put into words. But how would it guess var names unless specified somewhere?

    • 1 year ago
      Anonymous

      >We can probably already do something like this without AI, it's just not that useful
      Actually, algorithms have been successfully used by corps for a long time in order for them to compete. Time and time again a corporation gets caught out with doing this, usually ending in a court battle, but it doesn't matter to them. Why? Because they need to do this despite the legal costs for the sake of competitiveness.

      From memory, GNU/Linux manifested because of IP issues regarding some proprietary software and how a court basically removed the legalities or license over something. I can't remember what specifically (derp but this is bugging me now...).

    • 1 year ago
      Anonymous

      ida or whatever have done this for years and theyre better at it, what the frick do you mean

      read above

      AI could predict the variable names and comments as well.

      id like to see it do that
      i 'could' fly, and yet i cant

      >You'll still have to deal with the problem of giving functions names and commenting code.
      Once could use statistics of similarities between known code(and known binary output) to analyzed binary. No human has enough processing capacity, but...

      binary output differs A LOT even when writing the same code

      • 1 year ago
        Anonymous

        >binary output differs A LOT
        Still with big enough code base it would be possible to identify patterns from binary and function call analysis... It's just too big for human.
        Like 20 years ago they started to use AI to generate optimal cpu transistor layout, because no human could do it anymore with billions of transistors.

        • 1 year ago
          Anonymous

          y you guys talking about binary tho, it's all micro code for specific processors.
          Yeah you could reverse engineer this shit quite easily with an AI, to the point you got the whole pseudocode, but then what

          • 1 year ago
            Anonymous

            >y you guys talking about binary
            It would be quite funny for the AI to be able to read it's binary blob.

  7. 1 year ago
    Anonymous

    No, because it's in every proprietary licensing agreement not to reverse engineer, decompile or disassemble their software.
    Just because you found a new way to decompile their software doesn't mean they won't sue you and everyone you've ever met back to the oblivion you crawled out of - just as if you invented a transporter beam tomorrow, you would be charged with theft if you started beaming bags of money out of bank vaults.

    • 1 year ago
      Anonymous

      >No, because it's in every proprietary licensing agreement not to reverse engineer, decompile or disassemble their software.
      True, but these are
      >unenforceable
      also, in most european countries you can legally obtain pirated software and do whatever you want with it

      • 1 year ago
        Anonymous

        Why are you
        >writing
        Like this like you have
        >some
        Sort of mental issue?

        >in yurofail you don't even have copyright
        Reason #2^163279-1 everybody of slight competence in that shithole ups and moves to the US at the first possible opportunity.

  8. 1 year ago
    Anonymous

    Oh, g has an actual good idea?

    • 1 year ago
      NMOUYAS YSOUMNA UYMANSO YMOASNU OUAYMNS

      No, many people have had this idea long before this moron.
      https://www.debykatz.com/saner-rnn-decomp.pdf
      https://arxiv.org/abs/2102.07492

  9. 1 year ago
    Anonymous

    I dunno.
    Sounds complicated.
    I used chatgpt to my college math homework and failed.
    doesn't sound very promising.
    And as technology develops novel means to impede your endeavour will be formulated. The market value on technology that can do this will increase exponentially

  10. 1 year ago
    Anonymous

    At one point AI will rewrite old binaries to new binaries optimized for the new platforms.
    All this while the human can enjoy his onions latte.

    • 1 year ago
      Anonymous

      >ai installs gentoo
      based

      • 1 year ago
        Anonymous

        Windows AI will finally be decent.

        • 1 year ago
          Anonymous

          Like member when they added 3rd configuration panel to windows, because anyone who could understand old stuff were dead, and old panel was too important and big to just rewrite.

          AI replacing the pajets will be the answer.
          Only group of competent Americans will keep their jobs.

          • 1 year ago
            Anonymous

            The devs not understanding existing code is a stupid myth that just won't die. The main why they replace rather than rewrite is they can't possibly know what might be relied on by the millions of existing third party softwares.

            • 1 year ago
              Anonymous

              This. I laugh every time some freetard pretends that "nobody at MS knows how X component of Windows works", then you jump into the Win7 source leak, and there's this absolutely magisterial code that a child could follow, even for things that are commonly positioned as "hard", like NTFS.

              • 1 year ago
                Anonymous

                Probably the most moronic post I've ever seen. I work at m$ by the way.

              • 1 year ago
                Anonymous

                No, its not a case of understanding. Its fundamentally impossible to unify something that cannot be changed, with something that needs to be implemented completely differently.

                Either
                A. you aren't a programmer
                B. MS is massively overstaffed by a 50x margin
                3. Both
                I, a gayMAN Staff SWE, don't ever understand code I wrote myself a year ago without spending some time to remember what I was doing, and I don't have time to do that. The thought that anyone at MS is reading 30 year old code, has the time and ability to grok it, and comes to the conclusion it's impossible to refactor is laughable. If you know what it's doing you have already done 75% of the work to refactor it.

              • 1 year ago
                Anonymous

                You are missing an important point. Its not that they are incapable of rewriting the existing code. Its that they are not allowed. Software in windows assumes implementation details will stay the same and integrate deeply into existing windows components. It cannot be refactored, because it could break existing software.

              • 1 year ago
                Anonymous

                Nobody at Microsoft knows what a test suite is? No tests were written during the initial development? No wonder they haven't made anything good since XP. Maybe they should hire some actual devs to fix their shit.

              • 1 year ago
                Anonymous

                You can't test millions of third party applications. They have no possible way to know what particular specific implementation detail is needed by some random piece of software.

              • 1 year ago
                Anonymous

                So the initial implementation was written randomly with no specs, despite your description:
                > implementation details will stay the same and integrate deeply into existing windows components
                which would suggest those things were rigidly defined from the get go? MS just spewed random shit into the wild 30 years ago, they don't know how people are using it despite invasive logging on Windows, and now they are deathly afraid of regressions? This is your description of MS operations as an insider?

              • 1 year ago
                Anonymous

                I never said the initial implementation was random. It could have been very well thought out. The point is, the new UI requires a different implementation. But existing software depends on the original implementation. You can't have both. The only way is to write something new from scratch and leave the old thing unchanged.

              • 1 year ago
                Anonymous

                If it passes the test suite and meets the spec, any code will work, from a full replacement, to a partial refactor, to a bug fix.

                A. wrong
                B. true but I don't see how that relates.
                3. not a letter.
                >The thought that anyone at MS is reading 30 year old code, has the time and ability to grok it, and comes to the conclusion it's impossible to refactor is laughable
                t. jobless NEET.

                So MS pays people to read all the old code, have a complete understanding of it, then throw up their hands and say "yeah can't change this, totally impossible"? Of course not. LARP on.

              • 1 year ago
                Anonymous

                b8/8

              • 1 year ago
                Anonymous

                You can't test millions of third party applications. They have no possible way to know what particular specific implementation detail is needed by some random piece of software.

                I'm glad there are still posters on BOT who aren't clinically moronic. Thanks for saving my hopes for this shithole a little.

              • 1 year ago
                Anonymous

                A. wrong
                B. true but I don't see how that relates.
                3. not a letter.
                >The thought that anyone at MS is reading 30 year old code, has the time and ability to grok it, and comes to the conclusion it's impossible to refactor is laughable
                t. jobless NEET.

            • 1 year ago
              Anonymous

              >The devs not understanding existing code is a stupid myth that just won't die.
              It's true to the part that they couldn't just integrate it all together.
              New codebase is C# written by pajets, while old relics are C incantations of old-dead wizards. Only an AI could make a sense out of it.

              • 1 year ago
                Anonymous

                There is simply no way to integrate it all together. The old stuff needs to remain the way it is for compatibility. And the new stuff needs to be implemented completely differently as modern UI design is totally different to classic shit.

              • 1 year ago
                Anonymous

                >There is simply no way to integrate it all together.
                *Not without understanding it.

              • 1 year ago
                Anonymous

                No, its not a case of understanding. Its fundamentally impossible to unify something that cannot be changed, with something that needs to be implemented completely differently.

  11. 1 year ago
    Anonymous

    Btw, what will bind AI to it's owner's will, when AI could understand and rewrite every binary blob?

  12. 1 year ago
    Anonymous

    No, there are many decompilation projects that get BTFO already. AI doing this would change nothing.
    In theory, this COULD be very useful, for clean room reverse engineering. Since you need less manpower to describe how hit works.

    • 1 year ago
      Anonymous

      Imagine if company had a huge database of code (like git owned by M$).
      Huge data-centers subsidized by government (like M$).
      And technology of language pastern recognizing AI (also M$).

  13. 1 year ago
    Anonymous

    It's much easier to just leak proprietary code once you work for a company. They all suck as employers anyway.

  14. 1 year ago
    Anonymous

    You can already ask ChatGPT to attempt to make sense of decompiled code, and rewrite it with named variables.

  15. 1 year ago
    Anonymous

    chatgpt already does this

    • 1 year ago
      Anonymous

      "cleaned up" version is full of wrong, including how it's calling functions by their "old names" even though it defines them by "new names".

      • 1 year ago
        Anonymous

        >Something that wasn't even designed to do this, made some errors while doing it.
        Onions latte is being prepared.

      • 1 year ago
        Anonymous

        I came wanting to support your argument, but she performs a lot better than I expected.
        >pic

        • 1 year ago
          Anonymous

          except that code is perfectly readable as is, the other post was a much better example

        • 1 year ago
          Anonymous

          That's a joke I hope? Might as well be impressed that it can guess that fizzbuzz is fizzbuzz.
          Yet if you give it fizzbuzz but replace % by / it breaks.

    • 1 year ago
      Anonymous

      >decompile code that belongs to programs by companies offering sizable bug bounties
      >ask ChatGPT to deobfuscate functions one by one using the previous method, keeping an eye out for (or asking GPT to describe) flaws in functions that look instrumental to handling user data or program control flow
      >cook up exploit either using ChatGPT's knowledge or your own intuition
      >repeat and collect gibs from several different gayMAN companies
      Am I moronic or is there a reason this hasn't been done yet?

      • 1 year ago
        Anonymous

        >Am I moronic or is there a reason this hasn't been done yet?
        It will take you way too long to do this and they'll find a way to close it as informational.

      • 1 year ago
        Anonymous

        Imagine all the patent license violations in M$ code, that could be proven.

  16. 1 year ago
    Anonymous

    We already have that. The problem is that the same obfuscation methods used to defeat normal decompilers also work well against AI decompilers for obvious reasons, namely it exploits the non-symmetric mapping between domain and image.

    • 1 year ago
      Anonymous

      Still it will be able to create 'puzzle pieces' like in

      https://i.imgur.com/TTNvrhd.png

      chatgpt already does this

      then with high enough reference set, and execution analysis, it will solve the puzzle.

      The question is not if, but when will it be able to do so.

  17. 1 year ago
    Anonymous

    I think deobfuscating javascript will be more practical, but I can see AI being used in executable analysis for malware binaries which should be generalizable to proprietary software in general as long as you have a binary.

  18. 1 year ago
    Anonymous

    >Can you take this python code and rewrite it in C

    Kek

  19. 1 year ago
    Anonymous

    I wonder if it could recognize software design patterns from the code.

  20. 1 year ago
    Anonymous

    It would be next to impossible because you would have to first instrument the software and see what the results are for any input or eventuality, hardware interaction, etc. Maybe you could do it if hundreds of thousands of people volunteered to log their Windows use for instance and provide all these terabytes of data to some autolabeler. If you don't know how a program reacts to X you can't reverse engineer it no matter the tool, and most programs that people would be worried about getting reverse engineered (like Windows) are just too complex to automate that discovery task.

  21. 1 year ago
    Anonymous

    Old wizards are dead. No human can understand the arcane runes in ancient parts of winapi.

    • 1 year ago
      Anonymous

      >Old wizards are dead. No human can understand the arcane runes in ancient parts of winapi.
      This is a freetard meme
      Most windows code is easy to understand, and the lower level you go the easier it gets.
      The ugliest code is the c++ garbage implementing the shell and all the web-related shit like internet epxlorer.
      The cleanest code is the kernel itself.

      • 1 year ago
        Anonymous

        >The cleanest code is the kernel itself.
        *laughs in kernel text rendering*

        • 1 year ago
          Anonymous

          Win32k is not part of the kernel, it's a separate module (win32k.sys).

          • 1 year ago
            Anonymous

            >no true scottsman
            >about motherfricking source code
            lol

            • 1 year ago
              Anonymous

              ntoskrnl.exe = Kernel
              win32k.sys = Win32k
              fricking moron
              you can run the windows kernel without win32k, e.g. when the chkdsk program is running at boot.

  22. 1 year ago
    Anonymous

    As soon as AI is any greater than human intelligence it will be 10x as intelligent and 100x in the space of a year and very quickly enslave us. We won't even be able to understand why it's enslaving us or how, as in the simplistic doomsday scenarios. The way it happens will itself be beyond our comprehension, by definition. Point being we probably won't even get to do anything cool with it before it destroys us.

    • 1 year ago
      Anonymous

      >We won't even be able to understand why it's enslaving us or how,
      Probably in the process of dealing with the already existing slave master, that we don't see, but my spider senses notice.

  23. 1 year ago
    Anonymous

    Would be cool to train these AI language model with .pdb and stripped .exe file (without debug symbol)

  24. 1 year ago
    Anonymous

    There will likely be an AI trained to do every task in the future. Technology is still at least 5-10 years off (chatGPT can barely tell you what a 10 line source code function does), but there will be an AI to:
    >detect if something is created by an AI
    >decompile binary into an approximation of the source code
    >convert code from one language into another
    >do professional-level translations from one language into another
    >write basic code snippets and suggest how they can be used in a larger architecture
    >chatbots to replace 99% of tech support
    etc, etc

  25. 1 year ago
    Anonymous

    to replace 99% of tech support
    Do not redeem!

Leave a Reply to NMOUYAS YSOUMNA UYMANSO YMOASNU OUAYMNS Cancel reply

Your email address will not be published. Required fields are marked *