Malus.sh: AI, clean room and a controversial vision for the future of licensing

p.kaczmarek2 25 Mar 2026 11:26 7 708 Cool? (0)

📢 Listen (AI):

TL;DR

Malus.sh is a satirical AI clean room recreation platform that claims to rewrite open-source software from documentation without accessing the original code.
It uses a clean room process split into two stages and teams: one studies public documentation and behavior, and another implements the software from scratch.
The licensing debate focuses on attribution, sharing changes, the AGPL, and the GNU General Public License, which can require derivative software to stay open.
The site can actually rewrite a module, but its exaggerated framing is meant to highlight tensions between open source licensing and commercial interests.

Generated by the language model.

Malus.sh is a satirical platform offering an "AI clean room recreation" - a process in which AI recreates open source software based on documentation, without access to the code. The result is supposed to be new code, free of licensing restrictions and ready for commercial use.

The project focuses on licensing issues: the obligation to cite authors, the need to make changes available or the risks of licenses such as the AGPL and the GNU General Public License, which is sometimes referred to as 'viral' (viral) because it requires that any software that uses code under this licence also be made available under the same terms (i.e. with open code). The solution is supposed to be clean room - a technique that has been known for decades - which makes it possible to separate the 'idea' from its implementation and create an independent version of the software.

What is clean room? Clean room is, as the name suggests, a 'clean room' approach in which the software development process is deliberately separated into two stages and teams. The first analyses only documentation, system behaviour or public descriptions of operation (without access to the source code), while the second - separated 0 implements the solution from scratch, based only on these descriptions. This ensures that the resulting code is intended to be legally independent and not a copy or derivative work of the original.

Although the concept is based on a viable legal basis and the site actually works and can 'rewrite' the indicated module, the site is exaggerated and satirical. It was quickly recognised by the technology community as a satire that illustrates in an exaggerated way the problematic tensions between the open source world and commercial interests, especially in the context of using AI and circumventing licensing restrictions.

Is Malus.sh just innocent satire, or will this kind of AI 'rewriting' of code become a real challenge that the open source community will have to face? I invite you to discuss - what is your opinion?

About Author

p.kaczmarek2 wrote 14439 posts with rating 12406 , helped 650 times. Been with us since 2014 year.

Comments

Add a comment

sundayman 25 Mar 2026 13:18

"Tension" in this case means the impossibility of selling someone else's work that you got for free. This is indeed a big problem for companies :) As an entrepreneur, I wish I only had problems like... [Read more]

p.kaczmarek2 25 Mar 2026 13:40

Where the 'clean room' approach used to require a lot of work, there are now claims that all you need is a good LLM agent system along with tests and you can generate your equivalent of a given library... [Read more]

gulson 25 Mar 2026 14:00

In the world of AI, copyright no longer exists. The momentum is so fast that they are pulling in books, documentation, code as they fly without looking at the licences. I don't know where this will lead,... [Read more]

p.kaczmarek2 25 Mar 2026 14:17

And I am reminded of a related problem - researchers have reproduced through LLM the content of the book "Harry Potter" 96% true to the original: https://arxiv.org/abs/2601.02671 This kind of problem... [Read more]

p.kaczmarek2 04 Apr 2026 08:53

It's only been a few days since this topic was published, and a practical example has fallen into place to illustrate the problem I wrote about: The Claude Code leak from Anthropic and copyright... [Read more]

gulson 04 Apr 2026 15:11

Interesting times, but as if they were stubborn, they could try to protect from the title of the architecture itself - the idea. But that's very hard to do, because anyone can create a recreational wo... [Read more]

avatar 11 May 2026 14:41

As far as I know - the idea itself cannot be patented but only the 'implementation' / approach and this is based on a case from the 19th century [Read more]

FAQ

TL;DR: For companies and open source maintainers, the flashpoint is a reported 96% Harry Potter reproduction and the claim that "all you need is a good LLM agent system" to rebuild libraries from docs. This FAQ explains Malus.sh, clean room rewriting, GPL/AGPL pressure, and why a satirical site still surfaces a real licensing risk. [#21870265]

Why it matters: If AI can rebuild a module from documentation and tests alone, open source licensing may shift from code reuse control to proof-of-independence disputes.

Approach	Input used	License carry-over risk	Main trade-off
Direct GPL/AGPL reuse	Original code	High	Fastest path, but obligations follow the code
Traditional clean room	Docs, behavior, 2 separated teams	Lower	More legal structure, more work
AI-assisted clean room	Docs, tests, LLM agent system	Claimed lower	Faster generation, but proof and copyright doubts remain

Key insight: Malus.sh reads as satire, but the thread treats its core premise seriously: AI may compress a decades-old clean room method into a cheaper workflow that tests the limits of open source licensing.

Quick Facts

Malus.sh is presented as a satirical platform for "AI clean room recreation" that rewrites open source software from documentation without using source code, aiming for commercially usable code free of original license restrictions. [#21870112]
The clean room model described in the thread has 2 stages and 2 separated teams: one studies documentation or public behavior, and the other implements from scratch. [#21870112]
One post claims that where clean room once required substantial work, a single good LLM agent system plus tests may now generate an equivalent library. [#21870240]
The sharpest copyright datapoint in the thread is a reported 96% reproduction of Harry Potter content by an LLM, cited as evidence that model outputs can track protected texts closely. [#21870265]

What is a clean room software development process, and how does it avoid copying the original source code?

A clean room process is a separated software design method that avoids direct code copying by splitting analysis from implementation. "Clean room" is a software-development approach that recreates functionality from documentation or observed behavior, while keeping implementers away from the original code to support legal independence. In the thread, it has 2 stages and 2 teams: one studies docs or behavior, and the other writes new code from scratch. That separation aims to prevent derivative copying. [#21870112]

How does Malus.sh describe its AI clean room recreation workflow for rewriting open source software from documentation alone?

Malus.sh describes a workflow where AI recreates open source software from documentation without access to the original code. The site presents the output as new code, free of licensing restrictions, and ready for commercial use. The post also says the service can actually "rewrite" an indicated module, even though its framing is exaggerated and satirical. That makes the workflow both a technical demo and a critique of current licensing tensions. [#21870112]

Why are the GNU GPL and AGPL sometimes called viral licenses, and what obligations do they impose on commercial software?

They are called "viral" because the thread describes them as licenses that can force downstream software to stay under the same terms. The post says GPL and AGPL create duties such as citing authors, making changes available, and, in the thread's wording, releasing software that uses such code under the same open terms. For commercial teams, the burden is not price but compliance. That is why Malus.sh frames clean room rewriting as a way around those obligations. [#21870112]

How much of Malus.sh is satire, and how likely is AI-based clean room rewriting to become a real licensing challenge for the open source community?

Malus.sh is largely satire in tone, but the thread treats the underlying method as technically and legally plausible. The opening post says the site was quickly recognized by the tech community as satire, yet also says clean room has existed for decades and the site can rewrite a module. That combination makes the risk credible: satire highlights the issue, while AI may lower the cost of doing it in practice. The likely challenge is not theory, but scale. [#21870112]

In what ways could LLM agent systems plus tests replace the traditional two-team clean room approach?

They could compress a manual 2-team process into an automated loop driven by prompts, generated code, and test feedback. One post says that what once required extensive work may now need only a good LLM agent system plus tests to generate an equivalent library. A normal code assistant suggests snippets on request; an agent system is implied to iterate toward a target autonomously. The trade-off is speed versus proof: automation may reduce labor, but it does not erase legal questions. [#21870240]

What steps are involved in creating a legally independent replacement for a library using only public documentation and behavior tests?

The thread describes a 3-step path.

Study only public documentation, system behavior, and public descriptions.
Keep the implementation side separated from the original source code.
Rebuild the library from scratch and check it against tests or observed behavior. This mirrors the clean room idea described in the opening post. The legal goal is independence of implementation, not literal code similarity. Failure starts when developers or tools see the original code, because the separation claim weakens immediately. [#21870112]

How does clean room reimplementation compare with directly reusing GPL or AGPL code for a commercial product?

Clean room reimplementation aims to avoid inheriting the original code's license duties, while direct GPL or AGPL reuse accepts those duties. In the thread, direct reuse brings obligations like author attribution, sharing changes, and keeping dependent software under the same terms. Clean room instead tries to separate idea from implementation. For a commercial product, the comparison is simple: reuse is faster, but compliance follows the code; reimplementation is slower or more complex, but it seeks legal independence. [#21870112]

Why might AI-assisted code rewriting discourage some open source developers from publishing their work?

It could discourage publication because developers may fear that others will turn openly documented work into closed commercial equivalents. One post asks directly whether easier LLM-based clean room rewriting will discourage parts of the open source community from sharing solutions. Another reply frames the conflict as companies wanting to sell work they received for free. If that perception spreads, some authors may publish less code or less documentation, especially for reusable libraries. [#21870240]

What legal risks still remain if an AI recreates a library from documentation without seeing the original code?

Legal risk remains because documentation-derived output can still trigger copyright and independence disputes. The thread points to a reported 96% reproduction of Harry Potter by an LLM as a warning that model outputs can track protected works closely, even without obvious line-by-line copying. That raises a failure case: an AI rewrite may resemble protected expression too closely, or a claimant may argue the model encoded copyrighted material in its weights. Clean room reduces one risk, but it does not guarantee immunity. [#21870265]

How do documentation, public behavior, and test suites help prove that a rewritten module is independent of the original implementation?

They help by showing that developers targeted externally visible behavior rather than internal source code. The opening post says clean room uses documentation, system behavior, and public descriptions, while a later post adds tests as the practical check. Together, those artifacts document a path from public inputs to new code. That matters because independence claims depend on process evidence. If the rewrite matches behavior but not copied source, the argument for separate implementation becomes stronger. [#21870240]

What is an LLM agent system in the context of software rewriting, and how is it different from a normal code assistant?

An LLM agent system is an AI workflow that can pursue a rewriting goal across multiple steps, rather than just answering a single coding prompt. "LLM agent system" is an AI orchestration method that plans, generates, tests, and revises code toward a target library, with more autonomy than a normal code assistant. In the thread, the claimed stack is one good agent system plus tests. A normal assistant helps locally; an agent system is presented as handling the whole recreation loop. [#21870240]

Why are some people arguing that copyright is becoming harder to enforce in the AI era for books, documentation, and source code?

They argue this because AI training and generation move faster than traditional licensing control. One post states bluntly that in the AI world, copyright "no longer exists," then says books, documentation, and code are being pulled in without checking licenses. Another post adds the 96% Harry Potter reproduction example. Together, those claims suggest a practical enforcement gap: rights may still exist in law, but large-scale ingestion and close output similarity make them harder to police. [#21870252]

How does the reported 96% reproduction of Harry Potter by an LLM change the debate about model weights and copyrighted content?

It sharpens the debate by turning an abstract concern into a concrete number: 96%. The post says researchers reproduced Harry Potter content through an LLM with 96% fidelity to the original, then asks whether there is some limit to how much copyrighted text can be reflected in model weights. That shifts the question from simple training-data use to output recoverability. If highly similar text can reappear, critics can argue that weights may encode more copyrighted expression than developers admit. [#21870265]

What alternatives are available to companies that want to avoid open source license obligations without relying on AI-generated rewrites?

The thread presents one clear non-AI alternative: traditional clean room development. The opening post says this technique has been known for decades and separates idea from implementation through 2 stages and 2 teams. A company can also accept GPL or AGPL obligations instead of avoiding them, but that means sharing under the same terms. Within the thread's limits, the real choice is between compliance and a structured clean room process. AI is framed as an accelerator, not the only route. [#21870112]

Which kinds of software projects are most vulnerable to AI clean room reimplementation: small libraries, APIs, network services under AGPL, or full applications?

The thread points most strongly to libraries and modules as the easiest targets. Malus.sh is described as able to "rewrite" an indicated module, and a later post says an LLM agent system plus tests could generate an equivalent library. That makes small libraries and API-shaped components the most exposed in this discussion, because public documentation and behavior are easier to specify there. Full applications may also be affected, but the thread does not provide the same concrete workflow for them. [#21870112]

Generated by the language model.