Samson's Machete

Recon the Hydra Cave: a development note of Rust on GCC (part 1)

Posted by NalaGinrut 28 April 2020 8:38 PM

Rust is a strange language.

It has modern and promising design idea, and good performance compared to C++. It's proved to be an alternative system language that can use for writing OS, although there's no product level rust OS yet, there're several implementations as the potential players. It's the highest expected, trying the whole bag of tricks to grow, gaining popularity in the young crowds...all these are very good to a novel and health community.

On the other hand, this language seems no idea where to go. No, I'm not talking about its dreaming goal, of course I know its goal is explicitly expressed: safety, speed, and concurrency. What I'm confusing is the way it choose towards its goal. Rust community choose an iterative model rather than waterfall model to develop Rust language. That is to say, the code and interfaces keeps changing while the design is still not confirmed. Although it's the popular development model in today's internet industry on business software, I doubt if it's the best way for a foundational foresight programming language. However, it's too early to say whether the development model is bad in nowadays' industry. We can see many companies are trying to use Rust even its design is still unstable. In the past, such kind of activity is insane to take risks in a serious product. But today's industry is in a rapid iterating, maybe our understand is too old for it, and maybe Rust's model is the brand new way to improve the industry. People can adapt the fast changing things even it's their fundamental langugage, and there could be stronger auto tools to rely on. Who knows, let's give it a try, warriors, the boss pays for it.

This article is about the Rust frontend on GCC. It's reasonable for me (or any potential contributors) to blame Rust's development model, since it's interfering the efforts of its own GCC frontend.

So far, the only reliable implementation is the officially supported LLVM front-end. I don't know if there're more, but I will not choose if it's not based on a mainstream compiler infrastructure. After all, it's system level language, once you choose, it's not expected to be changed in a decade. To my experiences to lead the software development in past decade, it's not so simple to change fundamental things, from libs to compilers to languages. It's not impossible, but why waste time to do it when you have lots of other workloads to deal with?

Stop, it's not a criticism article against Rust. So I'm not going to blame it more.

OK, that's enough, now that we're trying to work for the Rust front-end on GCC, it's reasonable to say we're interested in this language. And we hope to promote it better. As a perennial GNU hacker, it's natual for me to think about Rust on GCC. I have several experiences on writing language front-ends, the architecture is similar, and the theory is similar too. However, gcc-rust is a very hard project that couldn't be done successfully by one-man effort. In this article, I'll introduce the ideas and efforts around FOSS community. I hope all these information can help people to understand the situation, and help potential contributors to take part in.

Of course, Rust on LLVM has gone so far away, but what would I say?

The Hydra Cave

Rust on GCC is a Hydra. Why? Because there're at least 3 possible ways to go. Oh well, three heads of a dragon.

These 3 ways are originally mentioned by @redbrain who is one of the enthusiasts of Rust on GCC (https://github.com/redbrain/gccrs/issues/2#issuecomment-563255365). Unfortunately, he has dropped his efforts, but his work of Rust on GCC is still inspiring.

The first way is the most lightweight way. The original Rust compiler provides a IR (Intermediate Representation) named MIR, so the idea is really simple, we can take advantage of the front-end of Rust on LLVM then convert MIR to GENERIC which is the general IR in GCC. The pros is that we don't have to deal with the occasionally changing of Rust's grammar. This idea is achievable, since the original Rust compiler is in the similar architecture to convert MIR to LLVM-IR. The only problem is that this Rust on GCC frontend has to require the original rustc, and this may bring the LLVM dependencies chain. Then people may wonder why we still bother to implement Rust on GCC, it doesn't reduce the dependencies in the development environment nor likely to be widely accepted by GCC folks. No mention the license issue.

The second way is to use GCC JIT interface on top of the original Rust compiler. This architecture implies to add a new backend (libgccgit https://gcc.gnu.org/onlinedocs/jit/cp/index.html) to the original Rust compiler. So it's not the Rust on GCC as most people expected, it's a new feature request to Rust developer to support yet another compiler backend and maintain the code by Rust community. This is the perfect approach only if Rust developers accept it. However, the hard part would be license issue, and it all depends on the willing of Rust community.

The third way is the traditional way, full C++ implementation of Rust on GCC. There's an existing good example, gccgo, which is the Go language front-end on GCC. It's the hardest way to build a Rust compiler from scratch. The pros is that the developers can expect to do everything they like to build an efficient Rust compiler to challenge the orginal one, and it doesn't require LLVM as the prerequisite. Finally, no any license issue, we can make it GPLv3+. However, there must be a strong group to maintain this front-end. Gccgo has a strong team hired by Google. No mention you have to keep updating the volatile Rust grammar periodically.

All these approaches are independent from each other, so it's impossible to combine all these 3 forces. The MIR way is working by @sapir https://github.com/sapir/gcc-rust/tree/rust, and it's still in progress; The libgccjit way requires the connection to Rust community that is out of my reach, and I don't know who is working on it; as a GNU hacker, I think my prefered way is the third one. The third way was dropped by @redbrain, but there's some inspiring code left.

What we've got so far?

My plan is not to secretly working on it then throw a release suddenly to shock people. I have too many projects in my TODO, I do want to have gcc-rust, and I do want to have more friends on it. Let me emphersize it again, it's impossible to do it by one-man effort.

Fortunately, @SimplyTheOther have done fully-operational and working lexer, parser of a recent Rust syntax. It's great work. I have enough experiences to write parsers for practical languages, it's time consuming work. That's why I appreciate @SimplyTheOther's work so much. Of course, I agree with @SimplyTheOther that the most time consuming work would be the type checking part, at least for Rust, I'm afraid that it could be true.

After I helped to fix tiny bugs to make it work smoothly, I started to write an enhanced AST dumper. It's reasonable for us, first, I need to get familiar with @SimplyTheOther's AST design; second, we need to output a human-readable AST for debugging.

Let me show you a silly example:

fn abc(x:u32, y:u32) -> u32 {
   return x+y;
}

fn main() {
    {
      1+1;
    }

    println!("Hello World!");
    abc(1, 1);
}

Alright, it's silly meaningless code, let's see how the AST dump looks like.

Assuming you stored the source code to test.rs, then run AST dump:

rust1 -frust-dump-parse test.rs

The output could be:

Crate:
 inner attributes: none
 items:

u32 abc(x : u32, y : u32)
BlockExpr:
{
 outer attributes: none
 inner attributes: none
 statements:
 ExprStmtWithoutBlock:
  return ArithmeticOrLogicalExpr: x + y
 final expression: none
}

void main()
BlockExpr:
{
 outer attributes: none
 inner attributes: none
 statements:
  ExprStmtWithBlock:
   BlockExpr:
   {
    outer attributes: none
    inner attributes: none
    statements:
    ExprStmtWithoutBlock:
     ArithmeticOrLogicalExpr: 1 + 1
    final expression: none
   }
 ExprStmtWithoutBlock:
  println!("Hello World!")
 ExprStmtWithoutBlock:
  outer attributes: none
   StructExpr:
    PathInExpr:
    abc(1, 1)
    inner attributes:none
 final expression: none
}

It's good for us to debug or extend the parser. After all, if we can't make sure the parser is correct, how can we move forward to type checking and IR transformation?

The project repo is here, patches are welcome: https://github.com/philberty/gccrs

What's next?

Before answer this question, I guess you may wonder why I used `rust1` rather than a traditional name `gccrust` or something else.

`rust1` is the Rust compiler, while `gccrust` is a collection that calling compiler, assembler, and linker like a chain. We haven't done the IR transformation yet, so there won't be anything can be assembled and linked except for the AST. Oh my poor Rust compiler!

It's not so hard to transform AST to GENERIC which is the current toplevel IR of gcc. However, it's meaningless to rush if we haven't made sure the parser work correctly for the Rust grammar. No mention the critical feature of Rust, the linear type system.

If you understand what I've said so far, then I can answer the question "what could be the next".

1. The type system, too big to talk about it here.

2. The static analysis.

One of the advantages of llvm-rust was that it used the static analysis of LLVM. Fortunately, David Malcolm finished his GCC static analysis framework recently https://lwn.net/Articles/806099/. I'm not sure if there could be any hook for front-end to interact with, or maybe we can't touch it once we pass the authority to the next level IR. Anyway, it's an exciting feature of GCC to research.

3. The new middle-IR for Rust specific pre-optimizing.

This idea is largely inspired by gccgo because they've inserted a specifically designed IR for Go language before transforming it to GENERIC. This IR was named GOGO. Gccgo will do some pre-optimizing with GOGO. From the compiler design perspective, it's reasonable to do so, since some information may lose after transforming to the lower level, and can never rollback for certain optimizing.

4. AST optimizing/refactoring.

There could be caveat or limitation in the current AST, so far, I just found one: it lacks of printing record so that we can't dump AST with pretty indentation. I've implemented the indentation with a global variable. I don't like it, but it's safe since there's no threading in AST dumping.

5. AST to GENERIC/GIMPLE. Nothing to say.

6. Memory management model.

I haven't thought it deeply, but gccgo has been optimizing its memory footprint for many years, finally it got great result. So I don't think it's an easy work for us.

7. Exception throw and restore.

One of the ideas in Rust is "safety", it's easy for common users to understand from the manual and the advertisement. However, the real work is behind the syntax, if we failed to do it correctly in the compiler, then no one can guarrentee the safety. After all, the grammar and feature description is just a piece of paper, not the magic itself.

8. Runtime, standard library, package support, compatibility, etc ...

Hey, we're a small community with several core developers, so it's just my personal opinion, but I think they're something that would be done sooner or later.

Happy hacking!