Skip to content

Native Parser

Compared to JavaScript, the Rust native language has performance advantages in algorithm execution. Rollup decided to switch from the JavaScript-side Acorn parser to the Rust-side SWC parser, which has the ability to efficiently parse complex AST. This serves as a core change in Rollup v4.

Challenges

Native Interaction

Directly using SWC's JavaScript reference and parsing complex AST through the SWC.parse JavaScript interface would incur significant communication overhead.

ts
import swc from '@swc/core';

const code = `
  const a = 1;
  function add(a, b) {
    return a + b;
  }
`;
swc
  .parse(code, {
    syntax: 'ecmascript',
    comments: false,
    script: true,
    target: 'es3',
    isModule: false
  })
  .then(module => {
    module.type; // file type
    module.body; // AST
  });

Through SWC's source code, it can be found that SWC internally uses the serde_json library to serialize the parsed program object into a JSON string, which is then passed to the JavaScript side.

rust
#[napi]
impl Task for ParseTask {
  type JsValue = String;
  type Output = String;

  fn compute(&mut self) -> napi::Result<Self::Output> {
    let options: ParseOptions = deserialize_json(&self.options)?;
    let fm = self
      .c
      .cm
      .new_source_file(self.filename.clone().into(), self.src.clone());

    let comments = if options.comments {
      Some(self.c.comments() as &dyn Comments)
    } else {
      None
    };

    let program = try_with(self.c.cm.clone(), false, ErrorFormat::Normal, |handler| {
      let mut p = self.c.parse_js(
        fm,
        handler,
        options.target,
        options.syntax,
        options.is_module,
        comments,
      )?;

      p.visit_mut_with(&mut resolver(
        Mark::new(),
        Mark::new(),
        options.syntax.typescript(),
      ));

      Ok(p)
    })
    .convert_err()?;

    let ast_json = serde_json::to_string(&program)?;

    Ok(ast_json)
  }

  fn resolve(&mut self, _env: Env, result: Self::Output) -> napi::Result<Self::JsValue> {
    Ok(result)
  }
}

The JavaScript interface side then deserializes the AST string returned by the native parser into a JavaScript object through JSON.parse.

ts
class Compiler {
  async parse(
    src: string,
    options?: ParseOptions,
    filename?: string
  ): Promise<Program> {
    options = options || { syntax: 'ecmascript' };
    options.syntax = options.syntax || 'ecmascript';

    if (!bindings && !!fallbackBindings) {
      throw new Error(
        'Fallback bindings does not support this interface yet.'
      );
    } else if (!bindings) {
      throw new Error('Bindings not found.');
    }

    if (bindings) {
      const res = await bindings.parse(src, toBuffer(options), filename);
      return JSON.parse(res);
    } else if (fallbackBindings) {
      return fallbackBindings.parse(src, options);
    }
    throw new Error('Bindings not found.');
  }
}

Between Rust and JavaScript, repeatedly serializing (Rust side) and deserializing (JavaScript side) the AST would almost completely erode the performance advantage of switching to the native parser (Rust) when parsing complex AST.

AST Compatibility

SWC has designed its own unique AST structure for the Rust side, while Rollup depends on the standard ESTree AST. The two differ in AST structure, so compatibility processing is needed.

It is worth noting that SWC provides the swc_estree_compat compatibility layer, which offers parsed output in both Babel AST and ESTree AST structures, but there are still performance issues.

Nearly, but it would be very slow at the moment because of JSON.parse of large AST is very slow

File Encoding

SWC uses UTF-8 encoding, while Rollup depends on standard JavaScript's UTF-16 encoding.

Differences between UTF-8 and UTF-16

UTF-8:

Variable Length Encoding:

UTF-8 uses 1 ~ 4 bytes to represent a character. ASCII characters (such as English letters and numbers) use 1 byte, while other characters (such as Chinese characters) may use 2 ~ 4 bytes.

  • 1 byte: ASCII characters (U+0000 to U+007F).
  • 2 bytes: Extended Latin characters (U+0080 to U+07FF).
  • 3 bytes: Basic Multilingual Plane (BMP) characters (U+0800 to U+FFFF).
  • 4 bytes: Supplementary Plane characters (U+10000 to U+10FFFF).

Backward Compatible with ASCII:

Since ASCII characters only occupy 1 byte in UTF-8, UTF-8 is fully compatible with ASCII encoding.

Encoding Efficiency:

  • High efficiency for English and ASCII text (1 byte per character).
  • For non-Latin characters (such as Chinese, Japanese, etc.), typically requires 3 bytes.
  • For supplementary plane characters (such as emojis), requires 4 bytes.

Use Cases:

  • More suitable for network transmission and storage, especially for text primarily in ASCII.
  • Commonly used in web pages, JSON files, and other scenarios.

UTF-16:

Fixed or Variable Length Encoding:

UTF-16 typically uses 2 bytes to represent most commonly used characters, but for certain special characters (such as emojis), it may require 4 bytes.

  • 2 bytes: Characters within the BMP range (U+0000 to U+FFFF, excluding surrogate pairs).
  • 4 bytes: Characters beyond the BMP (U+10000 to U+10FFFF), using two 16-bit units (called surrogate pairs).

Not Compatible with ASCII:

UTF-16 is not compatible with ASCII because ASCII characters require 2 bytes in UTF-16. However, both UTF-8 and UTF-16 can treat each ASCII character as one unit.

Encoding Efficiency:

  • High efficiency for characters within the BMP range (such as most Chinese, Japanese) (2 bytes per character).
  • Low efficiency for ASCII characters (2 bytes per character).
  • Similar efficiency to UTF-8 for supplementary plane characters (requires 4 bytes).

Use Cases:

  • More suitable for memory operations, especially in scenarios primarily using BMP range characters (such as Chinese environments).
  • Commonly used in internal character representation for Windows, JavaScript, and Java.

Taking the string A你 as an example, the encoding results for the two methods are as follows:

UTF-8 Encoding:

"A": 1 byte, encoded as 0x41

"你": 3 bytes, encoded as 0xE4BDA0

UTF-16 Encoding:

"A": 2 bytes, encoded as 0x0041

"你": 2 bytes, encoded as 0x4F60

SWC (Rust) uses byte offsets. In other words, when calculating the position offset of "A你", it uses the byte offset calculation method:

"A你": "(1) + A(1) + 你(3) + "(1) = 6 bytes (i.e., Rust calculates: "A你".len() = 4).

The position information recorded in the SWC AST is as follows:

SWC Abstract Syntax Tree
json
{
  "type": "Module",
  "span": {
    "start": 0,
    "end": 6,
    "ctxt": 0
  },
  "body": [
    {
      "type": "ExpressionStatement",
      "span": {
        "start": 0,
        "end": 6,
        "ctxt": 0
      },
      "expression": {
        "type": "StringLiteral",
        "span": {
          "start": 0,
          "end": 6,
          "ctxt": 0
        },
        "value": "A你",
        "hasEscape": false,
        "kind": {
          "type": "normal",
          "containsQuote": true
        }
      }
    }
  ],
  "interpreter": null
}

JavaScript uses character offsets with the UTF-16 encoding model, where all characters can be divided into 2-byte and 4-byte units. For JavaScript, the basic unit of a character is 2 bytes. In other words, special characters (emojis) occupy 4 bytes, which translates to 2 characters.

When calculating the position offset of "A你", the character offset calculation method is used:

"A你": "(1) + A(1) + 你(1) + "(1) = 4 characters (i.e., JavaScript calculates: "A你".length = 2).

The position information recorded in the ESTree AST is as follows:

ESTree Abstract Syntax Tree
json
{
  "type": "Program",
  "start": 0,
  "end": 4,
  "body": [
    {
      "type": "ExpressionStatement",
      "start": 0,
      "end": 4,
      "expression": {
        "type": "Literal",
        "start": 0,
        "end": 4,
        "value": "A你",
        "raw": "\"A你\""
      },
      "directive": "A你"
    }
  ],
  "sourceType": "module"
}

Summary

The phenomenon described above is precisely the root cause of the divergence between character offsets (ESTree) and byte offsets (SWC).

ESTree / Babel / Acorn (Character Offsets):

Follows JavaScript's String.length logic.

Counts the number of UTF-16 encoding units (Code Units).

"你好": "(1) + 你(1) + 好(1) + "(1) = 4 units (i.e., length 4).


"👍": "(1) + 👍(2) + "(1) = 4 units (i.e., length 4).

JavaScript (and ESTree) counts length and offsets where one character unit refers to a 2-byte code unit, causing a 4-byte Emoji to be counted as 2 characters in length.

SWC (Byte Offsets):

Counts the number of bytes in the source file (typically UTF-8 encoded).

"你好": "(1) + 你(3) + 好(3) + "(1) = 8 bytes (i.e., length 8).


"👍": "(1) + 👍(4) + "(1) = 6 bytes (i.e., length 6).

The SourceMap chapter details how Rollup internally generates SourceMap, where Rollup relies on the position information provided by ESTree AST for mapping markers.

ts
export class NodeBase extends ExpressionEntity implements ExpressionNode {
  /**
   * Override to perform special initialisation steps after the scope is
   * initialised
   */
  initialise(): void {
    this.scope.context.magicString.addSourcemapLocation(this.start);
    this.scope.context.magicString.addSourcemapLocation(this.end);
  }
}

Therefore, since native programming languages (Rust) and JavaScript use different encoding methods, the AST position information obtained is inconsistent. Rollup needs to readjust the SWC AST position information on the Rust native side to conform to JavaScript's character offset calculation method.

Performance

Optimize AST Compatibility

On the Rust side, after leveraging SWC's ability to parse code into SWC AST:

rust
use swc_compiler_base::parse_js;

pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
  // other code omitted
  GLOBALS.set(&Globals::default(), || {
    let result = catch_unwind(AssertUnwindSafe(|| {
      let result = try_with_handler(&code_reference, |handler| {
        parse_js(
          cm,
          file,
          handler,
          target,
          syntax,
          IsModule::Unknown,
          Some(&comments),
        )
      });
      match result {
        Err(buffer) => buffer,
        Ok(program) => {
          let annotations = comments.take_annotations();
          let converter = AstConverter::new(&code_reference, &annotations);
          converter.convert_ast_to_buffer(&program)
        }
      }
    }));
    result.unwrap_or_else(|err| {
      let msg = if let Some(msg) = err.downcast_ref::<&str>() {
        msg
      } else if let Some(msg) = err.downcast_ref::<String>() {
        msg
      } else {
        "Unknown rust panic message"
      };
      get_panic_error_buffer(msg)
    })
  })
}

Through the converter.convert_ast_to_buffer(&program) method, it recursively parses the SWC AST tree parsed by SWC, recalculating the ESTree AST position information corresponding to the SWC AST node position information:

rust
/// Converts the given UTF-8 byte index to a UTF-16 byte index.
///
/// To be performant, this method assumes that the given index is not smaller
/// than the previous index. Additionally, it handles "annotations" like
/// `@__PURE__` comments in the process.
///
/// The logic for those comments is as follows:
/// - If the current index is at the start of an annotation, the annotation
///   is collected and the index is advanced to the end of the annotation.
/// - Otherwise, we check if the next character is a white-space character.
///   If not, we invalidate all collected annotations.
///   This is to ensure that we only collect annotations that directly precede
///   an expression and are not e.g. separated by a comma.
/// - If annotations are relevant for an expression, it can "take" the
///   collected annotations by calling `take_collected_annotations`. This
///   clears the internal buffer and returns the collected annotations.
/// - Invalidated annotations are attached to the Program node so that they
///   can all be removed from the source code later.
/// - If an annotation can influence a child that is separated by some
///   non-whitespace from the annotation, `keep_annotations_for_next` will
///   prevent annotations from being invalidated when the next position is
///   converted.
pub(crate) fn convert(&mut self, utf8_index: u32, keep_annotations_for_next: bool) -> u32 {
  if self.current_utf8_index > utf8_index {
    panic!(
      "Cannot convert positions backwards: {} < {}",
      utf8_index, self.current_utf8_index
    );
  }
  while self.current_utf8_index < utf8_index {
    if self.current_utf8_index == self.next_annotation_start {
      let start = self.current_utf16_index;
      let (next_comment_end, next_comment_kind) = self
        .next_annotation
        .map(|a| (a.comment.span.hi.0 - 1, a.kind.clone()))
        .unwrap();
      while self.current_utf8_index < next_comment_end {
        let character = self.character_iterator.next().unwrap();
        self.current_utf8_index += character.len_utf8() as u32;
        self.current_utf16_index += character.len_utf16() as u32;
      }
      if let Annotation(kind) = next_comment_kind {
        self.collected_annotations.push(ConvertedAnnotation {
          start,
          end: self.current_utf16_index,
          kind,
        });
      }
      self.next_annotation = self.annotation_iterator.next();
      self.next_annotation_start = get_annotation_start(self.next_annotation);
    } else {
      let character = self.character_iterator.next().unwrap();
      if !(self.keep_annotations || self.collected_annotations.is_empty()) {
        match character {
          ' ' | '\t' | '\r' | '\n' => {}
          _ => {
            self.invalidate_collected_annotations();
          }
        }
      }
      self.current_utf8_index += character.len_utf8() as u32;
      self.current_utf16_index += character.len_utf16() as u32;
    }
  }
  self.keep_annotations = keep_annotations_for_next;
  self.current_utf16_index
}

It also needs to collect the information required by the ESTree AST node structure.

rust
pub(crate) fn convert_statement(&mut self, statement: &Stmt) {
  match statement {
    Stmt::Break(break_statement) => self.store_break_statement(break_statement),
    Stmt::Block(block_statement) => self.store_block_statement(block_statement, false),
    Stmt::Continue(continue_statement) => self.store_continue_statement(continue_statement),
    Stmt::Decl(declaration) => self.convert_declaration(declaration),
    Stmt::Debugger(debugger_statement) => self.store_debugger_statement(debugger_statement),
    Stmt::DoWhile(do_while_statement) => self.store_do_while_statement(do_while_statement),
    Stmt::Empty(empty_statement) => self.store_empty_statement(empty_statement),
    Stmt::Expr(expression_statement) => self.store_expression_statement(expression_statement),
    Stmt::For(for_statement) => self.store_for_statement(for_statement),
    Stmt::ForIn(for_in_statement) => self.store_for_in_statement(for_in_statement),
    Stmt::ForOf(for_of_statement) => self.store_for_of_statement(for_of_statement),
    Stmt::If(if_statement) => self.store_if_statement(if_statement),
    Stmt::Labeled(labeled_statement) => self.store_labeled_statement(labeled_statement),
    Stmt::Return(return_statement) => self.store_return_statement(return_statement),
    Stmt::Switch(switch_statement) => self.store_switch_statement(switch_statement),
    Stmt::Throw(throw_statement) => self.store_throw_statement(throw_statement),
    Stmt::Try(try_statement) => self.store_try_statement(try_statement),
    Stmt::While(while_statement) => self.store_while_statement(while_statement),
    Stmt::With(_) => unimplemented!("Cannot convert Stmt::With"),
  }
}

Information required for ESTree AST nodes is extracted from the SWC AST node structure, and the position information under the ESTree AST specification is recalculated using UTF-16 encoding.

rust
pub(crate) fn convert_item_list_with_state<T, S, F>(
    &mut self,
    item_list: &[T],
    state: &mut S,
    reference_position: usize,
    convert_item: F,
  ) where
  F: Fn(&mut AstConverter, &T, &mut S) -> bool,
{
  // for an empty list, we leave the referenced position at zero
  if item_list.is_empty() {
    return;
  }
  self.update_reference_position(reference_position);
  // store number of items in first position
  self
    .buffer
    .extend_from_slice(&(item_list.len() as u32).to_ne_bytes());
  let mut reference_position = self.buffer.len();
  // make room for the reference positions of the items
  self
    .buffer
    .resize(self.buffer.len() + item_list.len() * 4, 0);
  for item in item_list {
    let insert_position = (self.buffer.len() as u32) >> 2;
    if convert_item(self, item, state) {
      self.buffer[reference_position..reference_position + 4]
        .copy_from_slice(&insert_position.to_ne_bytes());
    }
    reference_position += 4;
  }
}

Of course, it will also collect comments nodes, preparing for Rollup's Tree Shaking later. Note that the ESTree AST specification does not include comments nodes, but the information of comments nodes is crucial for Rollup's Tree Shaking, which can enhance the ability of Tree Shaking.

Rollup will collect these comment information in the ESTree AST and store them through the _rollupAnnotations property. In other words, the final returned ESTree AST additionally contains the _rollupAnnotations property, and its structure conforms to the ESTree AST specification.

rust
pub(crate) fn take_collected_annotations(
  &mut self,
  kind: AnnotationKind,
) -> Vec<ConvertedAnnotation> {
  let mut relevant_annotations = Vec::new();
  for annotation in self.collected_annotations.drain(..) {
    if annotation.kind == kind {
      relevant_annotations.push(annotation);
    } else {
      self.invalid_annotations.push(annotation);
    }
  }
  relevant_annotations
}
impl<'a> AstConverter<'a> {
  pub(crate) fn store_call_expression(
    &mut self,
    span: &Span,
    is_optional: bool,
    callee: &StoredCallee,
    arguments: &[ExprOrSpread],
    is_chained: bool,
  ) {
  // annotations
  let annotations = self
    .index_converter
    .take_collected_annotations(AnnotationKind::Pure);
}
impl SequentialComments {
  pub(crate) fn add_comment(&self, comment: Comment) {
    if comment.text.starts_with('#') && comment.text.contains("sourceMappingURL=") {
      self.annotations.borrow_mut().push(AnnotationWithType {
        comment,
        kind: CommentKind::Annotation(AnnotationKind::SourceMappingUrl),
      });
      return;
    }
    let mut search_position = comment
      .text
      .chars()
      .nth(0)
      .map(|first_char| first_char.len_utf8())
      .unwrap_or(0);
    while let Some(Some(match_position)) = comment.text.get(search_position..).map(|s| s.find("__"))
    {
      search_position += match_position;
      // Using a byte reference avoids UTF8 character boundary checks
      match &comment.text.as_bytes()[search_position - 1] {
        b'@' | b'#' => {
          let annotation_slice = &comment.text[search_position..];
          if annotation_slice.starts_with("__PURE__") {
            self.annotations.borrow_mut().push(AnnotationWithType {
              comment,
              kind: CommentKind::Annotation(AnnotationKind::Pure),
            });
            return;
          }
          if annotation_slice.starts_with("__NO_SIDE_EFFECTS__") {
            self.annotations.borrow_mut().push(AnnotationWithType {
              comment,
              kind: CommentKind::Annotation(AnnotationKind::NoSideEffects),
            });
            return;
          }
        }
        _ => {}
      }
      search_position += 2;
    }
    self.annotations.borrow_mut().push(AnnotationWithType {
      comment,
      kind: CommentKind::Comment,
    });
  }

  pub(crate) fn take_annotations(self) -> Vec<AnnotationWithType> {
    self.annotations.take()
  }
}

Finally, the returned ArrayBuffer structure compatible with ESTree AST is passed to the Rollup side, and the JavaScript side needs to guide the parsing of the ArrayBuffer compatible ESTree AST structure to instantiate the AST Class Node implemented internally by Rollup.

ts
export default class Module {
  async setSource({
    ast,
    code,
    customTransformCache,
    originalCode,
    originalSourcemap,
    resolvedIds,
    sourcemapChain,
    transformDependencies,
    transformFiles,
    ...moduleOptions
  }: TransformModuleJSON & {
    resolvedIds?: ResolvedIdMap;
    transformFiles?: EmittedFile[] | undefined;
  }): Promise<void> {
    // Measuring asynchronous code does not provide reasonable results
    timeEnd('generate ast', 3);
    const astBuffer = await parseAsync(
      code,
      false,
      this.options.jsx !== false
    );
    timeStart('generate ast', 3);
    this.ast = convertProgram(astBuffer, programParent, this.scope);
  }
}

Rollup's guidance on the buffer level:

ts
function convertNode(
  parent: Node | { context: AstContext; type: string },
  parentScope: ChildScope,
  position: number,
  buffer: AstBuffer
): any {
  const nodeType = buffer[position];
  const NodeConstructor = nodeConstructors[nodeType];
  /* istanbul ignore if: This should never be executed but is a safeguard against faulty buffers */
  if (!NodeConstructor) {
    console.trace();
    throw new Error(`Unknown node type: ${nodeType}`);
  }
  const node = new NodeConstructor(parent, parentScope);
  node.type = nodeTypeStrings[nodeType];
  node.start = buffer[position + 1];
  node.end = buffer[position + 2];
  bufferParsers[nodeType](node, position + 3, buffer);
  node.initialise();
  return node;
}

Optimize Native Interaction

As mentioned above, directly using the JavaScript reference exposed by SWC will repeatedly serialize and deserialize the AST between Rust and JavaScript. When processing complex AST, the parsing efficiency almost completely erodes the performance advantage of switching to the native parser (Rust).

The solution is as follows:

Use ArrayBuffer to transfer the parsed AST between Rust and JavaScript.

Do not consider using SWC's JavaScript reference, but directly use SWC's Rust-side reference in Rust.

rust
use swc_compiler_base::parse_js;

pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
  GLOBALS.set(&Globals::default(), || {
    let result = catch_unwind(AssertUnwindSafe(|| {
      let result = try_with_handler(&code_reference, |handler| {
        parse_js(
          cm,
          file,
          handler,
          target,
          syntax,
          IsModule::Unknown,
          Some(&comments),
        )
      });
      match result {
        Err(buffer) => buffer,
        Ok(program) => {
          let annotations = comments.take_annotations();
          let converter = AstConverter::new(&code_reference, &annotations);
          converter.convert_ast_to_buffer(&program)
        }
      }
    }));
  });
}

At the same time, Rollup will convert the SWC AST parsed by SWC into a compatible ESTree AST binary format on the Rust side, and then pass it as (array) buffer to JavaScript.

rust
match result {
  Err(buffer) => buffer,
  Ok(program) => {
    let annotations = comments.take_annotations();
    let converter = AstConverter::new(&code_reference, &annotations);
    converter.convert_ast_to_buffer(&program)
  }
}

Passing ArrayBuffer is basically a lossless operation, so we only need to teach the JavaScript side how to parse the AST instance from the ArrayBuffer structure. In addition, the size of ArrayBuffer is only about one-third of the stringified JSON.

The ArrayBuffer data format is also efficient for transferring between different threads. For example, parsing can be done in a WebWorker, and after completion, the ArrayBuffer format AST can be passed losslessly to the main thread.

On the Node.js side, napi-rs is used to interact with Rust code, and wasm-pack is used for building on the browser side.

Optimize Semantic Analysis

Parser Semantic Analysis Design

Directly calling SWC's use swc_compiler_base::parse_js on the Rust side will not execute semantic analysis, only handling lexical analysis and syntax analysis. That is, the following code can be parsed normally into SWC AST in SWC without errors.

js
const a = 1;
const a = 2;

This is different from Acorn's parsing approach. Acorn additionally performs complete static semantic analysis --- Static Semantics: Early Errors when generating AST, detecting errors before program execution.

ECMAScript Static Semantics: Early Errors

Early Errors are a static semantic error detection mechanism defined in the ECMAScript specification. According to the ECMA-262 specification, these errors must be detected and reported during the parsing phase before code execution.

Authoritative Specification References:

The fundamental reason is that Acorn is designed as a parser that conforms to the ECMAScript specification. Before the JavaScript engine executes code, ECMAScript requires the execution of Static Semantics: Early Errors steps (essentially static semantic analysis), which are errors that need to be detected and reported during the parsing and early syntax analysis phase. These errors are checked statically, meaning they can be found without actually running the code.

Browsers, Node.js and other built-in JavaScript engines also execute Static Semantics: Early Errors steps before executing code.

The significance of the specification is:

  1. Early Detection of Issues: Potential errors can be found before the code is actually executed, avoiding issues that may only surface at runtime.
  2. Performance Improvement: Since these checks are completed in the static analysis stage, they can improve code execution efficiency without waiting until runtime to discover errors.
  3. Ensure Language Consistency: Through a unified early error check mechanism, ensure that JavaScript code can be processed consistently in different environments.
  4. Help Developers Write Better Code: These rules also guide developers to follow better programming practices.

SWC, Babel and other parsers do not execute Static Semantics: Early Errors steps when generating AST, meaning their design goals differ from Acorn. Let's first introduce why they separate syntax analysis and static semantic analysis.

  1. Performance and Complexity Trade-off

    Implementing Early Errors detection requires the parser to do the following:

    • Simulate and maintain the scope and scope chain of the execution context for the current statement.
    • Static rule checks.
      • Detection of other static semantic rules defined in the language specification.
      • Syntax restriction rule detection.
      • Module system static verification rule detection.

    Although the detection complexity is not particularly high, in large projects, if users need to perform Early Errors checks every time they transpile new code, the cumulative complexity of complete Early Errors checks may bring non-negligible performance overhead.

  2. Toolchain Division of Labor

    SWC, Babel and other parsers' focus is on code transformation, mainly injected into the build system's code transformation pipeline in the form of plugins. For tools seeking to integrate deeply into various build system ecosystems, the easiest approach is to maintain the single responsibility principle.

    By separating parsing and semantic analysis:

    • Parser can focus on generating accurate AST.
    • Semantic Analyzer can focus on checking code correctness.
    • Each part is easier to maintain and optimize.
  3. Flexibility

    In complex application module transpilation processes, it is usually not a one-step process but involves intermediate states, where intermediate code is largely non-compliant with semantic specifications. If transpilation tools perform strict semantic analysis, such code cannot pass compilation, impacting extensibility. Modern development toolchains balance development flexibility and code quality by distributing different checks to different stages and executing semantic analysis on demand.

Babel, SWC choose to separate the responsibilities of syntax analysis and Early Errors detection. In the plugin code transpilation stage, code is parsed into AST for lexical analysis and syntax analysis only, without executing Early Errors checks (static semantic analysis). Instead, at the appropriate time (such as when Rollup's transform stage is completed), Bundlers (such as Rollup) control and execute Early Errors checks.

This design choice reflects an important principle in engineering practice: sometimes, breaking down a complex problem into multiple independent steps may be more effective than trying to solve everything in one step. This allows each tool to focus on its core task, thereby providing better functionality and performance.

Rollup Plugin System Design Inspiration

The above design approach is also reflected in Rollup's plugin system. When a user plugin returns AST in the load (or transform) hook, Rollup will reuse the AST returned by the user plugin in subsequent transform hooks. Before Rollup completes the transform stage, Rollup will not perform any semantic analysis on the reused AST.

js
const a = 1;
const a = 2;

For the above example, Acorn will provide the following error message.

js
while (this.type !== tt.braceR) {
  const element = this.parseClassElement(node.superClass !== null);
  if (element) {
    classBody.body.push(element);
    if (
      element.type === 'MethodDefinition' &&
      element.kind === 'constructor'
    ) {
      if (hadConstructor)
        this.raiseRecoverable(
          element.start,
          'Duplicate constructor in the same class'
        );
      hadConstructor = true;
    } else if (
      element.key &&
      element.key.type === 'PrivateIdentifier' &&
      isPrivateNameConflicted(privateNameMap, element)
    ) {
      this.raiseRecoverable(
        element.key.start,
        `Identifier '#${element.key.name}' has already been declared`
      );
    }
  }
}

Error Prompt

Line 2: Identifier 'a' has already been declared.

Therefore, Rollup needs to leverage swc_ecma_lints capabilities to achieve more complete semantic analysis.

rust
use swc_ecma_lints::{rule::Rule, rules, rules::LintParams};

let result = HANDLER.set(&handler, || op(&handler));

match result {
  Ok(mut program) => {
    let unresolved_mark = Mark::new();
    let top_level_mark = Mark::new();
    let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark);
    let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark);

    program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false));

    let mut rules = rules::all(LintParams {
      program: &program,
      lint_config: &Default::default(),
      unresolved_ctxt,
      top_level_ctxt,
      es_version,
      source_map: cm.clone(),
    });

    HANDLER.set(&handler, || match &program {
      Program::Module(m) => {
        rules.lint_module(m);
      }
      Program::Script(s) => {
        rules.lint_script(s);
      }
    });

    if handler.has_errors() {
      let buffer = create_error_buffer(&wr, code);
      Err(buffer)
    } else {
      Ok(program)
    }
  }
}

Implement Semantic Analysis On JavaScript Side

However, from the following PR and discussion it can be known:

After testing, it was found that the efficiency of swc_ecma_lints detection was not high.

In order to optimize this problem, in Rollup's native parser, it was temporarily decided to remove the complete semantic analysis on the Rust side before scope analysis is implemented on the Rust side.

rust
let result = HANDLER.set(&handler, || op(&handler));

match result { 
  Ok(mut program) => { 
    let unresolved_mark = Mark::new(); 
    let top_level_mark = Mark::new(); 
    let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark); 
    let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark); 
    program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false)); 
    let mut rules = rules::all(LintParams { 
      program: &program, 
      lint_config: &Default::default(), 
      unresolved_ctxt, 
      top_level_ctxt, 
      es_version, 
      source_map: cm.clone(), 
    }); 
    HANDLER.set(&handler, || match &program { 
      Program::Module(m) => { 
        rules.lint_module(m); 
      } 
      Program::Script(s) => { 
        rules.lint_script(s); 
      } 
    }); 
    if handler.has_errors() { 
      let buffer = create_error_buffer(&wr, code); 
      Err(buffer) 
    } else { 
      Ok(program) 
    } 
  } 
} 
result.map_err(|_| { 
  if handler.has_errors() { 
    create_error_buffer(&wr, code) 
  } else { 
    panic!("Unexpected error in parse") 
  } 
}) 

The semantic analysis task is handed over to the JavaScript side for processing.

Rollup will perform more complete semantic analysis during the backtracking phase when instantiating AST Class Node. After testing, it was found that semantic analysis on the JavaScript side is much faster than using native SWC's swc_ecma_lints, indicating that semantic analysis on the JavaScript side did not have a significant impact on Rollup's performance.

Early Errors Detection Capability Comparison

To verify the actual effect of the above design, we wrote a comprehensive Early Errors test suite based on the ECMAScript specification, covering 97 test cases across 11 major categories. The test results are as follows:

Test Environment and Methodology

Test Environment:

  • Node.js: v22.x
  • Acorn: ^8.14.0
  • Rollup: ^4.53.3

Test Methodology:

  • Each test case contains code that should trigger an Early Error
  • Tests whether the parser correctly detects and reports the error
  • Some test cases verify that legal code should not produce errors

Test Coverage:

  1. Identifier and Binding Errors
  2. Function Parameter Errors
  3. Function Body Errors
  4. Class Errors
  5. Module Errors
  6. Control Flow Errors
  7. Assignment Errors
  8. Literal Errors
  9. Strict Mode Errors
  10. Regular Expression Errors
  11. for-in/of Errors
Parser/ModePass RatePass/TotalDescription
Acorn100%97/97Complete Early Errors implementation
SWC Parser (default)38.1%37/97Syntax analysis + strict mode detection
Rollup parseAst33.0%32/97Syntax analysis (IsModule::Unknown config)
Rollup Full Build54.6%53/97parseAst + JavaScript-side semantic analysis

Key Finding

Rollup's parseAst function does not fully implement ECMAScript Early Errors detection.

This validates the above discussion: SWC does not execute Static Semantics: Early Errors steps when generating AST, and the semantic analysis task is handed over to the JavaScript side for processing.

Detailed Detection Capability Analysis

1. Errors detectable by Rollup parseAst (pure syntax level):

These errors do not require scope analysis and can be detected during the lexical/syntax analysis phase:

Error TypeExampleSpec Reference
Control flow positionbreak; / continue;Section 14.8.1
return positionreturn 1; (outside function)Section 15.1.1
yield/await positionfunction f() { yield 1; }Section 15.5.1
Literal assignment1 = 2;Section 13.15.1
rest syntaxlet [...a, b] = x;Section 13.2.3
Regular expression/a/gg;Section 22.2.1.1
Numeric separator1__0;Section 12.9.1
Class constructorDuplicate/async/generator constructorSection 15.7.1
Label errorsL: L: for(;;) {}Section 14.13.1

Syntax analysis phase detection rate: 32/40 (80.0%)

2. Additional errors detected by Rollup Full Build (requiring scope analysis):

These errors are detected during AST node instantiation through the initialise() method:

Error TypeExampleDetection LocationSpec Reference
Duplicate let/const declarationlet a=1; let a=2;Scope.addDeclaration()Section 14.3.1.1
Duplicate parametersfunction f(a, a) {}ParameterScope.addParameterDeclaration()Section 15.1.1
Duplicate exportsexport default 1; export default 2;Module.assertUniqueExportName()Section 16.2.1.1
Duplicate import bindingsimport { a, a } from "x"Module.addImport()Section 16.2.1.1
const reassignmentconst x=1; x=2;AssignmentExpression.initialise()Section 14.3.1.1

Semantic analysis phase detection rate: 21/57 (36.8%)

3. Early Errors not implemented by Rollup:

The following Early Errors are not detected in Rollup:

Error TypeExampleSpec ReferenceDescription
eval/arguments restrictionfunction f(eval) {}Section 15.1.1Strict mode reserved word
await as identifierlet await = 1;Section 15.8.1Module top-level/async restriction
Octal literal010;Section 12.9.4.1Forbidden in strict mode
Octal escape"\07";Section 12.9.4.1Forbidden in strict mode
Duplicate private fieldsclass A { #x; #x; }Section 15.7.1Class private field detection
Duplicate proto{ __proto__: 1, __proto__: 2 }Section 13.2.5.1Object literal restriction
delete identifierdelete x;Section 13.5.1.1Forbidden in strict mode
let as variable namevar let = 1;Section 13.3.1.1Strict mode reserved word
super() positionclass A { foo() { super(); } }Section 15.7.1Only in constructor

Architecture Diagram

bash
┌─────────────────────────────────────────────────────────────┐
                      Rust Side (SWC)                          │
├─────────────────────────────────────────────────────────────┤
  Source Code Lexical Analysis Syntax Analysis SWC AST ArrayBuffer

 Basic syntax error detection (break/continue/return position, etc.) │
 Does not perform scope analysis
 Does not detect duplicate declarations/exports

  Syntax analysis phase detection rate: 80.0% (32/40)         │
  Overall detection rate: 33.0% (32/97)                       │
└─────────────────────┬───────────────────────────────────────┘
 ArrayBuffer (binary format)

┌─────────────────────────────────────────────────────────────┐
                   JavaScript Side (Rollup)                     │
├─────────────────────────────────────────────────────────────┤
  convertNode()  new NodeConstructor()  node.initialise()  

 Build scope chain (Scope/ChildScope)                      │
 Detect duplicate declarations (addDeclaration)             │
 Detect duplicate parameters (addParameterDeclaration)      │
 Detect duplicate exports (addExport  assertUniqueExportName) │
 Detect const reassignment (AssignmentExpression.initialise) │
 Some strict mode restrictions not implemented

  Semantic analysis phase detection rate: 36.8% (21/57)       │
  Full Build overall detection rate: 54.6% (53/97)            │
└─────────────────────────────────────────────────────────────┘

Practical Impact

  • parseAst(): Only syntax errors are detected, no semantic analysis. Suitable for scenarios that only need AST structure.
  • Full Build: Core semantic errors are detected (duplicate declarations/exports, etc.), but not all Early Errors. Suitable for actual bundling scenarios.
  • Acorn: 100% Early Errors detection. Suitable for scenarios requiring complete specification validation.

Design Trade-offs

Rollup's design choices reflect trade-offs in engineering practice:

  1. Implements the most critical semantic detection for bundling: Duplicate bindings, duplicate exports, etc., which would cause runtime errors
  2. Omits some strict mode related detection: These are typically handled by IDEs or linters (such as ESLint)
  3. Maintains performance advantages: Avoids complete semantic analysis on the Rust side

This layered design allows each tool to focus on its core task while ensuring the correctness of the final output.

Deep Analysis: Early Errors Detection Mechanism

1. Definition and Classification of Early Errors

According to the ECMAScript specification (ECMA-262), Early Errors are errors that must be detected and reported during the static analysis phase before code execution. These errors span multiple levels from syntax constraints to semantic constraints.

From an implementation perspective, Early Errors can be divided into two major categories:

Category 1: Errors detectable during syntax analysis (approximately 41%)

These errors only require current syntactic context information to determine, without the need to maintain symbol tables or scope chains. Typical examples include:

  • Control flow statement position constraints: e.g., break/continue must be inside loops or switch
  • Assignment target legality checks: e.g., literals cannot be left-hand values of assignments
  • Destructuring syntax constraints: e.g., rest element must be last
  • Literal syntax constraints: e.g., numeric separators, regular expression syntax
  • Class constructor syntax constraints: e.g., no duplicate constructor, cannot be async/generator

Category 2: Errors during semantic analysis (approximately 59%)

These errors require building and maintaining symbol tables, scope chains, or module binding tables to detect. Including:

  • Duplicate declaration detection: Conflicts between let/const/var
  • Duplicate parameter detection: Function parameter names cannot be duplicated
  • Duplicate export/import detection: Module export/import bindings cannot be duplicated
  • const reassignment detection: Constants cannot be reassigned
  • Strict mode identifier restrictions: Usage restrictions for eval/arguments

Based on comprehensive testing with 97 specification test cases, different parsers show significantly different coverage rates due to different design goals and implementation strategies.


2. Quantitative Comparison of Parser Detection Capabilities

Through systematic testing and verification, the Early Errors detection capabilities of each parser show clear stratification:

ParserSyntax Analysis PhaseSemantic Analysis PhaseTotalDescription
Acorn40/40 (100%)57/57 (100%)97/97 (100%)Complete implementation
SWC Parser (default)37/40 (92.5%)0/57 (0%)37/97 (38.1%)Includes strict mode detection
SWC Parser (Unknown)32/40 (80%)0/57 (0%)32/97 (33.0%)Does not check strict mode
Rollup parseAst32/40 (80%)0/57 (0%)32/97 (33.0%)= SWC Unknown mode
Rollup Full Build32/40 (80%)21/57 (36.8%)53/97 (54.6%)+ JS-side semantic analysis

Key Findings:

  • Acorn, as a complete implementation conforming to the ECMAScript specification, achieves full detection in both phases
  • SWC Parser completely skips semantic analysis requiring symbol tables, but can detect strict mode constraints in its default configuration
  • Rollup parseAst uses SWC as the underlying parser, but configuration differences result in slightly lower detection capability than SWC's default configuration
  • Rollup Full Build implements the most critical semantic detection for bundling scenarios on the JavaScript side

3. Difference Analysis Between SWC Parser and Rollup parseAst

Actual testing found that SWC Parser (called directly through the @swc/core JavaScript API) can detect 5 errors that Rollup parseAst cannot:

Error TypeExampleSWC (default)Rollup parseAstCharacteristic
for-in initializerfor (var a = 1 in x) {}Strict mode restriction
eval as parameter namefunction f(eval) {}Strict mode restriction
await as identifierlet await = 1;Module mode restriction
Octal literal010;Strict mode restriction
delete identifierdelete x;Strict mode restriction

These errors share a common characteristic: they are all related to ECMAScript's strict mode or module mode semantic restrictions.

Root Cause Verification:

Through version verification, it was confirmed that the version difference between @swc/core 1.15.2 and the swc_ecma_parser 27.0.2 built into Rollup 4.53.3 is minimal, and behavior is completely identical under the same configuration. Deep analysis of Rollup source code (rust/parse_ast/src/lib.rs) reveals:

rust
parse_js(
  cm, file, handler, target, syntax,
  IsModule::Unknown,  // ← key configuration
  Some(&comments),
)

Controlled Experiment Results:

SWC ConfigurationDetects for (var a = 1 in x) {}Detection Rate
Default (isModule unspecified)✅ ERROR37/97
isModule: true✅ ERROR37/97
isModule: false❌ NO ERROR32/97
isModule: "unknown"❌ NO ERROR32/97
Rollup parseAst❌ NO ERROR32/97

Conclusion: The root cause of the difference between Rollup parseAst and SWC Parser lies in the configuration method (IsModule::Unknown), not in parser version or implementation differences.


4. Deeper Trade-offs in Design Decisions

Rollup's choice of IsModule::Unknown configuration is not an implementation defect, but a well-considered engineering trade-off based on modern build toolchain architecture. This design decision reflects a precise balance between completeness, flexibility, and performance across multiple dimensions.

From a flexibility perspective, the Unknown mode gives the parser the ability to automatically determine code type, allowing it to adapt to both module code and script code — two fundamentally different semantic environments. This design avoids the risk of legitimate code failing to parse due to incorrect mode prediction, which is particularly important when processing third-party libraries or legacy code. Furthermore, this configuration supports users returning intermediate-state ASTs in the plugin system, where intermediate code may temporarily not conform to strict mode constraints but will be normalized in subsequent stages.

From a fault tolerance perspective, the ECMAScript specification defines different semantic rules for strict mode and script mode. Some code structures classified as Early Errors in strict mode are perfectly legal in script mode. If the parseAst phase forcefully enforced strict mode detection, such legal code would be incorrectly rejected. The Unknown mode defers the final correctness determination to the Full Build phase, maintaining high parser availability while transferring semantic integrity checking responsibility to a more appropriate execution phase.

From an architectural layering perspective, Rollup's design philosophy emphasizes separation of concerns. The parseAst phase focuses on efficiently completing lexical and syntax analysis and converting the SWC AST into a compact ArrayBuffer format, thereby avoiding the significant performance overhead of JSON serialization. The core goal of this phase is to produce a correct AST structural representation, not to perform complete semantic validation. The actual semantic analysis is systematically arranged during the AST node instantiation phase on the JavaScript side, where scope chains are built, symbol tables are maintained, and the most critical semantic detection for bundling scenarios is performed through the node.initialise() method. This cross-language-boundary division of responsibilities leverages Rust's performance advantages in syntax parsing while utilizing JavaScript's flexibility in dynamic semantic analysis.

From empirical data, the Full Build phase achieves a 54.6% Early Errors detection rate. This figure is not arbitrary but precisely covers the semantic error categories most threatening to bundling scenarios. Errors such as duplicate declarations, duplicate parameters, duplicate exports, and const reassignment would cause runtime errors or unpredictable behavior if not detected during build time. The intentionally omitted strict mode restrictions (such as eval as parameter name, octal literals, etc.) are typically caught early by IDE real-time diagnostics or static analysis tools like ESLint in modern development workflows, making it unnecessary to redundantly implement them at the bundler level. This layered defense strategy ensures zero missed critical errors while avoiding unnecessary performance overhead.


5. Terminology Clarification and Precise Expression

Based on the above systematic testing verification and source code analysis, it is necessary to standardize relevant terminology definitions to eliminate cognitive bias and establish a unified understanding framework.

In the context of the ECMAScript specification, Early Errors specifically refer to the set of errors that must be detected and reported during the static analysis phase before code execution. This concept covers the complete error spectrum from syntax constraints to semantic constraints, encompassing two inseparable phases of syntax analysis and semantic analysis. This research, based on 97 typical test scenarios constructed from the ECMA-262 specification, comprehensively covers dimensions including identifier binding, function parameters, class definitions, module systems, control flow, assignment expressions, literal syntax, and strict mode restrictions, forming a quantitative evaluation benchmark for Early Errors detection capability.

From an implementation perspective, Early Errors in the syntax analysis phase specifically refer to errors that can be determined without maintaining symbol tables or scope chains, relying only on current syntactic context information. These errors account for approximately 41% (40/97) of the test scenarios, with typical representatives including control flow statement position constraints (such as break/continue context restrictions), assignment target legality checks (such as prohibiting assignment to literals), and destructuring syntax constraints (such as rest element position requirements). Their common characteristic is that detection logic can be directly completed during AST construction through stack-based context tracking, requiring no additional data structures. In contrast, Early Errors in the semantic analysis phase specifically refer to errors that must rely on symbol tables, scope chains, or module binding tables for accurate determination. These errors account for approximately 59% (57/97) of the test scenarios, including duplicate declaration detection (requiring querying whether a binding with the same name already exists in the current scope), duplicate parameter detection (requiring maintaining parameter scope), and duplicate export detection (requiring maintaining module export tables). Their essence is verifying code compliance by building a static semantic model of the program.

For different parsers' detection capabilities, precise quantitative descriptions need to be established. SWC Parser, in its default configuration (isModule unspecified) or explicitly configured as module mode (isModule: true), can detect 37 Early Errors (38.1%), which includes semantic constraints specific to strict mode and module mode. However, when configured as script mode (isModule: false) or unknown mode (isModule: "unknown"), detection capability drops to 32 (33.0%), primarily because 5 strict-mode-related detections are disabled. This behavior conforms to the ECMAScript specification's definition of semantic differences between different code types and is not a parser implementation defect.

Rollup parseAst's detection capability is completely equivalent to SWC Parser under the IsModule::Unknown configuration, at 32 Early Errors (33.0%). Through source code analysis, it is confirmed that Rollup explicitly passes the IsModule::Unknown parameter when calling SWC's parse_js function, and this configuration choice directly determines its behavior characteristics in strict mode constraint detection. It is worth emphasizing that these 32 detected errors do not completely correspond to the theoretical classification of "syntax analysis phase Early Errors," because some errors (such as for-of initializer) theoretically belong to the syntax analysis phase but exhibit differentiated detection behavior due to mode configuration. Therefore, a more accurate description should be: Rollup parseAst detects syntax constraints that SWC can recognize under Unknown mode, excluding constraints specific to strict mode or module mode.

Rollup Full Build's detection capability is the superposition of parseAst and JavaScript-side semantic analysis, totaling 53 Early Errors (54.6%). Of these, 32 come from the parseAst phase's syntax constraint detection, and 21 come from semantic analysis performed through the node.initialise() method during AST node instantiation on the JavaScript side. These 21 additionally detected errors precisely cover the semantic violations most threatening to bundling scenarios, including let/const/var declaration conflicts, function parameter duplication, module export/import duplication, and const constant reassignment. This selection of detection scope is not accidental but is based on deep understanding of JavaScript runtime behavior and the responsibility boundaries of build tools.

Combining the above analysis, the following precise expression paradigm can be formed: SWC Parser and Rollup parseAst both focus on Early Errors detection in the syntax analysis phase in terms of functional positioning, but due to configuration differences (IsModule::Unknown versus isModule:true), there is a quantitative gap of 5 errors in strict-mode-related error detection. Complete Early Errors detection must cover both syntax analysis and semantic analysis phases. Among current mainstream JavaScript parser ecosystems, only Acorn achieves the completeness target required by the specification. Rollup, through its architectural layering design strategy, selectively supplements partial semantic analysis capability on the JavaScript side. This implementation approach achieves the engineering optimum of performance, flexibility, and correctness under the premise of ensuring zero missed critical errors.


Semantic Analysis Detection Points

The main tasks of semantic analysis include the following:

  1. const_assign

    Example:

    ts
    export function logConstVariableReassignError() {
      return {
        code: CONST_REASSIGN,
        message: 'Cannot reassign a variable declared with `const`'
      };
    }
    ts
    // case
    const x = 1;
    x = 'string';
    
    // implementation
    export default class AssignmentExpression extends NodeBase {
      initialise(): void {
        super.initialise();
        if (this.left instanceof Identifier) {
          const variable = this.scope.variables.get(this.left.name);
          if (variable?.kind === 'const') {
            this.scope.context.error(
              logConstVariableReassignError(),
              this.left.start
            );
          }
        }
        this.left.setAssignedValue(this.right);
      }
    }
  2. duplicate_bindings

    ts
    export function logRedeclarationError(name: string): RollupLog {
      return {
        code: REDECLARATION_ERROR,
        message: `Identifier "${name}" has already been declared`
      };
    }
    ts
    // case
    import { x } from './b';
    const x = 1;
    
    // case2
    import { x } from './b';
    import { x } from './b';
    
    // implementation
    export default class Module {
      private addImport(node: ImportDeclaration): void {
        const source = node.source.value;
        this.addSource(source, node);
    
        for (const specifier of node.specifiers) {
          const localName = specifier.local.name;
          if (
            this.scope.variables.has(localName) ||
            this.importDescriptions.has(localName)
          ) {
            this.error(
              logRedeclarationError(localName),
              specifier.local.start
            );
          }
    
          const name =
            specifier instanceof ImportDefaultSpecifier
              ? 'default'
              : specifier instanceof ImportNamespaceSpecifier
                ? '*'
                : specifier.imported instanceof Identifier
                  ? specifier.imported.name
                  : specifier.imported.value;
          this.importDescriptions.set(localName, {
            module: null as never, // filled in later
            name,
            source,
            start: specifier.start
          });
        }
      }
    }
    ts
    // case
    {
      const a = 1;
      const a = 1;
    }
    
    // implementation
    export default class BlockScope extends ChildScope {
      addDeclaration(
        identifier: Identifier,
        context: AstContext,
        init: ExpressionEntity,
        destructuredInitPath: ObjectPath,
        kind: VariableKind
      ): LocalVariable {
        if (kind === 'var') {
          const name = identifier.name;
          const existingVariable =
            this.hoistedVariables?.get(name) ||
            (this.variables.get(name) as LocalVariable | undefined);
          if (existingVariable) {
            if (
              existingVariable.kind === 'var' ||
              (kind === 'var' && existingVariable.kind === 'parameter')
            ) {
              existingVariable.addDeclaration(identifier, init);
              return existingVariable;
            }
            return context.error(
              logRedeclarationError(name),
              identifier.start
            );
          }
          const declaredVariable = this.parent.addDeclaration(
            identifier,
            context,
            init,
            destructuredInitPath,
            kind
          );
          // Necessary to make sure the init is deoptimized for conditional declarations.
          // We cannot call deoptimizePath here.
          declaredVariable.markInitializersForDeoptimization();
          // We add the variable to this and all parent scopes to reliably detect conflicts
          this.addHoistedVariable(name, declaredVariable);
          return declaredVariable;
        }
        return super.addDeclaration(
          identifier,
          context,
          init,
          destructuredInitPath,
          kind
        );
      }
    }
    ts
    // case
    try {
    } catch (e) {
      const a = 1;
      const a = 2;
    }
    
    // implementation
    export default class CatchBodyScope extends ChildScope {
      addDeclaration(
        identifier: Identifier,
        context: AstContext,
        init: ExpressionEntity,
        destructuredInitPath: ObjectPath,
        kind: VariableKind
      ): LocalVariable {
        if (kind === 'var') {
          const name = identifier.name;
          const existingVariable =
            this.hoistedVariables?.get(name) ||
            (this.variables.get(name) as LocalVariable | undefined);
          if (existingVariable) {
            const existingKind = existingVariable.kind;
            if (
              existingKind === 'parameter' &&
              // If this is a destructured parameter, it is forbidden to redeclare
              existingVariable.declarations[0].parent.type ===
                NodeType.CatchClause
            ) {
              // If this is a var with the same name as the catch scope parameter,
              // the assignment actually goes to the parameter and the var is
              // hoisted without assignment. Locally, it is shadowed by the
              // parameter
              const declaredVariable = this.parent.parent.addDeclaration(
                identifier,
                context,
                UNDEFINED_EXPRESSION,
                destructuredInitPath,
                kind
              );
              // To avoid the need to rewrite the declaration, we link the variable
              // names. If we ever implement a logic that splits initialization and
              // assignment for hoisted vars, the "renderLikeHoisted" logic can be
              // removed again.
              // We do not need to check whether there already is a linked
              // variable because then declaredVariable would be that linked
              // variable.
              existingVariable.renderLikeHoisted(declaredVariable);
              this.addHoistedVariable(name, declaredVariable);
              return declaredVariable;
            }
            if (existingKind === 'var') {
              existingVariable.addDeclaration(identifier, init);
              return existingVariable;
            }
            return context.error(
              logRedeclarationError(name),
              identifier.start
            );
          }
        }
      }
    }
    ts
    // case
    function fn() {
      const a = 1;
      const a = 2;
    }
    
    // implementation
    export default class FunctionBodyScope extends ChildScope {
      // There is stuff that is only allowed in function scopes, i.e. functions can
      // be redeclared, functions and var can redeclare each other
      addDeclaration(
        identifier: Identifier,
        context: AstContext,
        init: ExpressionEntity,
        destructuredInitPath: ObjectPath,
        kind: VariableKind
      ): LocalVariable {
        const name = identifier.name;
        const existingVariable =
          this.hoistedVariables?.get(name) ||
          (this.variables.get(name) as LocalVariable);
        if (existingVariable) {
          const existingKind = existingVariable.kind;
          if (
            (kind === 'var' || kind === 'function') &&
            (existingKind === 'var' ||
              existingKind === 'function' ||
              existingKind === 'parameter')
          ) {
            existingVariable.addDeclaration(identifier, init);
            return existingVariable;
          }
          context.error(logRedeclarationError(name), identifier.start);
        }
        const newVariable = new LocalVariable(
          identifier.name,
          identifier,
          init,
          destructuredInitPath,
          context,
          kind
        );
        this.variables.set(name, newVariable);
        return newVariable;
      }
    }
    ts
    // case1
    import { a } from './b';
    const a = 1;
    
    // case2
    import { a } from './b';
    import { a } from './b';
    
    // implementation
    export default class ModuleScope extends ChildScope {
      addDeclaration(
        identifier: Identifier,
        context: AstContext,
        init: ExpressionEntity,
        destructuredInitPath: ObjectPath,
        kind: VariableKind
      ): LocalVariable {
        if (this.context.module.importDescriptions.has(identifier.name)) {
          context.error(
            logRedeclarationError(identifier.name),
            identifier.start
          );
        }
        return super.addDeclaration(
          identifier,
          context,
          init,
          destructuredInitPath,
          kind
        );
      }
    }
    ts
    // case
    const a = 1;
    const a = 2;
    
    export default class Scope {
      /*
    Redeclaration rules:
    - var can redeclare var
    - in function scopes, function and var can redeclare function and var
    - var is hoisted across scopes, function remains in the scope it is declared
    - var and function can redeclare function parameters, but parameters cannot redeclare parameters
    - function cannot redeclare catch scope parameters
    - var can redeclare catch scope parameters in a way
    	- if the parameter is an identifier and not a pattern
    	- then the variable is still declared in the hoisted outer scope, but the initializer is assigned to the parameter
    - const, let, class, and function except in the cases above cannot redeclare anything
     */
      addDeclaration(
        identifier: Identifier,
        context: AstContext,
        init: ExpressionEntity,
        destructuredInitPath: ObjectPath,
        kind: VariableKind
      ): LocalVariable {
        const name = identifier.name;
        const existingVariable =
          this.hoistedVariables?.get(name) ||
          (this.variables.get(name) as LocalVariable);
        if (existingVariable) {
          if (kind === 'var' && existingVariable.kind === 'var') {
            existingVariable.addDeclaration(identifier, init);
            return existingVariable;
          }
          context.error(logRedeclarationError(name), identifier.start);
        }
        const newVariable = new LocalVariable(
          identifier.name,
          identifier,
          init,
          destructuredInitPath,
          context,
          kind
        );
        this.variables.set(name, newVariable);
        return newVariable;
      }
    }
  3. duplicate_exports

    ts
    export function logDuplicateExportError(name: string): RollupLog {
      return {
        code: DUPLICATE_EXPORT,
        message: `Duplicate export "${name}"`
      };
    }
    
    export default class Module {
      private assertUniqueExportName(name: string, nodeStart: number) {
        if (this.exports.has(name) || this.reexportDescriptions.has(name)) {
          this.error(logDuplicateExportError(name), nodeStart);
        }
      }
    }
    ts
    // case
    export default 1;
    export default 2;
    
    // implementation
    export default class Module {
      private addExport(
        node:
          | ExportAllDeclaration
          | ExportNamedDeclaration
          | ExportDefaultDeclaration
      ): void {
        if (node instanceof ExportDefaultDeclaration) {
          // export default foo;
    
          this.assertUniqueExportName('default', node.start);
          this.exports.set('default', {
            identifier: node.variable.getAssignedVariableName(),
            localName: 'default'
          });
        }
      }
    }
    ts
    // case
    export * as a from './b';
    export * as a from './b';
    
    // implementation
    export default class Module {
      private addExport(
        node: ExportAllDeclaration | ExportNamedDeclaration
      ): void {
        if (node instanceof ExportAllDeclaration) {
          const source = node.source.value;
          this.addSource(source, node);
          if (node.exported) {
            // export * as name from './other'
    
            const name =
              node.exported instanceof Literal
                ? node.exported.value
                : node.exported.name;
            this.assertUniqueExportName(name, node.exported.start);
            this.reexportDescriptions.set(name, {
              localName: '*',
              module: null as never, // filled in later,
              source,
              start: node.start
            });
          } else {
            // export * from './other'
    
            this.exportAllSources.add(source);
          }
        }
      }
    }
    ts
    // case
    export { a } from './b';
    export { a } from './b';
    
    // implementation
    export default class Module {
      private addExport(
        node: ExportAllDeclaration | ExportNamedDeclaration
      ): void {
        if (node.source instanceof Literal) {
          // export { name } from './other'
    
          const source = node.source.value;
          this.addSource(source, node);
          for (const { exported, local, start } of node.specifiers) {
            const name =
              exported instanceof Literal ? exported.value : exported.name;
            this.assertUniqueExportName(name, start);
            this.reexportDescriptions.set(name, {
              localName: local instanceof Literal ? local.value : local.name,
              module: null as never, // filled in later,
              source,
              start
            });
          }
        }
      }
    }
    ts
    // case1
    export const a = 1;
    export const a = 2;
    
    // case2
    export function a() {}
    export function a() {}
    
    // case3
    export { a, a };
    
    // implementation
    export default class Module {
      private addExport(node: ExportNamedDeclaration): void {
        if (node.declaration) {
          const declaration = node.declaration;
          if (declaration instanceof VariableDeclaration) {
            // export var { foo, bar } = ...
            // export var foo = 1, bar = 2;
    
            for (const declarator of declaration.declarations) {
              for (const localName of extractAssignedNames(declarator.id)) {
                this.assertUniqueExportName(localName, declarator.id.start);
                this.exports.set(localName, { identifier: null, localName });
              }
            }
          } else {
            // export function foo () {}
    
            const localName = (declaration.id as Identifier).name;
            this.assertUniqueExportName(localName, declaration.id!.start);
            this.exports.set(localName, { identifier: null, localName });
          }
        }
      }
    }
  4. no_dupe_args

    ts
    export function logDuplicateArgumentNameError(name: string): RollupLog {
      return {
        code: DUPLICATE_ARGUMENT_NAME,
        message: `Duplicate argument name "${name}"`
      };
    }
    ts
    // case
    function fn(a, a) {}
    
    // implementation
    export default class ParameterScope extends ChildScope {
      /**
       * Adds a parameter to this scope. Parameters must be added in the correct
       * order, i.e. from left to right.
       */
      addParameterDeclaration(
        identifier: Identifier,
        argumentPath: ObjectPath
      ): ParameterVariable {
        const { name, start } = identifier;
        const existingParameter = this.variables.get(name);
        if (existingParameter) {
          return this.context.error(
            logDuplicateArgumentNameError(name),
            start
          );
        }
        const variable = new ParameterVariable(
          name,
          identifier,
          argumentPath,
          this.context
        );
        this.variables.set(name, variable);
        // We also add it to the body scope to detect name conflicts with local
        // variables. We still need the intermediate scope, though, as parameter
        // defaults are NOT taken from the body scope but from the parameters or
        // outside scope.
        this.bodyScope.addHoistedVariable(name, variable);
        return variable;
      }
    }

From the above implementation, it can be seen that semantic analysis is heavily dependent on the lexical scope in which the AST Node resides. Of course, the above semantic analysis is the most basic. Rollup internally also performs other semantic analyses, such as side effect analysis, module circular dependency analysis, strict syntax restrictions (such as namespace objects cannot be called, imported references cannot be reassigned, etc.), which are beyond what Acorn can do.

Since the internal implementation of swc_ecma_lints may have performance issues, this is a temporary solution. In the future, Rollup will add scope analysis in the execution context on the Rust side, implementing complete semantic analysis on the Rust side. At that time, the complete semantic analysis task will be handed over to the Rust side for processing.

Optimize AST Parsing

Rollup provides this.parser for plugin context to allow user plugins to use native SWC capabilities to parse code into AST. User plugins can return parsed AST in load and transform hooks, and Rollup will reuse the parsed AST from the user plugin.

If the user plugin does not parse AST (i.e., the plugin does not return AST in load and transform hooks), then AST will be handled as a fallback. After the transform stage is completed, the transpiled code will be parsed into ESTree AST using native Rust capabilities.

precautions for using this.parser

Currently, Rollup has removed AST semantic analysis on the Rust side. In other words, using Rollup-provided this.parser API to parse code into AST in the plugin context has not completed semantic analysis.

If the user plugin needs to determine whether the generated AST conforms to semantic analysis requirements during implementation, the user plugin needs to use other tools to perform semantic analysis on the AST.

If the user does not need to ensure the generated AST conforms to semantic analysis during implementation, Rollup will automatically perform semantic analysis when recursively instantiating AST Node classes.

Even with native parsing capabilities, generating complex AST still takes time. In watch mode, Rollup will cache (see Rollup Incremental Build section for details) ESTree AST to skip the native SWC parsing process, recursively parsing the ESTree AST structure to instantiate Rollup's internal AST object instances.

Performance Comparison

Rollup Optimization vs Direct SWC JavaScript API

Before diving into the performance comparison, it is essential to clarify a key concept: Rollup's native parsing optimization and directly using @swc/core's JavaScript API are completely different implementation approaches.

Key Distinction

Directly using @swc/core JavaScript API:

ts
import swc from '@swc/core';
const ast = await swc.parse(code); // JSON serialization/deserialization

Rollup optimization:

ts
// Rust side: SWC parsing → SWC AST → convert to ESTree AST → write to ArrayBuffer
// JavaScript side: build AST instances directly from ArrayBuffer
const astBuffer = await parseAsync(code);
const ast = convertProgram(astBuffer); // no JSON parsing

The former requires the complete JSON serialization (Rust) → JSON deserialization (JavaScript) process, while the latter transfers data almost losslessly through ArrayBuffer binary transfer.

Pure Parser Performance Benchmark

To verify the serialization overhead problem mentioned in the Native Interaction Challenge, we conducted a pure parser performance benchmark comparing the JavaScript API performance of each parser directly.

Test Environment

bash
Node.js:  v22.14.0
Platform: darwin arm64 (Apple M1)
Memory:   16GB

Parser versions:
- @swc/core:     ^1.15.2  (Rust implementation)
- rollup:        ^4.53.2  (Rust implementation)
- acorn:         ^8.15.0  (pure JavaScript)
- @babel/parser: ^7.28.5  (pure JavaScript)

Test Results

FileSizeSWCRollupAcornBabelFastestSWC Slowdown
colors.js1.1 KB11,63155,04655,69451,696Acorn4.79x
underscore42.5 KB218947894818Rollup4.34x
backbone58.7 KB201835805681Rollup4.15x
mootools156.7 KB43194183159Rollup4.54x
jquery262 KB2914113999Rollup4.86x
yui330.4 KB42173202163Acorn4.78x
jquery.mobile442.2 KB20849350Acorn4.52x
angular701.9 KB259611767Acorn4.70x
three.js1.2 MB6242314Rollup3.87x
larger.js2.3 MB315129Rollup4.37x
typescript.js8.2 MB1442Rollup/Acorn4.00x

Legend:Green = FastestRed = Slowest (SWC)Orange = Performance gap ratio

Interpreting Test Results

Testing found that directly using @swc/core's JavaScript API was the slowest in all tests, averaging 4.3 times slower than pure JavaScript implementations.

This is exactly the problem described in the Native Interaction Challenge:

bash
Complete call chain:
JavaScript
 [FFI call overhead]
Rust parser (fast!)
 [JSON serialization: serde_json::to_string]
JSON string
 [transfer]
JavaScript
 [JSON deserialization: JSON.parse]
JavaScript AST object

Total overhead = FFI + serialization + deserialization >> Rust algorithm advantage

AST Serialization Size Comparison

FileSource SizeSWC ASTRollup ASTAcorn ASTBabel ASTSWC/AcornBabel/Acorn
colors.js1.1 KB8,8856,8266,82621,4681.30x3.15x
underscore42.5 KB611,325409,026409,0261,158,5541.49x2.83x
backbone58.7 KB676,338492,315492,3151,407,9891.37x2.86x
mootools156.7 KB3,025,6562,207,3242,207,3245,490,0151.37x2.49x
jquery262 KB3,706,1722,684,1402,684,1407,296,2181.38x2.72x
yui330.4 KB2,687,8942,100,7332,100,7335,743,7291.28x2.73x
jquery.mobile442.2 KB5,853,2544,627,7874,627,78712,238,3611.26x2.64x
angular701.9 KB4,371,8593,000,6173,000,6179,127,2921.46x3.04x
three.js1.2 MB18,546,95413,789,21913,751,31034,315,4731.35x2.50x
larger.js2.3 MB35,738,74627,911,67427,835,50469,421,9561.28x2.49x
typescript.js8.2 MB91,256,34967,612,83767,567,418178,461,4261.35x2.64x
Average-----1.35x2.74x

Legend:Green = Best (smallest AST or lowest ratio)Yellow = Moderate (Babel AST or moderate ratio)Red = Poor (high ratio)

Unit: Serialized character count

Key Findings:

  • SWC AST is on average 35% larger than Acorn (1.35x), meaning more data needs to be serialized.
  • Babel AST is on average 174% larger than Acorn (2.74x), but Babel is a pure JS implementation without cross-language serialization.
  • When parsing the 8MB TypeScript.js, SWC needs to serialize 91MB of AST JSON.
  • Even small files (1KB) require serializing nearly 9KB of AST data.
  • Serialization overhead grows linearly with file size, which is why directly using SWC's JavaScript API is so slow.

Rollup Optimization Performance

Precisely because directly using SWC's JavaScript API has severe serialization overhead problems, Rollup adopted the ArrayBuffer optimization:

  1. Avoids JSON serialization: Writes directly to binary ArrayBuffer on the Rust side.
  2. Avoids JSON deserialization: JavaScript side reads data directly from ArrayBuffer.
  3. Smaller size: ArrayBuffer size is only about 1/3 of JSON.

Testing also found that when the parsed character count reaches 319,869,952, Acorn parsing AST would report a stack overflow error.

bash
<--- Last few GCs --->

[69821:0x120078000]    15364 ms: Mark-sweep 4062.9 (4143.2) -> 4059.0 (4143.2) MB, 703.2 / 0.0 ms  (average mu = 0.293, current mu = 0.102) allocation failure; scavenge might not succeed
[69821:0x120078000]    16770 ms: Mark-sweep 4075.3 (4143.2) -> 4071.5 (4169.0) MB, 1383.5 / 0.0 ms  (average mu = 0.143, current mu = 0.016) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

In other words, 319,869,952 characters converts to:

  • UTF-8 file size: approximately 320 MB (assuming primarily ASCII)
  • JavaScript memory usage: approximately 640 MB (UTF-16 encoding, double)
  • Post JSON.parse objects: potentially exceeding 1 GB

This explains why Acorn encounters memory overflow issues at this scale.

Performance Analysis Summary

  1. Problems with directly using SWC JavaScript API:

    ProblemCauseImpact
    FFI call overheadJavaScript ↔ Rust boundaryCosts accumulate with frequent calls
    JSON serializationserde_json::to_stringLarge AST serialization is time-consuming
    JSON deserializationJSON.parseParsing large JSON strings is slow
    Memory usageGenerating intermediate JSON stringsExtra memory allocation

    Test results: 3.87 - 4.86 times slower than pure JavaScript parsers (average 4.43 times)

  2. Advantages of Rollup's ArrayBuffer optimization:

    OptimizationImplementationBenefit
    Avoids JSON serializationDirect ArrayBuffer writeReduces Rust-side overhead
    Avoids JSON deserializationDirect ArrayBuffer readReduces JavaScript-side overhead
    Size optimizationBinary formatApproximately 1/3 the size
    Zero-copy transferSharedArrayBufferEfficient inter-thread transfer

    Test results:

    • Significant advantage for small files: In 1.1KB files, Rollup achieves 55,046 ops/sec, only 1.16% slower than Acorn (55,694 ops/sec)
    • Excellent performance for medium-large files: In 42.5KB - 2.3MB files, Rollup is 5.9% - 25% faster than Acorn
    • Even for very large files: In the 8.2MB TypeScript.js, both Rollup and Acorn achieve 4 ops/sec
  3. AST Serialization Overhead Analysis:

    Based on actual test data, AST serialization size directly impacts performance:

    FileSource SizeSWC ASTAcorn ASTBabel ASTSWC InflationBabel Inflation
    colors.js1.1 KB8,885 (≈8.9 KB)6,826 (≈6.8 KB)21,468 (≈21.5 KB)1.30x3.15x
    jquery262 KB3,706,172 (≈3.71 MB)2,684,140 (≈2.68 MB)7,296,218 (≈7.30 MB)1.38x2.72x
    typescript.js8.2 MB91,256,349 (≈91.3 MB)67,567,418 (≈67.6 MB)178,461,426 (≈178.5 MB)1.35x2.64x
    Data Conversion Note: Relationship Between Character Count and Byte Count

    The numbers in the table come from JSON.stringify(ast).length, representing character count (UTF-16 code unit count).

    Conversion uses SI units (decimal):

    bash
    1 KB = 1,000 bytes
    1 MB = 1,000,000 bytes
    1 GB = 1,000,000,000 bytes

    Note: Binary units can also be used (1 KiB = 1,024 bytes, 1 MiB = 1,048,576 bytes), but to maintain consistency with the benchmark data source, this article uniformly uses SI units.

    Why does character count ≈ UTF-8 file size?

    Because JSON AST content primarily consists of ASCII characters (letters, numbers, punctuation, keywords like "type", "start", etc.), and ASCII characters occupy 1 byte in UTF-8 encoding:

    javascript
    const astJson = JSON.stringify(ast);
    console.log(astJson.length);                    // 91,256,349 characters
    console.log(Buffer.byteLength(astJson, 'utf8')); // ≈91,256,349 bytes
    
    // Conversion using SI units
    fs.writeFileSync('ast.json', astJson, 'utf8');
    console.log(fs.statSync('ast.json').size);       // 91,256,349 bytes
    console.log((91_256_349 / 1_000_000).toFixed(1)); // 91.3 MB

    But JavaScript memory usage is double!

    JavaScript internally uses UTF-16 encoding to store strings. In UTF-16, each code unit occupies 2 bytes:

    javascript
    console.log(Buffer.byteLength(astJson, 'utf16le')); // ≈182.5 MB

    Why is memory double?

    • 1 UTF-16 code unit = 2 bytes (fixed)
    • JSON AST characters are almost all ASCII/BMP characters ("type", "start", {, }, etc.)
    • Each BMP character = 1 code unit = 2 bytes
    • Therefore: 91,256,349 characters = 91,256,349 code units × 2 = 182,512,698 bytes ≈ 182.5 MB

    Special Cases in UTF-16

    Supplementary plane characters (such as emoji 😀) require 2 code units (called surrogate pairs):

    javascript
    const emoji = "😀";
    console.log(emoji.length);                        // 2 (2 UTF-16 code units)
    console.log(Buffer.byteLength(emoji, 'utf16le')); // 4 bytes
    console.log([...emoji].length);                   // 1 (actual character count)

    But JSON AST does not contain emoji, so it can be simplified to: 1 character = 2 bytes.

    Complete Encoding Conversion Flow:

    PhaseData SizeEncodingDescription
    Rust serialization≈91.3 MBUTF-8serde_json::to_string
    Cross FFI transfer≈91.3 MBUTF-8Passed to JavaScript
    JavaScript memory≈182.5 MBUTF-16Node.js auto-converts to UTF-16
    Post JSON.parse objectHundreds of MB-Parsed object structure takes more memory

    This also explains why JSON.parse on large JSON is so slow: not only must it parse 91M characters, but it also needs to construct object structures in memory occupying hundreds of MB.

    :::

    Key Findings:

    • SWC AST is on average 35% larger than Acorn (1.35x), requiring more data to serialize
    • Babel AST is on average 174% larger than Acorn (2.74x), but Babel is a pure JS implementation without cross-language serialization
    • When parsing the 8MB TypeScript.js, SWC needs to serialize 91.3MB of UTF-8 JSON (JavaScript memory usage 182.5MB)
    • Even small files (1KB) require serializing nearly 9KB of AST data
    • Serialization overhead grows linearly with file size, which is the fundamental reason why directly using SWC JavaScript API is slow
    • Double memory pressure: UTF-8 serialization + UTF-16 deserialization, actual memory usage approaches 2 times the file size
  4. Stability Analysis:

    Based on Relative Margin of Error (RME %) test data:

    ParserAverage ErrorBest ScenarioCharacteristics
    Acorn±3.65%angular (±1.25%)Most stable, suitable for production
    Rollup±3.69%mootools (±0.88%)Stability close to Acorn
    Babel±4.09%colors.js (±1.01%)Stable for small files, more volatile for large files
    SWC±4.51%typescript.js (±0.98%)Excellent stability for very large files (no GC pauses)

    Unexpected finding: Although SWC is the slowest, it shows excellent stability for very large files. This is because Rust has no GC pauses, while pure JS implementations are affected by V8 GC and JIT optimization.

  5. Detailed Performance Multiplier Comparison:

    SWC's performance gap relative to the fastest parser:

    FileFile SizeFastest ParserSWC Slower By
    colors.js1.1 KBAcorn4.79x
    jquery262 KBRollup4.86x
    mootools156.7 KBRollup4.54x
    three.js1.2 MBRollup3.87x
    typescript.js8.2 MBRollup/Acorn4.00x

    Average slowdown: 4.43x (range: 3.87x - 4.86x)

  6. Trend Analysis:

    • Rollup optimization: Parsing time growth is small, suitable for large-scale module parsing
    • Acorn: Parsing time growth is larger, but still competent for very large module scenarios
    • SWC: Consistently 4.43 times slower than pure JS implementations, proving FFI + JSON serialization overhead exceeds algorithm advantages
    • Extreme scenarios: At 300MB+ code volumes, Acorn encounters memory overflow, while Rollup's optimization can handle it normally

Core Conclusion

Rollup's performance improvement does not come from simply switching to SWC, but from the carefully designed ArrayBuffer optimization.

If @swc/core's JavaScript API is used directly, performance actually drops significantly (average 4.43 times slower).

Key Data Support:

  • SWC needs to serialize 35% more AST than Acorn (8MB source → 91MB UTF-8 JSON → 182MB UTF-16 memory)
  • Double memory pressure: UTF-8 serialization (91MB) + UTF-16 deserialization (182MB), actual memory usage approaches 2 times the file size
  • The triple overhead of FFI boundary + JSON serialization/deserialization + encoding conversion completely negates Rust's algorithm advantage
  • Rollup avoids JSON serialization and encoding conversion through ArrayBuffer, reducing size to 1/3, achieving genuine performance improvement

This case illustrates well: native code does not automatically mean faster — cross-language boundary costs, data serialization, and character encoding conversion must be considered. Rollup achieves Rust's true performance advantage only by eliminating the serialization bottleneck and encoding conversion overhead.

Contributors

Changelog

Discuss

Released under the CC BY-SA 4.0 License. (fc15202)