Native Parser
Reference Materials
Compared to JavaScript, the Rust native language has performance advantages in algorithm execution. Rollup decided to switch from the JavaScript-side Acorn parser to the Rust-side SWC parser, which has the ability to efficiently parse complex AST. This serves as a core change in Rollup v4.
Challenges
Native Interaction
Directly using SWC's JavaScript reference and parsing complex AST through the SWC.parse JavaScript interface would incur significant communication overhead.
import swc from '@swc/core';
const code = `
const a = 1;
function add(a, b) {
return a + b;
}
`;
swc
.parse(code, {
syntax: 'ecmascript',
comments: false,
script: true,
target: 'es3',
isModule: false
})
.then(module => {
module.type; // file type
module.body; // AST
});Through SWC's source code, it can be found that SWC internally uses the serde_json library to serialize the parsed program object into a JSON string, which is then passed to the JavaScript side.
#[napi]
impl Task for ParseTask {
type JsValue = String;
type Output = String;
fn compute(&mut self) -> napi::Result<Self::Output> {
let options: ParseOptions = deserialize_json(&self.options)?;
let fm = self
.c
.cm
.new_source_file(self.filename.clone().into(), self.src.clone());
let comments = if options.comments {
Some(self.c.comments() as &dyn Comments)
} else {
None
};
let program = try_with(self.c.cm.clone(), false, ErrorFormat::Normal, |handler| {
let mut p = self.c.parse_js(
fm,
handler,
options.target,
options.syntax,
options.is_module,
comments,
)?;
p.visit_mut_with(&mut resolver(
Mark::new(),
Mark::new(),
options.syntax.typescript(),
));
Ok(p)
})
.convert_err()?;
let ast_json = serde_json::to_string(&program)?;
Ok(ast_json)
}
fn resolve(&mut self, _env: Env, result: Self::Output) -> napi::Result<Self::JsValue> {
Ok(result)
}
}The JavaScript interface side then deserializes the AST string returned by the native parser into a JavaScript object through JSON.parse.
class Compiler {
async parse(
src: string,
options?: ParseOptions,
filename?: string
): Promise<Program> {
options = options || { syntax: 'ecmascript' };
options.syntax = options.syntax || 'ecmascript';
if (!bindings && !!fallbackBindings) {
throw new Error(
'Fallback bindings does not support this interface yet.'
);
} else if (!bindings) {
throw new Error('Bindings not found.');
}
if (bindings) {
const res = await bindings.parse(src, toBuffer(options), filename);
return JSON.parse(res);
} else if (fallbackBindings) {
return fallbackBindings.parse(src, options);
}
throw new Error('Bindings not found.');
}
}Between Rust and JavaScript, repeatedly serializing (Rust side) and deserializing (JavaScript side) the AST would almost completely erode the performance advantage of switching to the native parser (Rust) when parsing complex AST.
AST Compatibility
SWC has designed its own unique AST structure for the Rust side, while Rollup depends on the standard ESTree AST. The two differ in AST structure, so compatibility processing is needed.
It is worth noting that SWC provides the swc_estree_compat compatibility layer, which offers parsed output in both Babel AST and ESTree AST structures, but there are still performance issues.
Nearly, but it would be very slow at the moment because of JSON.parse of large AST is very slow
File Encoding
SWC uses UTF-8 encoding, while Rollup depends on standard JavaScript's UTF-16 encoding.
Differences between UTF-8 and UTF-16
UTF-8:
Variable Length Encoding:
UTF-8 uses 1 ~ 4 bytes to represent a character. ASCII characters (such as English letters and numbers) use 1 byte, while other characters (such as Chinese characters) may use 2 ~ 4 bytes.
1 byte:ASCIIcharacters (U+0000toU+007F).2 bytes: Extended Latin characters (U+0080toU+07FF).3 bytes: Basic Multilingual Plane (BMP) characters (U+0800toU+FFFF).4 bytes: Supplementary Plane characters (U+10000toU+10FFFF).
Backward Compatible with ASCII:
Since ASCII characters only occupy 1 byte in UTF-8, UTF-8 is fully compatible with ASCII encoding.
Encoding Efficiency:
- High efficiency for English and
ASCIItext (1 byte per character). - For non-Latin characters (such as Chinese, Japanese, etc.), typically requires 3 bytes.
- For supplementary plane characters (such as emojis), requires 4 bytes.
Use Cases:
- More suitable for network transmission and storage, especially for text primarily in
ASCII. - Commonly used in web pages,
JSONfiles, and other scenarios.
UTF-16:
Fixed or Variable Length Encoding:
UTF-16 typically uses 2 bytes to represent most commonly used characters, but for certain special characters (such as emojis), it may require 4 bytes.
2 bytes: Characters within theBMPrange (U+0000toU+FFFF, excluding surrogate pairs).4 bytes: Characters beyond theBMP(U+10000toU+10FFFF), using two16-bit units (called surrogate pairs).
Not Compatible with ASCII:
UTF-16 is not compatible with ASCII because ASCII characters require 2 bytes in UTF-16. However, both UTF-8 and UTF-16 can treat each ASCII character as one unit.
Encoding Efficiency:
- High efficiency for characters within the
BMPrange (such as most Chinese, Japanese) (2 bytes per character). - Low efficiency for
ASCIIcharacters (2 bytes per character). - Similar efficiency to
UTF-8for supplementary plane characters (requires 4 bytes).
Use Cases:
- More suitable for memory operations, especially in scenarios primarily using
BMPrange characters (such as Chinese environments). - Commonly used in internal character representation for
Windows,JavaScript, andJava.
Taking the string A你 as an example, the encoding results for the two methods are as follows:
UTF-8 Encoding:
"A": 1 byte, encoded as 0x41
"你": 3 bytes, encoded as 0xE4BDA0
UTF-16 Encoding:
"A": 2 bytes, encoded as 0x0041
"你": 2 bytes, encoded as 0x4F60
SWC (Rust) uses byte offsets. In other words, when calculating the position offset of "A你", it uses the byte offset calculation method:
"A你": "(1) + A(1) + 你(3) + "(1) =
6bytes (i.e.,Rustcalculates: "A你".len() = 4).
The position information recorded in the SWC AST is as follows:
{
"type": "Module",
"span": {
"start": 0,
"end": 6,
"ctxt": 0
},
"body": [
{
"type": "ExpressionStatement",
"span": {
"start": 0,
"end": 6,
"ctxt": 0
},
"expression": {
"type": "StringLiteral",
"span": {
"start": 0,
"end": 6,
"ctxt": 0
},
"value": "A你",
"hasEscape": false,
"kind": {
"type": "normal",
"containsQuote": true
}
}
}
],
"interpreter": null
}JavaScript uses character offsets with the UTF-16 encoding model, where all characters can be divided into 2-byte and 4-byte units. For JavaScript, the basic unit of a character is 2 bytes. In other words, special characters (emojis) occupy 4 bytes, which translates to 2 characters.
When calculating the position offset of "A你", the character offset calculation method is used:
"A你": "(1) + A(1) + 你(1) + "(1) =
4characters (i.e.,JavaScriptcalculates: "A你".length = 2).
The position information recorded in the ESTree AST is as follows:
{
"type": "Program",
"start": 0,
"end": 4,
"body": [
{
"type": "ExpressionStatement",
"start": 0,
"end": 4,
"expression": {
"type": "Literal",
"start": 0,
"end": 4,
"value": "A你",
"raw": "\"A你\""
},
"directive": "A你"
}
],
"sourceType": "module"
}Summary
The phenomenon described above is precisely the root cause of the divergence between character offsets (ESTree) and byte offsets (SWC).
ESTree / Babel / Acorn (Character Offsets):
Follows JavaScript's String.length logic.
Counts the number of UTF-16 encoding units (Code Units).
"你好": "(1) + 你(1) + 好(1) + "(1) =
4units (i.e., length4).
"👍": "(1) + 👍(2) + "(1) =
4units (i.e., length4).
JavaScript (and ESTree) counts length and offsets where one character unit refers to a 2-byte code unit, causing a 4-byte Emoji to be counted as 2 characters in length.
SWC (Byte Offsets):
Counts the number of bytes in the source file (typically UTF-8 encoded).
"你好": "(1) + 你(3) + 好(3) + "(1) = 8 bytes (i.e., length 8).
"👍": "(1) + 👍(4) + "(1) = 6 bytes (i.e., length 6).
The SourceMap chapter details how Rollup internally generates SourceMap, where Rollup relies on the position information provided by ESTree AST for mapping markers.
export class NodeBase extends ExpressionEntity implements ExpressionNode {
/**
* Override to perform special initialisation steps after the scope is
* initialised
*/
initialise(): void {
this.scope.context.magicString.addSourcemapLocation(this.start);
this.scope.context.magicString.addSourcemapLocation(this.end);
}
}Therefore, since native programming languages (Rust) and JavaScript use different encoding methods, the AST position information obtained is inconsistent. Rollup needs to readjust the SWC AST position information on the Rust native side to conform to JavaScript's character offset calculation method.
Performance
Optimize AST Compatibility
On the Rust side, after leveraging SWC's ability to parse code into SWC AST:
use swc_compiler_base::parse_js;
pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
// other code omitted
GLOBALS.set(&Globals::default(), || {
let result = catch_unwind(AssertUnwindSafe(|| {
let result = try_with_handler(&code_reference, |handler| {
parse_js(
cm,
file,
handler,
target,
syntax,
IsModule::Unknown,
Some(&comments),
)
});
match result {
Err(buffer) => buffer,
Ok(program) => {
let annotations = comments.take_annotations();
let converter = AstConverter::new(&code_reference, &annotations);
converter.convert_ast_to_buffer(&program)
}
}
}));
result.unwrap_or_else(|err| {
let msg = if let Some(msg) = err.downcast_ref::<&str>() {
msg
} else if let Some(msg) = err.downcast_ref::<String>() {
msg
} else {
"Unknown rust panic message"
};
get_panic_error_buffer(msg)
})
})
}Through the converter.convert_ast_to_buffer(&program) method, it recursively parses the SWC AST tree parsed by SWC, recalculating the ESTree AST position information corresponding to the SWC AST node position information:
/// Converts the given UTF-8 byte index to a UTF-16 byte index.
///
/// To be performant, this method assumes that the given index is not smaller
/// than the previous index. Additionally, it handles "annotations" like
/// `@__PURE__` comments in the process.
///
/// The logic for those comments is as follows:
/// - If the current index is at the start of an annotation, the annotation
/// is collected and the index is advanced to the end of the annotation.
/// - Otherwise, we check if the next character is a white-space character.
/// If not, we invalidate all collected annotations.
/// This is to ensure that we only collect annotations that directly precede
/// an expression and are not e.g. separated by a comma.
/// - If annotations are relevant for an expression, it can "take" the
/// collected annotations by calling `take_collected_annotations`. This
/// clears the internal buffer and returns the collected annotations.
/// - Invalidated annotations are attached to the Program node so that they
/// can all be removed from the source code later.
/// - If an annotation can influence a child that is separated by some
/// non-whitespace from the annotation, `keep_annotations_for_next` will
/// prevent annotations from being invalidated when the next position is
/// converted.
pub(crate) fn convert(&mut self, utf8_index: u32, keep_annotations_for_next: bool) -> u32 {
if self.current_utf8_index > utf8_index {
panic!(
"Cannot convert positions backwards: {} < {}",
utf8_index, self.current_utf8_index
);
}
while self.current_utf8_index < utf8_index {
if self.current_utf8_index == self.next_annotation_start {
let start = self.current_utf16_index;
let (next_comment_end, next_comment_kind) = self
.next_annotation
.map(|a| (a.comment.span.hi.0 - 1, a.kind.clone()))
.unwrap();
while self.current_utf8_index < next_comment_end {
let character = self.character_iterator.next().unwrap();
self.current_utf8_index += character.len_utf8() as u32;
self.current_utf16_index += character.len_utf16() as u32;
}
if let Annotation(kind) = next_comment_kind {
self.collected_annotations.push(ConvertedAnnotation {
start,
end: self.current_utf16_index,
kind,
});
}
self.next_annotation = self.annotation_iterator.next();
self.next_annotation_start = get_annotation_start(self.next_annotation);
} else {
let character = self.character_iterator.next().unwrap();
if !(self.keep_annotations || self.collected_annotations.is_empty()) {
match character {
' ' | '\t' | '\r' | '\n' => {}
_ => {
self.invalidate_collected_annotations();
}
}
}
self.current_utf8_index += character.len_utf8() as u32;
self.current_utf16_index += character.len_utf16() as u32;
}
}
self.keep_annotations = keep_annotations_for_next;
self.current_utf16_index
}It also needs to collect the information required by the ESTree AST node structure.
pub(crate) fn convert_statement(&mut self, statement: &Stmt) {
match statement {
Stmt::Break(break_statement) => self.store_break_statement(break_statement),
Stmt::Block(block_statement) => self.store_block_statement(block_statement, false),
Stmt::Continue(continue_statement) => self.store_continue_statement(continue_statement),
Stmt::Decl(declaration) => self.convert_declaration(declaration),
Stmt::Debugger(debugger_statement) => self.store_debugger_statement(debugger_statement),
Stmt::DoWhile(do_while_statement) => self.store_do_while_statement(do_while_statement),
Stmt::Empty(empty_statement) => self.store_empty_statement(empty_statement),
Stmt::Expr(expression_statement) => self.store_expression_statement(expression_statement),
Stmt::For(for_statement) => self.store_for_statement(for_statement),
Stmt::ForIn(for_in_statement) => self.store_for_in_statement(for_in_statement),
Stmt::ForOf(for_of_statement) => self.store_for_of_statement(for_of_statement),
Stmt::If(if_statement) => self.store_if_statement(if_statement),
Stmt::Labeled(labeled_statement) => self.store_labeled_statement(labeled_statement),
Stmt::Return(return_statement) => self.store_return_statement(return_statement),
Stmt::Switch(switch_statement) => self.store_switch_statement(switch_statement),
Stmt::Throw(throw_statement) => self.store_throw_statement(throw_statement),
Stmt::Try(try_statement) => self.store_try_statement(try_statement),
Stmt::While(while_statement) => self.store_while_statement(while_statement),
Stmt::With(_) => unimplemented!("Cannot convert Stmt::With"),
}
}Information required for ESTree AST nodes is extracted from the SWC AST node structure, and the position information under the ESTree AST specification is recalculated using UTF-16 encoding.
pub(crate) fn convert_item_list_with_state<T, S, F>(
&mut self,
item_list: &[T],
state: &mut S,
reference_position: usize,
convert_item: F,
) where
F: Fn(&mut AstConverter, &T, &mut S) -> bool,
{
// for an empty list, we leave the referenced position at zero
if item_list.is_empty() {
return;
}
self.update_reference_position(reference_position);
// store number of items in first position
self
.buffer
.extend_from_slice(&(item_list.len() as u32).to_ne_bytes());
let mut reference_position = self.buffer.len();
// make room for the reference positions of the items
self
.buffer
.resize(self.buffer.len() + item_list.len() * 4, 0);
for item in item_list {
let insert_position = (self.buffer.len() as u32) >> 2;
if convert_item(self, item, state) {
self.buffer[reference_position..reference_position + 4]
.copy_from_slice(&insert_position.to_ne_bytes());
}
reference_position += 4;
}
}Of course, it will also collect comments nodes, preparing for Rollup's Tree Shaking later. Note that the ESTree AST specification does not include comments nodes, but the information of comments nodes is crucial for Rollup's Tree Shaking, which can enhance the ability of Tree Shaking.
Rollup will collect these comment information in the ESTree AST and store them through the _rollupAnnotations property. In other words, the final returned ESTree AST additionally contains the _rollupAnnotations property, and its structure conforms to the ESTree AST specification.
pub(crate) fn take_collected_annotations(
&mut self,
kind: AnnotationKind,
) -> Vec<ConvertedAnnotation> {
let mut relevant_annotations = Vec::new();
for annotation in self.collected_annotations.drain(..) {
if annotation.kind == kind {
relevant_annotations.push(annotation);
} else {
self.invalid_annotations.push(annotation);
}
}
relevant_annotations
}
impl<'a> AstConverter<'a> {
pub(crate) fn store_call_expression(
&mut self,
span: &Span,
is_optional: bool,
callee: &StoredCallee,
arguments: &[ExprOrSpread],
is_chained: bool,
) {
// annotations
let annotations = self
.index_converter
.take_collected_annotations(AnnotationKind::Pure);
}
impl SequentialComments {
pub(crate) fn add_comment(&self, comment: Comment) {
if comment.text.starts_with('#') && comment.text.contains("sourceMappingURL=") {
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Annotation(AnnotationKind::SourceMappingUrl),
});
return;
}
let mut search_position = comment
.text
.chars()
.nth(0)
.map(|first_char| first_char.len_utf8())
.unwrap_or(0);
while let Some(Some(match_position)) = comment.text.get(search_position..).map(|s| s.find("__"))
{
search_position += match_position;
// Using a byte reference avoids UTF8 character boundary checks
match &comment.text.as_bytes()[search_position - 1] {
b'@' | b'#' => {
let annotation_slice = &comment.text[search_position..];
if annotation_slice.starts_with("__PURE__") {
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Annotation(AnnotationKind::Pure),
});
return;
}
if annotation_slice.starts_with("__NO_SIDE_EFFECTS__") {
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Annotation(AnnotationKind::NoSideEffects),
});
return;
}
}
_ => {}
}
search_position += 2;
}
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Comment,
});
}
pub(crate) fn take_annotations(self) -> Vec<AnnotationWithType> {
self.annotations.take()
}
}Finally, the returned ArrayBuffer structure compatible with ESTree AST is passed to the Rollup side, and the JavaScript side needs to guide the parsing of the ArrayBuffer compatible ESTree AST structure to instantiate the AST Class Node implemented internally by Rollup.
export default class Module {
async setSource({
ast,
code,
customTransformCache,
originalCode,
originalSourcemap,
resolvedIds,
sourcemapChain,
transformDependencies,
transformFiles,
...moduleOptions
}: TransformModuleJSON & {
resolvedIds?: ResolvedIdMap;
transformFiles?: EmittedFile[] | undefined;
}): Promise<void> {
// Measuring asynchronous code does not provide reasonable results
timeEnd('generate ast', 3);
const astBuffer = await parseAsync(
code,
false,
this.options.jsx !== false
);
timeStart('generate ast', 3);
this.ast = convertProgram(astBuffer, programParent, this.scope);
}
}Rollup's guidance on the buffer level:
function convertNode(
parent: Node | { context: AstContext; type: string },
parentScope: ChildScope,
position: number,
buffer: AstBuffer
): any {
const nodeType = buffer[position];
const NodeConstructor = nodeConstructors[nodeType];
/* istanbul ignore if: This should never be executed but is a safeguard against faulty buffers */
if (!NodeConstructor) {
console.trace();
throw new Error(`Unknown node type: ${nodeType}`);
}
const node = new NodeConstructor(parent, parentScope);
node.type = nodeTypeStrings[nodeType];
node.start = buffer[position + 1];
node.end = buffer[position + 2];
bufferParsers[nodeType](node, position + 3, buffer);
node.initialise();
return node;
}Optimize Native Interaction
As mentioned above, directly using the JavaScript reference exposed by SWC will repeatedly serialize and deserialize the AST between Rust and JavaScript. When processing complex AST, the parsing efficiency almost completely erodes the performance advantage of switching to the native parser (Rust).
The solution is as follows:
Use
ArrayBufferto transfer the parsedASTbetweenRustandJavaScript.
Do not consider using SWC's JavaScript reference, but directly use SWC's Rust-side reference in Rust.
use swc_compiler_base::parse_js;
pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
GLOBALS.set(&Globals::default(), || {
let result = catch_unwind(AssertUnwindSafe(|| {
let result = try_with_handler(&code_reference, |handler| {
parse_js(
cm,
file,
handler,
target,
syntax,
IsModule::Unknown,
Some(&comments),
)
});
match result {
Err(buffer) => buffer,
Ok(program) => {
let annotations = comments.take_annotations();
let converter = AstConverter::new(&code_reference, &annotations);
converter.convert_ast_to_buffer(&program)
}
}
}));
});
}At the same time, Rollup will convert the SWC AST parsed by SWC into a compatible ESTree AST binary format on the Rust side, and then pass it as (array) buffer to JavaScript.
match result {
Err(buffer) => buffer,
Ok(program) => {
let annotations = comments.take_annotations();
let converter = AstConverter::new(&code_reference, &annotations);
converter.convert_ast_to_buffer(&program)
}
}Passing ArrayBuffer is basically a lossless operation, so we only need to teach the JavaScript side how to parse the AST instance from the ArrayBuffer structure. In addition, the size of ArrayBuffer is only about one-third of the stringified JSON.
The ArrayBuffer data format is also efficient for transferring between different threads. For example, parsing can be done in a WebWorker, and after completion, the ArrayBuffer format AST can be passed losslessly to the main thread.
On the Node.js side, napi-rs is used to interact with Rust code, and wasm-pack is used for building on the browser side.
Optimize Semantic Analysis
Parser Semantic Analysis Design
Directly calling SWC's use swc_compiler_base::parse_js on the Rust side will not execute semantic analysis, only handling lexical analysis and syntax analysis. That is, the following code can be parsed normally into SWC AST in SWC without errors.
const a = 1;
const a = 2;This is different from Acorn's parsing approach. Acorn additionally performs complete static semantic analysis --- Static Semantics: Early Errors when generating AST, detecting errors before program execution.
ECMAScript Static Semantics: Early Errors
Early Errors are a static semantic error detection mechanism defined in the ECMAScript specification. According to the ECMA-262 specification, these errors must be detected and reported during the parsing phase before code execution.
Authoritative Specification References:
- ECMA-262 Section 13.2.1 - Block Statement Early Errors
- ECMA-262 Section 14.3.1.1 - let/const Declaration Early Errors
- ECMA-262 Section 15.1.1 - Function Definition Early Errors
- ECMA-262 Section 15.7.1 - Class Definition Early Errors
- ECMA-262 Section 16.2.1.1 - Module Semantics Early Errors
The fundamental reason is that Acorn is designed as a parser that conforms to the ECMAScript specification. Before the JavaScript engine executes code, ECMAScript requires the execution of Static Semantics: Early Errors steps (essentially static semantic analysis), which are errors that need to be detected and reported during the parsing and early syntax analysis phase. These errors are checked statically, meaning they can be found without actually running the code.
Browsers, Node.js and other built-in
JavaScriptengines also executeStatic Semantics: Early Errorssteps before executing code.
The significance of the specification is:
- Early Detection of Issues: Potential errors can be found before the code is actually executed, avoiding issues that may only surface at runtime.
- Performance Improvement: Since these checks are completed in the static analysis stage, they can improve code execution efficiency without waiting until runtime to discover errors.
- Ensure Language Consistency: Through a unified early error check mechanism, ensure that
JavaScriptcode can be processed consistently in different environments. - Help Developers Write Better Code: These rules also guide developers to follow better programming practices.
SWC, Babel and other parsers do not execute Static Semantics: Early Errors steps when generating AST, meaning their design goals differ from Acorn. Let's first introduce why they separate syntax analysis and static semantic analysis.
Performance and Complexity Trade-off
Implementing
Early Errorsdetection requires the parser to do the following:- Simulate and maintain the scope and scope chain of the execution context for the current statement.
- Static rule checks.
- Detection of other static semantic rules defined in the language specification.
- Syntax restriction rule detection.
- Module system static verification rule detection.
Although the detection complexity is not particularly high, in large projects, if users need to perform
Early Errorschecks every time they transpile new code, the cumulative complexity of completeEarly Errorschecks may bring non-negligible performance overhead.Toolchain Division of Labor
SWC,Babeland other parsers' focus is on code transformation, mainly injected into the build system's code transformation pipeline in the form of plugins. For tools seeking to integrate deeply into various build system ecosystems, the easiest approach is to maintain the single responsibility principle.By separating parsing and semantic analysis:
- Parser can focus on generating accurate
AST. - Semantic Analyzer can focus on checking code correctness.
- Each part is easier to maintain and optimize.
- Parser can focus on generating accurate
Flexibility
In complex application module transpilation processes, it is usually not a one-step process but involves intermediate states, where intermediate code is largely non-compliant with semantic specifications. If transpilation tools perform strict semantic analysis, such code cannot pass compilation, impacting extensibility. Modern development toolchains balance development flexibility and code quality by distributing different checks to different stages and executing semantic analysis on demand.
Babel, SWC choose to separate the responsibilities of syntax analysis and Early Errors detection. In the plugin code transpilation stage, code is parsed into AST for lexical analysis and syntax analysis only, without executing Early Errors checks (static semantic analysis). Instead, at the appropriate time (such as when Rollup's transform stage is completed), Bundlers (such as Rollup) control and execute Early Errors checks.
This design choice reflects an important principle in engineering practice: sometimes, breaking down a complex problem into multiple independent steps may be more effective than trying to solve everything in one step. This allows each tool to focus on its core task, thereby providing better functionality and performance.
Rollup Plugin System Design Inspiration
The above design approach is also reflected in Rollup's plugin system. When a user plugin returns AST in the load (or transform) hook, Rollup will reuse the AST returned by the user plugin in subsequent transform hooks. Before Rollup completes the transform stage, Rollup will not perform any semantic analysis on the reused AST.
const a = 1;
const a = 2;For the above example, Acorn will provide the following error message.
while (this.type !== tt.braceR) {
const element = this.parseClassElement(node.superClass !== null);
if (element) {
classBody.body.push(element);
if (
element.type === 'MethodDefinition' &&
element.kind === 'constructor'
) {
if (hadConstructor)
this.raiseRecoverable(
element.start,
'Duplicate constructor in the same class'
);
hadConstructor = true;
} else if (
element.key &&
element.key.type === 'PrivateIdentifier' &&
isPrivateNameConflicted(privateNameMap, element)
) {
this.raiseRecoverable(
element.key.start,
`Identifier '#${element.key.name}' has already been declared`
);
}
}
}Error Prompt
Line 2: Identifier 'a' has already been declared.
Therefore, Rollup needs to leverage swc_ecma_lints capabilities to achieve more complete semantic analysis.
use swc_ecma_lints::{rule::Rule, rules, rules::LintParams};
let result = HANDLER.set(&handler, || op(&handler));
match result {
Ok(mut program) => {
let unresolved_mark = Mark::new();
let top_level_mark = Mark::new();
let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark);
let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark);
program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false));
let mut rules = rules::all(LintParams {
program: &program,
lint_config: &Default::default(),
unresolved_ctxt,
top_level_ctxt,
es_version,
source_map: cm.clone(),
});
HANDLER.set(&handler, || match &program {
Program::Module(m) => {
rules.lint_module(m);
}
Program::Script(s) => {
rules.lint_script(s);
}
});
if handler.has_errors() {
let buffer = create_error_buffer(&wr, code);
Err(buffer)
} else {
Ok(program)
}
}
}Implement Semantic Analysis On JavaScript Side
However, from the following PR and discussion it can be known:
After testing, it was found that the efficiency of swc_ecma_lints detection was not high.
In order to optimize this problem, in Rollup's native parser, it was temporarily decided to remove the complete semantic analysis on the Rust side before scope analysis is implemented on the Rust side.
let result = HANDLER.set(&handler, || op(&handler));
match result {
Ok(mut program) => {
let unresolved_mark = Mark::new();
let top_level_mark = Mark::new();
let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark);
let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark);
program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false));
let mut rules = rules::all(LintParams {
program: &program,
lint_config: &Default::default(),
unresolved_ctxt,
top_level_ctxt,
es_version,
source_map: cm.clone(),
});
HANDLER.set(&handler, || match &program {
Program::Module(m) => {
rules.lint_module(m);
}
Program::Script(s) => {
rules.lint_script(s);
}
});
if handler.has_errors() {
let buffer = create_error_buffer(&wr, code);
Err(buffer)
} else {
Ok(program)
}
}
}
result.map_err(|_| {
if handler.has_errors() {
create_error_buffer(&wr, code)
} else {
panic!("Unexpected error in parse")
}
}) The semantic analysis task is handed over to the JavaScript side for processing.
Rollup will perform more complete semantic analysis during the backtracking phase when instantiating AST Class Node. After testing, it was found that semantic analysis on the JavaScript side is much faster than using native SWC's swc_ecma_lints, indicating that semantic analysis on the JavaScript side did not have a significant impact on Rollup's performance.
Early Errors Detection Capability Comparison
To verify the actual effect of the above design, we wrote a comprehensive Early Errors test suite based on the ECMAScript specification, covering 97 test cases across 11 major categories. The test results are as follows:
Test Environment and Methodology
Test Environment:
- Node.js: v22.x
- Acorn: ^8.14.0
- Rollup: ^4.53.3
Test Methodology:
- Each test case contains code that should trigger an Early Error
- Tests whether the parser correctly detects and reports the error
- Some test cases verify that legal code should not produce errors
Test Coverage:
- Identifier and Binding Errors
- Function Parameter Errors
- Function Body Errors
- Class Errors
- Module Errors
- Control Flow Errors
- Assignment Errors
- Literal Errors
- Strict Mode Errors
- Regular Expression Errors
- for-in/of Errors
| Parser/Mode | Pass Rate | Pass/Total | Description |
|---|---|---|---|
| Acorn | 100% | 97/97 | Complete Early Errors implementation |
| SWC Parser (default) | 38.1% | 37/97 | Syntax analysis + strict mode detection |
| Rollup parseAst | 33.0% | 32/97 | Syntax analysis (IsModule::Unknown config) |
| Rollup Full Build | 54.6% | 53/97 | parseAst + JavaScript-side semantic analysis |
Key Finding
Rollup's parseAst function does not fully implement ECMAScript Early Errors detection.
This validates the above discussion: SWC does not execute Static Semantics: Early Errors steps when generating AST, and the semantic analysis task is handed over to the JavaScript side for processing.
Detailed Detection Capability Analysis
1. Errors detectable by Rollup parseAst (pure syntax level):
These errors do not require scope analysis and can be detected during the lexical/syntax analysis phase:
| Error Type | Example | Spec Reference |
|---|---|---|
| Control flow position | break; / continue; | Section 14.8.1 |
| return position | return 1; (outside function) | Section 15.1.1 |
| yield/await position | function f() { yield 1; } | Section 15.5.1 |
| Literal assignment | 1 = 2; | Section 13.15.1 |
| rest syntax | let [...a, b] = x; | Section 13.2.3 |
| Regular expression | /a/gg; | Section 22.2.1.1 |
| Numeric separator | 1__0; | Section 12.9.1 |
| Class constructor | Duplicate/async/generator constructor | Section 15.7.1 |
| Label errors | L: L: for(;;) {} | Section 14.13.1 |
Syntax analysis phase detection rate: 32/40 (80.0%)
2. Additional errors detected by Rollup Full Build (requiring scope analysis):
These errors are detected during AST node instantiation through the initialise() method:
| Error Type | Example | Detection Location | Spec Reference |
|---|---|---|---|
| Duplicate let/const declaration | let a=1; let a=2; | Scope.addDeclaration() | Section 14.3.1.1 |
| Duplicate parameters | function f(a, a) {} | ParameterScope.addParameterDeclaration() | Section 15.1.1 |
| Duplicate exports | export default 1; export default 2; | Module.assertUniqueExportName() | Section 16.2.1.1 |
| Duplicate import bindings | import { a, a } from "x" | Module.addImport() | Section 16.2.1.1 |
| const reassignment | const x=1; x=2; | AssignmentExpression.initialise() | Section 14.3.1.1 |
Semantic analysis phase detection rate: 21/57 (36.8%)
3. Early Errors not implemented by Rollup:
The following Early Errors are not detected in Rollup:
| Error Type | Example | Spec Reference | Description |
|---|---|---|---|
| eval/arguments restriction | function f(eval) {} | Section 15.1.1 | Strict mode reserved word |
| await as identifier | let await = 1; | Section 15.8.1 | Module top-level/async restriction |
| Octal literal | 010; | Section 12.9.4.1 | Forbidden in strict mode |
| Octal escape | "\07"; | Section 12.9.4.1 | Forbidden in strict mode |
| Duplicate private fields | class A { #x; #x; } | Section 15.7.1 | Class private field detection |
| Duplicate proto | { __proto__: 1, __proto__: 2 } | Section 13.2.5.1 | Object literal restriction |
| delete identifier | delete x; | Section 13.5.1.1 | Forbidden in strict mode |
| let as variable name | var let = 1; | Section 13.3.1.1 | Strict mode reserved word |
| super() position | class A { foo() { super(); } } | Section 15.7.1 | Only in constructor |
Architecture Diagram
┌─────────────────────────────────────────────────────────────┐
│ Rust Side (SWC) │
├─────────────────────────────────────────────────────────────┤
│ Source Code → Lexical Analysis → Syntax Analysis → SWC AST → ArrayBuffer │
│ │
│ ✅ Basic syntax error detection (break/continue/return position, etc.) │
│ ❌ Does not perform scope analysis │
│ ❌ Does not detect duplicate declarations/exports │
│ │
│ Syntax analysis phase detection rate: 80.0% (32/40) │
│ Overall detection rate: 33.0% (32/97) │
└─────────────────────┬───────────────────────────────────────┘
│ ArrayBuffer (binary format)
▼
┌─────────────────────────────────────────────────────────────┐
│ JavaScript Side (Rollup) │
├─────────────────────────────────────────────────────────────┤
│ convertNode() → new NodeConstructor() → node.initialise() │
│ │
│ ✅ Build scope chain (Scope/ChildScope) │
│ ✅ Detect duplicate declarations (addDeclaration) │
│ ✅ Detect duplicate parameters (addParameterDeclaration) │
│ ✅ Detect duplicate exports (addExport → assertUniqueExportName) │
│ ✅ Detect const reassignment (AssignmentExpression.initialise) │
│ ❌ Some strict mode restrictions not implemented │
│ │
│ Semantic analysis phase detection rate: 36.8% (21/57) │
│ Full Build overall detection rate: 54.6% (53/97) │
└─────────────────────────────────────────────────────────────┘Practical Impact
- parseAst(): Only syntax errors are detected, no semantic analysis. Suitable for scenarios that only need AST structure.
- Full Build: Core semantic errors are detected (duplicate declarations/exports, etc.), but not all Early Errors. Suitable for actual bundling scenarios.
- Acorn: 100% Early Errors detection. Suitable for scenarios requiring complete specification validation.
Design Trade-offs
Rollup's design choices reflect trade-offs in engineering practice:
- Implements the most critical semantic detection for bundling: Duplicate bindings, duplicate exports, etc., which would cause runtime errors
- Omits some strict mode related detection: These are typically handled by IDEs or linters (such as ESLint)
- Maintains performance advantages: Avoids complete semantic analysis on the Rust side
This layered design allows each tool to focus on its core task while ensuring the correctness of the final output.
Deep Analysis: Early Errors Detection Mechanism
1. Definition and Classification of Early Errors
According to the ECMAScript specification (ECMA-262), Early Errors are errors that must be detected and reported during the static analysis phase before code execution. These errors span multiple levels from syntax constraints to semantic constraints.
From an implementation perspective, Early Errors can be divided into two major categories:
Category 1: Errors detectable during syntax analysis (approximately 41%)
These errors only require current syntactic context information to determine, without the need to maintain symbol tables or scope chains. Typical examples include:
- Control flow statement position constraints: e.g.,
break/continuemust be inside loops orswitch - Assignment target legality checks: e.g., literals cannot be left-hand values of assignments
- Destructuring syntax constraints: e.g., rest element must be last
- Literal syntax constraints: e.g., numeric separators, regular expression syntax
- Class constructor syntax constraints: e.g., no duplicate constructor, cannot be async/generator
Category 2: Errors during semantic analysis (approximately 59%)
These errors require building and maintaining symbol tables, scope chains, or module binding tables to detect. Including:
- Duplicate declaration detection: Conflicts between
let/const/var - Duplicate parameter detection: Function parameter names cannot be duplicated
- Duplicate export/import detection: Module export/import bindings cannot be duplicated
- const reassignment detection: Constants cannot be reassigned
- Strict mode identifier restrictions: Usage restrictions for
eval/arguments
Based on comprehensive testing with 97 specification test cases, different parsers show significantly different coverage rates due to different design goals and implementation strategies.
2. Quantitative Comparison of Parser Detection Capabilities
Through systematic testing and verification, the Early Errors detection capabilities of each parser show clear stratification:
| Parser | Syntax Analysis Phase | Semantic Analysis Phase | Total | Description |
|---|---|---|---|---|
| Acorn | 40/40 (100%) | 57/57 (100%) | 97/97 (100%) | Complete implementation |
| SWC Parser (default) | 37/40 (92.5%) | 0/57 (0%) | 37/97 (38.1%) | Includes strict mode detection |
| SWC Parser (Unknown) | 32/40 (80%) | 0/57 (0%) | 32/97 (33.0%) | Does not check strict mode |
| Rollup parseAst | 32/40 (80%) | 0/57 (0%) | 32/97 (33.0%) | = SWC Unknown mode |
| Rollup Full Build | 32/40 (80%) | 21/57 (36.8%) | 53/97 (54.6%) | + JS-side semantic analysis |
Key Findings:
- Acorn, as a complete implementation conforming to the ECMAScript specification, achieves full detection in both phases
- SWC Parser completely skips semantic analysis requiring symbol tables, but can detect strict mode constraints in its default configuration
- Rollup parseAst uses SWC as the underlying parser, but configuration differences result in slightly lower detection capability than SWC's default configuration
- Rollup Full Build implements the most critical semantic detection for bundling scenarios on the JavaScript side
3. Difference Analysis Between SWC Parser and Rollup parseAst
Actual testing found that SWC Parser (called directly through the @swc/core JavaScript API) can detect 5 errors that Rollup parseAst cannot:
| Error Type | Example | SWC (default) | Rollup parseAst | Characteristic |
|---|---|---|---|---|
| for-in initializer | for (var a = 1 in x) {} | ✅ | ❌ | Strict mode restriction |
| eval as parameter name | function f(eval) {} | ✅ | ❌ | Strict mode restriction |
| await as identifier | let await = 1; | ✅ | ❌ | Module mode restriction |
| Octal literal | 010; | ✅ | ❌ | Strict mode restriction |
| delete identifier | delete x; | ✅ | ❌ | Strict mode restriction |
These errors share a common characteristic: they are all related to ECMAScript's strict mode or module mode semantic restrictions.
Root Cause Verification:
Through version verification, it was confirmed that the version difference between @swc/core 1.15.2 and the swc_ecma_parser 27.0.2 built into Rollup 4.53.3 is minimal, and behavior is completely identical under the same configuration. Deep analysis of Rollup source code (rust/parse_ast/src/lib.rs) reveals:
parse_js(
cm, file, handler, target, syntax,
IsModule::Unknown, // ← key configuration
Some(&comments),
)Controlled Experiment Results:
| SWC Configuration | Detects for (var a = 1 in x) {} | Detection Rate |
|---|---|---|
| Default (isModule unspecified) | ✅ ERROR | 37/97 |
| isModule: true | ✅ ERROR | 37/97 |
| isModule: false | ❌ NO ERROR | 32/97 |
| isModule: "unknown" | ❌ NO ERROR | 32/97 |
| Rollup parseAst | ❌ NO ERROR | 32/97 |
Conclusion: The root cause of the difference between Rollup parseAst and SWC Parser lies in the configuration method (IsModule::Unknown), not in parser version or implementation differences.
4. Deeper Trade-offs in Design Decisions
Rollup's choice of IsModule::Unknown configuration is not an implementation defect, but a well-considered engineering trade-off based on modern build toolchain architecture. This design decision reflects a precise balance between completeness, flexibility, and performance across multiple dimensions.
From a flexibility perspective, the Unknown mode gives the parser the ability to automatically determine code type, allowing it to adapt to both module code and script code — two fundamentally different semantic environments. This design avoids the risk of legitimate code failing to parse due to incorrect mode prediction, which is particularly important when processing third-party libraries or legacy code. Furthermore, this configuration supports users returning intermediate-state ASTs in the plugin system, where intermediate code may temporarily not conform to strict mode constraints but will be normalized in subsequent stages.
From a fault tolerance perspective, the ECMAScript specification defines different semantic rules for strict mode and script mode. Some code structures classified as Early Errors in strict mode are perfectly legal in script mode. If the parseAst phase forcefully enforced strict mode detection, such legal code would be incorrectly rejected. The Unknown mode defers the final correctness determination to the Full Build phase, maintaining high parser availability while transferring semantic integrity checking responsibility to a more appropriate execution phase.
From an architectural layering perspective, Rollup's design philosophy emphasizes separation of concerns. The parseAst phase focuses on efficiently completing lexical and syntax analysis and converting the SWC AST into a compact ArrayBuffer format, thereby avoiding the significant performance overhead of JSON serialization. The core goal of this phase is to produce a correct AST structural representation, not to perform complete semantic validation. The actual semantic analysis is systematically arranged during the AST node instantiation phase on the JavaScript side, where scope chains are built, symbol tables are maintained, and the most critical semantic detection for bundling scenarios is performed through the node.initialise() method. This cross-language-boundary division of responsibilities leverages Rust's performance advantages in syntax parsing while utilizing JavaScript's flexibility in dynamic semantic analysis.
From empirical data, the Full Build phase achieves a 54.6% Early Errors detection rate. This figure is not arbitrary but precisely covers the semantic error categories most threatening to bundling scenarios. Errors such as duplicate declarations, duplicate parameters, duplicate exports, and const reassignment would cause runtime errors or unpredictable behavior if not detected during build time. The intentionally omitted strict mode restrictions (such as eval as parameter name, octal literals, etc.) are typically caught early by IDE real-time diagnostics or static analysis tools like ESLint in modern development workflows, making it unnecessary to redundantly implement them at the bundler level. This layered defense strategy ensures zero missed critical errors while avoiding unnecessary performance overhead.
5. Terminology Clarification and Precise Expression
Based on the above systematic testing verification and source code analysis, it is necessary to standardize relevant terminology definitions to eliminate cognitive bias and establish a unified understanding framework.
In the context of the ECMAScript specification, Early Errors specifically refer to the set of errors that must be detected and reported during the static analysis phase before code execution. This concept covers the complete error spectrum from syntax constraints to semantic constraints, encompassing two inseparable phases of syntax analysis and semantic analysis. This research, based on 97 typical test scenarios constructed from the ECMA-262 specification, comprehensively covers dimensions including identifier binding, function parameters, class definitions, module systems, control flow, assignment expressions, literal syntax, and strict mode restrictions, forming a quantitative evaluation benchmark for Early Errors detection capability.
From an implementation perspective, Early Errors in the syntax analysis phase specifically refer to errors that can be determined without maintaining symbol tables or scope chains, relying only on current syntactic context information. These errors account for approximately 41% (40/97) of the test scenarios, with typical representatives including control flow statement position constraints (such as break/continue context restrictions), assignment target legality checks (such as prohibiting assignment to literals), and destructuring syntax constraints (such as rest element position requirements). Their common characteristic is that detection logic can be directly completed during AST construction through stack-based context tracking, requiring no additional data structures. In contrast, Early Errors in the semantic analysis phase specifically refer to errors that must rely on symbol tables, scope chains, or module binding tables for accurate determination. These errors account for approximately 59% (57/97) of the test scenarios, including duplicate declaration detection (requiring querying whether a binding with the same name already exists in the current scope), duplicate parameter detection (requiring maintaining parameter scope), and duplicate export detection (requiring maintaining module export tables). Their essence is verifying code compliance by building a static semantic model of the program.
For different parsers' detection capabilities, precise quantitative descriptions need to be established. SWC Parser, in its default configuration (isModule unspecified) or explicitly configured as module mode (isModule: true), can detect 37 Early Errors (38.1%), which includes semantic constraints specific to strict mode and module mode. However, when configured as script mode (isModule: false) or unknown mode (isModule: "unknown"), detection capability drops to 32 (33.0%), primarily because 5 strict-mode-related detections are disabled. This behavior conforms to the ECMAScript specification's definition of semantic differences between different code types and is not a parser implementation defect.
Rollup parseAst's detection capability is completely equivalent to SWC Parser under the IsModule::Unknown configuration, at 32 Early Errors (33.0%). Through source code analysis, it is confirmed that Rollup explicitly passes the IsModule::Unknown parameter when calling SWC's parse_js function, and this configuration choice directly determines its behavior characteristics in strict mode constraint detection. It is worth emphasizing that these 32 detected errors do not completely correspond to the theoretical classification of "syntax analysis phase Early Errors," because some errors (such as for-of initializer) theoretically belong to the syntax analysis phase but exhibit differentiated detection behavior due to mode configuration. Therefore, a more accurate description should be: Rollup parseAst detects syntax constraints that SWC can recognize under Unknown mode, excluding constraints specific to strict mode or module mode.
Rollup Full Build's detection capability is the superposition of parseAst and JavaScript-side semantic analysis, totaling 53 Early Errors (54.6%). Of these, 32 come from the parseAst phase's syntax constraint detection, and 21 come from semantic analysis performed through the node.initialise() method during AST node instantiation on the JavaScript side. These 21 additionally detected errors precisely cover the semantic violations most threatening to bundling scenarios, including let/const/var declaration conflicts, function parameter duplication, module export/import duplication, and const constant reassignment. This selection of detection scope is not accidental but is based on deep understanding of JavaScript runtime behavior and the responsibility boundaries of build tools.
Combining the above analysis, the following precise expression paradigm can be formed: SWC Parser and Rollup parseAst both focus on Early Errors detection in the syntax analysis phase in terms of functional positioning, but due to configuration differences (IsModule::Unknown versus isModule:true), there is a quantitative gap of 5 errors in strict-mode-related error detection. Complete Early Errors detection must cover both syntax analysis and semantic analysis phases. Among current mainstream JavaScript parser ecosystems, only Acorn achieves the completeness target required by the specification. Rollup, through its architectural layering design strategy, selectively supplements partial semantic analysis capability on the JavaScript side. This implementation approach achieves the engineering optimum of performance, flexibility, and correctness under the premise of ensuring zero missed critical errors.
Semantic Analysis Detection Points
The main tasks of semantic analysis include the following:
const_assignExample:
tsexport function logConstVariableReassignError() { return { code: CONST_REASSIGN, message: 'Cannot reassign a variable declared with `const`' }; }ts// case const x = 1; x = 'string'; // implementation export default class AssignmentExpression extends NodeBase { initialise(): void { super.initialise(); if (this.left instanceof Identifier) { const variable = this.scope.variables.get(this.left.name); if (variable?.kind === 'const') { this.scope.context.error( logConstVariableReassignError(), this.left.start ); } } this.left.setAssignedValue(this.right); } }duplicate_bindingstsexport function logRedeclarationError(name: string): RollupLog { return { code: REDECLARATION_ERROR, message: `Identifier "${name}" has already been declared` }; }ts// case import { x } from './b'; const x = 1; // case2 import { x } from './b'; import { x } from './b'; // implementation export default class Module { private addImport(node: ImportDeclaration): void { const source = node.source.value; this.addSource(source, node); for (const specifier of node.specifiers) { const localName = specifier.local.name; if ( this.scope.variables.has(localName) || this.importDescriptions.has(localName) ) { this.error( logRedeclarationError(localName), specifier.local.start ); } const name = specifier instanceof ImportDefaultSpecifier ? 'default' : specifier instanceof ImportNamespaceSpecifier ? '*' : specifier.imported instanceof Identifier ? specifier.imported.name : specifier.imported.value; this.importDescriptions.set(localName, { module: null as never, // filled in later name, source, start: specifier.start }); } } }ts// case { const a = 1; const a = 1; } // implementation export default class BlockScope extends ChildScope { addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { if (kind === 'var') { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable | undefined); if (existingVariable) { if ( existingVariable.kind === 'var' || (kind === 'var' && existingVariable.kind === 'parameter') ) { existingVariable.addDeclaration(identifier, init); return existingVariable; } return context.error( logRedeclarationError(name), identifier.start ); } const declaredVariable = this.parent.addDeclaration( identifier, context, init, destructuredInitPath, kind ); // Necessary to make sure the init is deoptimized for conditional declarations. // We cannot call deoptimizePath here. declaredVariable.markInitializersForDeoptimization(); // We add the variable to this and all parent scopes to reliably detect conflicts this.addHoistedVariable(name, declaredVariable); return declaredVariable; } return super.addDeclaration( identifier, context, init, destructuredInitPath, kind ); } }ts// case try { } catch (e) { const a = 1; const a = 2; } // implementation export default class CatchBodyScope extends ChildScope { addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { if (kind === 'var') { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable | undefined); if (existingVariable) { const existingKind = existingVariable.kind; if ( existingKind === 'parameter' && // If this is a destructured parameter, it is forbidden to redeclare existingVariable.declarations[0].parent.type === NodeType.CatchClause ) { // If this is a var with the same name as the catch scope parameter, // the assignment actually goes to the parameter and the var is // hoisted without assignment. Locally, it is shadowed by the // parameter const declaredVariable = this.parent.parent.addDeclaration( identifier, context, UNDEFINED_EXPRESSION, destructuredInitPath, kind ); // To avoid the need to rewrite the declaration, we link the variable // names. If we ever implement a logic that splits initialization and // assignment for hoisted vars, the "renderLikeHoisted" logic can be // removed again. // We do not need to check whether there already is a linked // variable because then declaredVariable would be that linked // variable. existingVariable.renderLikeHoisted(declaredVariable); this.addHoistedVariable(name, declaredVariable); return declaredVariable; } if (existingKind === 'var') { existingVariable.addDeclaration(identifier, init); return existingVariable; } return context.error( logRedeclarationError(name), identifier.start ); } } } }ts// case function fn() { const a = 1; const a = 2; } // implementation export default class FunctionBodyScope extends ChildScope { // There is stuff that is only allowed in function scopes, i.e. functions can // be redeclared, functions and var can redeclare each other addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable); if (existingVariable) { const existingKind = existingVariable.kind; if ( (kind === 'var' || kind === 'function') && (existingKind === 'var' || existingKind === 'function' || existingKind === 'parameter') ) { existingVariable.addDeclaration(identifier, init); return existingVariable; } context.error(logRedeclarationError(name), identifier.start); } const newVariable = new LocalVariable( identifier.name, identifier, init, destructuredInitPath, context, kind ); this.variables.set(name, newVariable); return newVariable; } }ts// case1 import { a } from './b'; const a = 1; // case2 import { a } from './b'; import { a } from './b'; // implementation export default class ModuleScope extends ChildScope { addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { if (this.context.module.importDescriptions.has(identifier.name)) { context.error( logRedeclarationError(identifier.name), identifier.start ); } return super.addDeclaration( identifier, context, init, destructuredInitPath, kind ); } }ts// case const a = 1; const a = 2; export default class Scope { /* Redeclaration rules: - var can redeclare var - in function scopes, function and var can redeclare function and var - var is hoisted across scopes, function remains in the scope it is declared - var and function can redeclare function parameters, but parameters cannot redeclare parameters - function cannot redeclare catch scope parameters - var can redeclare catch scope parameters in a way - if the parameter is an identifier and not a pattern - then the variable is still declared in the hoisted outer scope, but the initializer is assigned to the parameter - const, let, class, and function except in the cases above cannot redeclare anything */ addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable); if (existingVariable) { if (kind === 'var' && existingVariable.kind === 'var') { existingVariable.addDeclaration(identifier, init); return existingVariable; } context.error(logRedeclarationError(name), identifier.start); } const newVariable = new LocalVariable( identifier.name, identifier, init, destructuredInitPath, context, kind ); this.variables.set(name, newVariable); return newVariable; } }duplicate_exportstsexport function logDuplicateExportError(name: string): RollupLog { return { code: DUPLICATE_EXPORT, message: `Duplicate export "${name}"` }; } export default class Module { private assertUniqueExportName(name: string, nodeStart: number) { if (this.exports.has(name) || this.reexportDescriptions.has(name)) { this.error(logDuplicateExportError(name), nodeStart); } } }ts// case export default 1; export default 2; // implementation export default class Module { private addExport( node: | ExportAllDeclaration | ExportNamedDeclaration | ExportDefaultDeclaration ): void { if (node instanceof ExportDefaultDeclaration) { // export default foo; this.assertUniqueExportName('default', node.start); this.exports.set('default', { identifier: node.variable.getAssignedVariableName(), localName: 'default' }); } } }ts// case export * as a from './b'; export * as a from './b'; // implementation export default class Module { private addExport( node: ExportAllDeclaration | ExportNamedDeclaration ): void { if (node instanceof ExportAllDeclaration) { const source = node.source.value; this.addSource(source, node); if (node.exported) { // export * as name from './other' const name = node.exported instanceof Literal ? node.exported.value : node.exported.name; this.assertUniqueExportName(name, node.exported.start); this.reexportDescriptions.set(name, { localName: '*', module: null as never, // filled in later, source, start: node.start }); } else { // export * from './other' this.exportAllSources.add(source); } } } }ts// case export { a } from './b'; export { a } from './b'; // implementation export default class Module { private addExport( node: ExportAllDeclaration | ExportNamedDeclaration ): void { if (node.source instanceof Literal) { // export { name } from './other' const source = node.source.value; this.addSource(source, node); for (const { exported, local, start } of node.specifiers) { const name = exported instanceof Literal ? exported.value : exported.name; this.assertUniqueExportName(name, start); this.reexportDescriptions.set(name, { localName: local instanceof Literal ? local.value : local.name, module: null as never, // filled in later, source, start }); } } } }ts// case1 export const a = 1; export const a = 2; // case2 export function a() {} export function a() {} // case3 export { a, a }; // implementation export default class Module { private addExport(node: ExportNamedDeclaration): void { if (node.declaration) { const declaration = node.declaration; if (declaration instanceof VariableDeclaration) { // export var { foo, bar } = ... // export var foo = 1, bar = 2; for (const declarator of declaration.declarations) { for (const localName of extractAssignedNames(declarator.id)) { this.assertUniqueExportName(localName, declarator.id.start); this.exports.set(localName, { identifier: null, localName }); } } } else { // export function foo () {} const localName = (declaration.id as Identifier).name; this.assertUniqueExportName(localName, declaration.id!.start); this.exports.set(localName, { identifier: null, localName }); } } } }no_dupe_argstsexport function logDuplicateArgumentNameError(name: string): RollupLog { return { code: DUPLICATE_ARGUMENT_NAME, message: `Duplicate argument name "${name}"` }; }ts// case function fn(a, a) {} // implementation export default class ParameterScope extends ChildScope { /** * Adds a parameter to this scope. Parameters must be added in the correct * order, i.e. from left to right. */ addParameterDeclaration( identifier: Identifier, argumentPath: ObjectPath ): ParameterVariable { const { name, start } = identifier; const existingParameter = this.variables.get(name); if (existingParameter) { return this.context.error( logDuplicateArgumentNameError(name), start ); } const variable = new ParameterVariable( name, identifier, argumentPath, this.context ); this.variables.set(name, variable); // We also add it to the body scope to detect name conflicts with local // variables. We still need the intermediate scope, though, as parameter // defaults are NOT taken from the body scope but from the parameters or // outside scope. this.bodyScope.addHoistedVariable(name, variable); return variable; } }
From the above implementation, it can be seen that semantic analysis is heavily dependent on the lexical scope in which the AST Node resides. Of course, the above semantic analysis is the most basic. Rollup internally also performs other semantic analyses, such as side effect analysis, module circular dependency analysis, strict syntax restrictions (such as namespace objects cannot be called, imported references cannot be reassigned, etc.), which are beyond what Acorn can do.
Since the internal implementation of swc_ecma_lints may have performance issues, this is a temporary solution. In the future, Rollup will add scope analysis in the execution context on the Rust side, implementing complete semantic analysis on the Rust side. At that time, the complete semantic analysis task will be handed over to the Rust side for processing.
Optimize AST Parsing
Rollup provides this.parser for plugin context to allow user plugins to use native SWC capabilities to parse code into AST. User plugins can return parsed AST in load and transform hooks, and Rollup will reuse the parsed AST from the user plugin.
If the user plugin does not parse AST (i.e., the plugin does not return AST in load and transform hooks), then AST will be handled as a fallback. After the transform stage is completed, the transpiled code will be parsed into ESTree AST using native Rust capabilities.
precautions for using this.parser
Currently, Rollup has removed AST semantic analysis on the Rust side. In other words, using Rollup-provided this.parser API to parse code into AST in the plugin context has not completed semantic analysis.
If the user plugin needs to determine whether the generated AST conforms to semantic analysis requirements during implementation, the user plugin needs to use other tools to perform semantic analysis on the AST.
If the user does not need to ensure the generated AST conforms to semantic analysis during implementation, Rollup will automatically perform semantic analysis when recursively instantiating AST Node classes.
Even with native parsing capabilities, generating complex AST still takes time. In watch mode, Rollup will cache (see Rollup Incremental Build section for details) ESTree AST to skip the native SWC parsing process, recursively parsing the ESTree AST structure to instantiate Rollup's internal AST object instances.
Performance Comparison
Rollup Optimization vs Direct SWC JavaScript API
Before diving into the performance comparison, it is essential to clarify a key concept: Rollup's native parsing optimization and directly using @swc/core's JavaScript API are completely different implementation approaches.
Key Distinction
Directly using @swc/core JavaScript API:
import swc from '@swc/core';
const ast = await swc.parse(code); // JSON serialization/deserializationRollup optimization:
// Rust side: SWC parsing → SWC AST → convert to ESTree AST → write to ArrayBuffer
// JavaScript side: build AST instances directly from ArrayBuffer
const astBuffer = await parseAsync(code);
const ast = convertProgram(astBuffer); // no JSON parsingThe former requires the complete JSON serialization (Rust) → JSON deserialization (JavaScript) process, while the latter transfers data almost losslessly through ArrayBuffer binary transfer.
Pure Parser Performance Benchmark
To verify the serialization overhead problem mentioned in the Native Interaction Challenge, we conducted a pure parser performance benchmark comparing the JavaScript API performance of each parser directly.
Test Environment
Node.js: v22.14.0
Platform: darwin arm64 (Apple M1)
Memory: 16GB
Parser versions:
- @swc/core: ^1.15.2 (Rust implementation)
- rollup: ^4.53.2 (Rust implementation)
- acorn: ^8.15.0 (pure JavaScript)
- @babel/parser: ^7.28.5 (pure JavaScript)Test Results
| File | Size | SWC | Rollup | Acorn | Babel | Fastest | SWC Slowdown |
|---|---|---|---|---|---|---|---|
| colors.js | 1.1 KB | 11,631 | 55,046 | 55,694 | 51,696 | Acorn | 4.79x |
| underscore | 42.5 KB | 218 | 947 | 894 | 818 | Rollup | 4.34x |
| backbone | 58.7 KB | 201 | 835 | 805 | 681 | Rollup | 4.15x |
| mootools | 156.7 KB | 43 | 194 | 183 | 159 | Rollup | 4.54x |
| jquery | 262 KB | 29 | 141 | 139 | 99 | Rollup | 4.86x |
| yui | 330.4 KB | 42 | 173 | 202 | 163 | Acorn | 4.78x |
| jquery.mobile | 442.2 KB | 20 | 84 | 93 | 50 | Acorn | 4.52x |
| angular | 701.9 KB | 25 | 96 | 117 | 67 | Acorn | 4.70x |
| three.js | 1.2 MB | 6 | 24 | 23 | 14 | Rollup | 3.87x |
| larger.js | 2.3 MB | 3 | 15 | 12 | 9 | Rollup | 4.37x |
| typescript.js | 8.2 MB | 1 | 4 | 4 | 2 | Rollup/Acorn | 4.00x |
Legend:Green = FastestRed = Slowest (SWC)Orange = Performance gap ratio
Interpreting Test Results
Testing found that directly using @swc/core's JavaScript API was the slowest in all tests, averaging 4.3 times slower than pure JavaScript implementations.
This is exactly the problem described in the Native Interaction Challenge:
Complete call chain:
JavaScript
↓ [FFI call overhead]
Rust parser (fast!)
↓ [JSON serialization: serde_json::to_string]
JSON string
↓ [transfer]
JavaScript
↓ [JSON deserialization: JSON.parse]
JavaScript AST object
Total overhead = FFI + serialization + deserialization >> Rust algorithm advantageAST Serialization Size Comparison
| File | Source Size | SWC AST | Rollup AST | Acorn AST | Babel AST | SWC/Acorn | Babel/Acorn |
|---|---|---|---|---|---|---|---|
| colors.js | 1.1 KB | 8,885 | 6,826 | 6,826 | 21,468 | 1.30x | 3.15x |
| underscore | 42.5 KB | 611,325 | 409,026 | 409,026 | 1,158,554 | 1.49x | 2.83x |
| backbone | 58.7 KB | 676,338 | 492,315 | 492,315 | 1,407,989 | 1.37x | 2.86x |
| mootools | 156.7 KB | 3,025,656 | 2,207,324 | 2,207,324 | 5,490,015 | 1.37x | 2.49x |
| jquery | 262 KB | 3,706,172 | 2,684,140 | 2,684,140 | 7,296,218 | 1.38x | 2.72x |
| yui | 330.4 KB | 2,687,894 | 2,100,733 | 2,100,733 | 5,743,729 | 1.28x | 2.73x |
| jquery.mobile | 442.2 KB | 5,853,254 | 4,627,787 | 4,627,787 | 12,238,361 | 1.26x | 2.64x |
| angular | 701.9 KB | 4,371,859 | 3,000,617 | 3,000,617 | 9,127,292 | 1.46x | 3.04x |
| three.js | 1.2 MB | 18,546,954 | 13,789,219 | 13,751,310 | 34,315,473 | 1.35x | 2.50x |
| larger.js | 2.3 MB | 35,738,746 | 27,911,674 | 27,835,504 | 69,421,956 | 1.28x | 2.49x |
| typescript.js | 8.2 MB | 91,256,349 | 67,612,837 | 67,567,418 | 178,461,426 | 1.35x | 2.64x |
| Average | - | - | - | - | - | 1.35x | 2.74x |
Legend:Green = Best (smallest AST or lowest ratio)Yellow = Moderate (Babel AST or moderate ratio)Red = Poor (high ratio)
Unit: Serialized character count
Key Findings:
SWC ASTis on average 35% larger thanAcorn(1.35x), meaning more data needs to be serialized.Babel ASTis on average 174% larger thanAcorn(2.74x), butBabelis a pureJSimplementation without cross-language serialization.- When parsing the
8MBTypeScript.js,SWCneeds to serialize 91MB ofAST JSON. - Even small files (
1KB) require serializing nearly9KBofASTdata. - Serialization overhead grows linearly with file size, which is why directly using
SWC'sJavaScript APIis so slow.
Rollup Optimization Performance
Precisely because directly using SWC's JavaScript API has severe serialization overhead problems, Rollup adopted the ArrayBuffer optimization:
- Avoids JSON serialization: Writes directly to binary
ArrayBufferon theRustside. - Avoids JSON deserialization:
JavaScriptside reads data directly fromArrayBuffer. - Smaller size:
ArrayBuffersize is only about 1/3 ofJSON.
Testing also found that when the parsed character count reaches 319,869,952, Acorn parsing AST would report a stack overflow error.
<--- Last few GCs --->
[69821:0x120078000] 15364 ms: Mark-sweep 4062.9 (4143.2) -> 4059.0 (4143.2) MB, 703.2 / 0.0 ms (average mu = 0.293, current mu = 0.102) allocation failure; scavenge might not succeed
[69821:0x120078000] 16770 ms: Mark-sweep 4075.3 (4143.2) -> 4071.5 (4169.0) MB, 1383.5 / 0.0 ms (average mu = 0.143, current mu = 0.016) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memoryIn other words, 319,869,952 characters converts to:
- UTF-8 file size: approximately 320 MB (assuming primarily ASCII)
- JavaScript memory usage: approximately 640 MB (UTF-16 encoding, double)
- Post JSON.parse objects: potentially exceeding 1 GB
This explains why Acorn encounters memory overflow issues at this scale.
Performance Analysis Summary
Problems with directly using SWC JavaScript API:
Problem Cause Impact FFI call overhead JavaScript ↔ Rust boundary Costs accumulate with frequent calls JSON serialization serde_json::to_stringLarge AST serialization is time-consuming JSON deserialization JSON.parseParsing large JSON strings is slow Memory usage Generating intermediate JSON strings Extra memory allocation Test results: 3.87 - 4.86 times slower than pure JavaScript parsers (average 4.43 times)
Advantages of Rollup's ArrayBuffer optimization:
Optimization Implementation Benefit Avoids JSON serialization Direct ArrayBuffer write Reduces Rust-side overhead Avoids JSON deserialization Direct ArrayBuffer read Reduces JavaScript-side overhead Size optimization Binary format Approximately 1/3 the size Zero-copy transfer SharedArrayBuffer Efficient inter-thread transfer Test results:
- Significant advantage for small files: In 1.1KB files, Rollup achieves 55,046 ops/sec, only 1.16% slower than Acorn (55,694 ops/sec)
- Excellent performance for medium-large files: In 42.5KB - 2.3MB files, Rollup is 5.9% - 25% faster than Acorn
- Even for very large files: In the 8.2MB TypeScript.js, both Rollup and Acorn achieve 4 ops/sec
AST Serialization Overhead Analysis:
Based on actual test data, AST serialization size directly impacts performance:
File Source Size SWC AST Acorn AST Babel AST SWC Inflation Babel Inflation colors.js 1.1 KB 8,885 (≈8.9 KB) 6,826 (≈6.8 KB) 21,468 (≈21.5 KB) 1.30x 3.15x jquery 262 KB 3,706,172 (≈3.71 MB) 2,684,140 (≈2.68 MB) 7,296,218 (≈7.30 MB) 1.38x 2.72x typescript.js 8.2 MB 91,256,349 (≈91.3 MB) 67,567,418 (≈67.6 MB) 178,461,426 (≈178.5 MB) 1.35x 2.64x Data Conversion Note: Relationship Between Character Count and Byte Count
The numbers in the table come from
JSON.stringify(ast).length, representing character count (UTF-16 code unit count).Conversion uses SI units (decimal):
bash1 KB = 1,000 bytes 1 MB = 1,000,000 bytes 1 GB = 1,000,000,000 bytesNote: Binary units can also be used (1 KiB = 1,024 bytes, 1 MiB = 1,048,576 bytes), but to maintain consistency with the benchmark data source, this article uniformly uses SI units.
Why does character count ≈ UTF-8 file size?
Because JSON AST content primarily consists of ASCII characters (letters, numbers, punctuation, keywords like
"type","start", etc.), and ASCII characters occupy 1 byte in UTF-8 encoding:javascriptconst astJson = JSON.stringify(ast); console.log(astJson.length); // 91,256,349 characters console.log(Buffer.byteLength(astJson, 'utf8')); // ≈91,256,349 bytes // Conversion using SI units fs.writeFileSync('ast.json', astJson, 'utf8'); console.log(fs.statSync('ast.json').size); // 91,256,349 bytes console.log((91_256_349 / 1_000_000).toFixed(1)); // 91.3 MBBut JavaScript memory usage is double!
JavaScript internally uses UTF-16 encoding to store strings. In UTF-16, each code unit occupies 2 bytes:
javascriptconsole.log(Buffer.byteLength(astJson, 'utf16le')); // ≈182.5 MBWhy is memory double?
- 1 UTF-16 code unit = 2 bytes (fixed)
- JSON AST characters are almost all ASCII/BMP characters (
"type","start",{,}, etc.) - Each BMP character = 1 code unit = 2 bytes
- Therefore: 91,256,349 characters = 91,256,349 code units × 2 = 182,512,698 bytes ≈ 182.5 MB
Special Cases in UTF-16
Supplementary plane characters (such as emoji
😀) require 2 code units (called surrogate pairs):javascriptconst emoji = "😀"; console.log(emoji.length); // 2 (2 UTF-16 code units) console.log(Buffer.byteLength(emoji, 'utf16le')); // 4 bytes console.log([...emoji].length); // 1 (actual character count)But JSON AST does not contain emoji, so it can be simplified to: 1 character = 2 bytes.
Complete Encoding Conversion Flow:
Phase Data Size Encoding Description Rust serialization ≈91.3 MB UTF-8 serde_json::to_stringCross FFI transfer ≈91.3 MB UTF-8 Passed to JavaScript JavaScript memory ≈182.5 MB UTF-16 Node.js auto-converts to UTF-16 Post JSON.parseobjectHundreds of MB - Parsed object structure takes more memory This also explains why
JSON.parseon large JSON is so slow: not only must it parse 91M characters, but it also needs to construct object structures in memory occupying hundreds of MB.:::
Key Findings:
- SWC AST is on average 35% larger than Acorn (1.35x), requiring more data to serialize
- Babel AST is on average 174% larger than Acorn (2.74x), but Babel is a pure JS implementation without cross-language serialization
- When parsing the 8MB TypeScript.js, SWC needs to serialize 91.3MB of UTF-8 JSON (JavaScript memory usage 182.5MB)
- Even small files (1KB) require serializing nearly 9KB of AST data
- Serialization overhead grows linearly with file size, which is the fundamental reason why directly using SWC JavaScript API is slow
- Double memory pressure: UTF-8 serialization + UTF-16 deserialization, actual memory usage approaches 2 times the file size
Stability Analysis:
Based on Relative Margin of Error (RME %) test data:
Parser Average Error Best Scenario Characteristics Acorn ±3.65% angular (±1.25%) Most stable, suitable for production Rollup ±3.69% mootools (±0.88%) Stability close to Acorn Babel ±4.09% colors.js (±1.01%) Stable for small files, more volatile for large files SWC ±4.51% typescript.js (±0.98%) Excellent stability for very large files (no GC pauses) Unexpected finding: Although SWC is the slowest, it shows excellent stability for very large files. This is because Rust has no GC pauses, while pure JS implementations are affected by V8 GC and JIT optimization.
Detailed Performance Multiplier Comparison:
SWC's performance gap relative to the fastest parser:
File File Size Fastest Parser SWC Slower By colors.js 1.1 KB Acorn 4.79x jquery 262 KB Rollup 4.86x mootools 156.7 KB Rollup 4.54x three.js 1.2 MB Rollup 3.87x typescript.js 8.2 MB Rollup/Acorn 4.00x Average slowdown: 4.43x (range: 3.87x - 4.86x)
Trend Analysis:
- Rollup optimization: Parsing time growth is small, suitable for large-scale module parsing
- Acorn: Parsing time growth is larger, but still competent for very large module scenarios
- SWC: Consistently 4.43 times slower than pure JS implementations, proving FFI + JSON serialization overhead exceeds algorithm advantages
- Extreme scenarios: At 300MB+ code volumes, Acorn encounters memory overflow, while Rollup's optimization can handle it normally
Core Conclusion
Rollup's performance improvement does not come from simply switching to SWC, but from the carefully designed ArrayBuffer optimization.
If @swc/core's JavaScript API is used directly, performance actually drops significantly (average 4.43 times slower).
Key Data Support:
- SWC needs to serialize 35% more AST than Acorn (8MB source → 91MB UTF-8 JSON → 182MB UTF-16 memory)
- Double memory pressure: UTF-8 serialization (91MB) + UTF-16 deserialization (182MB), actual memory usage approaches 2 times the file size
- The triple overhead of FFI boundary + JSON serialization/deserialization + encoding conversion completely negates Rust's algorithm advantage
- Rollup avoids JSON serialization and encoding conversion through ArrayBuffer, reducing size to 1/3, achieving genuine performance improvement
This case illustrates well: native code does not automatically mean faster — cross-language boundary costs, data serialization, and character encoding conversion must be considered. Rollup achieves Rust's true performance advantage only by eliminating the serialization bottleneck and encoding conversion overhead.