Native Parser
Reference Materials
Compared to javascript
, the rust
native language inherently possesses powerful performance capabilities. rollup
has decided to switch from the javascript
-side acorn
parser to the rust
-side swc
parser, which has the ability to efficiently parse complex ast
. This serves as a core change in rollup v4
.
Challenges
Native Interaction
Directly using swc
's javascript
reference and parsing complex ast
through the swc.parse
javascript
interface would incur significant communication overhead.
import swc from '@swc/core';
const code = `
const a = 1;
function add(a, b) {
return a + b;
}
`;
swc
.parse(code, {
syntax: 'ecmascript',
comments: false,
script: true,
target: 'es3',
isModule: false
})
.then(module => {
module.type; // file type
module.body; // AST
});
Through swc
's source code, it can be found that swc
internally uses the serde_json
library to serialize the parsed program
object into a JSON
string, which is then passed to the javascript
side.
#[napi]
impl Task for ParseTask {
type JsValue = String;
type Output = String;
fn compute(&mut self) -> napi::Result<Self::Output> {
let options: ParseOptions = deserialize_json(&self.options)?;
let fm = self
.c
.cm
.new_source_file(self.filename.clone().into(), self.src.clone());
let comments = if options.comments {
Some(self.c.comments() as &dyn Comments)
} else {
None
};
let program = try_with(self.c.cm.clone(), false, ErrorFormat::Normal, |handler| {
let mut p = self.c.parse_js(
fm,
handler,
options.target,
options.syntax,
options.is_module,
comments,
)?;
p.visit_mut_with(&mut resolver(
Mark::new(),
Mark::new(),
options.syntax.typescript(),
));
Ok(p)
})
.convert_err()?;
let ast_json = serde_json::to_string(&program)?;
Ok(ast_json)
}
fn resolve(&mut self, _env: Env, result: Self::Output) -> napi::Result<Self::JsValue> {
Ok(result)
}
}
The javascript
interface side then deserializes the ast
string returned by the native parser into a javascript
object through JSON.parse
.
class Compiler {
async parse(
src: string,
options?: ParseOptions,
filename?: string
): Promise<Program> {
options = options || { syntax: 'ecmascript' };
options.syntax = options.syntax || 'ecmascript';
if (!bindings && !!fallbackBindings) {
throw new Error(
'Fallback bindings does not support this interface yet.'
);
} else if (!bindings) {
throw new Error('Bindings not found.');
}
if (bindings) {
const res = await bindings.parse(src, toBuffer(options), filename);
return JSON.parse(res);
} else if (fallbackBindings) {
return fallbackBindings.parse(src, options);
}
throw new Error('Bindings not found.');
}
}
Between rust
and javascript
, repeatedly serializing (rust side) and deserializing (javascript side) the ast
would almost completely erode the performance advantage of switching to the native parser (rust
) when parsing complex ast
.
Ast Compatibility
Even with the estree compat
module, swc
still produces babel ast
, not estree ast
. However, rollup
depends on standard estree ast
.
File Encoding
swc
uses utf-8
encoding, while rollup
depends on standard javascript
's utf-16
encoding.
utf-8
and utf-16
are two different character encoding methods used to represent characters in text. Their main differences lie in the number of bytes used per character and the encoding method.
Differences between utf-8 and utf-16
utf-8
:
Variable Length Encoding:
utf-8
uses 1 ~ 4
bytes to represent a character. ascii
characters (such as English letters and numbers) use 1
byte, while other characters (such as Chinese characters) may use 2 ~ 4
bytes.
1 byte
:ascii
characters (U+0000
toU+007F
).2 bytes
: Extended Latin characters (U+0080
toU+07FF
).3 bytes
: Basic Multilingual Plane (BMP) characters (U+0800
toU+FFFF
).4 bytes
: Supplementary Plane characters (U+10000
toU+10FFFF
).
Backward Compatible with ascii:
Since ascii
characters only occupy 1
byte in utf-8
, utf-8
is fully compatible with ascii
encoding.
Encoding Efficiency:
- High efficiency for English and ASCII text (1 byte per character).
- For non-Latin characters (such as Chinese, Japanese, etc.), typically requires 3 bytes.
- For supplementary plane characters (such as emojis), requires 4 bytes.
Use Cases:
- More suitable for network transmission and storage, especially for text primarily in
ascii
. - Commonly used in web pages,
json
files, and other scenarios.
utf-16
:
Fixed or Variable Length Encoding:
utf-16
typically uses 2
bytes to represent most commonly used characters, but for certain special characters (such as emojis), it may require 4
bytes.
2 bytes
: Characters within theBMP
range (U+0000
toU+FFFF
, excluding surrogate pairs).4 bytes
: Characters beyond theBMP
(U+10000
toU+10FFFF
), using two16
-bit units (called surrogate pairs).
Not Compatible with ascii:
UTF-16
is not compatible with ascii
because ascii
characters require 2
bytes in UTF-16
. However, both utf-8
and utf-16
can treat each character of ascii
as one unit.
Encoding Efficiency:
- High efficiency for characters within the BMP range (such as most Chinese, Japanese) (2 bytes per character).
- Low efficiency for ASCII characters (2 bytes per character).
- Similar efficiency to UTF-8 for supplementary plane characters (requires 4 bytes).
Use Cases:
- More suitable for memory operations, especially in scenarios primarily using
BMP
range characters (such as Chinese environments). - Commonly used in internal character representation for
windows
,javascript
, andjava
.
Example Assumption:
For the string A你
, the encoding results are as follows.
UTF-8 Encoding:
"A": 1 byte, encoded as 0x41
"你": 3 bytes, encoded as 0xE4BDA0
UTF-16 Encoding:
"A": 2 bytes, encoded as 0x0041
"你": 2 bytes, encoded as 0x4F60
Character positions in utf-8 are byte-based, while in utf-16 they are based on 2-byte units.
Summary:
Feature | utf-8 | utf-16 |
---|---|---|
Encoding Length | 1-4 bytes | 2 or 4 bytes |
ascii Compatibility | Compatible | Incompatible |
ASCII Text Efficiency | High (1 byte/char) | Low (2 bytes/char) |
Non-Latin Text Efficiency | Lower (3 bytes/char) | Higher (2 bytes/char) |
Byte Order Issues | No concern | Needs BOM mark |
Use Cases | Network protocols, file storage | Memory operations, large text processing |
When processing text, the choice between utf-8 and utf-16 affects file size and character position calculations. This impacts the determination of character positions in the ast
. Consider the following example:
const info = '你好';
The ast
parsed through babel ast
and estree ast
specifications will differ in character positions.
{
"type": "Module",
"span": {
"start": 0,
"end": 19,
"ctxt": 0
},
"body": [
{
"type": "VariableDeclaration",
"span": {
"start": 0,
"end": 19,
"ctxt": 0
},
"kind": "const",
"declare": false,
"declarations": [
{
"type": "VariableDeclarator",
"span": {
"start": 6,
"end": 18,
"ctxt": 0
},
"id": {
"type": "Identifier",
"span": {
"start": 6,
"end": 7,
"ctxt": 0
},
"value": "a",
"optional": false,
"typeAnnotation": null
},
"init": {
"type": "StringLiteral",
"span": {
"start": 10,
"end": 18,
"ctxt": 0
},
"value": "你好",
"hasEscape": false,
"kind": {
"type": "normal",
"containsQuote": true
}
},
"definite": false
}
]
}
],
"interpreter": null
}
{
"type": "Program",
"start": 0,
"end": 15,
"body": [
{
"type": "VariableDeclaration",
"start": 0,
"end": 15,
"declarations": [
{
"type": "VariableDeclarator",
"start": 6,
"end": 14,
"id": {
"type": "Identifier",
"start": 6,
"end": 7,
"name": "a"
},
"init": {
"type": "Literal",
"start": 10,
"end": 14,
"value": "你好",
"raw": "\"你好\""
}
}
],
"kind": "const"
}
],
"sourceType": "module"
}
It can be observed that the two different specification ast
s handle special characters differently due to different encoding methods, resulting in differences in the parsed ast node
positions. The babel ast tree
parses the utf-8
encoded 你好 literal with an ast node
position range of [10, 18)
, while the estree ast tree
parses the utf-16
encoded literal with an ast node
position range of [10, 14)
.
The source map chapter details how rollup
internally generates sourcemap
, where rollup
relies on the position information provided by estree ast
for mapping markers.
export class NodeBase extends ExpressionEntity implements ExpressionNode {
/**
* Override to perform special initialisation steps after the scope is
* initialised
*/
initialise(): void {
this.scope.context.magicString.addSourcemapLocation(this.start);
this.scope.context.magicString.addSourcemapLocation(this.end);
}
}
Therefore, different encodings will cause serious offset in the sourcemap
generated by rollup
.
Performance
Optimize Ast Compatibility
In rust
side, by leveraging swc
's ability to parse code into babel ast
use swc_compiler_base::parse_js;
pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
GLOBALS.set(&Globals::default(), || {
let result = catch_unwind(AssertUnwindSafe(|| {
let result = try_with_handler(&code_reference, |handler| {
parse_js(
cm,
file,
handler,
target,
syntax,
IsModule::Unknown,
Some(&comments),
)
});
match result {
Err(buffer) => buffer,
Ok(program) => {
let annotations = comments.take_annotations();
let converter = AstConverter::new(&code_reference, &annotations);
converter.convert_ast_to_buffer(&program)
}
}
}));
});
}
Through converter.convert_ast_to_buffer(&program)
method recursively parsing the babel ast
tree parsed by swc
, recalculating the estree ast node
position information corresponding to the babel ast node
position information
/// Converts the given UTF-8 byte index to a UTF-16 byte index.
///
/// To be performant, this method assumes that the given index is not smaller
/// than the previous index. Additionally, it handles "annotations" like
/// `@__PURE__` comments in the process.
///
/// The logic for those comments is as follows:
/// - If the current index is at the start of an annotation, the annotation
/// is collected and the index is advanced to the end of the annotation.
/// - Otherwise, we check if the next character is a white-space character.
/// If not, we invalidate all collected annotations.
/// This is to ensure that we only collect annotations that directly precede
/// an expression and are not e.g. separated by a comma.
/// - If annotations are relevant for an expression, it can "take" the
/// collected annotations by calling `take_collected_annotations`. This
/// clears the internal buffer and returns the collected annotations.
/// - Invalidated annotations are attached to the Program node so that they
/// can all be removed from the source code later.
/// - If an annotation can influence a child that is separated by some
/// non-whitespace from the annotation, `keep_annotations_for_next` will
/// prevent annotations from being invalidated when the next position is
/// converted.
pub(crate) fn convert(&mut self, utf8_index: u32, keep_annotations_for_next: bool) -> u32 {
if self.current_utf8_index > utf8_index {
panic!(
"Cannot convert positions backwards: {} < {}",
utf8_index, self.current_utf8_index
);
}
while self.current_utf8_index < utf8_index {
if self.current_utf8_index == self.next_annotation_start {
let start = self.current_utf16_index;
let (next_comment_end, next_comment_kind) = self
.next_annotation
.map(|a| (a.comment.span.hi.0 - 1, a.kind.clone()))
.unwrap();
while self.current_utf8_index < next_comment_end {
let character = self.character_iterator.next().unwrap();
self.current_utf8_index += character.len_utf8() as u32;
self.current_utf16_index += character.len_utf16() as u32;
}
if let Annotation(kind) = next_comment_kind {
self.collected_annotations.push(ConvertedAnnotation {
start,
end: self.current_utf16_index,
kind,
});
}
self.next_annotation = self.annotation_iterator.next();
self.next_annotation_start = get_annotation_start(self.next_annotation);
} else {
let character = self.character_iterator.next().unwrap();
if !(self.keep_annotations || self.collected_annotations.is_empty()) {
match character {
' ' | '\t' | '\r' | '\n' => {}
_ => {
self.invalidate_collected_annotations();
}
}
}
self.current_utf8_index += character.len_utf8() as u32;
self.current_utf16_index += character.len_utf16() as u32;
}
}
self.keep_annotations = keep_annotations_for_next;
self.current_utf16_index
}
At the same time, rollup
will convert the babel ast
parsed by swc
into compatible estree ast
binary format
in rust
side, and then pass it as (array) buffer to javascript
.
pub(crate) fn convert_statement(&mut self, statement: &Stmt) {
match statement {
Stmt::Break(break_statement) => self.store_break_statement(break_statement),
Stmt::Block(block_statement) => self.store_block_statement(block_statement, false),
Stmt::Continue(continue_statement) => self.store_continue_statement(continue_statement),
Stmt::Decl(declaration) => self.convert_declaration(declaration),
Stmt::Debugger(debugger_statement) => self.store_debugger_statement(debugger_statement),
Stmt::DoWhile(do_while_statement) => self.store_do_while_statement(do_while_statement),
Stmt::Empty(empty_statement) => self.store_empty_statement(empty_statement),
Stmt::Expr(expression_statement) => self.store_expression_statement(expression_statement),
Stmt::For(for_statement) => self.store_for_statement(for_statement),
Stmt::ForIn(for_in_statement) => self.store_for_in_statement(for_in_statement),
Stmt::ForOf(for_of_statement) => self.store_for_of_statement(for_of_statement),
Stmt::If(if_statement) => self.store_if_statement(if_statement),
Stmt::Labeled(labeled_statement) => self.store_labeled_statement(labeled_statement),
Stmt::Return(return_statement) => self.store_return_statement(return_statement),
Stmt::Switch(switch_statement) => self.store_switch_statement(switch_statement),
Stmt::Throw(throw_statement) => self.store_throw_statement(throw_statement),
Stmt::Try(try_statement) => self.store_try_statement(try_statement),
Stmt::While(while_statement) => self.store_while_statement(while_statement),
Stmt::With(_) => unimplemented!("Cannot convert Stmt::With"),
}
}
Extract information required for estree ast node
from the structure of babel ast node
, and recalculate the position information under the estree ast
specification using utf-16
encoding.
pub(crate) fn convert_item_list_with_state<T, S, F>(
&mut self,
item_list: &[T],
state: &mut S,
reference_position: usize,
convert_item: F,
) where
F: Fn(&mut AstConverter, &T, &mut S) -> bool,
{
// for an empty list, we leave the referenced position at zero
if item_list.is_empty() {
return;
}
self.update_reference_position(reference_position);
// store number of items in first position
self
.buffer
.extend_from_slice(&(item_list.len() as u32).to_ne_bytes());
let mut reference_position = self.buffer.len();
// make room for the reference positions of the items
self
.buffer
.resize(self.buffer.len() + item_list.len() * 4, 0);
for item in item_list {
let insert_position = (self.buffer.len() as u32) >> 2;
if convert_item(self, item, state) {
self.buffer[reference_position..reference_position + 4]
.copy_from_slice(&insert_position.to_ne_bytes());
}
reference_position += 4;
}
}
Of course, it will also collect comments
nodes, preparing for rollup
's tree shaking
later. Note that the comments
node is included in the babel ast
specification, but not in the estree ast
specification. However, the information of the comments
node is crucial for rollup
's tree shaking
, which can enhance the ability of tree shaking
.
rollup
will collect these comment information in estree ast
and store it through the _rollupAnnotations
property. In other words, the final returned estree ast
is compatible with the estree ast
structure and contains the _rollupAnnotations
property.
pub(crate) fn take_collected_annotations(
&mut self,
kind: AnnotationKind,
) -> Vec<ConvertedAnnotation> {
let mut relevant_annotations = Vec::new();
for annotation in self.collected_annotations.drain(..) {
if annotation.kind == kind {
relevant_annotations.push(annotation);
} else {
self.invalid_annotations.push(annotation);
}
}
relevant_annotations
}
impl<'a> AstConverter<'a> {
pub(crate) fn store_call_expression(
&mut self,
span: &Span,
is_optional: bool,
callee: &StoredCallee,
arguments: &[ExprOrSpread],
is_chained: bool,
) {
// annotations
let annotations = self
.index_converter
.take_collected_annotations(AnnotationKind::Pure);
}
impl SequentialComments {
pub(crate) fn add_comment(&self, comment: Comment) {
if comment.text.starts_with('#') && comment.text.contains("sourceMappingURL=") {
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Annotation(AnnotationKind::SourceMappingUrl),
});
return;
}
let mut search_position = comment
.text
.chars()
.nth(0)
.map(|first_char| first_char.len_utf8())
.unwrap_or(0);
while let Some(Some(match_position)) = comment.text.get(search_position..).map(|s| s.find("__"))
{
search_position += match_position;
// Using a byte reference avoids UTF8 character boundary checks
match &comment.text.as_bytes()[search_position - 1] {
b'@' | b'#' => {
let annotation_slice = &comment.text[search_position..];
if annotation_slice.starts_with("__PURE__") {
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Annotation(AnnotationKind::Pure),
});
return;
}
if annotation_slice.starts_with("__NO_SIDE_EFFECTS__") {
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Annotation(AnnotationKind::NoSideEffects),
});
return;
}
}
_ => {}
}
search_position += 2;
}
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Comment,
});
}
pub(crate) fn take_annotations(self) -> Vec<AnnotationWithType> {
self.annotations.take()
}
}
Finally, the returned arraybuffer
structure compatible with estree ast
is passed to the rollup
side, and the rollup
side needs to guide the parsing of the arraybuffer
compatible with estree ast
structure to instantiate the ast class node
implemented internally by rollup
.
export default class Module {
async setSource({
ast,
code,
customTransformCache,
originalCode,
originalSourcemap,
resolvedIds,
sourcemapChain,
transformDependencies,
transformFiles,
...moduleOptions
}: TransformModuleJSON & {
resolvedIds?: ResolvedIdMap;
transformFiles?: EmittedFile[] | undefined;
}): Promise<void> {
// Measuring asynchronous code does not provide reasonable results
timeEnd('generate ast', 3);
const astBuffer = await parseAsync(
code,
false,
this.options.jsx !== false
);
timeStart('generate ast', 3);
this.ast = convertProgram(astBuffer, programParent, this.scope);
}
}
rollup
's guidance on buffer
level
function convertNode(
parent: Node | { context: AstContext; type: string },
parentScope: ChildScope,
position: number,
buffer: AstBuffer
): any {
const nodeType = buffer[position];
const NodeConstructor = nodeConstructors[nodeType];
/* istanbul ignore if: This should never be executed but is a safeguard against faulty buffers */
if (!NodeConstructor) {
console.trace();
throw new Error(`Unknown node type: ${nodeType}`);
}
const node = new NodeConstructor(parent, parentScope);
node.type = nodeTypeStrings[nodeType];
node.start = buffer[position + 1];
node.end = buffer[position + 2];
bufferParsers[nodeType](node, position + 3, buffer);
node.initialise();
return node;
}
Optimize Native Interaction
As mentioned above, directly using the javascript
reference exposed by swc
will repeatedly serialize
and deserialize
ast
between rust
and javascript
. When processing complex ast
, the parsing efficiency almost erodes the performance advantage of switching to the native parser (rust
). The solution is as follows:
Use
arraybuffer
to transfer the parsedast
betweenrust
andjavascript
.
Do not consider using the swc
's javascript
reference, but directly use the swc
's rust
reference in rust
.
use swc_compiler_base::parse_js;
pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
GLOBALS.set(&Globals::default(), || {
let result = catch_unwind(AssertUnwindSafe(|| {
let result = try_with_handler(&code_reference, |handler| {
parse_js(
cm,
file,
handler,
target,
syntax,
IsModule::Unknown,
Some(&comments),
)
});
match result {
Err(buffer) => buffer,
Ok(program) => {
let annotations = comments.take_annotations();
let converter = AstConverter::new(&code_reference, &annotations);
converter.convert_ast_to_buffer(&program)
}
}
}));
});
}
At the same time, rollup
will convert the swc
parsed babel ast
to the compatible estree ast
binary format
in rust
, and then pass it as (array) buffer to javascript
.
match result {
Err(buffer) => buffer,
Ok(program) => {
let annotations = comments.take_annotations();
let converter = AstConverter::new(&code_reference, &annotations);
converter.convert_ast_to_buffer(&program)
}
}
Passing arraybuffer
is basically a lossless operation, so we only need to teach javascript
side how to operate arraybuffer
. In addition, the size of arraybuffer
is only about one-third of the serialized json
. Finally, this will allow us to easily pass arraybuffer
data format ast
to different threads, such as parsing in WebWorker
can be completed and then pass the arraybuffer
data format ast
to the main thread without loss.
In nodejs
side, using napi-rs
to interact with rust
code, and wasm-pack
in browser side for building.
Optimize Semantic Analysis
Parser
Semantic Analysis Design
rust
side directly calling swc
's use swc_compiler_base::parse_js
will not execute semantic analysis, only handle lexical analysis and syntax analysis. That is, the following code can be parsed normally in swc
const a = 1;
const a = 2;
This is different from acorn
, which performs partial
early errors
in syntax analysis and semantic analysis when generating ast
.
The reason is that acorn
is designed as a parser that conforms to the ECMAScript
specification. Before the javascript
engine executes the code, it requires the execution of Static Semantics: Early Errors
steps (essentially static semantic analysis), which are checks and reports that need to be completed in the parsing and early syntax analysis stage. These errors are checked statically, which means they do not need to be executed to be found.
browsers, nodejs and other built-in
javascript
engines also executeStatic Semantics: Early Errors
steps before executing the code.
The significance of the specification is:
- Early Detection of Issues: It can find potential issues before the code is actually executed, avoiding issues that may appear at runtime.
- Performance Improvement: Since these checks are completed in the static analysis stage, they can improve code execution efficiency.
- Ensure Language Consistency: Through a unified early error check mechanism, ensure that
javascript
code can be processed consistently in different environments. - Help Developers Write Better Code: These rules also guide developers to follow better programming practices.
swc
, babel
and other parsers do not execute Static Semantics: Early Errors
steps when generating ast
, that is, they are designed differently from acorn
. Then let's first introduce why they separate syntax analysis and static semantic analysis.
Performance and Complexity Trade-off
Implementing early errors detection requires the parser to do the following:
- Simulate and maintain the execution context of the current execution statement.
- Static rule check.
- Detection of other static semantic rules defined in the language specification.
- Syntax restriction rule detection.
- Module system static verification rule detection.
Although the detection complexity is not high, in large projects, if the user needs to perform
early errors
check every time they translate new code, the cumulative complexity of the completeearly errors
check may bring some performance overhead, which cannot be ignored.Toolchain Division of Labor
swc
,babel
and other parsers' focus is on code conversion, mainly injected in the code conversion process of the build system in the form of plugins. If the tool wants to be strongly integrated into the ecosystem of various build systems, the easiest way is to maintain single responsibility principle.By separating parsing and semantic analysis:
- Parser can focus on generating accurate
ast
. - Semantic Analyzer can focus on checking code correctness.
- Each part is easier to maintain and optimize.
- Parser can focus on generating accurate
Flexibility
In the complex application module translation process, it is usually not a one-time thing, but will exist in the intermediate state, and the intermediate code is largely not in compliance with semantic specifications. If the translation tool performs strict semantic analysis, such code cannot pass the compilation and affect the ability to extend. Modern development toolchain distributes different checks to different stages through different semantic analysis, on-demand execution, and balances development flexibility and code quality.
babel
, swc
choose to separate the responsibilities of syntax analysis and early errors
detection, and in the plugin translation code stage, the code is parsed into ast
for lexical analysis and syntax analysis, without executing early errors
check(static semantic analysis), and in the suitable time (such as rollup
's transform
stage) by bundlers
(such as rollup
) to control and execute early errors
check.
This design choice reflects an important principle in engineering practice: sometimes, breaking down a complex problem into multiple independent steps may be more effective than trying to solve everything in one step. This allows each tool to focus on its core task, thereby providing better functionality and performance.
rollup
plugin system design inspiration
The above design approach also has some reflection in the rollup
plugin system, when the user plugin returns ast
in the load
(or transform
) hook, then rollup
will reuse the ast
returned by the user plugin in the subsequent transform
hook. Before rollup
completes the transform
stage, rollup
will not perform any semantic analysis on the reused ast
.
const a = 1;
const a = 2;
For the above example, acorn
will provide the following error message
while (this.type !== tt.braceR) {
const element = this.parseClassElement(node.superClass !== null);
if (element) {
classBody.body.push(element);
if (
element.type === 'MethodDefinition' &&
element.kind === 'constructor'
) {
if (hadConstructor)
this.raiseRecoverable(
element.start,
'Duplicate constructor in the same class'
);
hadConstructor = true;
} else if (
element.key &&
element.key.type === 'PrivateIdentifier' &&
isPrivateNameConflicted(privateNameMap, element)
) {
this.raiseRecoverable(
element.key.start,
`Identifier '#${element.key.name}' has already been declared`
);
}
}
}
Error Prompt
Line 2: Identifier 'a' has already been declared.
Therefore, rollup
needs to leverage swc_ecma_lints
capabilities to achieve more complete semantic analysis.
use swc_ecma_lints::{rule::Rule, rules, rules::LintParams};
let result = HANDLER.set(&handler, || op(&handler));
match result {
Ok(mut program) => {
let unresolved_mark = Mark::new();
let top_level_mark = Mark::new();
let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark);
let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark);
program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false));
let mut rules = rules::all(LintParams {
program: &program,
lint_config: &Default::default(),
unresolved_ctxt,
top_level_ctxt,
es_version,
source_map: cm.clone(),
});
HANDLER.set(&handler, || match &program {
Program::Module(m) => {
rules.lint_module(m);
}
Program::Script(s) => {
rules.lint_script(s);
}
});
if handler.has_errors() {
let buffer = create_error_buffer(&wr, code);
Err(buffer)
} else {
Ok(program)
}
}
}
Implement Semantic Analysis On JavaScript Side
However, from the following PR
and discussion
it can be known
After testing, it was found that the efficiency of swc_ecma_lints
detection was not high.
In order to optimize this problem, in the rollup
native parser, it was temporarily decided to remove the complete semantic analysis in rust
side before the scope analysis is implemented in rust
side.
let result = HANDLER.set(&handler, || op(&handler));
match result {
Ok(mut program) => {
let unresolved_mark = Mark::new();
let top_level_mark = Mark::new();
let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark);
let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark);
program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false));
let mut rules = rules::all(LintParams {
program: &program,
lint_config: &Default::default(),
unresolved_ctxt,
top_level_ctxt,
es_version,
source_map: cm.clone(),
});
HANDLER.set(&handler, || match &program {
Program::Module(m) => {
rules.lint_module(m);
}
Program::Script(s) => {
rules.lint_script(s);
}
});
if handler.has_errors() {
let buffer = create_error_buffer(&wr, code);
Err(buffer)
} else {
Ok(program)
}
}
}
result.map_err(|_| {
if handler.has_errors() {
create_error_buffer(&wr, code)
} else {
panic!("Unexpected error in parse")
}
})
The semantic analysis task is handed over to javascript
side.
rollup
will perform more complete semantic analysis when backtracking to instance ast class node
. After testing, it was found that semantic analysis in javascript
side did not have a significant impact on rollup
's performance.
Semantic Analysis Detection Point
The main tasks of semantic analysis include the following:
const_assign
Example:
tsexport function logConstVariableReassignError() { return { code: CONST_REASSIGN, message: 'Cannot reassign a variable declared with `const`' }; }
ts// case const x = 1; x = 'string'; // implementation export default class AssignmentExpression extends NodeBase { initialise(): void { super.initialise(); if (this.left instanceof Identifier) { const variable = this.scope.variables.get(this.left.name); if (variable?.kind === 'const') { this.scope.context.error( logConstVariableReassignError(), this.left.start ); } } this.left.setAssignedValue(this.right); } }
duplicate_bindings
tsexport function logRedeclarationError(name: string): RollupLog { return { code: REDECLARATION_ERROR, message: `Identifier "${name}" has already been declared` }; }
ts// case import { x } from './b'; const x = 1; // case2 import { x } from './b'; import { x } from './b'; // implementation export default class Module { private addImport(node: ImportDeclaration): void { const source = node.source.value; this.addSource(source, node); for (const specifier of node.specifiers) { const localName = specifier.local.name; if ( this.scope.variables.has(localName) || this.importDescriptions.has(localName) ) { this.error( logRedeclarationError(localName), specifier.local.start ); } const name = specifier instanceof ImportDefaultSpecifier ? 'default' : specifier instanceof ImportNamespaceSpecifier ? '*' : specifier.imported instanceof Identifier ? specifier.imported.name : specifier.imported.value; this.importDescriptions.set(localName, { module: null as never, // filled in later name, source, start: specifier.start }); } } }
ts// case { const a = 1; const a = 1; } // implementation export default class BlockScope extends ChildScope { addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { if (kind === 'var') { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable | undefined); if (existingVariable) { if ( existingVariable.kind === 'var' || (kind === 'var' && existingVariable.kind === 'parameter') ) { existingVariable.addDeclaration(identifier, init); return existingVariable; } return context.error( logRedeclarationError(name), identifier.start ); } const declaredVariable = this.parent.addDeclaration( identifier, context, init, destructuredInitPath, kind ); // Necessary to make sure the init is deoptimized for conditional declarations. // We cannot call deoptimizePath here. declaredVariable.markInitializersForDeoptimization(); // We add the variable to this and all parent scopes to reliably detect conflicts this.addHoistedVariable(name, declaredVariable); return declaredVariable; } return super.addDeclaration( identifier, context, init, destructuredInitPath, kind ); } }
ts// case try { } catch (e) { const a = 1; const a = 2; } // implementation export default class CatchBodyScope extends ChildScope { addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { if (kind === 'var') { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable | undefined); if (existingVariable) { const existingKind = existingVariable.kind; if ( existingKind === 'parameter' && // If this is a destructured parameter, it is forbidden to redeclare existingVariable.declarations[0].parent.type === NodeType.CatchClause ) { // If this is a var with the same name as the catch scope parameter, // the assignment actually goes to the parameter and the var is // hoisted without assignment. Locally, it is shadowed by the // parameter const declaredVariable = this.parent.parent.addDeclaration( identifier, context, UNDEFINED_EXPRESSION, destructuredInitPath, kind ); // To avoid the need to rewrite the declaration, we link the variable // names. If we ever implement a logic that splits initialization and // assignment for hoisted vars, the "renderLikeHoisted" logic can be // removed again. // We do not need to check whether there already is a linked // variable because then declaredVariable would be that linked // variable. existingVariable.renderLikeHoisted(declaredVariable); this.addHoistedVariable(name, declaredVariable); return declaredVariable; } if (existingKind === 'var') { existingVariable.addDeclaration(identifier, init); return existingVariable; } return context.error( logRedeclarationError(name), identifier.start ); } } } }
ts// case function fn() { const a = 1; const a = 2; } // implementation export default class FunctionBodyScope extends ChildScope { // There is stuff that is only allowed in function scopes, i.e. functions can // be redeclared, functions and var can redeclare each other addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable); if (existingVariable) { const existingKind = existingVariable.kind; if ( (kind === 'var' || kind === 'function') && (existingKind === 'var' || existingKind === 'function' || existingKind === 'parameter') ) { existingVariable.addDeclaration(identifier, init); return existingVariable; } context.error(logRedeclarationError(name), identifier.start); } const newVariable = new LocalVariable( identifier.name, identifier, init, destructuredInitPath, context, kind ); this.variables.set(name, newVariable); return newVariable; } }
ts// case1 import { a } from './b'; const a = 1; // case2 import { a } from './b'; import { a } from './b'; // implementation export default class ModuleScope extends ChildScope { addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { if (this.context.module.importDescriptions.has(identifier.name)) { context.error( logRedeclarationError(identifier.name), identifier.start ); } return super.addDeclaration( identifier, context, init, destructuredInitPath, kind ); } }
ts// case const a = 1; const a = 2; export default class Scope { /* Redeclaration rules: - var can redeclare var - in function scopes, function and var can redeclare function and var - var is hoisted across scopes, function remains in the scope it is declared - var and function can redeclare function parameters, but parameters cannot redeclare parameters - function cannot redeclare catch scope parameters - var can redeclare catch scope parameters in a way - if the parameter is an identifier and not a pattern - then the variable is still declared in the hoisted outer scope, but the initializer is assigned to the parameter - const, let, class, and function except in the cases above cannot redeclare anything */ addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable); if (existingVariable) { if (kind === 'var' && existingVariable.kind === 'var') { existingVariable.addDeclaration(identifier, init); return existingVariable; } context.error(logRedeclarationError(name), identifier.start); } const newVariable = new LocalVariable( identifier.name, identifier, init, destructuredInitPath, context, kind ); this.variables.set(name, newVariable); return newVariable; } }
duplicate_exports
tsexport function logDuplicateExportError(name: string): RollupLog { return { code: DUPLICATE_EXPORT, message: `Duplicate export "${name}"` }; } export default class Module { private assertUniqueExportName(name: string, nodeStart: number) { if (this.exports.has(name) || this.reexportDescriptions.has(name)) { this.error(logDuplicateExportError(name), nodeStart); } } }
ts// case export default 1; export default 2; // implementation export default class Module { private addExport( node: | ExportAllDeclaration | ExportNamedDeclaration | ExportDefaultDeclaration ): void { if (node instanceof ExportDefaultDeclaration) { // export default foo; this.assertUniqueExportName('default', node.start); this.exports.set('default', { identifier: node.variable.getAssignedVariableName(), localName: 'default' }); } } }
ts// case export * as a from './b'; export * as a from './b'; // implementation export default class Module { private addExport( node: ExportAllDeclaration | ExportNamedDeclaration ): void { if (node instanceof ExportAllDeclaration) { const source = node.source.value; this.addSource(source, node); if (node.exported) { // export * as name from './other' const name = node.exported instanceof Literal ? node.exported.value : node.exported.name; this.assertUniqueExportName(name, node.exported.start); this.reexportDescriptions.set(name, { localName: '*', module: null as never, // filled in later, source, start: node.start }); } else { // export * from './other' this.exportAllSources.add(source); } } } }
ts// case export { a } from './b'; export { a } from './b'; // implementation export default class Module { private addExport( node: ExportAllDeclaration | ExportNamedDeclaration ): void { if (node.source instanceof Literal) { // export { name } from './other' const source = node.source.value; this.addSource(source, node); for (const { exported, local, start } of node.specifiers) { const name = exported instanceof Literal ? exported.value : exported.name; this.assertUniqueExportName(name, start); this.reexportDescriptions.set(name, { localName: local instanceof Literal ? local.value : local.name, module: null as never, // filled in later, source, start }); } } } }
ts// case1 export const a = 1; export const a = 2; // case2 export function a() {} export function a() {} // case3 export { a, a }; // implementation export default class Module { private addExport(node: ExportNamedDeclaration): void { if (node.declaration) { const declaration = node.declaration; if (declaration instanceof VariableDeclaration) { // export var { foo, bar } = ... // export var foo = 1, bar = 2; for (const declarator of declaration.declarations) { for (const localName of extractAssignedNames(declarator.id)) { this.assertUniqueExportName(localName, declarator.id.start); this.exports.set(localName, { identifier: null, localName }); } } } else { // export function foo () {} const localName = (declaration.id as Identifier).name; this.assertUniqueExportName(localName, declaration.id!.start); this.exports.set(localName, { identifier: null, localName }); } } } }
no_dupe_args
tsexport function logDuplicateArgumentNameError(name: string): RollupLog { return { code: DUPLICATE_ARGUMENT_NAME, message: `Duplicate argument name "${name}"` }; }
ts// case function fn(a, a) {} // implementation export default class ParameterScope extends ChildScope { /** * Adds a parameter to this scope. Parameters must be added in the correct * order, i.e. from left to right. */ addParameterDeclaration( identifier: Identifier, argumentPath: ObjectPath ): ParameterVariable { const { name, start } = identifier; const existingParameter = this.variables.get(name); if (existingParameter) { return this.context.error( logDuplicateArgumentNameError(name), start ); } const variable = new ParameterVariable( name, identifier, argumentPath, this.context ); this.variables.set(name, variable); // We also add it to the body scope to detect name conflicts with local // variables. We still need the intermediate scope, though, as parameter // defaults are NOT taken from the body scope but from the parameters or // outside scope. this.bodyScope.addHoistedVariable(name, variable); return variable; } }
From the above implementation, it can be seen that semantic analysis is heavily dependent on the current ast node
execution context and scope information. Of course, the above semantic analysis is the most basic, rollup
will also perform some other semantic analysis, such as side effect analysis, module loop dependency analysis, strict syntax restrictions (such as namespace object cannot be called, imported references cannot be reassigned, etc.) semantic analysis, etc., which are impossible for acorn
.
Since the internal implementation of swc_ecma_lints
may have performance issues, this is a temporary solution, and rollup
will add scope analysis in rust
side later, and then hand over the complete semantic analysis task to rust
side. At that time, the complete semantic analysis task will be handed over to rust
side for processing.
Optimize Ast Parsing
rollup
provides this.parser
for plugin context to allow user plugins to use native swc
capabilities to parse code
into ast
. User plugins can return parsed ast
in load
and transform
hooks, and rollup
will reuse the parsed ast
returned by the user plugin.
If the user plugin does not parse ast
(i.e., the plugin does not return ast
in load
and transform
hooks), then the ast
will be handled as a fallback, and the ast
parsed from the translated code will be parsed as compat estree ast
in transform
stage completion, using native rust
capabilities.
precautions for using this.parser
Currently, rollup
has removed rust
side ast
semantic analysis. In other words, using rollup
provided this.parser
api to parse code
into ast
in the plugin context has not completed semantic analysis.
If the user plugin needs to generate a ast
that is compliant with semantic analysis, then the user plugin needs to use other tools to perform semantic analysis on the ast
.
If the user does not need to ensure that the generated ast
is compliant with semantic analysis, then rollup
will automatically perform semantic analysis when backtracking to recursively instance ast node
class.
Even with native parsing capabilities, generating complex ast
is still time-consuming. In watch
mode, rollup
will cache(see Rollup Incremental Build section for details) estree ast
to skip the native swc
parsing process of ast
, recursively instance estree ast
structure to instance rollup
internal ast class node
.
Performance Comparison
Tested the parsing capabilities of rollup
in 4.28.1
and 3.29.5
versions, where:
4.28.1
version uses native swc
to parse ast
, and rust
side passes compatible estree ast
to javascript
side through arraybuffer
format.
3.29.5
version uses acorn
to parse ast
.
Each group tested 5
times for average.
Code Length (Character) | SWC Parsing Time (ms) | Acorn Parsing Time (ms) |
---|---|---|
312.4K | 13.47 | 73.92 |
624.7K | 21.78 | 83.80 |
1.2M | 36.03 | 124.82 |
2.5M | 68.88 | 182.45 |
5.0M | 136.52 | 272.53 |
10.0M | 266.87 | 608.72 |
20.0M | 578.00 | 1178.82 |
159.9M | 4155.64 | 7276.24 |
319.9M | 10081.40 | - |
After testing, it was found that when the parsed character amount reached 319,869,952
, acorn
parsing ast
would report an error.
<--- Last few GCs --->
[69821:0x120078000] 15364 ms: Mark-sweep 4062.9 (4143.2) -> 4059.0 (4143.2) MB, 703.2 / 0.0 ms (average mu = 0.293, current mu = 0.102) allocation failure; scavenge might not succeed
[69821:0x120078000] 16770 ms: Mark-sweep 4075.3 (4143.2) -> 4071.5 (4169.0) MB, 1383.5 / 0.0 ms (average mu = 0.143, current mu = 0.016) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
From the test results, it can be seen that switching to the native parser has a significant performance advantage over acorn
.
Overall performance:
The average parsing time using the native parser (built-in
swc
) is relatively short, and the growth is relatively gentle with the increase in code length.The parsing time using the non-native parser (built-in
acorn
) grows significantly with large code, showing high performance overhead.
Data comparison:
- Small code amount (
312,373
characters): the gap is relatively obvious, about5.5
times (13.47
ms vs73.92
ms). - Medium code amount (
9,995,936
characters): the gap is about2.28
times (266.87
ms vs608.72
ms). - Large code amount (
159,934,976
characters): the gap is1.75
times (4155.64
ms vs7276.24
ms).
Module Character Quantity Concept
module Code Length (Character) rollup.js 312,373 - Small code amount (
Trend analysis:
- The parsing time growth using the native parser (built-in
swc
) is relatively small, suitable for larger module parsing needs. - The parsing time growth using the non-native parser (built-in
acorn
) is relatively large, and the parsing efficiency is significantly insufficient in large module parsing.
- The parsing time growth using the native parser (built-in