Native Parser
Reference Materials
Compared to javascript, the rust native language inherently possesses powerful performance capabilities. rollup has decided to switch from the javascript-side acorn parser to the rust-side swc parser, which has the ability to efficiently parse complex ast. This serves as a core change in rollup v4.
Challenges
Native Interaction
Directly using swc's javascript reference and parsing complex ast through the swc.parse javascript interface would incur significant communication overhead.
import swc from '@swc/core';
const code = `
const a = 1;
function add(a, b) {
return a + b;
}
`;
swc
.parse(code, {
syntax: 'ecmascript',
comments: false,
script: true,
target: 'es3',
isModule: false
})
.then(module => {
module.type; // file type
module.body; // AST
});Through swc's source code, it can be found that swc internally uses the serde_json library to serialize the parsed program object into a JSON string, which is then passed to the javascript side.
#[napi]
impl Task for ParseTask {
type JsValue = String;
type Output = String;
fn compute(&mut self) -> napi::Result<Self::Output> {
let options: ParseOptions = deserialize_json(&self.options)?;
let fm = self
.c
.cm
.new_source_file(self.filename.clone().into(), self.src.clone());
let comments = if options.comments {
Some(self.c.comments() as &dyn Comments)
} else {
None
};
let program = try_with(self.c.cm.clone(), false, ErrorFormat::Normal, |handler| {
let mut p = self.c.parse_js(
fm,
handler,
options.target,
options.syntax,
options.is_module,
comments,
)?;
p.visit_mut_with(&mut resolver(
Mark::new(),
Mark::new(),
options.syntax.typescript(),
));
Ok(p)
})
.convert_err()?;
let ast_json = serde_json::to_string(&program)?;
Ok(ast_json)
}
fn resolve(&mut self, _env: Env, result: Self::Output) -> napi::Result<Self::JsValue> {
Ok(result)
}
}The javascript interface side then deserializes the ast string returned by the native parser into a javascript object through JSON.parse.
class Compiler {
async parse(
src: string,
options?: ParseOptions,
filename?: string
): Promise<Program> {
options = options || { syntax: 'ecmascript' };
options.syntax = options.syntax || 'ecmascript';
if (!bindings && !!fallbackBindings) {
throw new Error(
'Fallback bindings does not support this interface yet.'
);
} else if (!bindings) {
throw new Error('Bindings not found.');
}
if (bindings) {
const res = await bindings.parse(src, toBuffer(options), filename);
return JSON.parse(res);
} else if (fallbackBindings) {
return fallbackBindings.parse(src, options);
}
throw new Error('Bindings not found.');
}
}Between rust and javascript, repeatedly serializing (rust side) and deserializing (javascript side) the ast would almost completely erode the performance advantage of switching to the native parser (rust) when parsing complex ast.
Ast Compatibility
Even with the estree compat module, swc still produces babel ast, not estree ast. However, rollup depends on standard estree ast.
File Encoding
swc uses utf-8 encoding, while rollup depends on standard javascript's utf-16 encoding.
utf-8 and utf-16 are two different character encoding methods used to represent characters in text. Their main differences lie in the number of bytes used per character and the encoding method.
Differences between utf-8 and utf-16
utf-8:
Variable Length Encoding:
utf-8 uses 1 ~ 4 bytes to represent a character. ascii characters (such as English letters and numbers) use 1 byte, while other characters (such as Chinese characters) may use 2 ~ 4 bytes.
1 byte:asciicharacters (U+0000toU+007F).2 bytes: Extended Latin characters (U+0080toU+07FF).3 bytes: Basic Multilingual Plane (BMP) characters (U+0800toU+FFFF).4 bytes: Supplementary Plane characters (U+10000toU+10FFFF).
Backward Compatible with ascii:
Since ascii characters only occupy 1 byte in utf-8, utf-8 is fully compatible with ascii encoding.
Encoding Efficiency:
- High efficiency for English and ASCII text (1 byte per character).
- For non-Latin characters (such as Chinese, Japanese, etc.), typically requires 3 bytes.
- For supplementary plane characters (such as emojis), requires 4 bytes.
Use Cases:
- More suitable for network transmission and storage, especially for text primarily in
ascii. - Commonly used in web pages,
jsonfiles, and other scenarios.
utf-16:
Fixed or Variable Length Encoding:
utf-16 typically uses 2 bytes to represent most commonly used characters, but for certain special characters (such as emojis), it may require 4 bytes.
2 bytes: Characters within theBMPrange (U+0000toU+FFFF, excluding surrogate pairs).4 bytes: Characters beyond theBMP(U+10000toU+10FFFF), using two16-bit units (called surrogate pairs).
Not Compatible with ascii:
UTF-16 is not compatible with ascii because ascii characters require 2 bytes in UTF-16. However, both utf-8 and utf-16 can treat each character of ascii as one unit.
Encoding Efficiency:
- High efficiency for characters within the BMP range (such as most Chinese, Japanese) (2 bytes per character).
- Low efficiency for ASCII characters (2 bytes per character).
- Similar efficiency to UTF-8 for supplementary plane characters (requires 4 bytes).
Use Cases:
- More suitable for memory operations, especially in scenarios primarily using
BMPrange characters (such as Chinese environments). - Commonly used in internal character representation for
windows,javascript, andjava.
Example Assumption:
For the string A你, the encoding results are as follows.
UTF-8 Encoding:
"A": 1 byte, encoded as 0x41
"你": 3 bytes, encoded as 0xE4BDA0
UTF-16 Encoding:
"A": 2 bytes, encoded as 0x0041
"你": 2 bytes, encoded as 0x4F60
Character positions in utf-8 are byte-based, while in utf-16 they are based on 2-byte units.
Summary:
| Feature | utf-8 | utf-16 |
|---|---|---|
| Encoding Length | 1-4 bytes | 2 or 4 bytes |
| ascii Compatibility | Compatible | Incompatible |
| ASCII Text Efficiency | High (1 byte/char) | Low (2 bytes/char) |
| Non-Latin Text Efficiency | Lower (3 bytes/char) | Higher (2 bytes/char) |
| Byte Order Issues | No concern | Needs BOM mark |
| Use Cases | Network protocols, file storage | Memory operations, large text processing |
When processing text, the choice between utf-8 and utf-16 affects file size and character position calculations. This impacts the determination of character positions in the ast. Consider the following example:
const info = '你好';The ast parsed through babel ast and estree ast specifications will differ in character positions.
{
"type": "Module",
"span": {
"start": 0,
"end": 19,
"ctxt": 0
},
"body": [
{
"type": "VariableDeclaration",
"span": {
"start": 0,
"end": 19,
"ctxt": 0
},
"kind": "const",
"declare": false,
"declarations": [
{
"type": "VariableDeclarator",
"span": {
"start": 6,
"end": 18,
"ctxt": 0
},
"id": {
"type": "Identifier",
"span": {
"start": 6,
"end": 7,
"ctxt": 0
},
"value": "a",
"optional": false,
"typeAnnotation": null
},
"init": {
"type": "StringLiteral",
"span": {
"start": 10,
"end": 18,
"ctxt": 0
},
"value": "你好",
"hasEscape": false,
"kind": {
"type": "normal",
"containsQuote": true
}
},
"definite": false
}
]
}
],
"interpreter": null
}{
"type": "Program",
"start": 0,
"end": 15,
"body": [
{
"type": "VariableDeclaration",
"start": 0,
"end": 15,
"declarations": [
{
"type": "VariableDeclarator",
"start": 6,
"end": 14,
"id": {
"type": "Identifier",
"start": 6,
"end": 7,
"name": "a"
},
"init": {
"type": "Literal",
"start": 10,
"end": 14,
"value": "你好",
"raw": "\"你好\""
}
}
],
"kind": "const"
}
],
"sourceType": "module"
}It can be observed that the two different specification asts handle special characters differently due to different encoding methods, resulting in differences in the parsed ast node positions. The babel ast tree parses the utf-8 encoded 你好 literal with an ast node position range of [10, 18), while the estree ast tree parses the utf-16 encoded literal with an ast node position range of [10, 14).
The source map chapter details how rollup internally generates sourcemap, where rollup relies on the position information provided by estree ast for mapping markers.
export class NodeBase extends ExpressionEntity implements ExpressionNode {
/**
* Override to perform special initialisation steps after the scope is
* initialised
*/
initialise(): void {
this.scope.context.magicString.addSourcemapLocation(this.start);
this.scope.context.magicString.addSourcemapLocation(this.end);
}
}Therefore, different encodings will cause serious offset in the sourcemap generated by rollup.
Performance
Optimize Ast Compatibility
In rust side, by leveraging swc's ability to parse code into babel ast
use swc_compiler_base::parse_js;
pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
GLOBALS.set(&Globals::default(), || {
let result = catch_unwind(AssertUnwindSafe(|| {
let result = try_with_handler(&code_reference, |handler| {
parse_js(
cm,
file,
handler,
target,
syntax,
IsModule::Unknown,
Some(&comments),
)
});
match result {
Err(buffer) => buffer,
Ok(program) => {
let annotations = comments.take_annotations();
let converter = AstConverter::new(&code_reference, &annotations);
converter.convert_ast_to_buffer(&program)
}
}
}));
});
}Through converter.convert_ast_to_buffer(&program) method recursively parsing the babel ast tree parsed by swc, recalculating the estree ast node position information corresponding to the babel ast node position information
/// Converts the given UTF-8 byte index to a UTF-16 byte index.
///
/// To be performant, this method assumes that the given index is not smaller
/// than the previous index. Additionally, it handles "annotations" like
/// `@__PURE__` comments in the process.
///
/// The logic for those comments is as follows:
/// - If the current index is at the start of an annotation, the annotation
/// is collected and the index is advanced to the end of the annotation.
/// - Otherwise, we check if the next character is a white-space character.
/// If not, we invalidate all collected annotations.
/// This is to ensure that we only collect annotations that directly precede
/// an expression and are not e.g. separated by a comma.
/// - If annotations are relevant for an expression, it can "take" the
/// collected annotations by calling `take_collected_annotations`. This
/// clears the internal buffer and returns the collected annotations.
/// - Invalidated annotations are attached to the Program node so that they
/// can all be removed from the source code later.
/// - If an annotation can influence a child that is separated by some
/// non-whitespace from the annotation, `keep_annotations_for_next` will
/// prevent annotations from being invalidated when the next position is
/// converted.
pub(crate) fn convert(&mut self, utf8_index: u32, keep_annotations_for_next: bool) -> u32 {
if self.current_utf8_index > utf8_index {
panic!(
"Cannot convert positions backwards: {} < {}",
utf8_index, self.current_utf8_index
);
}
while self.current_utf8_index < utf8_index {
if self.current_utf8_index == self.next_annotation_start {
let start = self.current_utf16_index;
let (next_comment_end, next_comment_kind) = self
.next_annotation
.map(|a| (a.comment.span.hi.0 - 1, a.kind.clone()))
.unwrap();
while self.current_utf8_index < next_comment_end {
let character = self.character_iterator.next().unwrap();
self.current_utf8_index += character.len_utf8() as u32;
self.current_utf16_index += character.len_utf16() as u32;
}
if let Annotation(kind) = next_comment_kind {
self.collected_annotations.push(ConvertedAnnotation {
start,
end: self.current_utf16_index,
kind,
});
}
self.next_annotation = self.annotation_iterator.next();
self.next_annotation_start = get_annotation_start(self.next_annotation);
} else {
let character = self.character_iterator.next().unwrap();
if !(self.keep_annotations || self.collected_annotations.is_empty()) {
match character {
' ' | '\t' | '\r' | '\n' => {}
_ => {
self.invalidate_collected_annotations();
}
}
}
self.current_utf8_index += character.len_utf8() as u32;
self.current_utf16_index += character.len_utf16() as u32;
}
}
self.keep_annotations = keep_annotations_for_next;
self.current_utf16_index
}At the same time, rollup will convert the babel ast parsed by swc into compatible estree ast binary format in rust side, and then pass it as (array) buffer to javascript.
pub(crate) fn convert_statement(&mut self, statement: &Stmt) {
match statement {
Stmt::Break(break_statement) => self.store_break_statement(break_statement),
Stmt::Block(block_statement) => self.store_block_statement(block_statement, false),
Stmt::Continue(continue_statement) => self.store_continue_statement(continue_statement),
Stmt::Decl(declaration) => self.convert_declaration(declaration),
Stmt::Debugger(debugger_statement) => self.store_debugger_statement(debugger_statement),
Stmt::DoWhile(do_while_statement) => self.store_do_while_statement(do_while_statement),
Stmt::Empty(empty_statement) => self.store_empty_statement(empty_statement),
Stmt::Expr(expression_statement) => self.store_expression_statement(expression_statement),
Stmt::For(for_statement) => self.store_for_statement(for_statement),
Stmt::ForIn(for_in_statement) => self.store_for_in_statement(for_in_statement),
Stmt::ForOf(for_of_statement) => self.store_for_of_statement(for_of_statement),
Stmt::If(if_statement) => self.store_if_statement(if_statement),
Stmt::Labeled(labeled_statement) => self.store_labeled_statement(labeled_statement),
Stmt::Return(return_statement) => self.store_return_statement(return_statement),
Stmt::Switch(switch_statement) => self.store_switch_statement(switch_statement),
Stmt::Throw(throw_statement) => self.store_throw_statement(throw_statement),
Stmt::Try(try_statement) => self.store_try_statement(try_statement),
Stmt::While(while_statement) => self.store_while_statement(while_statement),
Stmt::With(_) => unimplemented!("Cannot convert Stmt::With"),
}
}Extract information required for estree ast node from the structure of babel ast node, and recalculate the position information under the estree ast specification using utf-16 encoding.
pub(crate) fn convert_item_list_with_state<T, S, F>(
&mut self,
item_list: &[T],
state: &mut S,
reference_position: usize,
convert_item: F,
) where
F: Fn(&mut AstConverter, &T, &mut S) -> bool,
{
// for an empty list, we leave the referenced position at zero
if item_list.is_empty() {
return;
}
self.update_reference_position(reference_position);
// store number of items in first position
self
.buffer
.extend_from_slice(&(item_list.len() as u32).to_ne_bytes());
let mut reference_position = self.buffer.len();
// make room for the reference positions of the items
self
.buffer
.resize(self.buffer.len() + item_list.len() * 4, 0);
for item in item_list {
let insert_position = (self.buffer.len() as u32) >> 2;
if convert_item(self, item, state) {
self.buffer[reference_position..reference_position + 4]
.copy_from_slice(&insert_position.to_ne_bytes());
}
reference_position += 4;
}
}Of course, it will also collect comments nodes, preparing for rollup's tree shaking later. Note that the comments node is included in the babel ast specification, but not in the estree ast specification. However, the information of the comments node is crucial for rollup's tree shaking, which can enhance the ability of tree shaking.
rollup will collect these comment information in estree ast and store it through the _rollupAnnotations property. In other words, the final returned estree ast is compatible with the estree ast structure and contains the _rollupAnnotations property.
pub(crate) fn take_collected_annotations(
&mut self,
kind: AnnotationKind,
) -> Vec<ConvertedAnnotation> {
let mut relevant_annotations = Vec::new();
for annotation in self.collected_annotations.drain(..) {
if annotation.kind == kind {
relevant_annotations.push(annotation);
} else {
self.invalid_annotations.push(annotation);
}
}
relevant_annotations
}
impl<'a> AstConverter<'a> {
pub(crate) fn store_call_expression(
&mut self,
span: &Span,
is_optional: bool,
callee: &StoredCallee,
arguments: &[ExprOrSpread],
is_chained: bool,
) {
// annotations
let annotations = self
.index_converter
.take_collected_annotations(AnnotationKind::Pure);
}
impl SequentialComments {
pub(crate) fn add_comment(&self, comment: Comment) {
if comment.text.starts_with('#') && comment.text.contains("sourceMappingURL=") {
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Annotation(AnnotationKind::SourceMappingUrl),
});
return;
}
let mut search_position = comment
.text
.chars()
.nth(0)
.map(|first_char| first_char.len_utf8())
.unwrap_or(0);
while let Some(Some(match_position)) = comment.text.get(search_position..).map(|s| s.find("__"))
{
search_position += match_position;
// Using a byte reference avoids UTF8 character boundary checks
match &comment.text.as_bytes()[search_position - 1] {
b'@' | b'#' => {
let annotation_slice = &comment.text[search_position..];
if annotation_slice.starts_with("__PURE__") {
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Annotation(AnnotationKind::Pure),
});
return;
}
if annotation_slice.starts_with("__NO_SIDE_EFFECTS__") {
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Annotation(AnnotationKind::NoSideEffects),
});
return;
}
}
_ => {}
}
search_position += 2;
}
self.annotations.borrow_mut().push(AnnotationWithType {
comment,
kind: CommentKind::Comment,
});
}
pub(crate) fn take_annotations(self) -> Vec<AnnotationWithType> {
self.annotations.take()
}
}Finally, the returned arraybuffer structure compatible with estree ast is passed to the rollup side, and the rollup side needs to guide the parsing of the arraybuffer compatible with estree ast structure to instantiate the ast class node implemented internally by rollup.
export default class Module {
async setSource({
ast,
code,
customTransformCache,
originalCode,
originalSourcemap,
resolvedIds,
sourcemapChain,
transformDependencies,
transformFiles,
...moduleOptions
}: TransformModuleJSON & {
resolvedIds?: ResolvedIdMap;
transformFiles?: EmittedFile[] | undefined;
}): Promise<void> {
// Measuring asynchronous code does not provide reasonable results
timeEnd('generate ast', 3);
const astBuffer = await parseAsync(
code,
false,
this.options.jsx !== false
);
timeStart('generate ast', 3);
this.ast = convertProgram(astBuffer, programParent, this.scope);
}
}rollup's guidance on buffer level
function convertNode(
parent: Node | { context: AstContext; type: string },
parentScope: ChildScope,
position: number,
buffer: AstBuffer
): any {
const nodeType = buffer[position];
const NodeConstructor = nodeConstructors[nodeType];
/* istanbul ignore if: This should never be executed but is a safeguard against faulty buffers */
if (!NodeConstructor) {
console.trace();
throw new Error(`Unknown node type: ${nodeType}`);
}
const node = new NodeConstructor(parent, parentScope);
node.type = nodeTypeStrings[nodeType];
node.start = buffer[position + 1];
node.end = buffer[position + 2];
bufferParsers[nodeType](node, position + 3, buffer);
node.initialise();
return node;
}Optimize Native Interaction
As mentioned above, directly using the javascript reference exposed by swc will repeatedly serialize and deserialize ast between rust and javascript. When processing complex ast, the parsing efficiency almost erodes the performance advantage of switching to the native parser (rust). The solution is as follows:
Use
arraybufferto transfer the parsedastbetweenrustandjavascript.
Do not consider using the swc's javascript reference, but directly use the swc's rust reference in rust.
use swc_compiler_base::parse_js;
pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
GLOBALS.set(&Globals::default(), || {
let result = catch_unwind(AssertUnwindSafe(|| {
let result = try_with_handler(&code_reference, |handler| {
parse_js(
cm,
file,
handler,
target,
syntax,
IsModule::Unknown,
Some(&comments),
)
});
match result {
Err(buffer) => buffer,
Ok(program) => {
let annotations = comments.take_annotations();
let converter = AstConverter::new(&code_reference, &annotations);
converter.convert_ast_to_buffer(&program)
}
}
}));
});
}At the same time, rollup will convert the swc parsed babel ast to the compatible estree ast binary format in rust, and then pass it as (array) buffer to javascript.
match result {
Err(buffer) => buffer,
Ok(program) => {
let annotations = comments.take_annotations();
let converter = AstConverter::new(&code_reference, &annotations);
converter.convert_ast_to_buffer(&program)
}
}Passing arraybuffer is basically a lossless operation, so we only need to teach javascript side how to operate arraybuffer. In addition, the size of arraybuffer is only about one-third of the serialized json. Finally, this will allow us to easily pass arraybuffer data format ast to different threads, such as parsing in WebWorker can be completed and then pass the arraybuffer data format ast to the main thread without loss.
In nodejs side, using napi-rs to interact with rust code, and wasm-pack in browser side for building.
Optimize Semantic Analysis
Parser Semantic Analysis Design
rust side directly calling swc's use swc_compiler_base::parse_js will not execute semantic analysis, only handle lexical analysis and syntax analysis. That is, the following code can be parsed normally in swc
const a = 1;
const a = 2;This is different from acorn, which performs partial early errors in syntax analysis and semantic analysis when generating ast.
The reason is that acorn is designed as a parser that conforms to the ECMAScript specification. Before the javascript engine executes the code, it requires the execution of Static Semantics: Early Errors steps (essentially static semantic analysis), which are checks and reports that need to be completed in the parsing and early syntax analysis stage. These errors are checked statically, which means they do not need to be executed to be found.
browsers, nodejs and other built-in
javascriptengines also executeStatic Semantics: Early Errorssteps before executing the code.
The significance of the specification is:
- Early Detection of Issues: It can find potential issues before the code is actually executed, avoiding issues that may appear at runtime.
- Performance Improvement: Since these checks are completed in the static analysis stage, they can improve code execution efficiency.
- Ensure Language Consistency: Through a unified early error check mechanism, ensure that
javascriptcode can be processed consistently in different environments. - Help Developers Write Better Code: These rules also guide developers to follow better programming practices.
swc, babel and other parsers do not execute Static Semantics: Early Errors steps when generating ast, that is, they are designed differently from acorn. Then let's first introduce why they separate syntax analysis and static semantic analysis.
Performance and Complexity Trade-off
Implementing early errors detection requires the parser to do the following:
- Simulate and maintain the execution context of the current execution statement.
- Static rule check.
- Detection of other static semantic rules defined in the language specification.
- Syntax restriction rule detection.
- Module system static verification rule detection.
Although the detection complexity is not high, in large projects, if the user needs to perform
early errorscheck every time they translate new code, the cumulative complexity of the completeearly errorscheck may bring some performance overhead, which cannot be ignored.Toolchain Division of Labor
swc,babeland other parsers' focus is on code conversion, mainly injected in the code conversion process of the build system in the form of plugins. If the tool wants to be strongly integrated into the ecosystem of various build systems, the easiest way is to maintain single responsibility principle.By separating parsing and semantic analysis:
- Parser can focus on generating accurate
ast. - Semantic Analyzer can focus on checking code correctness.
- Each part is easier to maintain and optimize.
- Parser can focus on generating accurate
Flexibility
In the complex application module translation process, it is usually not a one-time thing, but will exist in the intermediate state, and the intermediate code is largely not in compliance with semantic specifications. If the translation tool performs strict semantic analysis, such code cannot pass the compilation and affect the ability to extend. Modern development toolchain distributes different checks to different stages through different semantic analysis, on-demand execution, and balances development flexibility and code quality.
babel, swc choose to separate the responsibilities of syntax analysis and early errors detection, and in the plugin translation code stage, the code is parsed into ast for lexical analysis and syntax analysis, without executing early errors check(static semantic analysis), and in the suitable time (such as rollup's transform stage) by bundlers(such as rollup) to control and execute early errors check.
This design choice reflects an important principle in engineering practice: sometimes, breaking down a complex problem into multiple independent steps may be more effective than trying to solve everything in one step. This allows each tool to focus on its core task, thereby providing better functionality and performance.
rollup plugin system design inspiration
The above design approach also has some reflection in the rollup plugin system, when the user plugin returns ast in the load(or transform) hook, then rollup will reuse the ast returned by the user plugin in the subsequent transform hook. Before rollup completes the transform stage, rollup will not perform any semantic analysis on the reused ast.
const a = 1;
const a = 2;For the above example, acorn will provide the following error message
while (this.type !== tt.braceR) {
const element = this.parseClassElement(node.superClass !== null);
if (element) {
classBody.body.push(element);
if (
element.type === 'MethodDefinition' &&
element.kind === 'constructor'
) {
if (hadConstructor)
this.raiseRecoverable(
element.start,
'Duplicate constructor in the same class'
);
hadConstructor = true;
} else if (
element.key &&
element.key.type === 'PrivateIdentifier' &&
isPrivateNameConflicted(privateNameMap, element)
) {
this.raiseRecoverable(
element.key.start,
`Identifier '#${element.key.name}' has already been declared`
);
}
}
}Error Prompt
Line 2: Identifier 'a' has already been declared.
Therefore, rollup needs to leverage swc_ecma_lints capabilities to achieve more complete semantic analysis.
use swc_ecma_lints::{rule::Rule, rules, rules::LintParams};
let result = HANDLER.set(&handler, || op(&handler));
match result {
Ok(mut program) => {
let unresolved_mark = Mark::new();
let top_level_mark = Mark::new();
let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark);
let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark);
program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false));
let mut rules = rules::all(LintParams {
program: &program,
lint_config: &Default::default(),
unresolved_ctxt,
top_level_ctxt,
es_version,
source_map: cm.clone(),
});
HANDLER.set(&handler, || match &program {
Program::Module(m) => {
rules.lint_module(m);
}
Program::Script(s) => {
rules.lint_script(s);
}
});
if handler.has_errors() {
let buffer = create_error_buffer(&wr, code);
Err(buffer)
} else {
Ok(program)
}
}
}Implement Semantic Analysis On JavaScript Side
However, from the following PR and discussion it can be known
After testing, it was found that the efficiency of swc_ecma_lints detection was not high.
In order to optimize this problem, in the rollup native parser, it was temporarily decided to remove the complete semantic analysis in rust side before the scope analysis is implemented in rust side.
let result = HANDLER.set(&handler, || op(&handler));
match result {
Ok(mut program) => {
let unresolved_mark = Mark::new();
let top_level_mark = Mark::new();
let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark);
let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark);
program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false));
let mut rules = rules::all(LintParams {
program: &program,
lint_config: &Default::default(),
unresolved_ctxt,
top_level_ctxt,
es_version,
source_map: cm.clone(),
});
HANDLER.set(&handler, || match &program {
Program::Module(m) => {
rules.lint_module(m);
}
Program::Script(s) => {
rules.lint_script(s);
}
});
if handler.has_errors() {
let buffer = create_error_buffer(&wr, code);
Err(buffer)
} else {
Ok(program)
}
}
}
result.map_err(|_| {
if handler.has_errors() {
create_error_buffer(&wr, code)
} else {
panic!("Unexpected error in parse")
}
}) The semantic analysis task is handed over to javascript side.
rollup will perform more complete semantic analysis when backtracking to instance ast class node. After testing, it was found that semantic analysis in javascript side did not have a significant impact on rollup's performance.
Semantic Analysis Detection Point
The main tasks of semantic analysis include the following:
const_assignExample:
tsexport function logConstVariableReassignError() { return { code: CONST_REASSIGN, message: 'Cannot reassign a variable declared with `const`' }; }ts// case const x = 1; x = 'string'; // implementation export default class AssignmentExpression extends NodeBase { initialise(): void { super.initialise(); if (this.left instanceof Identifier) { const variable = this.scope.variables.get(this.left.name); if (variable?.kind === 'const') { this.scope.context.error( logConstVariableReassignError(), this.left.start ); } } this.left.setAssignedValue(this.right); } }duplicate_bindingstsexport function logRedeclarationError(name: string): RollupLog { return { code: REDECLARATION_ERROR, message: `Identifier "${name}" has already been declared` }; }ts// case import { x } from './b'; const x = 1; // case2 import { x } from './b'; import { x } from './b'; // implementation export default class Module { private addImport(node: ImportDeclaration): void { const source = node.source.value; this.addSource(source, node); for (const specifier of node.specifiers) { const localName = specifier.local.name; if ( this.scope.variables.has(localName) || this.importDescriptions.has(localName) ) { this.error( logRedeclarationError(localName), specifier.local.start ); } const name = specifier instanceof ImportDefaultSpecifier ? 'default' : specifier instanceof ImportNamespaceSpecifier ? '*' : specifier.imported instanceof Identifier ? specifier.imported.name : specifier.imported.value; this.importDescriptions.set(localName, { module: null as never, // filled in later name, source, start: specifier.start }); } } }ts// case { const a = 1; const a = 1; } // implementation export default class BlockScope extends ChildScope { addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { if (kind === 'var') { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable | undefined); if (existingVariable) { if ( existingVariable.kind === 'var' || (kind === 'var' && existingVariable.kind === 'parameter') ) { existingVariable.addDeclaration(identifier, init); return existingVariable; } return context.error( logRedeclarationError(name), identifier.start ); } const declaredVariable = this.parent.addDeclaration( identifier, context, init, destructuredInitPath, kind ); // Necessary to make sure the init is deoptimized for conditional declarations. // We cannot call deoptimizePath here. declaredVariable.markInitializersForDeoptimization(); // We add the variable to this and all parent scopes to reliably detect conflicts this.addHoistedVariable(name, declaredVariable); return declaredVariable; } return super.addDeclaration( identifier, context, init, destructuredInitPath, kind ); } }ts// case try { } catch (e) { const a = 1; const a = 2; } // implementation export default class CatchBodyScope extends ChildScope { addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { if (kind === 'var') { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable | undefined); if (existingVariable) { const existingKind = existingVariable.kind; if ( existingKind === 'parameter' && // If this is a destructured parameter, it is forbidden to redeclare existingVariable.declarations[0].parent.type === NodeType.CatchClause ) { // If this is a var with the same name as the catch scope parameter, // the assignment actually goes to the parameter and the var is // hoisted without assignment. Locally, it is shadowed by the // parameter const declaredVariable = this.parent.parent.addDeclaration( identifier, context, UNDEFINED_EXPRESSION, destructuredInitPath, kind ); // To avoid the need to rewrite the declaration, we link the variable // names. If we ever implement a logic that splits initialization and // assignment for hoisted vars, the "renderLikeHoisted" logic can be // removed again. // We do not need to check whether there already is a linked // variable because then declaredVariable would be that linked // variable. existingVariable.renderLikeHoisted(declaredVariable); this.addHoistedVariable(name, declaredVariable); return declaredVariable; } if (existingKind === 'var') { existingVariable.addDeclaration(identifier, init); return existingVariable; } return context.error( logRedeclarationError(name), identifier.start ); } } } }ts// case function fn() { const a = 1; const a = 2; } // implementation export default class FunctionBodyScope extends ChildScope { // There is stuff that is only allowed in function scopes, i.e. functions can // be redeclared, functions and var can redeclare each other addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable); if (existingVariable) { const existingKind = existingVariable.kind; if ( (kind === 'var' || kind === 'function') && (existingKind === 'var' || existingKind === 'function' || existingKind === 'parameter') ) { existingVariable.addDeclaration(identifier, init); return existingVariable; } context.error(logRedeclarationError(name), identifier.start); } const newVariable = new LocalVariable( identifier.name, identifier, init, destructuredInitPath, context, kind ); this.variables.set(name, newVariable); return newVariable; } }ts// case1 import { a } from './b'; const a = 1; // case2 import { a } from './b'; import { a } from './b'; // implementation export default class ModuleScope extends ChildScope { addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { if (this.context.module.importDescriptions.has(identifier.name)) { context.error( logRedeclarationError(identifier.name), identifier.start ); } return super.addDeclaration( identifier, context, init, destructuredInitPath, kind ); } }ts// case const a = 1; const a = 2; export default class Scope { /* Redeclaration rules: - var can redeclare var - in function scopes, function and var can redeclare function and var - var is hoisted across scopes, function remains in the scope it is declared - var and function can redeclare function parameters, but parameters cannot redeclare parameters - function cannot redeclare catch scope parameters - var can redeclare catch scope parameters in a way - if the parameter is an identifier and not a pattern - then the variable is still declared in the hoisted outer scope, but the initializer is assigned to the parameter - const, let, class, and function except in the cases above cannot redeclare anything */ addDeclaration( identifier: Identifier, context: AstContext, init: ExpressionEntity, destructuredInitPath: ObjectPath, kind: VariableKind ): LocalVariable { const name = identifier.name; const existingVariable = this.hoistedVariables?.get(name) || (this.variables.get(name) as LocalVariable); if (existingVariable) { if (kind === 'var' && existingVariable.kind === 'var') { existingVariable.addDeclaration(identifier, init); return existingVariable; } context.error(logRedeclarationError(name), identifier.start); } const newVariable = new LocalVariable( identifier.name, identifier, init, destructuredInitPath, context, kind ); this.variables.set(name, newVariable); return newVariable; } }duplicate_exportstsexport function logDuplicateExportError(name: string): RollupLog { return { code: DUPLICATE_EXPORT, message: `Duplicate export "${name}"` }; } export default class Module { private assertUniqueExportName(name: string, nodeStart: number) { if (this.exports.has(name) || this.reexportDescriptions.has(name)) { this.error(logDuplicateExportError(name), nodeStart); } } }ts// case export default 1; export default 2; // implementation export default class Module { private addExport( node: | ExportAllDeclaration | ExportNamedDeclaration | ExportDefaultDeclaration ): void { if (node instanceof ExportDefaultDeclaration) { // export default foo; this.assertUniqueExportName('default', node.start); this.exports.set('default', { identifier: node.variable.getAssignedVariableName(), localName: 'default' }); } } }ts// case export * as a from './b'; export * as a from './b'; // implementation export default class Module { private addExport( node: ExportAllDeclaration | ExportNamedDeclaration ): void { if (node instanceof ExportAllDeclaration) { const source = node.source.value; this.addSource(source, node); if (node.exported) { // export * as name from './other' const name = node.exported instanceof Literal ? node.exported.value : node.exported.name; this.assertUniqueExportName(name, node.exported.start); this.reexportDescriptions.set(name, { localName: '*', module: null as never, // filled in later, source, start: node.start }); } else { // export * from './other' this.exportAllSources.add(source); } } } }ts// case export { a } from './b'; export { a } from './b'; // implementation export default class Module { private addExport( node: ExportAllDeclaration | ExportNamedDeclaration ): void { if (node.source instanceof Literal) { // export { name } from './other' const source = node.source.value; this.addSource(source, node); for (const { exported, local, start } of node.specifiers) { const name = exported instanceof Literal ? exported.value : exported.name; this.assertUniqueExportName(name, start); this.reexportDescriptions.set(name, { localName: local instanceof Literal ? local.value : local.name, module: null as never, // filled in later, source, start }); } } } }ts// case1 export const a = 1; export const a = 2; // case2 export function a() {} export function a() {} // case3 export { a, a }; // implementation export default class Module { private addExport(node: ExportNamedDeclaration): void { if (node.declaration) { const declaration = node.declaration; if (declaration instanceof VariableDeclaration) { // export var { foo, bar } = ... // export var foo = 1, bar = 2; for (const declarator of declaration.declarations) { for (const localName of extractAssignedNames(declarator.id)) { this.assertUniqueExportName(localName, declarator.id.start); this.exports.set(localName, { identifier: null, localName }); } } } else { // export function foo () {} const localName = (declaration.id as Identifier).name; this.assertUniqueExportName(localName, declaration.id!.start); this.exports.set(localName, { identifier: null, localName }); } } } }no_dupe_argstsexport function logDuplicateArgumentNameError(name: string): RollupLog { return { code: DUPLICATE_ARGUMENT_NAME, message: `Duplicate argument name "${name}"` }; }ts// case function fn(a, a) {} // implementation export default class ParameterScope extends ChildScope { /** * Adds a parameter to this scope. Parameters must be added in the correct * order, i.e. from left to right. */ addParameterDeclaration( identifier: Identifier, argumentPath: ObjectPath ): ParameterVariable { const { name, start } = identifier; const existingParameter = this.variables.get(name); if (existingParameter) { return this.context.error( logDuplicateArgumentNameError(name), start ); } const variable = new ParameterVariable( name, identifier, argumentPath, this.context ); this.variables.set(name, variable); // We also add it to the body scope to detect name conflicts with local // variables. We still need the intermediate scope, though, as parameter // defaults are NOT taken from the body scope but from the parameters or // outside scope. this.bodyScope.addHoistedVariable(name, variable); return variable; } }
From the above implementation, it can be seen that semantic analysis is heavily dependent on the current ast node execution context and scope information. Of course, the above semantic analysis is the most basic, rollup will also perform some other semantic analysis, such as side effect analysis, module loop dependency analysis, strict syntax restrictions (such as namespace object cannot be called, imported references cannot be reassigned, etc.) semantic analysis, etc., which are impossible for acorn.
Since the internal implementation of swc_ecma_lints may have performance issues, this is a temporary solution, and rollup will add scope analysis in rust side later, and then hand over the complete semantic analysis task to rust side. At that time, the complete semantic analysis task will be handed over to rust side for processing.
Optimize Ast Parsing
rollup provides this.parser for plugin context to allow user plugins to use native swc capabilities to parse code into ast. User plugins can return parsed ast in load and transform hooks, and rollup will reuse the parsed ast returned by the user plugin.
If the user plugin does not parse ast(i.e., the plugin does not return ast in load and transform hooks), then the ast will be handled as a fallback, and the ast parsed from the translated code will be parsed as compat estree ast in transform stage completion, using native rust capabilities.
precautions for using this.parser
Currently, rollup has removed rust side ast semantic analysis. In other words, using rollup provided this.parser api to parse code into ast in the plugin context has not completed semantic analysis.
If the user plugin needs to generate a ast that is compliant with semantic analysis, then the user plugin needs to use other tools to perform semantic analysis on the ast.
If the user does not need to ensure that the generated ast is compliant with semantic analysis, then rollup will automatically perform semantic analysis when backtracking to recursively instance ast node class.
Even with native parsing capabilities, generating complex ast is still time-consuming. In watch mode, rollup will cache(see Rollup Incremental Build section for details) estree ast to skip the native swc parsing process of ast, recursively instance estree ast structure to instance rollup internal ast class node.
Performance Comparison
Tested the parsing capabilities of rollup in 4.28.1 and 3.29.5 versions, where:
4.28.1 version uses native swc to parse ast, and rust side passes compatible estree ast to javascript side through arraybuffer format.
3.29.5 version uses acorn to parse ast.
Each group tested 5 times for average.
| Code Length (Character) | SWC Parsing Time (ms) | Acorn Parsing Time (ms) |
|---|---|---|
| 312.4K | 13.47 | 73.92 |
| 624.7K | 21.78 | 83.80 |
| 1.2M | 36.03 | 124.82 |
| 2.5M | 68.88 | 182.45 |
| 5.0M | 136.52 | 272.53 |
| 10.0M | 266.87 | 608.72 |
| 20.0M | 578.00 | 1178.82 |
| 159.9M | 4155.64 | 7276.24 |
| 319.9M | 10081.40 | - |
After testing, it was found that when the parsed character amount reached 319,869,952, acorn parsing ast would report an error.
<--- Last few GCs --->
[69821:0x120078000] 15364 ms: Mark-sweep 4062.9 (4143.2) -> 4059.0 (4143.2) MB, 703.2 / 0.0 ms (average mu = 0.293, current mu = 0.102) allocation failure; scavenge might not succeed
[69821:0x120078000] 16770 ms: Mark-sweep 4075.3 (4143.2) -> 4071.5 (4169.0) MB, 1383.5 / 0.0 ms (average mu = 0.143, current mu = 0.016) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memoryFrom the test results, it can be seen that switching to the native parser has a significant performance advantage over acorn.
Overall performance:
The average parsing time using the native parser (built-in
swc) is relatively short, and the growth is relatively gentle with the increase in code length.The parsing time using the non-native parser (built-in
acorn) grows significantly with large code, showing high performance overhead.
Data comparison:
- Small code amount (
312,373characters): the gap is relatively obvious, about5.5times (13.47ms vs73.92ms). - Medium code amount (
9,995,936characters): the gap is about2.28times (266.87ms vs608.72ms). - Large code amount (
159,934,976characters): the gap is1.75times (4155.64ms vs7276.24ms).
Module Character Quantity Concept
module Code Length (Character) rollup.js 312,373 - Small code amount (
Trend analysis:
- The parsing time growth using the native parser (built-in
swc) is relatively small, suitable for larger module parsing needs. - The parsing time growth using the non-native parser (built-in
acorn) is relatively large, and the parsing efficiency is significantly insufficient in large module parsing.
- The parsing time growth using the native parser (built-in