JavaScriptCore引擎深度解析4-语法分析篇

前言

JavaScriptCore引擎深度解析3—词法分析篇一文中,我们提到:Lexer是作为Parser的一个成员而存在的,主要目的就是为Parser提供一个个的JSToken,依赖图如下:

可以看到,词法分析篇花费了大量篇幅,也仅仅是讲了讲图的上半部分,即使如此,有些东西都是浅尝辄止,还有很多东西都未能讲到,因为源码实在是过于庞大,我的能力又实在过于有限…
路漫漫其修远兮……
不管怎样,词法分析篇也算是为我们的继续分析打了一些基础,我们继续来分析,尽可能在本篇把上图的下半部分分析完成。
注:有些源码过长,为了代码清晰便于分析,会删除一些逻辑,这并不意味这删除的代码逻辑不重要,如果想看全部源码,建议去看下源码工程对应的文件

语法分析概述

我们知道编译原理的语法分析有自顶向下和自底向上两种:

  • 自顶向下分析法:从分析树的顶部(根节点)向底部(叶节点)方向构造分析树,而递归下降分析是自顶向下语法分析的通用形式,它可能需要回溯,导致效率较低,所以会引入预测分析,通过在输入中向前看固定个数(k)符号来选择正确的产生式,其中最典型的就是LL(1)文法(消除二义性、消除左递归、消除回溯)。在LL(1)文法的基础上,甚至会继续计算文法的FIRST集、FOLLOW集和SELECT集,进一步引入预测分析表,根据预测分析表进行产生式的选择。

  • 自底向上分析法:从分析树的底部(叶节点)向顶部(根节点)方向构造分析树,而自底向上语法分析的通用框架是”移入-归约分析”,而LR文法是最大的、可以构造出相应
    移入-归约语法分析器的文法类,最具代表性的有LR(0)、SLR、LR(1)、LASR几类。

而我们本文要讨论的JavaScript的语法分析采用的递归下降法,这是一种适合手写语法编译器的方法,且非常简单。递归下降法对语言所用的文法有一些限制,但递归下降是现阶段主流的语法分析方法,因为它可以由开发人员高度控制,在提供错误信息方面也很有优势。

语法树

我们知道,JavaScriptCore词法分析的输入是一个个的JSToken,而输出是一棵语法分析树。
例如,我们有如下的JavaScript代码:

function foo(x) {
    var sum = 0;
    if (x > 10) {
        while(x > 0){
            sum += x;
            x--;
        }
    }
    return sum;
}

通过语法分析将得到如下的简化版本语法树:

语法树节点

在了解语法树细节之前,我们先来看下语法树的节点,如下图所示

高清版本传送门

上图容纳了语法分析树中的绝大部分节点,可以看出,最重要的节点当属ExpressionNodeStatementNode节点,图中的大部分节点都继承自这两个节点。

  • StatementNode,基本语句节点,它的子节点主要是用来描述语言结构,例如
    (1)顺序结构:ClassDeclNodeDeclarationStateMentNode
    (2)条件结构:IFElseNodeSwitchNode
    (3)循环结构:WhileNodeForNodeDoWhileNode
    (4)还有一个主要的节点是ScopeNode,用来描述区域
  • ExpressionNode,表达式节点,它的子节点主要用来描述各种表达式
    (1)ConstantNode,常量节点,它又衍生出字符串节点、Bool节点、整形、浮点型节点
    (2)BinaryOpNode,两目运算符,衍生出加、减、乘、除、模、左移、右移等运算符节点
    (3)此外,还有正则表达式节点、this节点、super节点等

节点内存管理

ParserArenaFreeable & ParserArenaDeletable

从图中可以看出,大部分Node节点追溯到根节点,都会到ParserArenaFreeable类,它重写了C++的new操作符,主要是实现自己的内存管理方式,不再是默认的的new、delete对象机制。如果熟悉C++,就会发现这和C++的placement new/delete的机制如出一辙:它主要是反复使用一块较大的动态分配成功的内存来构造不同类型的对象或者它们的数组。例如可以先申请一个足够大的字符数组,然后当需要时在它上面构造不同类型的对象或数组。它不用担心内存分配失败,因为它根本不分配内存,它只是调用对象的构造函数。

class ParserArenaFreeable {
    public:
        void* operator new(size_t, ParserArena&);
};

inline void* ParserArenaFreeable::operator new(size_t size, ParserArena& parserArena)
{
    return parserArena.allocateFreeable(size);
}
  • ParserArenaFreeable objects are freed when the arena is deleted.
    Destructors are not called. Clients must not call delete on such objects
    注意:ParserArenaFreeable的析构函数是不会被调用的,所以Node节点如果继承自ParserArenaFreeable,就不要去实现析构函数了,即使写了,也不会被调用!

例如,用如下方式创建一个字符串节点StringNode,其中ParserArena& m_parserArena

ExpressionNode* createString(const JSTokenLocation& location, const Identifier* string)
{       
    return new (m_parserArena) StringNode(location, *string);
}

实际上还有一个ParserArenaDeletable,它和ParserArenaFreeable很类似,除了 可以调用析构函数,所以要想节点被调用析构函数,做一些释放逻辑的话,可以选择继承ParserArenaDeletable

 class ParserArenaDeletable {
    public:
        virtual ~ParserArenaDeletable() { }
        void* operator new(size_t, ParserArena&);
};
  • ParserArenaDeletable objects are deleted when the arena is deleted.
    Destructors can be called. Clients must not call delete directly on such objects.

如果想进一步探索内存分配机制,可以看下类ParserArena的实现:
1、两个比较重要的alloc方法

void* allocateFreeable(size_t size)
{
    // 字节对齐
    size_t alignedSize = alignSize(size);

    // 如果当前的内存块不够,进行扩容,并将扩容地址记录到m_freeablePools中
    if ((m_freeablePoolEnd - m_freeableMemory) < alignedSize))
        allocateFreeablePool();

    // 分配地址
    void* block = m_freeableMemory;

    // 修改可用地址数值
    m_freeableMemory += alignedSize;

    return block;
}

void* allocateDeletable(size_t size)
{
    // 首先调用allocateFreeable进行内存分配
    ParserArenaDeletable* deletable = static_cast<ParserArenaDeletable*>(allocateFreeable(size));

    // 将分配的地址添加到m_deletableObjects容器中,方便后续调用析构函数
    m_deletableObjects.append(deletable);

    // 返回地址
    return deletable;
}

2、ParserArena的析构函数
首先对于m_deletableObjects的每个对象,要调用其析构函数,然后将之前记录到m_freeablePools中的每个地址都进行释放

ParserArena::~ParserArena()
{
    deallocateObjects();
}

inline void ParserArena::deallocateObjects()
{
    // 调用析构函数
    size_t size = m_deletableObjects.size();
    for (size_t i = 0; i < size; ++i)
        m_deletableObjects[i]->~ParserArenaDeletable();

    // 
    if (m_freeablePoolEnd)
        fastFree(freeablePool());

    // 释放所有申请的地址块
    size = m_freeablePools.size();
    for (size_t i = 0; i < size; ++i)
        fastFree(m_freeablePools[i]);
}

ASTBuilder

ASTBuilder用来构建语法树AST,它主要包含以下成员:

  • 一个虚拟机m_vm;
  • 内存管理类ParserArena;
  • 源代码类SourceCode;
  • Scope;
  • 4个栈容器;
class ASTBuilder
{
    ...
    VM* m_vm;
    ParserArena& m_parserArena;
    SourceCode* m_sourceCode;
    Scope m_scope;
    Vector<BinaryOperand, 10, UnsafeVectorOverflow> m_binaryOperandStack;
    Vector<AssignmentInfo, 10, UnsafeVectorOverflow> m_assignmentInfoStack;
    Vector<std::pair<int, int>, 10, UnsafeVectorOverflow> m_binaryOperatorStack;
    Vector<std::pair<int, JSTextPosition>, 10, UnsafeVectorOverflow> m_unaryTokenStack;
    int m_evalCount;
}

Vector

Vector不同于STL的vector,它是Webkit基础类库里面的一个工具类,位于:
webkit-master/Source/WTF/wtf/Vector.h,这里的模板参数声明如下:

template<typename T, size_t inlineCapacity = 0, typename OverflowHandler = CrashOnOverflow, size_t minCapacity = 16>
  • 第1个参数代表存放的数据类型;
  • 第2个参数代表容量;
  • 第3个参数代表溢出异常处理,默认处理方案CrashOnOverflow,这里给的是UnsafeVectorOverflow,这两种异常处理方案最后都会走向CRASH()
  • 第4个参数代表最小的容量,这里我们没有传,将使用默认值16;

操作数 BinaryOperand

typedef int BinaryOperand;

二元操作结构体

struct BinaryOpInfo 
{
    BinaryOpInfo() {}
    BinaryOpInfo(const JSTextPosition& otherStart, const JSTextPosition& otherDivot, const JSTextPosition& otherEnd, bool rhsHasAssignment)
        : start(otherStart)
        , divot(otherDivot)
        , end(otherEnd)
        , hasAssignment(rhsHasAssignment)
    {
    }
    BinaryOpInfo(const BinaryOpInfo& lhs, const BinaryOpInfo& rhs)
        : start(lhs.start)
        , divot(rhs.start)
        , end(rhs.end)
        , hasAssignment(lhs.hasAssignment || rhs.hasAssignment)
    {
    }
    JSTextPosition start;
    JSTextPosition divot;
    JSTextPosition end;
    bool hasAssignment;
};

赋值操作结构体

struct AssignmentInfo 
{
    AssignmentInfo() {}
    AssignmentInfo(ExpressionNode* node, const JSTextPosition& start, const JSTextPosition& divot, int initAssignments, Operator op)
        : m_node(node)
        , m_start(start)
        , m_divot(divot)
        , m_initAssignments(initAssignments)
        , m_op(op)
    {
    }
    ExpressionNode* m_node;
    JSTextPosition m_start;
    JSTextPosition m_divot;
    int m_initAssignments;
    Operator m_op;
};

其中Operator是一个枚举:

enum Operator
{
    OpEqual,
    OpPlusEq,
    OpMinusEq,
    OpMultEq,
    OpDivEq,
    OpPlusPlus,
    OpMinusMinus,
    OpAndEq,
    OpXOrEq,
    OpOrEq,
    OpModEq,
    OpLShift,
    OpRShift,
    OpURShift
};

parse堆栈

JavsScriptCore的语法分析起始于parse方法,parse的核心在parseInner方法,parseInner方法的两个参数类型分别是IdentifierSourceParseModeIdentifier前文提到过,代表标识符,而SourceParseMode是一个枚举:

enum class SourceParseMode : uint8_t 
{
    NormalFunctionMode,
    GeneratorBodyMode,
    GeneratorWrapperFunctionMode,
    GetterMode,
    SetterMode,
    MethodMode,
    ArrowFunctionMode,
    ProgramMode,
    ModuleAnalyzeMode,
    ModuleEvaluateMode
};

parse方法的内部调用流程大概如下:

parse
└──parseInner
    └──parseArrowFunctionSingleExpressionBodySourceElements(箭头函数)
    └──parseModuleSourceElements(模块导入导出)
    └──parseGeneratorFunctionSourceElements
    └──parseSourceElements
        └──parseStatementListItem(语句列表)
            └──parseVariableDeclaration(变量声明var、let)
            └──parseClassDeclaration(类声明)
            └──parseFunctionDeclaration(方法声明)
            └──parseExpressionOrLabelStatement(表达式或者标签)
            └──parseStatement(基本语句)
                └──parseBlockStatement
                └──parseVariableDeclaration
                └──parseFunctionDeclaration
                └──parseIfStatement
                └──parseDoWhileStatement
                └──parseWhileStatement
                └──parseForStatement
                └──parseContinueStatement
                └──parseBreakStatement
                └──parseReturnStatement
                └──parseWithStatement
                └──parseSwitchStatement
                └──parseThrowStatement
                └──parseTryStatement
                └──parseExpressionOrLabelStatement
                └──parseExpressionStatement
                    └──parseExpression

其中有些语句是相互嵌套的,例如,parseIfStatement里面可以继续调用parseStatement。由于这些代码规模较为庞大,不可能逐个分析到,本文将挑选下面的路径做一个大致的分析:

parse
└──parseInner
    └──parseSourceElements
        └──parseStatementListItem(语句列表)
            └──parseStatement(基本语句)
                └──parseVariableDeclaration
                └──parseWhileStatement
                └──parseExpressionStatement
                    └──parseExpression

parseInner

// 进行语法分析,生成抽象语法树(为了代码清晰,省略了部分代码)
template <typename LexerType>
String Parser<LexerType>::parseInner(const Identifier& calleeName, SourceParseMode parseMode)
{
    String parseError = String();

    // 定义ASTBuilder
    ASTBuilder context(const_cast<VM*>(m_vm), m_parserArena, const_cast<SourceCode*>(m_source));

    // 设置scope
    ScopeRef scope = currentScope();
    scope->setIsLexicalScope();
    SetForScope<FunctionParsePhase> functionParsePhasePoisoner(m_parserState.functionParsePhase, FunctionParsePhase::Body);

    ...

    if (!calleeName.isNull())
        scope->declareCallee(&calleeName);
    if (m_lexer->isReparsingFunction())
        m_statementDepth--;

    // 开始解析生成语法树的一个节点:
    SourceElements* sourceElements = nullptr;

    // 核心逻辑
    if (!hasError())
    {
        if (isArrowFunctionBodyExpression)
            // 箭头函数
            sourceElements = parseArrowFunctionSingleExpressionBodySourceElements(context);
        else if (isModuleParseMode(parseMode))
            // Module导入导出
            sourceElements = parseModuleSourceElements(context, parseMode);
        else 
        {
            if (parseMode == SourceParseMode::GeneratorWrapperFunctionMode)
                sourceElements = parseGeneratorFunctionSourceElements(context, CheckForStrictMode);
            else
                // 基本语句解析
                sourceElements = parseSourceElements(context, CheckForStrictMode);
        }
    }

    ...

    didFinishParsing(sourceElements, scope->takeFunctionDeclarations(), varDeclarations, WTFMove(sloppyModeHoistedFunctions), features, context.numConstants());

    return parseError;
}

parseSourceElements

template <typename LexerType>
template <class TreeBuilder> TreeSourceElements Parser<LexerType>::parseSourceElements(TreeBuilder& context, SourceElementsMode mode)
{
    const unsigned lengthOfUseStrictLiteral = 12; // "use strict".length
    TreeSourceElements sourceElements = context.createSourceElements();
    bool seenNonDirective = false;
    const Identifier* directive = 0;
    unsigned directiveLiteralLength = 0;
    auto savePoint = createSavePoint();
    bool hasSetStrict = false;

    // 主要逻辑在parseStatementListItem中
    while (TreeStatement statement = parseStatementListItem(context, directive, &directiveLiteralLength)) {
        ...

        // 添加语法节点到ASTBuilder
        context.appendStatement(sourceElements, statement);
    }

    propagateError();
    return sourceElements;
}

parseStatementListItem

template <typename LexerType>
template <class TreeBuilder> TreeStatement Parser<LexerType>::parseStatementListItem(TreeBuilder& context, const Identifier*& directive, unsigned* directiveLiteralLength)
{
    DepthManager statementDepth(&m_statementDepth);
    m_statementDepth++;
    TreeStatement result = 0;
    bool shouldSetEndOffset = true;
    switch (m_token.m_type)
    {
        case CONSTTOKEN:
            // 变量声明解析
            result = parseVariableDeclaration(context, DeclarationType::ConstDeclaration);
            break;
        case LET: {
            bool shouldParseVariableDeclaration = true;
            if (shouldParseVariableDeclaration)
                result = parseVariableDeclaration(context, DeclarationType::LetDeclaration);
            else {
                bool allowFunctionDeclarationAsStatement = true;
                result = parseExpressionOrLabelStatement(context, allowFunctionDeclarationAsStatement);
            }

            break;
        }
        case CLASSTOKEN:
            // 类声明解析
            result = parseClassDeclaration(context);
            break;
        case FUNCTION:
            // 方法声明解析
            result = parseFunctionDeclaration(context);
            break;
        case IDENT:
        case YIELD: {
            // 表达式或者标签解析
            bool allowFunctionDeclarationAsStatement = true;
            result = parseExpressionOrLabelStatement(context, allowFunctionDeclarationAsStatement);
            break;
        }
        default:
            // 普通语句解析
            m_statementDepth--; // parseStatement() increments the depth.
            result = parseStatement(context, directive, directiveLiteralLength);
            shouldSetEndOffset = false;
            break;
        }
    }

    if (result && shouldSetEndOffset)
        context.setEndOffset(result, m_lastTokenEndPosition.offset);

    return result;
}

parseStatement

可以看到基本语句的解析都是通过switch分支来进行的

template <typename LexerType>
template <class TreeBuilder> TreeStatement Parser<LexerType>::parseStatement(TreeBuilder& context, const Identifier*& directive, unsigned* directiveLiteralLength)
{
    DepthManager statementDepth(&m_statementDepth);
    m_statementDepth++;
    directive = 0;
    int nonTrivialExpressionCount = 0;
    failIfStackOverflow();
    TreeStatement result = 0;
    bool shouldSetEndOffset = true;
    bool parentAllowsFunctionDeclarationAsStatement = m_immediateParentAllowsFunctionDeclarationInStatement;
    m_immediateParentAllowsFunctionDeclarationInStatement = false;

    switch (m_token.m_type) 
    {
        case OPENBRACE:
            result = parseBlockStatement(context);
            shouldSetEndOffset = false;
            break;
        case VAR:
            result = parseVariableDeclaration(context, DeclarationType::VarDeclaration);
            break;
        case FUNCTION: {
            DepthManager statementDepth(&m_statementDepth);
            m_statementDepth = 1;
            result = parseFunctionDeclaration(context);
            break;
        }
        case SEMICOLON: {
            JSTokenLocation location(tokenLocation());
            next();
            result = context.createEmptyStatement(location);
            break;
        }
        case IF:
            result = parseIfStatement(context);
            break;
        case DO:
            result = parseDoWhileStatement(context);
            break;
        case WHILE:
            result = parseWhileStatement(context);
            break;
        case FOR:
            result = parseForStatement(context);
            break;
        case CONTINUE:
            result = parseContinueStatement(context);
            break;
        case BREAK:
            result = parseBreakStatement(context);
            break;
        case RETURN:
            result = parseReturnStatement(context);
            break;
        case WITH:
            result = parseWithStatement(context);
            break;
        case SWITCH:
            result = parseSwitchStatement(context);
            break;
        case THROW:
            result = parseThrowStatement(context);
            break;
        case TRY:
            result = parseTryStatement(context);
            break;
        case DEBUGGER:
            result = parseDebuggerStatement(context);
            break;
        case EOFTOK:
        case CASE:
        case CLOSEBRACE:
        case DEFAULT:
            // These tokens imply the end of a set of source elements
            return 0;
        case IDENT:
        case YIELD: {
            bool allowFunctionDeclarationAsStatement = false;
            result = parseExpressionOrLabelStatement(context, allowFunctionDeclarationAsStatement);
            break;
        }
        case STRING:
            directive = m_token.m_data.ident;
            if (directiveLiteralLength)
                *directiveLiteralLength = m_token.m_location.endOffset - m_token.m_location.startOffset;
            nonTrivialExpressionCount = m_parserState.nonTrivialExpressionCount;
            FALLTHROUGH;
        default:
            TreeStatement exprStatement = parseExpressionStatement(context);
            if (directive && nonTrivialExpressionCount != m_parserState.nonTrivialExpressionCount)
                directive = 0;
            result = exprStatement;
            break;
    }

    if (result && shouldSetEndOffset)
        context.setEndOffset(result, m_lastTokenEndPosition.offset);
    return result;
}

parseExpression

template <typename LexerType>
template <class TreeBuilder> TreeExpression Parser<LexerType>::parseExpression(TreeBuilder& context)
{
    failIfStackOverflow();
    JSTokenLocation location(tokenLocation());

    // 解析第一个表达式
    TreeExpression node = parseAssignmentExpression(context);
    context.setEndOffset(node, m_lastTokenEndPosition.offset);

    // 如果是逗号,解析结束,直接返回
    if (!match(COMMA)) return node;

    // 匹配到逗号,继续往右解析
    next();
    m_parserState.nonTrivialExpressionCount++;
    m_parserState.nonLHSCount++;

    // 
    TreeExpression right = parseAssignmentExpression(context);
    context.setEndOffset(right, m_lastTokenEndPosition.offset);

    // 赋值表达式可能是一个逗号分隔的表达式,例如:b,2+3,c+5,d>>1;
    typename TreeBuilder::Comma head = context.createCommaExpr(location, node);
    typename TreeBuilder::Comma tail = context.appendToCommaExpr(location, head, head, right);

    // 从左往右依次匹配逗号,如果成功,就进行解析
    while (match(COMMA)) {
        next(TreeBuilder::DontBuildStrings);
        right = parseAssignmentExpression(context);
        context.setEndOffset(right, m_lastTokenEndPosition.offset);
        tail = context.appendToCommaExpr(location, head, tail, right);
    }
    context.setEndOffset(head, m_lastTokenEndPosition.offset);
    return head;
}

parseAssignmentExpression

接下来重点分析下赋值表达式的解析:parseAssignmentExpression

template <typename LexerType>
template <typename TreeBuilder> TreeExpression Parser<LexerType>::parseAssignmentExpression(TreeBuilder& context, ExpressionErrorClassifier& classifier)
{
    failIfStackOverflow();
    JSTextPosition start = tokenStartPosition();
    JSTokenLocation location(tokenLocation());
    int initialAssignmentCount = m_parserState.assignmentCount;
    int initialNonLHSCount = m_parserState.nonLHSCount;
    bool maybeAssignmentPattern = match(OPENBRACE) || match(OPENBRACKET);
    bool wasOpenParen = match(OPENPAREN);
    bool isValidArrowFunctionStart = match(OPENPAREN) || match(IDENT);
    SavePoint savePoint = createSavePoint();
    size_t usedVariablesSize = 0;
    if (wasOpenParen) {
        usedVariablesSize = currentScope()->currentUsedVariablesSize();
        currentScope()->pushUsedVariableSet();
    }

    if (match(YIELD) && !isYIELDMaskedAsIDENT(currentScope()->isGenerator()))
        return parseYieldExpression(context);

    // 解析左值
    TreeExpression lhs = parseConditionalExpression(context);

    // 箭头函数的分析,有的情况下会把一个箭头函数赋值给一个变量:var a = (a)=>a+2;
    if (isValidArrowFunctionStart && !match(EOFTOK)) 
    {
        bool isArrowFunctionToken = match(ARROWFUNCTION);
        if (!lhs || isArrowFunctionToken) {
            SavePointWithError errorRestorationSavePoint = createSavePointForError();
            restoreSavePoint(savePoint);
            if (isArrowFunctionParameters()) {
                if (wasOpenParen)
                    currentScope()->revertToPreviousUsedVariables(usedVariablesSize);
                return parseArrowFunctionExpression(context);
            }
            restoreSavePointWithError(errorRestorationSavePoint);
            if (isArrowFunctionToken)
                failDueToUnexpectedToken();
        }
    }



    ...

    if (initialNonLHSCount != m_parserState.nonLHSCount) {
        return lhs;
    }

    int assignmentStack = 0;
    Operator op;
    bool hadAssignment = false;
    while (true) 
    {
        // 解析操作符
        switch (m_token.m_type) 
        {
            case EQUAL: op = OpEqual; break;
            case PLUSEQUAL: op = OpPlusEq; break;
            case MINUSEQUAL: op = OpMinusEq; break;
            case MULTEQUAL: op = OpMultEq; break;
            case DIVEQUAL: op = OpDivEq; break;
            case LSHIFTEQUAL: op = OpLShift; break;
            case RSHIFTEQUAL: op = OpRShift; break;
            case URSHIFTEQUAL: op = OpURShift; break;
            case ANDEQUAL: op = OpAndEq; break;
            case XOREQUAL: op = OpXOrEq; break;
            case OREQUAL: op = OpOrEq; break;
            case MODEQUAL: op = OpModEq; break;
            default:
                goto end;
        }
        m_parserState.nonTrivialExpressionCount++;
        hadAssignment = true;

        context.assignmentStackAppend(assignmentStack, lhs, start, tokenStartPosition(), m_parserState.assignmentCount, op);
        start = tokenStartPosition();
        m_parserState.assignmentCount++;
        next(TreeBuilder::DontBuildStrings);

        // 解析
        lhs = parseAssignmentExpression(context);
        if (initialNonLHSCount != m_parserState.nonLHSCount) {
            break;
        }

    }
end:
    if (hadAssignment)
        m_parserState.nonLHSCount++;

    if (!TreeBuilder::CreatesAST)
        return lhs;

    while (assignmentStack)
        lhs = context.createAssignment(location, assignmentStack, lhs, initialAssignmentCount, m_parserState.assignmentCount, lastTokenEndPosition());

    return lhs;
}

StatementNode

接下来将挑选若干StatementNode的子类进行细节分析,这部分内容某种意义上也是为之后的字节码生成做一定的铺垫

DeclarationStatement

class DeclarationStatement : public StatementNode {
    public:
        DeclarationStatement(const JSTokenLocation&, ExpressionNode*);
    private:
        void emitBytecode(BytecodeGenerator&, RegisterID* = 0) override;

        ExpressionNode* m_expr;
    };

一个JavaScript变量的声明的代码篇幅,已经超出了我的想象:

parseVariableDeclaration

template <typename LexerType>
template <class TreeBuilder> TreeStatement Parser<LexerType>::parseVariableDeclaration(TreeBuilder& context, DeclarationType declarationType, ExportType exportType)
{
    ASSERT(match(VAR) || match(LET) || match(CONSTTOKEN));
    JSTokenLocation location(tokenLocation());
    int start = tokenLine();
    int end = 0;
    int scratch;
    TreeDestructuringPattern scratch1 = 0;
    TreeExpression scratch2 = 0;
    JSTextPosition scratch3;
    bool scratchBool;
    // 核心重点在parseVariableDeclarationList
    TreeExpression variableDecls = parseVariableDeclarationList(context, scratch, scratch1, scratch2, scratch3, scratch3, scratch3, VarDeclarationContext, declarationType, exportType, scratchBool);
    propagateError();
    failIfFalse(autoSemiColon(), "Expected ';' after variable declaration");

    return context.createDeclarationStatement(location, variableDecls, start, end);
}

parseVariableDeclarationList

template <typename LexerType>
template <class TreeBuilder> TreeExpression Parser<LexerType>::parseVariableDeclarationList(TreeBuilder& context, int& declarations, TreeDestructuringPattern& lastPattern, TreeExpression& lastInitializer, JSTextPosition& identStart, JSTextPosition& initStart, JSTextPosition& initEnd, VarDeclarationListContext declarationListContext, DeclarationType declarationType, ExportType exportType, bool& forLoopConstDoesNotHaveInitializer)
{
    TreeExpression head = 0;
    TreeExpression tail = 0;
    const Identifier* lastIdent;
    JSToken lastIdentToken; 
    AssignmentContext assignmentContext = assignmentContextFromDeclarationType(declarationType);
    do 
    {
        lastIdent = 0;
        lastPattern = TreeDestructuringPattern(0);
        JSTokenLocation location(tokenLocation());
        next();
        TreeExpression node = 0;
        declarations++;
        bool hasInitializer = false;
        if (matchSpecIdentifier()) 
        {
            // 获取变量标识符Identifier
            JSTextPosition varStart = tokenStartPosition();
            JSTokenLocation varStartLocation(tokenLocation());
            identStart = varStart;
            // 从JSToken中获取Identifier信息
            const Identifier* name = m_token.m_data.ident;
            lastIdent = name;
            lastIdentToken = m_token;
            // 获取下一个JSToken
            next();

            // 是否是等于号,如果是,表明在变量声明之后就要进行初始化
            hasInitializer = match(EQUAL);

            // 在当前Scope中声明一个变量,同时检查该变量声明是否合法
            DeclarationResultMask declarationResult = declareVariable(name, declarationType);

            if (hasInitializer) {
                // 如果有等于号
                JSTextPosition varDivot = tokenStartPosition() + 1;
                initStart = tokenStartPosition();
                next(TreeBuilder::DontBuildStrings); // consume '='
                propagateError();

                // 解析等号右面的表达式
                TreeExpression initializer = parseAssignmentExpression(context);
                initEnd = lastTokenEndPosition();
                lastInitializer = initializer;

                // 创建一个赋值
                node = context.createAssignResolve(location, *name, initializer, varStart, varDivot, lastTokenEndPosition(), assignmentContext);
            } 
            else 
            {
                ...
                if (declarationType == DeclarationType::VarDeclaration)
                    // 空变量声明
                    node = context.createEmptyVarExpression(varStartLocation, *name);
                else
                    // 空左值声明
                    node = context.createEmptyLetExpression(varStartLocation, *name);
            }
        } 
        ...

        if (node)
        {
            if (!head) head = node;
            else if (!tail) {
                head = context.createCommaExpr(location, head);
                tail = context.appendToCommaExpr(location, head, head, node);
            } else
                tail = context.appendToCommaExpr(location, head, tail, node);
        }
    } while (match(COMMA));
    ...
    return head;
}

示例

一段js代码,一个变量的声明和赋值

var age = 6 * 7;

在经过Esprima语法分析后,会得到下面的结果:

```json
{
    "type": "Program",
    "body": [
        {
            "type": "VariableDeclaration",
            "declarations": [
                {
                    "type": "VariableDeclarator",
                    "id": {
                        "type": "Identifier",
                        "name": "age"
                    },
                    "init": {
                        "type": "BinaryExpression",
                        "operator": "*",
                        "left": {
                            "type": "Literal",
                            "value": 6,
                            "raw": "6"
                        },
                        "right": {
                            "type": "Literal",
                            "value": 7,
                            "raw": "7"
                        }
                    }
                }
            ],
            "kind": "var"
        }
    ],
    "sourceType": "script"
}

WhileNode

class WhileNode : public StatementNode {
    public:
        WhileNode(const JSTokenLocation&, ExpressionNode*, StatementNode*);

    private:
        void emitBytecode(BytecodeGenerator&, RegisterID* = 0) override;

        ExpressionNode* m_expr;
        StatementNode* m_statement;
    };

parseWhileStatement

template <typename LexerType>
template <class TreeBuilder> TreeStatement Parser<LexerType>::parseWhileStatement(TreeBuilder& context)
{
    ASSERT(match(WHILE));
    JSTokenLocation location(tokenLocation());
    int startLine = tokenLine();
    next();

    // 确认while左括号
    handleProductionOrFail(OPENPAREN, "(", "start", "while loop condition");
    semanticFailIfTrue(match(CLOSEPAREN), "Must provide an expression as a while loop condition");
    // 解析括号里的条件表达式
    TreeExpression expr = parseExpression(context);
    failIfFalse(expr, "Unable to parse while loop condition");
    int endLine = tokenLine();
    // 确认while右括号
    handleProductionOrFail(CLOSEPAREN, ")", "end", "while loop condition");

    const Identifier* unused = 0;
    startLoop();
    // 解析while循环体
    TreeStatement statement = parseStatement(context, unused);
    endLoop();

    failIfFalse(statement, "Expected a statement as the body of a while loop");
    return context.createWhileStatement(location, expr, statement, startLine, endLine);
}

示例

while(a > 0){
    a--;
}

在经过Esprima语法分析后,会得到下面的结果:

{
    "type": "Program",
    "body": [
        {
            "type": "WhileStatement",
            "test": {
                "type": "BinaryExpression",
                "operator": ">",
                "left": {
                    "type": "Identifier",
                    "name": "a"
                },
                "right": {
                    "type": "Literal",
                    "value": 0,
                    "raw": "0"
                }
            },
            "body": {
                "type": "BlockStatement",
                "body": [
                    {
                        "type": "ExpressionStatement",
                        "expression": {
                            "type": "UpdateExpression",
                            "operator": "--",
                            "argument": {
                                "type": "Identifier",
                                "name": "a"
                            },
                            "prefix": false
                        }
                    }
                ]
            }
        }
    ],
    "sourceType": "script"
}

IfElseNode

class IfElseNode : public StatementNode {
    public:
        IfElseNode(const JSTokenLocation&, ExpressionNode* condition, StatementNode* ifBlock, StatementNode* elseBlock);

    private:
        void emitBytecode(BytecodeGenerator&, RegisterID* = 0) override;
        bool tryFoldBreakAndContinue(BytecodeGenerator&, StatementNode* ifBlock,
            Label*& trueTarget, FallThroughMode&);

        ExpressionNode* m_condition;
        StatementNode* m_ifBlock;
        StatementNode* m_elseBlock;
    };

一直觉得if语句的解析应该比较简单,但是看了parseIfStatement后,觉得并不是想象的那么容易:

如果是这样的if语句,相对来说比较容易理解,且语法树比较擅长这种形式;

if(con1){}
else{}

但是如果出现了下面的这种情况,就比较费劲了,因为语法树不太好表达这样的形式,所以需要将其转换成标准的“树形式”,转换的过程看下面的代码和图

if(con1) xxx;   
else if(con2) xxx;
...
else if(conn) xxx;
else xxx;

parseIfStatement

template <typename LexerType>
template <class TreeBuilder> TreeStatement Parser<LexerType>::parseIfStatement(TreeBuilder& context)
{
    // 确定if的JSToken
    JSTokenLocation ifLocation(tokenLocation());
    int start = tokenLine();

    // 确认if后面的左括号
    next();
    handleProductionOrFail2(OPENPAREN, "(", "start", "'if' condition");

    // 解析括号中的条件表达式
    TreeExpression condition = parseExpression(context);

    // 确认和前面左括号对应的右括号
    int end = tokenLine();
    handleProductionOrFail2(CLOSEPAREN, ")", "end", "'if' condition");

    const Identifier* unused = 0;
    m_immediateParentAllowsFunctionDeclarationInStatement = true;
    TreeStatement trueBlock = parseStatement(context, unused);
    failIfFalse(trueBlock, "Expected a statement as the body of an if block");

    // 如果没有else模块,createIfStatement,然后直接返回
    if (!match(ELSE))
        return context.createIfStatement(ifLocation, condition, trueBlock, 0, start, end);

    // 解析else模块
    Vector<TreeExpression> exprStack;
    Vector<std::pair<int, int>> posStack;
    Vector<JSTokenLocation> tokenLocationStack;
    Vector<TreeStatement> statementStack;

    // 寻找最后一个else,不是else if
    bool trailingElse = false;
    do
    {
        // 记录当前的else位置
        JSTokenLocation tempLocation = tokenLocation();
        next();

        // 如果没有匹配到if,说明是最后一个else,解析完,循环结束
        if (!match(IF)) 
        {
            const Identifier* unused = 0;
            m_immediateParentAllowsFunctionDeclarationInStatement = true;
            // 解析else里面的语句块
            TreeStatement block = parseStatement(context, unused);

            // 放到statementStack栈中,以便后续使用
            statementStack.append(block);
            // 标明最后一个else
            trailingElse = true;
            break;
        }

        // 如果匹配到了if
        int innerStart = tokenLine();
        next();

        // 确认else if后面的左括号
        handleProductionOrFail2(OPENPAREN, "(", "start", "'if' condition");

        // 解析else if括号中的条件表达式
        TreeExpression innerCondition = parseExpression(context);

        // 确认else if后面的右括号
        int innerEnd = tokenLine();
        handleProductionOrFail2(CLOSEPAREN, ")", "end", "'if' condition");

        const Identifier* unused = 0;
        m_immediateParentAllowsFunctionDeclarationInStatement = true;

        // 解析else if里面的语句块
        TreeStatement innerTrueBlock = parseStatement(context, unused);

        // 将else的位置添加到tokenLocationStack中
        tokenLocationStack.append(tempLocation);
        // 将else if的条件表达式添加到exprStack中
        exprStack.append(innerCondition);
        // 添加
        posStack.append(std::make_pair(innerStart, innerEnd));
        // 将语句块放到statementStack栈中,以便后续使用
        statementStack.append(innerTrueBlock);
    } while (match(ELSE));

    // 如果没有else模块,即以else if模块结束
    if (!trailingElse)
    {
        TreeExpression condition = exprStack.last();
        exprStack.removeLast();
        TreeStatement trueBlock = statementStack.last();
        statementStack.removeLast();
        std::pair<int, int> pos = posStack.last();
        posStack.removeLast();
        JSTokenLocation elseLocation = tokenLocationStack.last();
        tokenLocationStack.removeLast();
        TreeStatement ifStatement = context.createIfStatement(elseLocation, condition, trueBlock, 0, pos.first, pos.second);
        context.setEndOffset(ifStatement, context.endOffset(trueBlock));
        statementStack.append(ifStatement);
    }

    // 整个循环的目的:将else后面的信息变为一个falseBlock
    while (!exprStack.isEmpty()) 
    {
        // 提取条件表达式
        TreeExpression condition = exprStack.last();
        exprStack.removeLast();

        // 提取falseBlock
        TreeStatement falseBlock = statementStack.last();
        statementStack.removeLast();

        // 提取trueBlock
        TreeStatement trueBlock = statementStack.last();
        statementStack.removeLast();

        // 提取位置信息
        std::pair<int, int> pos = posStack.last();
        posStack.removeLast();
        JSTokenLocation elseLocation = tokenLocationStack.last();
        tokenLocationStack.removeLast();

        // 创建TreeStatement
        TreeStatement ifStatement = context.createIfStatement(elseLocation, condition, trueBlock, falseBlock, pos.first, pos.second);
        context.setEndOffset(ifStatement, context.endOffset(falseBlock));
        statementStack.append(ifStatement);
    }

    // 创建总的ifStatement
    return context.createIfStatement(ifLocation, condition, trueBlock, statementStack.last(), start, end);
}

上面讲了这么多,可能会感觉很迷糊,可能要问:为什么一个ifStatement要搞的这么复杂!里面为什么要用那么多栈?为什么还带着那么多循环?

  • 先来解决第一个问题: while (!exprStack.isEmpty())这个最后的while循环的目的是什么呢?

    如图所示,它的目的就是将if ... else if ... else if... else的结构变成一个标准的if ... else结构,方便语法树的处理

  • 再来看第二个问题:if (!trailingElse),如果没有最后的else,这里要单独处理的目的是什么

    实际上有了第一个问题的铺垫,也不难理解了,它将不规范的if ... else if ... else if结构(不包含最后的else)变成标准的if ... else if ... else if... else结构了,方便第一个问题中while循环的处理。

到了这里,不妨再回看下代码,应该会觉得比较清楚。

示例

这里自己输入if语句看吧。

SwitchNode

class SwitchNode : public StatementNode, public VariableEnvironmentNode
{
    public:
        SwitchNode(const JSTokenLocation&, ExpressionNode*, CaseBlockNode*, VariableEnvironment&, FunctionStack&&);

    private:
        void emitBytecode(BytecodeGenerator&, RegisterID* = 0) override;

        ExpressionNode* m_expr;
        CaseBlockNode* m_block;
    };

parseSwitchStatement

template <typename LexerType>
template <class TreeBuilder> TreeStatement Parser<LexerType>::parseSwitchStatement(TreeBuilder& context)
{
    JSTokenLocation location(tokenLocation());
    int startLine = tokenLine();
    next();

    // 确定switch后面的左括号
    handleProductionOrFail(OPENPAREN, "(", "start", "subject of a 'switch'");
    // 解析switch括号中的表达式
    TreeExpression expr = parseExpression(context);
    int endLine = tokenLine();
    // 确定switch后面的右括号
    handleProductionOrFail(CLOSEPAREN, ")", "end", "subject of a 'switch'");

    // 确定switch的左花括号
    handleProductionOrFail(OPENBRACE, "{", "start", "body of a 'switch'");
    AutoPopScopeRef lexicalScope(this, pushScope());
    lexicalScope->setIsLexicalScope();
    lexicalScope->preventVarDeclarations();

    // m_switchDepth++
    startSwitch();

    // 循环遍历,解析case语句
    TreeClauseList firstClauses = parseSwitchClauses(context);

    // 解析default语句
    TreeClause defaultClause = parseSwitchDefaultClause(context);

    // 再次循环遍历,解析case语句
    TreeClauseList secondClauses = parseSwitchClauses(context);

    // m_switchDepth--
    endSwitch();

    // 确定switch的右=花括号
    handleProductionOrFail(CLOSEBRACE, "}", "end", "body of a 'switch'");

    // 利用解析好的caseCaluse和defaultClause创建一个SwitchStatement
    TreeStatement result = context.createSwitchStatement(location, expr, firstClauses, defaultClause, secondClauses, startLine, endLine, lexicalScope->finalizeLexicalEnvironment(), lexicalScope->takeFunctionDeclarations());

    popScope(lexicalScope, TreeBuilder::NeedsFreeVariableInfo);
    return result;
}
template <typename LexerType>
template <class TreeBuilder> TreeClauseList Parser<LexerType>::parseSwitchClauses(TreeBuilder& context)
{
    if (!match(CASE)) return 0;
    unsigned startOffset = tokenStart();
    next();

    // 解析case对应的常量表达式
    TreeExpression condition = parseExpression(context);

    // 确认表达式后面的冒号:
    consumeOrFail(COLON, "Expected a ':' after switch clause expression");

    // 解析case对应的body语句
    TreeSourceElements statements = parseSourceElements(context, DontCheckForStrictMode);

    // 将case对应的常量表达式和语句创建一个Clause
    TreeClause clause = context.createClause(condition, statements);
    context.setStartOffset(clause, startOffset);

    // 建立链表
    TreeClauseList clauseList = context.createClauseList(clause);
    TreeClauseList tail = clauseList;

    // 继续循环遍历,直到匹配不到case
    while (match(CASE)) {
        startOffset = tokenStart();
        next();
        TreeExpression condition = parseExpression(context);
        consumeOrFail(COLON, "Expected a ':' after switch clause expression");
        TreeSourceElements statements = parseSourceElements(context, DontCheckForStrictMode);
        clause = context.createClause(condition, statements);
        context.setStartOffset(clause, startOffset);
        tail = context.createClauseList(tail, clause);
    }
    return clauseList;
}

大部分逻辑看代码中的注释应该就可以理解,这里比较费解的是:case语句解析了两次

TreeClauseList firstClauses = parseSwitchClauses(context);
TreeClauseList secondClauses = parseSwitchClauses(context);

原因是在switch语法中,defalut关键字的位置是不确定的,可能在开头,可能在结尾,也可能在中间,甚至还可能不存在。它相当于把switch所有的case劈成了两半,为了在遍历case的循环中不被defalut打断,所以这里采取了前后两次遍历,逻辑清晰,易于理解。

总结

语法分析涉及到的东西较大,篇幅有些多,后续可能会继续更新,添加一些细节分析。

-------------本文结束 感谢您的阅读-------------

本文标题:JavaScriptCore引擎深度解析4-语法分析篇

文章作者:lingyun

发布时间:2018年08月02日 - 00:08

最后更新:2018年11月05日 - 00:11

原始链接:https://tsuijunxi.github.io/2018/08/02/JavaScriptCore引擎深度解析-4-语法分析篇/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。