Airing

Airing

哲学系学生 / 小学教师 / 程序员,个人网站: ursb.me
github
email
zhihu
medium
tg_channel
twitter_id

Engine Analysis: String to Number Conversion in JS

In JavaScript, there are 9 ways to convert a string to a number:

  1. parseInt()
  2. parseFloat()
  3. Number()
  4. Double tilde (~~) Operator
  5. Unary Operator (+)
  6. Math.floor()
  7. Multiply with number
  8. The Signed Right Shift Operator (>>)
  9. The Unsigned Right Shift Operator (>>>)

The differences in the results of these methods are shown in the table below:

Comparison of string to number conversion methods

The source code for the comparison table has been published at https://airing.ursb.me/web/int.html, feel free to take it if needed.

In addition to the differences in results, these methods also vary in performance. In the NodeJS V8 environment, the results of micro-benchmarks for these methods are as follows:

parseInt() x 19,140,190 ops/sec ±0.45% (92 runs sampled)
parseFloat() x 28,203,053 ops/sec ±0.25% (95 runs sampled)
Number() x 1,041,209,524 ops/sec ±0.20% (90 runs sampled)
Double tilde (~~) Operator x 1,035,220,963 ops/sec ±1.65% (97 runs sampled)
Math.floor() x 28,224,678 ops/sec ±0.23% (96 runs sampled)
Unary Operator (+) x 1,045,129,381 ops/sec ±0.17% (95 runs sampled)
Multiply with number x 1,044,176,084 ops/sec ±0.15% (93 runs sampled)
The Signed Right Shift Operator (>>) x 1,046,016,782 ops/sec ±0.11% (96 runs sampled)
The Unsigned Right Shift Operator (>>>) x 1,045,384,959 ops/sec ±0.08% (96 runs sampled)

It can be seen that parseInt(), parseFloat(), and Math.floor() have the lowest efficiency, only about 2% of the efficiency of other operations, with parseInt() being the slowest at just 1%.

Why do these methods have such differences? How are these operations interpreted and executed at the engine level? Next, we will explore the specific implementations of these methods from the perspective of mainstream JS engines like V8, JavaScriptCore, and QuickJS.

First, let's take a look at parseInt().

1. parseInt()#

ECMAScript (ECMA-262) parseInt
image

1.1 parseInt() in V8#

In V8, the built-in standard objects of the JS language are defined in [→ src/init/bootstrapper.cc], where we can find the definition of parseInt:

Handle<JSFunction> number_fun = InstallFunction(isolate_, global, "Number", JS_PRIMITIVE_WRAPPER_TYPE, JSPrimitiveWrapper::kHeaderSize, 0, isolate_->initial_object_prototype(), Builtin::kNumberConstructor);

// Install Number.parseInt and Global.parseInt.
Handle<JSFunction> parse_int_fun = SimpleInstallFunction(isolate_, number_fun, "parseInt", Builtin::kNumberParseInt, 2, true);

JSObject::AddProperty(isolate_, global_object, "parseInt", parse_int_fun,
 native_context()->set_global_parse_int_fun(*parse_int_fun);

It can be seen that both Number.parseInt and the global object's parseInt are registered based on SimpleInstallFunction, which installs the API into the isolate and binds the method to Builtin. When parseInt is called from JS, it corresponds to the engine-side call Builtin::kNumberParseInt.

Builtin (Built-in Functions) are executable code blocks in V8 at runtime, used to express changes to the VM at runtime. Currently, there are 5 implementation methods for Builtin in V8:

  • Platform-dependent assembly language: very efficient but requires manual adaptation to all platforms and is difficult to maintain.
  • C++: similar in style to runtime functions, can access V8's powerful runtime features, but usually not suitable for performance-sensitive areas.
  • JavaScript: slow runtime calls, unpredictable performance impacts due to type pollution, and complex JS semantic issues. V8 no longer uses JavaScript built-in functions.
  • CodeStubAssembler: provides efficient low-level functionality, very close to assembly language, while maintaining platform independence and readability.
  • Torque: an improved version of CodeStubAssembler, its syntax combines some features of TypeScript, making it very simple and readable. It emphasizes reducing the difficulty of use without sacrificing performance, making Builtin development easier. Many built-in functions are now implemented in Torque.

Returning to the earlier Builtin::kNumberParseInt function, its definition can be seen in [→ src/builtins/builtins.h]:

// Convenience macro to avoid generating named accessors for all builtins.
#define BUILTIN_CODE(isolate, name) \
  (isolate)->builtins()->code_handle(i::Builtin::k##name)

Thus, the original name of this function is NumberParseInt, implemented in [→ src/builtins/number.tq], which is a Torque-based Builtin implementation.

// ES6 #sec-number.parseint
transitioning javascript builtin NumberParseInt(
    js-implicit context: NativeContext)(value: JSAny, radix: JSAny): Number {
  return ParseInt(value, radix);
}


transitioning builtin ParseInt(implicit context: Context)(
    input: JSAny, radix: JSAny): Number {
  try {
    // Check if radix should be 10 (i.e. undefined, 0 or 10).
    if (radix != Undefined && !TaggedEqual(radix, SmiConstant(10)) &&
        !TaggedEqual(radix, SmiConstant(0))) {
      goto CallRuntime;
    }

    typeswitch (input) {
      case (s: Smi): {
        return s;
      }
      case (h: HeapNumber): {
        // Check if the input value is in Signed32 range.
        const asFloat64: float64 = Convert<float64>(h);
        const asInt32: int32 = Signed(TruncateFloat64ToWord32(asFloat64));
        // The sense of comparison is important for the NaN case.
        if (asFloat64 == ChangeInt32ToFloat64(asInt32)) goto Int32(asInt32);

        // Check if the absolute value of input is in the [1,1<<31[ range. Call
        // the runtime for the range [0,1[ because the result could be -0.
        const kMaxAbsValue: float64 = 2147483648.0;
        const absInput: float64 = math::Float64Abs(asFloat64);
        if (absInput < kMaxAbsValue && absInput >= 1.0) goto Int32(asInt32);
        goto CallRuntime;
      }
      case (s: String): {
        goto String(s);
      }
      case (HeapObject): {
        goto CallRuntime;
      }
    }
  } label Int32(i: int32) {
    return ChangeInt32ToTagged(i);
  } label String(s: String) {
    // Check if the string is a cached array index.
    const hash: NameHash = s.raw_hash_field;
    if (IsIntegerIndex(hash) &&
        hash.array_index_length < kMaxCachedArrayIndexLength) {
      const arrayIndex: uint32 = hash.array_index_value;
      return SmiFromUint32(arrayIndex);
    }
    // Fall back to the runtime.
    goto CallRuntime;
  } label CallRuntime {
    tail runtime::StringParseInt(input, radix);
  }
}

Before analyzing this code, let's introduce some data structures in V8: (All data structure definitions in V8 can be found in [→ src/objects/objects.h])

  • Smi: inherits from Object, immediate small integer, only 31 bits
  • HeapObject: inherits from Object, superclass for everything allocated in the heap
  • PrimitiveHeapObject: inherits from HeapObject
  • HeapNumber: inherits from PrimitiveHeapObject, stores heap objects of numbers, used to save large integer objects.

We know that parseInt takes two parameters, namely parseInt(string, radix), and this is also the case here. The implementation process is as follows:

  • First, check if radix is not passed or is 0 or 10; if not, then it is not a decimal conversion, and it goes to the StringParseInt function provided in the runtime.
  • If it is a decimal conversion, continue to check the data type of the first parameter.
    • If it is Smi or a HeapNumber that has not overflowed (exceeding 31 bits), then return the input directly, equivalent to no conversion; otherwise, it also goes to runtime::StringParseInt. Note that if it overflows here, it will go to ChangeInt32ToTagged, which is a function implemented by CodeStubAssembler that will forcibly convert Int32; if the current execution environment does not allow overflow of 32 bits, the number after conversion will be unexpected.
    • If it is a String, check if it is a hash; if so, find the corresponding integer value and return it; otherwise, still go to runtime::StringParseInt.

Now the focus is on runtime::StringParseInt. [→ src/runtime/runtime-numbers.cc]

// ES6 18.2.5 parseInt(string, radix) slow path
RUNTIME_FUNCTION(Runtime_StringParseInt) {
  HandleScope handle_scope(isolate);
  DCHECK_EQ(2, args.length());
  Handle<Object> string = args.at(0);
  Handle<Object> radix = args.at(1);

  // Convert {string} to a String first, and flatten it.
  Handle<String> subject;
  ASSIGN_RETURN_FAILURE_ON_EXCEPTION(isolate, subject,
                                     Object::ToString(isolate, string));
  subject = String::Flatten(isolate, subject);

  // Convert {radix} to Int32.
  if (!radix->IsNumber()) {
    ASSIGN_RETURN_FAILURE_ON_EXCEPTION(isolate, radix,
                                       Object::ToNumber(isolate, radix));
  }
  int radix32 = DoubleToInt32(radix->Number());
  if (radix32 != 0 && (radix32 < 2 || radix32 > 36)) {
    return ReadOnlyRoots(isolate).nan_value();
  }

  double result = StringToInt(isolate, subject, radix32);
  return *isolate->factory()->NewNumber(result);
}

This logic is relatively simple, so I won't interpret it line by line. It is worth noting that according to the standard, if radix is not in the range of 2 to 36, it will return NaN.

1.2 parseInt() in JavaScriptCore#

Next, let's take a look at parseInt() in JavaScriptCore.

The registration of JS language built-in objects in JavaScriptCore is all in [→ runtime/JSGlobalObjectFuntions.cpp]:

JSC_DEFINE_HOST_FUNCTION(globalFuncParseInt, (JSGlobalObject* globalObject, CallFrame* callFrame))
{
    JSValue value = callFrame->argument(0);
    JSValue radixValue = callFrame->argument(1);

    // Optimized handling for numbers:
    // If the argument is 0 or a number in range 10^-6 <= n < INT_MAX+1, then parseInt
    // results in a truncation to integer. In the case of -0, this is converted to 0.
    //
    // This is also a truncation for values in the range INT_MAX+1 <= n < 10^21,
    // however these values cannot be trivially truncated to int since 10^21 exceeds
    // even the int64_t range. Negative numbers are a little trickier, the case for
    // values in the range -10^21 < n <= -1 are similar to those for integer, but
    // values in the range -1 < n <= -10^-6 need to truncate to -0, not 0.
    static const double tenToTheMinus6 = 0.000001;
    static const double intMaxPlusOne = 2147483648.0;
    if (value.isNumber()) {
        double n = value.asNumber();
        if (((n < intMaxPlusOne && n >= tenToTheMinus6) || !n) && radixValue.isUndefinedOrNull())
            return JSValue::encode(jsNumber(static_cast<int32_t>(n)));
    }

    // If ToString throws, we shouldn't call ToInt32.
    return toStringView(globalObject, value, [&] (StringView view) {
        return JSValue::encode(jsNumber(parseInt(view, radixValue.toInt32(globalObject))));
    });
}

The code comments in WebKit are very detailed and easy to read, so I won't interpret it further. Finally, it will call parseInt, and the implementation of JavaScriptCore's parseInt is all in [→ runtime/ParseInt.h], with the core code as follows:

ALWAYS_INLINE static bool isStrWhiteSpace(UChar c)
{
    // https://tc39.github.io/ecma262/#sec-tonumber-applied-to-the-string-type
    return Lexer<UChar>::isWhiteSpace(c) || Lexer<UChar>::isLineTerminator(c);
}

// ES5.1 15.1.2.2
template <typename CharType>
ALWAYS_INLINE
static double parseInt(StringView s, const CharType* data, int radix)
{
    // 1. Let inputString be ToString(string).
    // 2. Let S be a newly created substring of inputString consisting of the first character that is not a
    //    StrWhiteSpaceChar and all characters following that character. (In other words, remove leading white
    //    space.) If inputString does not contain any such characters, let S be the empty string.
    int length = s.length();
    int p = 0;
    while (p < length && isStrWhiteSpace(data[p]))
        ++p;

    // 3. Let sign be 1.
    // 4. If S is not empty and the first character of S is a minus sign -, let sign be -1.
    // 5. If S is not empty and the first character of S is a plus sign + or a minus sign -, then remove the first character from S.
    double sign = 1;
    if (p < length) {
        if (data[p] == '+')
            ++p;
        else if (data[p] == '-') {
            sign = -1;
            ++p;
        }
    }

    // 6. Let R = ToInt32(radix).
    // 7. Let stripPrefix be true.
    // 8. If R != 0,then
    //   b. If R != 16, let stripPrefix be false.
    // 9. Else, R == 0
    //   a. LetR = 10.
    // 10. If stripPrefix is true, then
    //   a. If the length of S is at least 2 and the first two characters of S are either ―0x or ―0X,
    //      then remove the first two characters from S and let R = 16.
    // 11. If S contains any character that is not a radix-R digit, then let Z be the substring of S
    //     consisting of all characters before the first such character; otherwise, let Z be S.
    if ((radix == 0 || radix == 16) && length - p >= 2 && data[p] == '0' && (data[p + 1] == 'x' || data[p + 1] == 'X')) {
        radix = 16;
        p += 2;
    } else if (radix == 0)
        radix = 10;

    // 8.a If R < 2 or R > 36, then return NaN.
    if (radix < 2 || radix > 36)
        return PNaN;

    // 13. Let mathInt be the mathematical integer value that is represented by Z in radix-R notation, using the letters
    //     A-Z and a-z for digits with values 10 through 35. (However, if R is 10 and Z contains more than 20 significant
    //     digits, every significant digit after the 20th may be replaced by a 0 digit, at the option of the implementation;
    //     and if R is not 2, 4, 8, 10, 16, or 32, then mathInt may be an implementation-dependent approximation to the
    //     mathematical integer value that is represented by Z in radix-R notation.)
    // 14. Let number be the Number value for mathInt.
    int firstDigitPosition = p;
    bool sawDigit = false;
    double number = 0;
    while (p < length) {
        int digit = parseDigit(data[p], radix);
        if (digit == -1)
            break;
        sawDigit = true;
        number *= radix;
        number += digit;
        ++p;
    }

    // 12. If Z is empty, return NaN.
    if (!sawDigit)
        return PNaN;

    // Alternate code path for certain large numbers.
    if (number >= mantissaOverflowLowerBound) {
        if (radix == 10) {
            size_t parsedLength;
            number = parseDouble(s.substring(firstDigitPosition, p - firstDigitPosition), parsedLength);
        } else if (radix == 2 || radix == 4 || radix == 8 || radix == 16 || radix == 32)
            number = parseIntOverflow(s.substring(firstDigitPosition, p - firstDigitPosition), radix);
    }

    // 15. Return sign x number.
    return sign * number;
}

ALWAYS_INLINE static double parseInt(StringView s, int radix)
{
    if (s.is8Bit())
        return parseInt(s, s.characters8(), radix);
    return parseInt(s, s.characters16(), radix);
}

template<typename CallbackWhenNoException>
static ALWAYS_INLINE typename std::invoke_result<CallbackWhenNoException, StringView>::type toStringView(JSGlobalObject* globalObject, JSValue value, CallbackWhenNoException callback)
{
    VM& vm = getVM(globalObject);
    auto scope = DECLARE_THROW_SCOPE(vm);
    JSString* string = value.toStringOrNull(globalObject);
    EXCEPTION_ASSERT(!!scope.exception() == !string);
    if (UNLIKELY(!string))
        return { };
    auto viewWithString = string->viewWithUnderlyingString(globalObject);
    RETURN_IF_EXCEPTION(scope, { });
    RELEASE_AND_RETURN(scope, callback(viewWithString.view));
}

// Mapping from integers 0..35 to digit identifying this value, for radix 2..36.
const char radixDigits[] = "0123456789abcdefghijklmnopqrstuvwxyz";

I have pasted the code directly because the API in JavaScriptCore is strictly implemented step by step according to the ECMAScript (ECMA-262) parseInt standard, making it very readable and well-commented. I strongly recommend readers to read it themselves, and I will not elaborate further here.

1.3 parseInt() in QuickJS#

The core code of QuickJS is in [→ quickjs.c], and first, here is the registration code for parseInt:

/* global object */
static const JSCFunctionListEntry js_global_funcs[] = {
    JS_CFUNC_DEF("parseInt", 2, js_parseInt ),
	//...
}

The implementation logic of js_parseInt is as follows:

static JSValue js_parseInt(JSContext *ctx, JSValueConst this_val,
                           int argc, JSValueConst *argv)
{
    const char *str, *p;
    int radix, flags;
    JSValue ret;

    str = JS_ToCString(ctx, argv[0]);
    if (!str)
        return JS_EXCEPTION;
    if (JS_ToInt32(ctx, &radix, argv[1])) {
        JS_FreeCString(ctx, str);
        return JS_EXCEPTION;
    }
    if (radix != 0 && (radix < 2 || radix > 36)) {
        ret = JS_NAN;
    } else {
        p = str;
        p += skip_spaces(p);
        flags = ATOD_INT_ONLY | ATOD_ACCEPT_PREFIX_AFTER_SIGN;
        ret = js_atof(ctx, p, NULL, radix, flags);
    }
    JS_FreeCString(ctx, str);
    return ret;
}

Bellard's code has very few comments but is also very concise.

Thus, we have introduced the implementations of parseInt in three engines, all based on the standard, but due to different coding styles, reading them feels like reading works from three different literary masters.

However, from the standard and implementation, we can see that parseInt performs a lot of pre-processing before actually executing the string-to-number operation, such as parameter validity checks, default values for parameters, string format checks and normalization, overflow checks, etc., and then hands it over to the runtime for processing. Therefore, it is not difficult to deduce the reason for its slightly lower efficiency.

Next, let's briefly look at parseFloat.

2. parseFloat()#

ECMAScript (ECMA-262) parseFloat
image

According to the standard, parseFloat has two obvious differences from parseInt:

  1. It only supports one parameter and does not support base conversion.
  2. The return value supports floating-point types.

2.1 parseFloat() in V8#

The relevant logic for parseFloat in V8 is right next to parseInt, so I will directly paste the key implementation:

[→ src/builtins/number.tq]

// ES6 #sec-number.parsefloat
transitioning javascript builtin NumberParseFloat(
    js-implicit context: NativeContext)(value: JSAny): Number {
  try {
    typeswitch (value) {
      case (s: Smi): {
        return s;
      }
      case (h: HeapNumber): {
        // The input is already a Number. Take care of -0.
        // The sense of comparison is important for the NaN case.
        return (Convert<float64>(h) == 0) ? SmiConstant(0) : h;
      }
      case (s: String): {
        goto String(s);
      }
      case (HeapObject): {
        goto String(string::ToString(context, value));
      }
    }
  } label String(s: String) {
    // Check if the string is a cached array index.
    const hash: NameHash = s.raw_hash_field;
    if (IsIntegerIndex(hash) &&
        hash.array_index_length < kMaxCachedArrayIndexLength) {
      const arrayIndex: uint32 = hash.array_index_value;
      return SmiFromUint32(arrayIndex);
    }
    // Fall back to the runtime to convert string to a number.
    return runtime::StringParseFloat(s);
  }
}

[→ src/runtime/runtime-numbers.cc]

// ES6 18.2.4 parseFloat(string)
RUNTIME_FUNCTION(Runtime_StringParseFloat) {
  HandleScope shs(isolate);
  DCHECK_EQ(1, args.length());
  Handle<String> subject = args.at<String>(0);

  double value = StringToDouble(isolate, subject, ALLOW_TRAILING_JUNK,
                                std::numeric_limits<double>::quiet_NaN());

  return *isolate->factory()->NewNumber(value);
}

Since the flow in the standard is simpler, parseFloat is simpler and more readable than parseInt.

2.2 parseFloat() in JavaScriptCore#

In JavaScriptCore, the logic for parseFloat is even more straightforward:

static double parseFloat(StringView s)
{
    unsigned size = s.length();

    if (size == 1) {
        UChar c = s[0];
        if (isASCIIDigit(c))
            return c - '0';
        return PNaN;
    }

    if (s.is8Bit()) {
        const LChar* data = s.characters8();
        const LChar* end = data + size;

        // Skip leading white space.
        for (; data < end; ++data) {
            if (!isStrWhiteSpace(*data))
                break;
        }

        // Empty string.
        if (data == end)
            return PNaN;

        return jsStrDecimalLiteral(data, end);
    }

    const UChar* data = s.characters16();
    const UChar* end = data + size;

    // Skip leading white space.
    for (; data < end; ++data) {
        if (!isStrWhiteSpace(*data))
            break;
    }

    // Empty string.
    if (data == end)
        return PNaN;

    return jsStrDecimalLiteral(data, end);
}

2.3 parseFloat() in QuickJS#

In contrast to JavaScriptCore, QuickJS is only 12 lines long:

[→ quickjs.c]

static JSValue js_parseFloat(JSContext *ctx, JSValueConst this_val,
                             int argc, JSValueConst *argv)
{
    const char *str, *p;
    JSValue ret;

    str = JS_ToCString(ctx, argv[0]);
    if (!str)
        return JS_EXCEPTION;
    p = str;
    p += skip_spaces(p);
    ret = js_atof(ctx, p, NULL, 10, 0);
    JS_FreeCString(ctx, str);
    return ret;
}

However, comparing it to JavaScriptCore, we can see that QuickJS is shorter because it does not handle ASCII and 8Bit compatibility. After reading ECMAScript (ECMA-262) parseFloat, we can find that QuickJS's handling is actually not problematic; the latest standard does not require the interpreter to have such compatibility.

3. Number()#

ECMAScript (ECMA-262) Number ( value )

image

3.1 Number() in V8#

As a global object, the definition of Number is still in [→ src/init/bootstrapper.cc]. We have already introduced the registration of Number.parseInt, so let's review:

Handle<JSFunction> number_fun = InstallFunction(
        isolate_, global, "Number", JS_PRIMITIVE_WRAPPER_TYPE,
        JSPrimitiveWrapper::kHeaderSize, 0,
        isolate_->initial_object_prototype(), Builtin::kNumberConstructor);
number_fun->shared().DontAdaptArguments();
number_fun->shared().set_length(1);
InstallWithIntrinsicDefaultProto(isolate_, number_fun,
                                     Context::NUMBER_FUNCTION_INDEX);

// Create the %NumberPrototype%
Handle<JSPrimitiveWrapper> prototype = Handle<JSPrimitiveWrapper>::cast(
        factory->NewJSObject(number_fun, AllocationType::kOld));
prototype->set_value(Smi::zero());
JSFunction::SetPrototype(number_fun, prototype);

// Install the "constructor" property on the {prototype}.
JSObject::AddProperty(isolate_, prototype, factory->constructor_string(),
                          number_fun, DONT_ENUM);

This code not only registers the Number object but also initializes its prototype chain and adds the constructor function to its prototype chain. The constructor function Builtin::kNumberConstructor is a Torque implementation of Builtin, found in [→ src/builtins/constructor.tq], with the specific implementation as follows:

// ES #sec-number-constructor
transitioning javascript builtin
NumberConstructor(
    js-implicit context: NativeContext, receiver: JSAny, newTarget: JSAny,
    target: JSFunction)(...arguments): JSAny {
  // 1. If no arguments were passed to this function invocation, let n be +0.
  let n: Number = 0;
  if (arguments.length > 0) {
    // 2. Else,
    //    a. Let prim be ? ToNumeric(value).
    //    b. If Type(prim) is BigInt, let n be the Number value for prim.
    //    c. Otherwise, let n be prim.
    const value = arguments[0];
    n = ToNumber(value, BigIntHandling::kConvertToNumber);
  }

  // 3. If NewTarget is undefined, return n.
  if (newTarget == Undefined) return n;

  // 4. Let O be ? OrdinaryCreateFromConstructor(NewTarget,
  //    "%NumberPrototype%", « [[NumberData]] »).
  // 5. Set O.[[NumberData]] to n.
  // 6. Return O.

  // We ignore the normal target parameter and load the value from the
  // current frame here in order to reduce register pressure on the fast path.
  const target: JSFunction = LoadTargetFromFrame();
  const result = UnsafeCast<JSPrimitiveWrapper>(
      FastNewObject(context, target, UnsafeCast<JSReceiver>(newTarget)));
  result.value = n;
  return result;
}

The annotations in steps 1-6 correspond one by one to the [ECMAScript (ECMA-262) Number ( value )] standard, so I will not elaborate on its implementation here. It is worth noting that the standard clearly states that Number supports BigInt, and the implementations in various engines also pay special attention to this point, which also proves the results in our earlier operation comparison table.

3.2 Number() in JavaScriptCore#

The code in JavaScriptCore is similar to V8, lacking comments but logically identical:

[→ runtime/NumberConstructor.cpp]

// ECMA 15.7.1
JSC_DEFINE_HOST_FUNCTION(constructNumberConstructor, (JSGlobalObject* globalObject, CallFrame* callFrame))
{
    VM& vm = globalObject->vm();
    auto scope = DECLARE_THROW_SCOPE(vm);
    double n = 0;
    if (callFrame->argumentCount()) {
        JSValue numeric = callFrame->uncheckedArgument(0).toNumeric(globalObject);
        RETURN_IF_EXCEPTION(scope, { });
        if (numeric.isNumber())
            n = numeric.asNumber();
        else {
            ASSERT(numeric.isBigInt());
            numeric = JSBigInt::toNumber(numeric);
            ASSERT(numeric.isNumber());
            n = numeric.asNumber();
        }
    }

    JSObject* newTarget = asObject(callFrame->newTarget());
    Structure* structure = JSC_GET_DERIVED_STRUCTURE(vm, numberObjectStructure, newTarget, callFrame->jsCallee());
    RETURN_IF_EXCEPTION(scope, { });

    NumberObject* object = NumberObject::create(vm, structure);
    object->setInternalValue(vm, jsNumber(n));
    return JSValue::encode(object);
}

3.3 Number() in QuickJS#

The registration code for the Number object and its prototype chain in QuickJS is as follows:

[→ quickjs.c]

void JS_AddIntrinsicBaseObjects(JSContext *ctx)
{
	//...

	/* Number */
    ctx->class_proto[JS_CLASS_NUMBER] = JS_NewObjectProtoClass(ctx, ctx->class_proto[JS_CLASS_OBJECT], JS_CLASS_NUMBER);
    
    JS_SetObjectData(ctx, ctx->class_proto[JS_CLASS_NUMBER], JS_NewInt32(ctx, 0));
    JS_SetPropertyFunctionList(ctx, ctx->class_proto[JS_CLASS_NUMBER], js_number_proto_funcs, countof(js_number_proto_funcs));
    
    number_obj = JS_NewGlobalCConstructor(ctx, "Number", js_number_constructor, 1, ctx->class_proto[JS_CLASS_NUMBER]);
    
    JS_SetPropertyFunctionList(ctx, number_obj, js_number_funcs, countof(js_number_funcs));
}

At the same time, the constructor function js_number_constructor is bound when registering the prototype chain:

static JSValue js_number_constructor(JSContext *ctx, JSValueConst new_target,
                                     int argc, JSValueConst *argv)
{
    JSValue val, obj;
    if (argc == 0) {
        val = JS_NewInt32(ctx, 0);
    } else {
        val = JS_ToNumeric(ctx, argv[0]);
        if (JS_IsException(val))
            return val;
        switch(JS_VALUE_GET_TAG(val)) {
#ifdef CONFIG_BIGNUM
        case JS_TAG_BIG_INT:
        case JS_TAG_BIG_FLOAT:
            {
                JSBigFloat *p = JS_VALUE_GET_PTR(val);
                double d;
                bf_get_float64(&p->num, &d, BF_RNDN);
                JS_FreeValue(ctx, val);
                val = __JS_NewFloat64(ctx, d);
            }
            break;
        case JS_TAG_BIG_DECIMAL:
            val = JS_ToStringFree(ctx, val);
            if (JS_IsException(val))
                return val;
            val = JS_ToNumberFree(ctx, val);
            if (JS_IsException(val))
                return val;
            break;
#endif
        default:
            break;
        }
    }
    if (!JS_IsUndefined(new_target)) {
        obj = js_create_from_ctor(ctx, new_target, JS_CLASS_NUMBER);
        if (!JS_IsException(obj))
            JS_SetObjectData(ctx, obj, val);
        return obj;
    } else {
        return val;
    }
}

It is worth noting that QuickJS pursues simplicity and compactness, so it can be configured to support or not support BigInt, while the rest of the logic still follows the standard.

4. Double tilde (~~) Operator#

ECMAScript (ECMA-262) Bitwise NOT Operator

image

The use of the ~ operator utilizes the second step in the standard, performing type conversion on the value being computed, thus converting the string to a number. Here we focus on which step in the engine this process is completed.

4.1 BitwiseNot in V8#

First, let's look at how V8 handles unary operators:

[→ src/parsing/token.h]

static bool IsUnaryOp(Value op) { return base::IsInRange(op, ADD, VOID); }

Operators defined within the range of ADD and VOID are unary operators, specifically including (see [→ src/parsing/token.h]), where SUB and ADD are defined at the end of the binary operator list, and they will also hit the unary operator check in IsUnaryOp:

E(T, ADD, "+", 12)
E(T, SUB, "-", 12)
T(NOT, "!", 0)
T(BIT_NOT, "~", 0)
K(DELETE, "delete", 0)
K(TYPEOF, "typeof", 0)
K(VOID, "void", 0)

Then it enters the syntax analysis phase, where during the parsing of the AST tree, encountering a unary operator will trigger corresponding processing, first calling ParseUnaryOrPrefixExpression and then constructing the unary operator expression BuildUnaryExpression:

[→ src/parsing/parser-base.h]

template <typename Impl>
typename ParserBase<Impl>::ExpressionT
ParserBase<Impl>::ParseUnaryExpression() {
  // UnaryExpression ::
  //   PostfixExpression
  //   'delete' UnaryExpression
  //   'void' UnaryExpression
  //   'typeof' UnaryExpression
  //   '++' UnaryExpression
  //   '--' UnaryExpression
  //   '+' UnaryExpression
  //   '-' UnaryExpression
  //   '~' UnaryExpression
  //   '!' UnaryExpression
  //   [+Await] AwaitExpression[?Yield]

  Token::Value op = peek();
  // Unary operator processing
  if (Token::IsUnaryOrCountOp(op)) return ParseUnaryOrPrefixExpression();
  if (is_await_allowed() && op == Token::AWAIT) {
	// await processing
    return ParseAwaitExpression();
  }
  return ParsePostfixExpression();
}
template <typename Impl>
typename ParserBase<Impl>::ExpressionT
ParserBase<Impl>::ParseUnaryOrPrefixExpression() {
	//...

	//...
 	// Allow the parser's implementation to rewrite the expression.
   	return impl()->BuildUnaryExpression(expression, op, pos);
}

[→ src/parsing/parser.cc]

Expression* Parser::BuildUnaryExpression(Expression* expression,
                                         Token::Value op, int pos) {
  DCHECK_NOT_NULL(expression);
  const Literal* literal = expression->AsLiteral();
  if (literal != nullptr) {
	// !
    if (op == Token::NOT) {
      // Convert the literal to a boolean condition and negate it.
      return factory()->NewBooleanLiteral(literal->ToBooleanIsFalse(), pos);
    } else if (literal->IsNumberLiteral()) {
      // Compute some expressions involving only number literals.
      double value = literal->AsNumber();
      switch (op) {
	    // +
        case Token::ADD:
          return expression;
        // -
        case Token::SUB:
          return factory()->NewNumberLiteral(-value, pos);
        // ~
        case Token::BIT_NOT:
          return factory()->NewNumberLiteral(~DoubleToInt32(value), pos);
        default:
          break;
      }
    }
  }
  return factory()->NewUnaryOperation(op, expression, pos);
}

If the literal is of numeric type and the unary operator is not NOT (!), then the Value will be converted to Number; if it is BIT_NOT, it will be converted to INT32 for negation.

4.2 BitwiseNot in JavaScriptCore#

Similarly, during the syntax analysis phase, when processing the TILDE (~) token, it will create an expression and perform type conversion:

[→ Parser/Parser.cpp]

template <typename LexerType>
template <class TreeBuilder> TreeExpression Parser<LexerType>::parseUnaryExpression(TreeBuilder& context)
{
	//... omitted unrelated code
	 while (tokenStackDepth) {
 		switch (tokenType) {
		//... omitted unrelated code
		// ~
		case TILDE:
     			expr = context.makeBitwiseNotNode(location, expr);
     			break;
	     // +
		case PLUS:
      			expr = context.createUnaryPlus(location, expr);
     			break;
		//... omitted unrelated code
		}
	}
}

[→ parser/ASTBuilder.h]

ExpressionNode* ASTBuilder::makeBitwiseNotNode(const JSTokenLocation& location, ExpressionNode* expr)
{
	if (expr->isNumber())
        return createIntegerLikeNumber(location, ~toInt32(static_cast<NumberNode*>(expr)->value()));
    return new (m_parserArena) BitwiseNotNode(location, expr);
}

[→ parser/NodeConstructors.h]

inline BitwiseNotNode::BitwiseNotNode(const JSTokenLocation& location, ExpressionNode* expr)
        : UnaryOpNode(location, ResultType::forBitOp(), expr, op_bitnot)
{
}

4.3 BitwiseNot in QuickJS#

In QuickJS, during the parsing phase, when encountering the ~ token, it will call emit_op(s, OP_not):

[→ quickjs.c]

/* allowed parse_flags: PF_ARROW_FUNC, PF_POW_ALLOWED, PF_POW_FORBIDDEN */
static __exception int js_parse_unary(JSParseState *s, int parse_flags)
{
    int op;

    switch(s->token.val) {
    case '+':
    case '-':
    case '!':
    case '~':
    case TOK_VOID:
        op = s->token.val;
        if (next_token(s))
            return -1;
        if (js_parse_unary(s, PF_POW_FORBIDDEN))
            return -1;
        switch(op) {
        case '-':
            emit_op(s, OP_neg);
            break;
        case '+':
            emit_op(s, OP_plus);
            break;
        case '!':
            emit_op(s, OP_lnot);
            break;
        case '~':
            emit_op(s, OP_not);
            break;
        case TOK_VOID:
            emit_op(s, OP_drop);
            emit_op(s, OP_undefined);
            break;
        default:
            abort();
        }
        parse_flags = 0;
        break;
	//...
	}
    //...
}

emit_op will generate the OP_not bytecode operator and save the source code in fd->byte_code:

static void emit_op(JSParseState *s, uint8_t val)
{
    JSFunctionDef *fd = s->cur_func;
    DynBuf *bc = &fd->byte_code;

    /* Use the line number of the last token used, not the next token,
       nor the current offset in the source file.
     */
    if (unlikely(fd->last_opcode_line_num != s->last_line_num)) {
        dbuf_putc(bc, OP_line_num);
        dbuf_put_u32(bc, s->last_line_num);
        fd->last_opcode_line_num = s->last_line_num;
    }
    fd->last_opcode_pos = bc->size;
    dbuf_putc(bc, val);
}

int dbuf_putc(DynBuf *s, uint8_t c)
{
	return dbuf_put(s, &c, 1);
}

int dbuf_put(DynBuf *s, const uint8_t *data, size_t len)
{
    if (unlikely((s->size + len) > s->allocated_size)) {
        if (dbuf_realloc(s, s->size + len))
            return -1;
    }
    memcpy(s->buf + s->size, data, len);
    s->size += len;
    return 0;
}

The function that interprets and executes the bytecode is JS_EvalFunctionInternal, which will call JS_CallFree for bytecode interpretation, and its core logic is called in the JS_CallInternal function.

/* argv[] is modified if (flags & JS_CALL_FLAG_COPY_ARGV) = 0. */
static JSValue JS_CallInternal(JSContext *caller_ctx, JSValueConst func_obj,
                               JSValueConst this_obj, JSValueConst new_target,
                               int argc, JSValue *argv, int flags)
{
    JSRuntime *rt = caller_ctx->rt;
    JSContext *ctx;
    JSObject *p;
    JSFunctionBytecode *b;
    JSStackFrame sf_s, *sf = &sf_s;
    const uint8_t *pc;
	// ... omitted unrelated code
	
	for(;;) {
		int call_argc;
		JSValue *call_argv;
		SWITCH(pc) {
		// ...
		CASE(OP_not):
		{
			JSValue op1;
			op1 = sp[-1];
			// If it is an integer
			if (JS_VALUE_GET_TAG(op1) == JS_TAG_INT) {
				sp[-1] = JS_NewInt32(ctx, ~JS_VALUE_GET_INT(op1));
			// If it is not an integer
			} else {
				if (js_not_slow(ctx, sp))
					goto exception;
			}
		}
		BREAK;
		// ...
	}
	// ... omitted unrelated code
}

It can be seen that when parsing OP_not, if it is an integer, it directly negates it; otherwise, it calls js_not_slow:

static no_inline int js_not_slow(JSContext *ctx, JSValue *sp)
{
    int32_t v1;

    if (unlikely(JS_ToInt32Free(ctx, &v1, sp[-1]))) {
        sp[-1] = JS_UNDEFINED;
        return -1;
    }
    sp[-1] = JS_NewInt32(ctx, ~v1);
    return 0;
}

js_not_slow will attempt to convert to an integer; if it cannot, it returns -1; if it can, it negates the integer. The conversion logic for JS_ToInt32Free is as follows:

/* return (<0, 0) in case of exception */
static int JS_ToInt32Free(JSContext *ctx, int32_t *pres, JSValue val)
{
 redo:
	tag = JS_VALUE_GET_NORM_TAG(val);
	switch(tag) {
	case JS_TAG_INT:
	case JS_TAG_BOOL:
	case JS_TAG_NULL:
	case JS_TAG_UNDEFINED:
		ret = JS_VALUE_GET_INT(val);
		break;
		// ...
	default:
		val = JS_ToNumberFree(ctx, val);
		if (JS_IsException(val)) {
			*pres = 0;
			return -1;
		}
		goto redo;
	}
    *pres = ret;
    return 0;
}

For strings, it will go to JS_ToNumberFree, which then calls JS_ToNumberHintFree, involving the core logic for string processing as follows:

static JSValue JS_ToNumberHintFree(JSContext *ctx, JSValue val,
                                   JSToNumberHintEnum flag)
{
    uint32_t tag;
    JSValue ret;

 redo:
    tag = JS_VALUE_GET_NORM_TAG(val);
    switch(tag) {
    // ... omitted unrelated logic
	case JS_TAG_STRING:
        {
            const char *str;
            const char *p;
            size_t len;
            
            str = JS_ToCStringLen(ctx, &len, val);
            JS_FreeValue(ctx, val);
            if (!str)
                return JS_EXCEPTION;
            p = str;
            p += skip_spaces(p);
            if ((p - str) == len) {
                ret = JS_NewInt32(ctx, 0);
            } else {
                int flags = ATOD_ACCEPT_BIN_OCT;
                ret = js_atof(ctx, p, &p, 0, flags);
                if (!JS_IsException(ret)) {
                    p += skip_spaces(p);
                    if ((p - str) != len) {
                        JS_FreeValue(ctx, ret);
                        ret = JS_NAN;
                    }
                }
            }
            JS_FreeCString(ctx, str);
        }
        break;
	// ... omitted unrelated logic
	}
	// ... omitted unrelated logic
}

If it can be converted, it uses JS_NewInt32 to handle it; otherwise, it returns NaN.

5. Unary Operator (+)#

ECMAScript (ECMA-262) Unary Plus Operator

image

The unary plus operator is one of my favorite ways to convert strings to numbers; the standard is straightforward and clear, simply used for type conversion to numbers.

5.1 UnaryPlus in V8#

The syntax analysis phase is the same as that of the Double tilde (~~) Operator, so I won't elaborate further.

5.2 UnaryPlus in JavaScriptCore#

The syntax analysis phase is the same as that of the Double tilde (~~) Operator, so I won't elaborate further.

5.3 UnaryPlus in QuickJS#

The syntax analysis phase is the same as that of the Double tilde (~~) Operator, so I won't elaborate further. Finally, it still goes to JS_CallInternal.

[→ quickjs.c]

/* argv[] is modified if (flags & JS_CALL_FLAG_COPY_ARGV) = 0. */
static JSValue JS_CallInternal(JSContext *caller_ctx, JSValueConst func_obj,
                               JSValueConst this_obj, JSValueConst new_target,
                               int argc, JSValue *argv, int flags)
{
    JSRuntime *rt = caller_ctx->rt;
    JSContext *ctx;
    JSObject *p;
    JSFunctionBytecode *b;
    JSStackFrame sf_s, *sf = &sf_s;
    const uint8_t *pc;
	// ... omitted unrelated code
	
	for(;;) {
		int call_argc;
		JSValue *call_argv;
		SWITCH(pc) {
		// ...
		CASE(OP_plus):
			{
			    JSValue op1;
				uint32_t tag;
				op1 = sp[-1];
				tag = JS_VALUE_GET_TAG(op1);
				if (tag == JS_TAG_INT || JS_TAG_IS_FLOAT64(tag)) {
				} else {
					if (js_unary_arith_slow(ctx, sp, opcode))
				 		goto exception;
				}
				BREAK;
			}
		// ... omitted unrelated code
		}
	}
	// ... omitted unrelated code
}

It can be seen that when the operand is Int or Float, it is processed directly without further handling, consistent with the specification in the standard. For other cases, it calls js_unary_arith_slow, and if an exception occurs during the call, it goes to the exception logic:

static no_inline __exception int js_unary_arith_slow(JSContext *ctx, JSValue *sp, OPCodeEnum op)
{
    JSValue op1;
    double d;

    op1 = sp[-1];
    if (unlikely(JS_ToFloat64Free(ctx, &d, op1))) {
        sp[-1] = JS_UNDEFINED;
        return -1;
    }
    switch(op) {
    case OP_inc:
        d++;
        break;
    case OP_dec:
        d--;
        break;
    case OP_plus:
        break;
    case OP_neg:
        d = -d;
        break;
    default:
        abort();
    }
    sp[-1] = JS_NewFloat64(ctx, d);
    return 0;
}

Here, JS_ToFloat64Free has the same internal processing logic as JS_ToFloat64Free in section 4.3, so I won't elaborate further. After js_unary_arith_slow processes the value conversion, if the operator is a unary plus, it simply returns; otherwise, it will perform the corresponding operation based on the operator, such as incrementing by 1.


Thus, we have explained the specific implementations of the following 5 methods in the interpreter:

  1. parseInt()
  2. parseFloat()
  3. Number()
  4. Double tilde (~~) Operator
  5. Unary Operator (+)

In addition to the above 5 methods for converting strings to numbers, there are 4 more methods that I will not elaborate on due to space constraints:

  • Math.floor()
  • Multiply with number
  • The Signed Right Shift Operator (>>)
  • The Unsigned Right Shift Operator (>>>)

Each method of converting strings to numbers has its pros and cons, and users can choose according to their needs. Here are some personal summaries of my experiences:

If the return value only requires an integer:

  • If you pursue code simplicity and execution efficiency, and have some confidence in the input value (no need for defensive programming), prefer using Unary Operator (+).
  • If you have no confidence in the input value and need to do defensive programming, use parseInt().
  • If you need to support BigInt, prioritize using Number(); if using Double tilde (~~) Operator, be aware of the 31-bit issue.

If the return value requires a floating-point number:

  • If you pursue code simplicity and execution efficiency, and have some confidence in the input value (no need for defensive programming), prefer using Unary Operator (+).
  • If you have no confidence in the input value and need to do defensive programming, use parseFloat().
  • If you need to support BigInt, use parseFloat().
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.