Robin Malfait

May 7, 2024

Natural sorting of strings containing numbers

Recently I was working on an improvement for Tailwind CSS to make the order of classes in the generated CSS output more logical.

There are 2 things we have to solve when generating the CSS file:

  1. The order of classes matter when specificity is the same โ€” when writing classes like p-4 px-2 pb-1 we want the pb-1 class to override the px-2 class, and the px-2 class to override the p-4 class.

    .p-4 {
      padding: var(--spacing-4, 1rem /* 16px */);
    }
    .px-2 {
      padding-left: var(--spacing-2, 0.5rem /* 8px */);
      padding-right: var(--spacing-2, 0.5rem /* 8px */);
    }
    .pb-1 {
      padding-bottom: var(--spacing-1, 0.25rem /* 4px */);
    }

    This allows you to set a padding and make overrides if you want.

    Here is another, but visual, example with borders instead: border-12, border-x-0 and border-b-0.

    border-12
    border-12 border-x-0
    border-12 border-x-0 border-b-0*
    *You'll likely write this as border-t-12 instead.
  2. The generated CSS file should be deterministic given the same input โ€” this helps with caching because the output doesn't change, but also helps with debugging because the output is guaranteed to be the same. If you use a new class, then the difference compared to the old file should be a single insertion in the output.

In the Tailwind CSS codebase we have a handful of sorting rules that we apply when generating the CSS output. For the sake of this blog post, we will only focus on the last sorting rule which is the "alphabetical" sorting rule.

This rule applies when all other rules are equal and to ensure the output is deterministic as discussed earlier.

The problem

The big problem with the alphabetical sorting is that we sort the classes as a string, and a lot of the classes in Tailwind CSS contain numbers. When using the native array.sort() method, the strings will be sorted character by character which results in an unexpected output.

Let's imagine you use these classes throughout your project:

let classes = ['p-1', 'p-2', 'p-10', 'p-20']

When you use array.sort() on this array, then the order will be:

let classes = ['p-1', 'p-2', 'p-10', 'p-20']

let sorted = classes.sort()

console.log(sorted) // ['p-1', 'p-10', 'p-2', 'p-20']

In context of Tailwind CSS...

โ€ฆ this is what we get:
.p-1 {
  padding: 0.25rem /* 4px */;
}

.p-10 { 
  padding: 2.5rem /* 40px */; 
} 

.p-2 { 
  padding: 0.5rem /* 8px */; 
} 

.p-20 {
  padding: 5rem /* 80px */;
}
โ€ฆ but this is what we want:
.p-1 {
  padding: 0.25rem /* 4px */;
}

.p-2 { 
  padding: 0.5rem /* 8px */; 
} 

.p-10 { 
  padding: 2.5rem /* 40px */; 
} 

.p-20 {
  padding: 5rem /* 80px */;
}

You'll also notice this when looking at colors:

What we get:

100
200
300
400
50๐Ÿค”โ†™
500
600
700
800
900
950

What we want:

50
100
200
300
400
500
600
700
800
900
950

While it's deterministic, this order leaks to the developer if they use the Tailwind CSS IntelliSense extension for Visual Studio Code. This is because the suggestions are based on this order. We definitely want to improve the developer experience here.

The output definitely looks funny, but it makes sense if you think about it because the strings are compared character by character from left to right.

Character position:0123
p-1ย p-1
p-10p-10
p-2ย p-2
p-20p-20

Looking at the table, especially at column #2, you can see that we group the 1's and 2's together because that's all the information we have at that point and it's enough to differentiate the strings.

The solution, part 1

Luckily for us, JavaScript has a built-in method called localeCompare that we can use to sort the array in a more natural way:

let classes = ['p-1', 'p-2', 'p-10', 'p-20']

let sorted = classes.sort((a, z) => {
  return a.localeCompare(z, 'en-US', { numeric: true })
})

console.log(sorted) // ['p-1', 'p-2', 'p-10', 'p-20']

This is awesome because it solves our problem with a single line of code. The numeric: true option tells localeCompare to treat the numbers inside the string as actual numbers.

That's it, problem solved! Right?

Well, not quite. This solution does way more work than we actually need. There are two ways of knowing this.

  1. You can look at the documentation and/or implementation to learn more.
  2. You run the code on a real world example and notice that it's way slower compared to the simple array.sort() solution. Due to this observation, I had to go back to step 1 and look at the documentation.

Running tailwind -o output.css on the source code for https://tailwindcss.com, the array.sort() runs in roughly ~3ms, this new solution runs in ~30ms. That's 10 times slower!

Let's take a quick detour and dig into what localeCompare actually does. Under the hood it's using Intl.Collator, which enables language-sensitive string comparison.

This is great if you're working with different languages, and if you need to be able to sort characters with diacritics (e.g.: "รค" vs "a") correctly, but we don't need that for our use cases.

Surely, there is a better way that better fits our needs.

The solution, part 2

We know that localeCompare is awesome, but it's doing more work than we need it to do, so let's implement a custom sorting function that only does what we need.

But what do we need?

  1. Tailwind CSS classes use simple English characters without diacritics, numbers and a handful of special characters such as -_[]().
  2. We don't need to worry about different languages. Even if in theory we could generate the incorrect order compared to the localeCompare, at least the output would still be stable and deterministic.

Let's start by writing what's essentially the default implementation of the array.sort() function.

Note: for our use case we care about performance, so the code could look a bit more complex, but let's dig in together.

const ZERO = 48 // Computed via: `'0'.charCodeAt(0)`
const NINE = 57 // Computed via: `'9'.charCodeAt(0)`

function compare(a: string, z: string) {
  let aLen = a.length
  let zLen = z.length

  // 1. When comparing 2 strings, we only have to worry about the shortest
  //    string.
  // 2. When two strings are equal up to the length of the shortest string, the
  //    shortest string should come first. At that point we don't care about the
  //    rest of the longer string.
  let minLen = aLen < zLen ? aLen : zLen

  for (let i = 0; i < minLen; i++) {
    // We will compare the characters using character codes. This allows us to
    // compare the characters as numbers, and since they are numbers, we can
    // compare them against ranges of characters. E.g.: `a` to `z`, `0` to `9`,
    // etc.
    let aCode = a.charCodeAt(i)
    let zCode = z.charCodeAt(i)

    // When we encounter the same character, we can continue to the next
    // character.
    if (aCode === zCode) continue

    // Otherwise, compare the character codes directly.
    return aCode - zCode
  }

  // If we got this far, the strings are equal up to the length of the shortest
  // string. The shortest string should come first.
  return a.length - z.length
}

The compare function will eventually return a number:

  • When the number is negative, a should come before z.
  • When the number is positive, a should come after z.
  • When the number is zero, a and z are equal, and the order stays the same.

This implementation is a little bit silly and you'll see why in a bit, but we can test it to see how it behaves:

let classes = ['p-1', 'p-2', 'p-10', 'p-20']

let sorted = classes.sort((a, z) => compare(a, z))
// The `sort()` function accepts a custom `compareFn` function with the same
// signature as the `compare` function we just implemented.

console.log(sorted) // ['p-1', 'p-10', 'p-2', 'p-20']

As expected, the output is incorrect because we didn't handle numbers yet. The good news is that it behaves the same as array.sort() does.

Next, let's add support for numbers.

const ZERO = 48 // Computed via: `'0'.charCodeAt(0)`
const NINE = 57 // Computed via: `'9'.charCodeAt(0)`

function compare(a: string, z: string) {
  let aLen = a.length
  let zLen = z.length

  // 1. When comparing 2 strings, we only have to worry about the shortest
  //    string.
  // 2. When two strings are equal up to the length of the shortest string, the
  //    shortest string should come first. At that point we don't care about the
  //    rest of the longer string.
  let minLen = aLen < zLen ? aLen : zLen

  for (let i = 0; i < minLen; i++) {
    // We will compare the characters using character codes. This allows us to
    // compare the characters as numbers, and since they are numbers, we can
    // compare them against ranges of characters. E.g.: `a` to `z`, `0` to `9`,
    // etc.
    let aCode = a.charCodeAt(i)
    let zCode = z.charCodeAt(i)

    // When we encounter the same character, we can continue to the next
    // character.
    if (aCode === zCode) continue

    // If both are numbers, compare them as numbers instead of strings.
    if (aCode >= ZERO && aCode <= NINE && zCode >= ZERO && zCode <= NINE) { 
      // Start position of the number in the string.
      let aStart = i
      let zStart = i
โ€‹ 
      // End position of the number in the string, starts at the next character.
      let aEnd = i + 1
      let zEnd = i + 1
โ€‹ 
      // Consume the number as long as we are in the range of numbers
      aCode = a.charCodeAt(aEnd) 
      while (aCode >= ZERO && aCode <= NINE) aCode = a.charCodeAt(++aEnd) 
โ€‹ 
      // Consume the number as long as we are in the range of numbers
      zCode = z.charCodeAt(zEnd) 
      while (zCode >= ZERO && zCode <= NINE) zCode = z.charCodeAt(++zEnd) 
โ€‹ 
      // We now know the start and end position of the numbers in the string.
      // Let's extract the numbers.
      let aNumber = a.slice(aStart, aEnd) 
      let zNumber = z.slice(zStart, zEnd) 
โ€‹ 
      // Compare the numbers as actual numbers.
      return Number(aNumber) - Number(zNumber) 
    } 

    // Otherwise, compare the character codes directly.
    return aCode - zCode
  }

  // If we got this far, the strings are equal up to the length of the shortest
  // string. The shortest string should come first.
  return a.length - z.length
}

You might be thinking about edge cases, but some of them are already covered by the fact that we are using character codes.

For example, when comparing a negative number and a positive number:

compare('-1', '10')

This would compare the - character with the 1 character, converting these to character codes results in:

let aCode = '-'.charCodeAt(0) // 45
let zCode = '1'.charCodeAt(0) // 49

console.log({ aCode, zCode, diff: aCode - zCode })
// { aCode: 45, zCode: 49, diff: -4 }

So - already comes before 1 because the character code for - is lower than the character code for 1.

Another edge case is when comparing floating point numbers:

compare('1.1', '1.2')

In this case, we are actually comparing 3 different parts:

compare('1.1', '1.2')
//       ^      ^       Step #1, they are the same, let's continue
compare('1.1', '1.2')
//        ^      ^      Step #2, they are the same, let's continue
compare('1.1', '1.2')
//         ^      ^     Step #3, they are different, let's compare them as numbers

Or if the numbers are different:

compare('50.1', '1.2')
//       ^^      ^       Step #1, they are different, let's compare them as numbers

In this case, we don't even have to worry about the . character (or the rest of the code for that matter) because we already know that 50 is greater than 1.

Alright, but how about comparing strings that contain multiple sections of numbers?

Well... let's test it:

compare('foo-1-bar-2', 'foo-2-bar-1')
//       ^^^^           ^^^^          Step #1, they are the same, let's continue
compare('foo-1-bar-2', 'foo-2-bar-1')
//           ^              ^         Step #2, they are different, let's compare them as numbers

Again, at this point we can already bail out because we know that 1 is less than 2. No need to worry about the rest of the string.

Usage

Now that we implemented the compare function, we can use it to sort the classes in a more natural way:

let classes = ['p-1', 'p-2', 'p-10', 'p-20']

let sorted = classes.sort((a, z) => compare(a, z))

console.log(sorted) // ['p-1', 'p-2', 'p-10', 'p-20']

In context of Tailwind CSS, the output would now be:

.p-1 {
  padding: 0.25rem /* 4px */;
}

.p-2 { 
  padding: 0.5rem /* 8px */; 
} 

.p-10 { 
  padding: 2.5rem /* 40px */; 
} 

.p-20 {
  padding: 5rem /* 80px */;
}

โ€ฆ and the colors example is also correct now:

50
100
200
300
400
500
600
700
800
900
950

Summary

  1. When you want to sort strings containing numbers, then you should use localeCompare or implement a custom comparing function that only does what you need.
  2. Always make sure to measure the performance of your code.
  3. Prefer implementations that only do what you need, and nothing more.

๐Ÿ‘จโ€๐Ÿ’ป

Copyright ยฉ 2024 Robin Malfait. All rights reserved.