Ir al contenido principal

Ralsina.Me — El sitio web de Roberto Alsina

Publicaciones sobre crystal

A Tale of Optimization

I reim­ple­ment­ed Pyg­ments in Crys­tal. It did­n't quite go as I ex­pect­ed. I have al­ready writ­ten about how I did it but that left a large part of the sto­ry un­told. You see, I am us­ing Crys­tal, which com­piles to na­tive code. And yet my reim­ple­men­ta­tion was slow­er than Python. That's not sup­posed to hap­pen.

I de­cid­ed to fig­ure out why, and fix it. This is the sto­ry of how I made my soft­ware that looked "ok" 30x faster. Mind you, this is go­ing to make me sound much smarter and ac­cu­rate than I am. I ran in­to 200 dead ends be­fore find­ing each im­prove­men­t.

The First Problem (v0.1.0)

Command Mean [ms] Min [ms] Max [ms] Relative
bin/tartrazine ../crycco/src/crycco.cr swapoff > x.html --standalone 533.8 ± 4.4 523.7 542.9 18.80 ± 0.92
chroma ../crycco/src/crycco.cr -l crystal -f html -s swapoff > x.html 28.4 ± 1.4 25.6 32.8 1.00
pygmentize ../crycco/src/crycco.cr -l crystal -O full,style=autumn -f html -o x.html 103.5 ± 2.8 95.6 109.1 3.65 ± 0.20

That bench­mark (like all the rest) is done us­ing hy­per­fine and run­ning each com­mand 50 times af­ter a 10-run warmup. Not that it needs so much care, just look at those num­ber­s. Not on­ly is tar­trazine al­most 20 times slow­er than chro­ma, it's al­so 3.5 times slow­er than Pyg­ments. And Pyg­ments is writ­ten in Python!

Even with­out com­par­ing, half a sec­ond to high­light a 100-­line file is ridicu­lous.

What's going on here? To find out, let's get data. I used callgrind to profile the code, and then kcachegrind to visualize it.

$ valgrind --tool=callgrind bin/tartrazine ../crycco/src/crycco.cr swapoff

Some functions called half a billion times

As you can see, some func­tions are called half a bil­lion times and ac­count for 40% of the ex­e­cu­tion time. What are they?

A string in Crystal is always unicode. The String::char_bytesize_at function is used to convert an offset into the string from characters to bytes. That's because unicode characters can be different "widths" in bytes. So in the string "123" the "3" is the 3rd byte, but in "áéí" the "í" starts in the 5th byte.

And why is it do­ing that? Be­cause this code does a bazil­lion regex op­er­a­tions, and the un­der­ly­ing li­brary (PCRE2) deals in bytes, not char­ac­ter­s, so ev­ery­time we need to do things like "match this regex in­to this string start­ing in the 9th po­si­tion" we need to con­vert that off­set to bytes, and then when it finds a match at byte X we need to con­vert that off­set to char­ac­ters and to ex­tract the da­ta we do it two more times, and so on.

I decided to ... not do that. One nice thing of Crystal is that even though it's compiled, the whole standard library is there in /usr/lib/crystal so I could just go and read how the regular expression code was implemented and see what to do.

Ended up writing a version of Regex and Regex.match that worked on bytes, and made my code deal with bytes instead of characters. I only needed to convert into strings when I had already generated a token, rather than in the thousands of failed attempts to generate it.

The Second Problem (commit 0626c86)

Command Mean [ms] Min [ms] Max [ms] Relative
bin/tartrazine ../crycco/src/crycco.cr -f html -t swapoff -o x.html --standalone 187.8 ± 4.4 183.2 204.1 7.48 ± 0.30
chroma ../crycco/src/crycco.cr -l crystal -f html -s swapoff > x.html 25.1 ± 0.8 23.6 28.5 1.00
pygmentize ../crycco/src/crycco.cr -l crystal -O full,style=autumn -f html -o x.html 89.9 ± 4.7 83.6 102.1 3.58 ± 0.22

While bet­ter this still suck­s. I made it 2.5 times faster, but it's still 7.5 times slow­er than chro­ma, and 3.6x slow­er than Python???

regex library takes all the time

Back to val­grind. This time, the pro­file was ... dif­fer­en­t. The regex li­brary was tak­ing al­most all the ex­e­cu­tion time, which makes sense, since it's do­ing all the work. But why is it so slow?

It's calling valid_utf8 ALL THE TIME?

Almost all the time is spent in valid_utf8. That's a function that checks if a string is valid UTF-8. And it's called all the time. Why? Because the regex library is written in C, and it doesn't know that the strings it's working with are already valid UTF-8. So, it checks. And checks. And checks.

Solution? Let's not do that either. The PCRE2 library has a handy flag called NO_UTF_CHECK just for that. So, if you pass that flag when you are doing a regex match, it will not call valid_utf8 at all!

The Third Problem (commit 7db8fdc)

Command Mean [ms] Min [ms] Max [ms] Relative
bin/tartrazine ../crycco/src/crycco.cr -f html -t swapoff -o x.html --standalone 30.0 ± 2.2 25.5 36.6 1.15 ± 0.10
chroma ../crycco/src/crycco.cr -l crystal -f html -s swapoff > x.html 26.0 ± 1.0 24.1 29.2 1.00
pygmentize ../crycco/src/crycco.cr -l crystal -O full,style=autumn -f html -o x.html 96.3 ± 7.7 86.1 125.3 3.70 ± 0.33

Yay! My com­piled pro­gram is fi­nal­ly faster than an in­ter­pret­ed one! And even in the same ball­park as the oth­er com­piled one!

I won­der what hap­pens with a larg­er file!

Command Mean [ms] Min [ms] Max [ms] Relative
bin/tartrazine /usr/include/sqlite3.h -f html -t swapoff -o x.html --standalone -l c 896.6 ± 71.6 709.6 1015.8 7.07 ± 0.58
chroma /usr/include/sqlite3.h -l c -f html -s swapoff > x.html 126.8 ± 2.1 122.8 132.9 1.00
pygmentize /usr/include/sqlite3.h -l c -O full,style=autumn -f html -o x.html 229.1 ± 4.5 219.5 238.9 1.81 ± 0.05

Clear­ly some­thing very bad hap­pens when my code pro­cess­es larg­er files. I won­der what it is?

That's rather low level

At first glance this is not very informative. So, most of the execution time is spent in libgc and libc functions. That's not very helpful. But, if you look a bit harder, you'll see the execution time is spent allocating memory. Hundreds of milliseconds spent allocating memory

But memory is fast!

Yes, memory is fast. But allocating it is not. And I was allocating a lot of memory. Or rather, I was allocating memory over and over. See that memcpy there?

This took me about a day to fig­ure out, but this line of code is where ev­ery­thing be­came slow.

matched, new_pos, new_tokens = rule.match(text_bytes, pos, self)
if matched
    # Move position forward, save the tokens,
    # tokenize from the new position
    pos = new_pos
    tokens += new_tokens

That looks innocent, right? It's not. The tokens array is created at the beginning of tokenization, and every time I find new tokens I just append them to it. BUT ... how does that work? Arrays are not infinite! Where does it put the new tokens?

Well, when an ar­ray grows, it al­lo­cates a new, larg­er ar­ray, copies all the el­e­ments there and now you have a larg­er ar­ray with some room to spare. When you keep do­ing that, you are copy­ing the whole ar­ray over and over. It's the cost you pay for the con­ve­nience of hav­ing an ar­ray that can grow.

Where was I call­ing this? In the for­mat­ter. The for­mat­ter needs to see the to­kens to turn them in­to HTM­L. Here's the rel­e­vant code:

lines = lexer.group_tokens_in_lines(lexer.tokenize(text))

All it does is group them in­to lines (BTW, that again does the whole "grow an ar­ray" thing) and then it just it­er­ates the lines, then the to­kens in each line, slaps some HTML around them and writes them to a string (which again, grows).

The so­lu­tion is ... not to do that.

In Crys­tal we have it­er­a­tors, so I changed the to­k­eniz­er so that rather than re­turn­ing an ar­ray of to­kens it's an it­er­a­tor which gen­er­ates them one af­ter the oth­er. So the for­mat­ter looks more like this:

tokenizer.each do |token|
    outp << "<span class=\"#{get_css_class(token[:type])}\">#{HTML.escape(token[:value])}</span>"
    if token[:value].ends_with? "\n"
    i += 1
    outp << line_label(i) if line_numbers?
    end
end

Rather than iterate an array, it iterates ... an iterator. So no tokens array in the middle. Instead of grouping them into lines, spit out a line label after every newline character (yes, this is a bit more complicated under the hood)

There were sev­er­al oth­er op­ti­miza­tions but this is the one that made the dif­fer­ence.

The Fourth Problem (commit ae03e46)

Command Mean [ms] Min [ms] Max [ms] Relative
bin/tartrazine /usr/include/sqlite3.h -f html -t swapoff -o x.html --standalone -l c 73.5 ± 5.9 67.7 104.8 1.00
chroma /usr/include/sqlite3.h -l c -f html -s swapoff > x.html 123.1 ± 2.7 117.5 130.3 1.68 ± 0.14
pygmentize /usr/include/sqlite3.h -l c -O full,style=autumn -f html -o x.html 222.0 ± 5.9 207.8 239.4 3.02 ± 0.26

Fi­nal­ly, tar­trazine is the fastest one. On a large file! By a good margin! But is that all there is? Is there noth­ing else to im­prove? Well, no. I can do the same trick again!

You see, the formatter is returning a String by appending to it, and then we are writing the string to a file. That's the same problem as before!. So, I changed the formatter to take an IO object and write to it directly.

Command Mean [ms] Min [ms] Max [ms] Relative
bin/tartrazine /usr/include/sqlite3.h -f html -t swapoff -o x.html --standalone -l c 70.3 ± 1.8 65.6 73.8 1.00
chroma /usr/include/sqlite3.h -l c -f html -s swapoff > x.html 122.2 ± 3.0 116.6 131.1 1.74 ± 0.06
pygmentize /usr/include/sqlite3.h -l c -O full,style=autumn -f html -o x.html 219.4 ± 4.4 212.2 235.5 3.12 ± 0.10

As you can see, that still gives a small im­prove­men­t, of just 3 mil­lisec­ond­s. But that's 5% of the to­tal time. And it's a small change.

And this is where di­min­ish­ing re­turns hit. I could prob­a­bly make it faster, but even a 10% im­prove­ment would be just 7 mil­lisec­onds on a huge file. If I were GitHub then maybe this would be worth my time, but I am not and it's not.

And how does the fi­nal ver­sion com­pare with the first one?

Command Mean [ms] Min [ms] Max [ms] Relative
bin/tartrazine-last ../crycco/src/crycco.cr -f html -t swapoff -o x.html --standalone -l c 16.1 ± 1.1 13.1 21.8 1.00
bin/tartrazine-first ../crycco/src/crycco.cr swapoff 519.1 ± 5.4 508.1 533.7 32.29 ± 2.23

A speedup fac­tor of 32.3x, for code that had noth­ing ob­vi­ous­ly wrong in it. I'd say that's pret­ty good.

Tartrazine: reimplementing pygments or my adventure in extreme test-driven-development

This is a "what I did this week­end" post, but it's slight­ly more in­ter­est­ing than oth­er­s, I think. So, I reim­ple­ment­ed a large chunk of Pyg­ments in a lib called Tar­trazine.

Why?

Be­cause I want­ed to high­light source code in Mark­term, and I want­ed to do it in Crys­tal, not us­ing an ex­ter­nal de­pen­den­cy.

I was us­ing Chro­ma but it's run­ning over a pipe and makes the code look ug­ly, and you need to in­stall it, and so on.

So ... I knew Chro­ma was a Go port of Pyg­ments. So I thought ... how hard can it be? They al­ready did it!

Be­cause I be­lieve we need more li­braries I just start­ed writ­ing the damned thing.

What?

Pyg­ments/Chro­ma con­sists of three part­s.

  • Lex­er­s, which turn a text and turn it in­to a pile of to­ken­s.
  • Styles, which when asked about a to­ken type, re­turn a col­or/bold/un­der­line/etc. "style".
  • For­mat­ter­s, which it­er­ate a list of to­ken­s, ap­ply styles and cre­ate a stream of text (for ex­am­ple HTML with pret­ty col­ors).

The hard part seemed to be the lex­er­s, so I start­ed there.

How?

I lied a lit­tle. I start­ed try­ing to read the Pyg­ments code. It was quick­ly clear that there are sev­er­al kinds of lex­er­s, but most of them (like, 90%) are "regex lex­er­s". They are lex­ers that use a state ma­chine and a bunch of reg­u­lar ex­pres­sions to to­k­enize the in­put.

I know and have im­ple­ment­ed state ma­chines.. State ma­chines are easy. So, I de­cid­ed to just im­ple­ment the regex lex­er­s. They have the huge ad­van­tage that they have lit­tle to no code. THey are just a bunch of reg­u­lar ex­pres­sions and a bunch of rules that say "if you see this, do that".

They are im­ple­ment­ed as da­ta. Here's what the ada lex­er looks like:

    tokens = {
        'root': [
            (r'[^\S\n]+', Text),
            (r'--.*?\n', Comment.Single),
            (r'[^\S\n]+', Text),
            (r'function|procedure|entry', Keyword.Declaration, 'subprogram'),
            (r'(subtype|type)(\s+)(\w+)',
             bygroups(Keyword.Declaration, Text, Keyword.Type), 'type_def'),
            (r'task|protected', Keyword.Declaration),
            (r'(subtype)(\s+)', bygroups(Keyword.Declaration, Text)),
            (r'(end)(\s+)', bygroups(Keyword.Reserved, Text), 'end'),
            (r'(pragma)(\s+)(\w+)', bygroups(Keyword.Reserved, Text,
                                             Comment.Preproc)),
            (r'(true|false|null)\b', Keyword.Constant),
            # builtin types
            (words(BUILTIN_LIST, suffix=r'\b'), Keyword.Type),
            (r'(and(\s+then)?|in|mod|not|or(\s+else)|rem)\b', Operator.Word),
            (r'generic|private', Keyword.Declaration),
            (r'package', Keyword.Declaration, 'package'),
            (r'array\b', Keyword.Reserved, 'array_def'),
            (r'(with|use)(\s+)', bygroups(Keyword.Namespace, Text), 'import'),
            (r'(\w+)(\s*)(:)(\s*)(constant)',
             bygroups(Name.Constant, Text, Punctuation, Text,
                      Keyword.Reserved)),
            (r'<<\w+>>', Name.Label),

While ut­ter­ly un­scrutable, that's just da­ta. Then I looked at how Pyg­ments pro­cess­es that data, and it was bad news. While it's ok it's very id­iomat­ic Python. Like, meta­class­es and things jump­ing around the code­base. I had a feel­ing it could­n't be that hard.

Af­ter al­l, the ex­cel­lent write your own lex­er doc­u­ment ex­plains it in about two pages of tex­t!

So, I looked at Chro­ma's im­ple­men­ta­tion. Let's say I am now dis­trust­ful of those who claim go code is sim­ple, and wor­ried I may be ex­treme­ly dum­b.

Sure, if I spent some time I could un­der­stand it, but I am not a go per­son, and I don't have plans to be one soon, so I had to make de­ci­sion­s.

And then I saw a mag­i­cal fold­er...

A fold­er full of XML which is ob­vi­ous­ly the lex­er def­i­ni­tion­s.

Chro­ma took the Pyg­ments lex­er def­i­ni­tions which were da­ta in Python files, and turned them in­to da­ta in ac­tu­al da­ta files.

And ac­tu­al­ly read­ing those XML files along the Pyg­ments doc did the trick. I now know how to write a lex­er.

But really, How?

Let's look at how, while look­ing at the def­i­ni­tion of a very sim­ple lex­er, the "bash_ses­sion" one. A lex­er, like I said, is a state ma­chine. Each lex­er has some meta­data, such as its name, alias­es, etc, and some in­struc­tions about how to process in­put.

In this case, it says in­put should end with a new­line.

<lexer>
  <config>
    <name>Bash Session</name>
    <alias>bash-session</alias>
    <alias>console</alias>
    <alias>shell-session</alias>
    <filename>*.sh-session</filename>
    <mime_type>text/x-sh</mime_type>
    <ensure_nl>true</ensure_nl>
  </config>

Since a lex­er is a state ma­chine, it has states. The first state is al­ways called root. Each state has rules. Be­cause this lex­er is very sim­ple, it has on­ly one state with two rules.

  <rules>
    <state name="root">
      <rule pattern="^((?:\[[^]]+@[^]]+\]\s?)?[#$%&gt;])(\s*)(.*\n?)">
        <bygroups>
          <token type="GenericPrompt"/>
          <token type="Text"/>
          <using lexer="bash"/>
        </bygroups>
      </rule>
      <rule pattern="^.+\n?">
        <token type="GenericOutput"/>
      </rule>
    </state>
  </rules>

Each rule has a pat­tern (a reg­u­lar ex­pres­sion) which de­cides if the rule ap­plies or not.

The first rule says "if the line starts with a promp­t, cap­ture the promp­t, cap­ture the spa­ces af­ter it, and then cap­ture the rest of the line".

Then, in­side the rule, we have "ac­tion­s". This rule has one ac­tion, which is "by­group­s". This ac­tion says "the first group we ca­pured is a Gener­icPromp­t, the sec­ond group is Tex­t, and the third group we should ask the bash lex­er to to­k­enize".

And that makes sense, since a bash ses­sion looks like this:

$ echo hello
hello

There you have "$" (the promp­t), " " (tex­t), and "e­cho hel­lo" (bash code).

The sec­ond rule is sim­pler. It says "cap­ture a whole line".

So, when pro­cess­ing that ex­am­ple ses­sion, it works like this:

The state is "root" (it al­ways starts there), and we look at the be­gin­ning of the file.

The first line match­es the first rule, so we cap­ture the promp­t, the spaces, and the tex­t. We gen­er­ate the first two to­ken­s: Gener­icPrompt and Tex­t. Then we ask the bash lex­er to to­k­enize the rest of the line. It will re­turn a list of to­ken­s, we keep those to­kens too.

Be­cause we matched, we move the "cur­sor" to the end of the match, which is at the be­gin­ning of the sec­ond line now.

And we start match­ing again.

The state is root. The first rule does­n't match at the po­si­tion we're in. The sec­ond rule does. So we cap­ture the whole line and gen­er­ate a Gener­i­cOut­put to­ken. Move the cur­sor to the end of the match.

Oop­s, no more file. There, we to­k­enized.

Just that?

Well, no. Ac­tions can al­so "push a state" which means change the cur­rent state to some­thing else. States are kept in a stack, so if you were in state "root" and pushed the state "foo" now the stack is "root, foo" and the cur­rent state is "foo".

Of course you can al­so "pop" a state, which means "go back to the pre­vi­ous state".

There are some oth­er bit­s, such as "in­clude" which means "pre­tend the rules of the oth­er lex­er are here" so we don't have to write them many times in the XM­L, or that you can pop more than one state, what­ev­er, the ba­sic is just:

  1. You are in a state
  2. Check rules un­til one match­es
  3. Use that rule's ac­tions (y­ou may end up in an­oth­er state)
  4. Col­lect any to­kens gen­er­at­ed
  5. Move the cur­sor to the end of the match
  6. Go back to 1.

And that's it. That's how you write a lex­er.

And then?

But sup­pose I wrote the lex­er, how do I know if I am do­ing it right? I mean, I can't just run the lex­er and see if it work­s, right?

Well, we could if we on­ly had a whole pile of things to to­k­enize, and a tool that cre­ates the to­kens in a read­able for­mat!

Hey, we have those things. There is the pyg­ments test suite and Chro­ma can out­put to­kens in json!

So, let's do some ex­treme test-­driven-de­vel­op­men­t! Af­ter al­l, I have the tests writ­ten, now I just have to pass them, right?

I wrote enough lex­er to spit out to­ken­s, wrote a test rig that com­pared them to chro­ma's out­put, and start­ed writ­ing a lex­er.

That up there is a two-­day-­long thread of me try­ing to pass test­s. When I fin­ished, over 96% of the test suite was pass­ing and most of the fail­ures were ar­guable (I think chro­ma is wrong in some cas­es).

So, I had writ­ten the RegexLex­er. Looks like this

That code sup­ports 241 lan­guages, and it's about 300 lines of sim­ple code.

In fac­t, I think some­one (not me) should do what I did but write this thing in C, so it can be used from any lan­guage and both chro­ma and tar­trazine are ren­dered ob­so­lete.

New tiny project: Markterm

A while back I saw glow and I won­dered ... how hard could it be to write that?

I mean, it looks im­pres­sive! It ren­ders mark­down to a ter­mi­nal!

Well, it's not all that hard if you have a nice mark­down pars­er you can hack, so I spent a cou­ple of hours and wrote it.

It's called mark­term and it ... well, it's done?

You can use it as a li­brary from Crys­tal, or you can use the mark­term bi­na­ry.

This is how it looks

This is how it looks in a light terminal

It can syn­tax high­light code (as long as you have chro­ma in­stalled), it does the right thing when piped to an­oth­er pro­gram, and it tries to look good but not gaudy.

What will I use it for?

Show­ing CLI pro­gram's help. I am us­ing do­copt late­ly and the for­mat of the do­copt in­put is al­most mark­down. With a bit of flair it can be both, and us­ing mark­term I can show it in a ter­mi­nal in a much nicer way than usu­al.

Ex­tra: why am I start­ing so many projects late­ly?

  • Be­cause it makes me hap­py
  • Be­cause in most cas­es it's not code that will need main­te­nance. I ex­pect mark­term to just work for the next 10 years with­out me touch­ing it, if need­ed.
  • Be­cause I can

Small Project: Crycco, a literate programming tool

I have fun in weird ways.

For ex­am­ple, I may read about a tool and then de­cide to trans­late a clone of the tool from one pro­gram­ming lan­guage to an­oth­er. And then af­ter it works but the code is aw­ful, re­write it, then re­lease it

In a week­end. It helps that the thing is small (un­der 200LOC) but it's still an in­di­ca­tion that I am hav­ing fun do­ing it.

So, what is it? I wrote Cryc­co, a Crys­tal tool sim­i­lar to Doc­co, which is a lit­er­ate pro­gram­ming tool.

A what?

A lit­er­ate pro­gram­ming tool. Mean­ing a tool that lets you:

  • Gen­er­ate a pro­gram's code out of a doc­u­ment
  • Gen­er­ate a doc­u­ment out of a pro­gram's code
  • Gen­er­ate a doc­u­ment and a pro­gram's code out of a "lit­er­ate doc­u­men­t"

That may be do­ing the op­po­site of ex­plain­ing what it ac­tu­al­ly does, so here's an ex­am­ple.

I wrote Cryc­co which if you look at it looks like a nor­mal, al­though very heav­i­ly com­ment­ed crys­tal pro­gram.

But if you pass the source code for Cryc­co through cryc­co it pro­duces the cryc­co web­site ... go take a look!

As you can see it's a de­tailed ex­pla­na­tion of what the pro­gram does and how the code work­s. It's the same thing on­ly pre­sent­ed dif­fer­ent­ly.

It's much eas­i­er to fol­low the ex­pla­na­tions when they run along­side the code rather than in­ter­rupt­ing it, is­n't it?

Let's see another example. Crycco uses a YAML file for some aspects of its configuration. Usually I would just add some comments and tell you to go and read the file.

But that would present you with ... well, YAML and com­ments. Github does an ad­mirable job con­vey­ing that data, but com­pare it with Cryc­co's lan­guages.yml

What do you think is eas­i­er to read and more un­der­stand­able?

So, there it is, and it's pret­ty us­able, even if I still want to ex­tend it in some ways (like... let's gen­er­ate Mark­down! And use that to gen­er­ate a PDF! Like it's 1989 and we are us­ing nuwe­b!)

Yes, lit­er­ate pro­gram­ming has been dead for a cou­ple of decades, but it's far from a worth­less idea. It can be used, it can be used pur­pose­ful­ly and be valu­able, and I have ideas that can be achieved us­ing Cryc­co or a sim­i­lar tool.

Hope it's use­ful for some­one else too.

Benchmarking Markdown in Crystal

I am work­ing a bit (s­low­ly) on Nicol­i­no a stat­ic site gen­er­a­tor writ­ten in Crys­tal. One of the things it does is, it ren­ders mark­down files.

Since the mark­down us­age is lim­it­ed to, like, 3 lines, I thought "Why not try all the mark­down li­braries and see which one is faster?"

So, I did.

The bench­mark is sim­ple:

  • An emp­ty Nicol­i­no site.
  • 4000 sim­ple mark­down files (a few lorem ip­sum para­graph­s)
  • Nicol­i­no com­piled in re­lease mode
  • Ren­der the whole site 10 times, av­er­age the last 7 runs
  • De­fault con­figs for ev­ery­thing

All this on this ma­chine:

        a8888b.           Host        -  ralsina@mindy
       d888888b.          Machine     -  Micro Computer (HK) Tech Limited Default string UM560 XT
       8P"YP"Y88          Kernel      -  6.9.9-arch1-1
       8|o||o|88          Distro      -  EndeavourOS
       8'    .88          DE          -  Qtile
       8`._.' Y8.         Packages    -  1340 (pacman), 1 (cargo), 18446744073709551615 (Homebrew)
      d/      `8b.        Terminal    -  zellij
     dP        Y8b.       Shell       -  fish
    d8:       ::88b.      Uptime      -  10d 18h 7m
   d8"         'Y88b      CPU         -  AMD Ryzen 5 5600H with Radeon Graphics (12)
  :8P           :888      Resolution  -  4096x2160, 1920x1080
   8a.         _a88P      CPU Load    -  15%
 ._/"Yaa     .| 88P|      Memory      -  9.2 GB/24.5 GB
 \    YP"    `|     `.
 /     \.___.d|    .'
 `--..__)     `._.'

Of course Nicol­i­no does some oth­er things be­sides ren­der­ing mark­down, so to iso­late that part I ran a cou­ple of dum­my im­ple­men­ta­tion­s:

  • NOOP It does noth­ing. When asked to com­pile a string to mark­down, it re­turns the same string.
  • TOEMP­TY It does less than noth­ing. When asked to com­pile a string to mark­down, it re­turns an emp­ty string.

So, the dif­fer­ence be­tween emp­ty and noop is prob­a­bly what it takes to ren­der some tem­plates and store the out­put in disk.

Then I ran the bench­mark for the 5 mark­down li­braries I could find:

Here is a chart show­ing the times in sec­onds for each li­brary with­out the time that is used in oth­er things (NOOP's time, which was 2.62 sec­ond­s).

Markdown libraries benchmark 0011223344550.3898.3446153846154505.25393356643360.66201.3723076923077491.63942307692310.33304.40000000000003507.68509615384620.38407.4276923076924505.25393356643365.28510.4553846153847267.0Markdown libraries benchmarkmarkdcr-discountcrystal-cmarkcr-mark-gfmluce

So, luce is much, much, much slow­er, and crys­tal-c­mark is the fastest. And cr-dis­count (MY OWN BIND­ING) is much slow­er than the oth­er­s, which is a bit dis­ap­point­ing.

On the oth­er hand, there are two sides to op­ti­miz­ing. One is choos­ing the fastest li­brary, the oth­er is not car­ing if the dif­fer­ence is small in ab­so­lute, even if it's large in rel­a­tive.

What does that mean?

This is over 4000 doc­u­ments.

So, while cr-dis­count is slow­er, it's still ren­der­ing 4000 doc­u­ments in 0.66 sec­ond­s. That's 0.000165 sec­onds per doc­u­men­t.

That's 1.65 tenths of a thou­sand of a sec­ond. That's over 6000 doc­u­ments per sec­ond.

If the nor­mal use­case was to ren­der thou­sands of doc­u­ments, then that would make a dif­fer­ence. But it's not. It's usu­al­ly 3 doc­u­ments.

So, as long as cr-dis­count has any fea­ture I need and it's not in the oth­er li­braries, it's fine to use it, and the same goes for the oth­ers (ex­cept luce, I guess, but stil­l: 757 doc­u­ments per sec­ond is not bad).

Update: Compiling discount with -O3 brings it down to 0.48, which is bettern than 0.66 but makes no difference for the conclusions.


Contents © 2000-2024 Roberto Alsina