KV: a remote KVM application

2025-07-07 14:02

I had been reading about remote KVMs for a while. There are several, like PiKVM, JetKVM, etc. I decided I wanted one to access my server at home because it has a nasty tendency to lose network connectivity, and I wanted to be able to fix it without having to go to the office and plug in stuff into it.

This is not like running RDP or VNC on the server in that to the server this is hardware. It works no matter how crashed or disconnected the server is, as long as it has power. It is like having a monitor, keyboard and mouse plugged in to the server, but remotely.

I had all the hardware I needed:

A Radxa Zero, which has OTG support so I could use it as a USB device.
A USB capture dongle, which are cheap and easy to find, to capture the HDMI output of the server.

But I just could not make it work. PiKVM, the most popular one, makes it pretty difficult to make it work on anything other than the exact hardware configs they support, and those are exactly the ones I don't have.

I could not find any implementation that was easy to setup and supported the hardware I had, so I decided to write my own.

You can get the code at GitHub of course. It compiles to a single binary. It only requires you to have ffmpeg installed, and it should just work as long as your hardware supports OTG, you have a USB capture dongle, and you plug all the cables correctly.

One USB cable and one HDMI cable go from the server to the Radxa Zero (the HDMI via capture dongle) and you run the one binary built from this code on the Radxa Zero. It will start a web server on port 3000, and you can access it from any browser.

The web interface is very simple, but it works. You can see the video feed, and you can send keyboard and mouse events to the server. You can even provide a disk image that the server will think is a USB drive. I suppose you can even install the OS in the server that way, but I have not tried it.

If you want to access it more remotely, just setup a VPN in the KVM itself.

New project: ToCry, a ToDo/Kanban app

2025-06-21 15:56

I have been using Tasks.md for a while now and ... I like it, but I got the itch to try fixing some things and I didn't really want to do JS backend code on weekends, so why not try to build a similar app?

Also I have been wanting to do some "vibe coding" with AI and I seem to have unlimited Gemini 2.5 in VS Code for some reason, so why not try to do that too?

Well: it worked, you can see the result at ToCry, get the app and use it. It's pretty nice!

alt text

Now, I am not saying it's perfect, but it works and I learned a lot. Did AI help? Well, yeah. Most of the frontend code was AI generated, most of the backend code is mine with AI autocompleting stuff.

And this got written pretty fast, I only started it two days ago!

In fact, in those two days it was written twice... because the first time it was absolute garbage.

It was so bad I not only rewrote it, I removed the repo history and pushed the new code over it. I didn't want to keep that code near, I did not want it to influence the rewrite.

What Went Wrong The First Time

The first time I went UI-first. I just prompted and prompted and prompted on a HTML file, trying to get Gemini to generate a nice UI that worked with all the data kept on the client side.

This seemed to work well, in a few hours I had a nice UI that was at least functional. However at one point Gemini hit a wall, hard.

Trying to add a new feature or fixing a bug was almost impossible, Gemini would get in loop after loop, doing and reverting the same changes over and over.

When I went to manually check the code, it was a bowl of spahetti, and refactoring it was beyond Gemini's capabilities. Connecting it to a backend was doable (and got done) but then state was kept in the client and the backend, and Gemini refused to move it and consistency was a nightmare.

It became increasingly clear that the code was of no value. So then I nuked it, salted the earth, and started over.

What Went Right The Second Time

The second time I went backend-first. I wrote a simple backend in Crystal, with a simple API that returned JSON data for the entities I wanted to manage: tasks, lanes, boards, etc.

After the data structures were clear, prompts like "create a PUT /note/:id endpoint that updates the note" worked great. The trick is, of course, that those requests are pretty much context-free, so Gemini didn't have to figure out how to connect one thing to another, it was just writing almost-boilerplate code.

After several of these were created, I intermingled refactoring prompts.

"Do you see any code that can be moved to a common function?"
"Please assume this and that are always true and stop checking"
"Change the names of this and that to something descriptive and short"

I still am not a huge fan of the code Gemini writes. It has a tendency to do multiple intermediate steps that are not needed and using intermediate variables in some sort of over descriptive notation all the time.

Ah, and adds stupid comments. Many stupid comments. This is an example of its code:

self.notes.each_with_index do |note, index| # Changed to each_with_index to get the index
        note.save                   # This saves the note to data/.notes/note_id.md

        padded_index = index.to_s.rjust(4, '0')
        sanitized_title = note.slug # Note instance method for slug

        symlink_filename = "#{padded_index}_#{sanitized_title}.md"
        source_note_path = File.join("..", ".notes", "#{note.id}.md") # Relative path for symlink
        symlink_target_path = File.join(lane_dir, symlink_filename)
        ...

Usually, after a function is "finished", I would go and do a pass "for taste" to make the code more readable, remove the unnecessary comments, and make it more concise, but the code is functional and this is CRUD, not a fashion show.

The same happened with the frontend code. I prompted slow, bit by bit:

Add a button to create a new lane, so when the user clicks it they are asked for the lane name. Then it calls POST /lane with the right data
Get the lane data from /lanes and display it as a horizontal list of containers
Add a button to the lane to delete it, which prompts the user for confirmation and then calls DELETE /lane/:id

And so on.

Sometimes, I tried larger prompts where I asked for a whole feature, but those were only occasionally successful. Finding the right granularity is key, and I found that the best granularity was "one function" or "one component".

Again, mixing refactoring as we went along helped a lot to keep the code clean and organized, as well as separated in reasonably functions, keeping spaghetti at bay.

What I Learned

Gemini codes like a junior. It does not understand the big picture, it does not understand the code it writes, It can write code that works ( I assume because the Internet is full of working code) but it cargo cults AF.

On the other hand, the code tends to work? And I can fix it pretty quick when it doesn't? It's not bad at simple refactoring, and what it needs more than anything is a manager.

Yes, that seems to be the bad news. Coding with AI felt like being a manager again. A manager with a very, very, very eager junior dev who doesn't sleep and feels soooo clever by describing uninitialized variables as "a classic example of a variable that has not been initialized yet".

Would I Do It Again?

Yeah. I have a ton of things I want to write, and this way I can write them faster. I can also put effort in the parts that matter, like data structures and algorithms and let Gemini do the silly CSS and CRUD.

Yeah, I don't care if the CSS is redundant, as long as it looks ok I am happy telling Gemini to "make the rows tighter" and who cares how.

CRUD? It's a solved problem. I will do a pass to clean up.

Data structures? I will do 90% of the work, because that is what makes the base of the app, and I want it to be solid.

Ethical Thoughts

I feel dirty, and like I am cheating. I am probably stealing code from other people. Use Gemini to write a Dockerfile and it will happily autocomplete things with fragments of things including repo names which exist in the Internet so it's not even a guess, I know it's copy pasting other people's code.

OTOH, I always copy pasted code from other people's code.

Writing a simple parser using a finite state machine

2025-06-06 21:21

I wrote a literate programming tool called Crycco and at its core is a very simple parser which extracts comment blocks from source code and organizes them into a document.

If you want to see an example, Crycco's website is made out of its own code by processing it with itself.

The format is very simple: you write a comment block, then some code, then another comment block, and so on. The comment blocks are taken as markdown, and the code blocks are taken as source code.

The document is then split in sections: 1 chunk of doc, 1 chunk of code, and processed to make it look nice, with syntax highlighting and so on.

The parser was just a couple dozen lines of code, but when I wanted to add more features I ran into a wall, so I rewrote it using a finite state machine (FSM) approach.

Since, again, Crycco is literate code, you can see the parser by just looking at the source code of the parser itself, which is in the Document class

Of course, that may not be enough, so let's go into details.

The state machine has a few states:

    enum State
      CommentBlock
      EnclosingCommentBlock
      CodeBlock
    end

A comment block is something like this:

    # This is a comment
    # More comment

An enclosing comment block is a "multiline" comment, like this:

 /* This is a comment
 More comment
 */

Code blocks are lines that are not comments :-)

So, suppose you are in the CommentBlock state, and you see a line that starts with # you stay in the same state.

If you see a line that does not start with #, you switch to the CodeBlock state.

When you are in the CommentBlock state, the line you are in is a comment. If you are in the CodeBlock state, the line is code.

Here are the possible transitions in this machine:

    state_machine State, initial: State::CodeBlock do
      event :comment, from: [State::CodeBlock], to: State::CommentBlock
      event :enclosing_comment_start, from: [State::CodeBlock], to: State::EnclosingCommentBlock
      event :enclosing_comment_end, from: [State::EnclosingCommentBlock], to: State::CodeBlock
      event :code, from: [State::CommentBlock], to: State::CodeBlock
    end

Then, to parse the document, we go line by line, and call the appropriate event depending on the line we are reading. That event may change the state or not.

For example:

        if is_comment.match(line) && !NOT_COMMENT.match(line)
          self.comment {
            # These blocks only execute when transitions are successful.
            #
            # So, this block is executed when we are transitioning
            # to a comment block, which means we are starting
            # a new section
            @sections << Section.new(@language)
          }
          # Because the docs section is supposed to be markdown, we need
          # to remove the comment marker from the line.
          processed_line = processed_line.sub(@language.match, "") unless @literate

And that's it! We send the proper events to the machine, the machine changes state, we handle the line according to what state we are in, and we end up with a nicely parsed document.

Parsers are somewhat scary, but they don't have to be. A finite state machine is a very simple way to write a parser that is easy to understand and maintain, and often is enough.

Have fun parsing!

Literate version of grafito code (WIP)

2025-06-03 13:03

In the past couple of weeks I have started (and pretty much finished) a tool called Grafito and the end result is under two thousand lines of code, including HTML and CSS.

For smaller codebases, I think it makes sense to make them literate. Just a couple hours writing around the code can make it perfectly clear to understand for anyone starting with the codebase and (to be honest) also for the me of the future who will remember nothing about it.

So I am publishing the commented codebase of grafito processed through Crycco a literate programming tool I wrote. Yes, the website for Crycco is Crycco's source code. That's traditional :-)

The code is not yet fully commented and I have found a couple bugs in Crycco already:

Links in the sidebar are wrong in some cases
There is no way to publish a literate HTML file!

In any case, I expect nobody cares, but I think it's nice and it's not a ton of effort so that makes it worth doing, so I did it.

New project: Grafito, a friendly frontend for your logs

2025-05-28 14:01

Tl; DR: If you go to https://github.com/ralsina/grafito you can get a single binary which will give you a web frontend for your logs, which looks like this:

Slightly longer version:

It supports filtering with different criteria
It uses very, very few resources
It takes no or very little configuration

Enjoy!

PS: This project is an implementation of this post

Ralsina.Me — Roberto Alsina's website

Posts about crystal