Skip to main content

Ralsina.Me — Roberto Alsina's website

Using duckdb to make CSV files talk

Some­times you want to ask da­ta ques­tion­s. And of­ten that da­ta is in a CSV. Sure, you can write a quick Python script and use that to ex­tract the in­for­ma­tion you wan­t. Or you can im­port it in­to a data­base and use SQL.

But TIL the eas­i­est thing is to just ask the duck.

The duck is Duck­DB here.

Why? Be­cause you can use SQL queries di­rect­ly on CSV files.

For examples, let's use a random CSV called luarocks-packages.csv I have lying around:

It starts like this:

cassowary,,,,,,marsam alerque

And how do I query it? Well, sup­pose I want to find all pack­ages where alerque is one of the main­tain­er­s:

> duckdb
v1.0.0 1f98600c2c
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D select name from 'luarocks-packages.csv' where maintainers like '%alerque%';
│   name    │
│  varchar  │
│ cassowary │
│ cldr      │
│ fluent    │
│ loadkit   │
│ penlight  │

And boom! There you go. So, if you know even some very ba­sic SQL (and you should!) you can lever­age duck­db to ex­tract in­for­ma­tion from CSV files quick­ly, re­li­ably and in a re­peat­able man­ner.

Which is awe­some!

Contents © 2000-2024 Roberto Alsina