Skip to main content

Ralsina.Me — Roberto Alsina's website

Using duckdb to make CSV files talk

Some­times you want to ask da­ta ques­tion­s. And of­ten that da­ta is in a CSV. Sure, you can write a quick Python script and use that to ex­tract the in­for­ma­tion you wan­t. Or you can im­port it in­to a data­base and use SQL.

But TIL the eas­i­est thing is to just ask the duck.

The duck is Duck­DB here.

Why? Be­cause you can use SQL queries di­rect­ly on CSV files.

For examples, let's use a random CSV called luarocks-packages.csv I have lying around:

It starts like this:

name,src,ref,server,version,luaversion,maintainers
alt-getopt,,,,,,arobyn
bit32,,,,5.3.0-1,5.1,lblasc
argparse,https://github.com/luarocks/argparse.git,,,,,
basexx,https://github.com/teto/basexx.git,,,,,
binaryheap,https://github.com/Tieske/binaryheap.lua,,,,,vcunat
busted,,,,,,
cassowary,,,,,,marsam alerque
cldr,,,,,,alerque
compat53,,,,0.7-1,,vcunat
cosmo,,,,,,marsam

And how do I query it? Well, sup­pose I want to find all pack­ages where alerque is one of the main­tain­er­s:

> duckdb
v1.0.0 1f98600c2c
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D select name from 'luarocks-packages.csv' where maintainers like '%alerque%';
┌───────────┐
│   name    │
│  varchar  │
├───────────┤
│ cassowary │
│ cldr      │
│ fluent    │
│ loadkit   │
│ penlight  │
└───────────┘

And boom! There you go. So, if you know even some very ba­sic SQL (and you should!) you can lever­age duck­db to ex­tract in­for­ma­tion from CSV files quick­ly, re­li­ably and in a re­peat­able man­ner.

Which is awe­some!


Contents © 2000-2024 Roberto Alsina