Skip to main content

Ralsina.Me — Roberto Alsina's website

Unknown jewels of Unix: NoSQL

I have known about Car­lo Strozzi's NoSQL for about 6 years. I don't think many oth­ers do (it's google score is 39500, com­pared with 81200000 for MySQL) , and I think it's a shame, be­cause it's a much more in­ter­est­ing idea.

What is it? It's a re­la­tion­al data­base.

Sure, I can al­most hear you think [1] , "like Mysql!" or "like Post­gres!" and you are wrong be­cause it's very dif­fer­en­t.

NoSQL is a re­la­tion­al data­base im­ple­ment­ed in AWK and op­er­at­ed via the unix shel­l. Yes, NoSQL is the most unixy RDBMS in the world. So all of you so-­called unix lover­s, shell freak­s, you are gonna love this.

In­stalling it is per­haps a bit hard­er than it should (who knew there are no MAWK RPMs any­more!), but it seems to be packed for De­bian at least ;-)

You can learn to use NoSQL with a nice [2] tu­to­ri­al that was pub­lished on Lin­ux Jour­nal.

The ta­bles are plain text files, and lat­er you can do things like this (from the doc­s):

% column Item Cost Value < inventory |
row 'Cost > 50' | mean -l Value
Item   Cost  Value
----   ----  -----
3      80    400
6      147   13083
7      175   875
----   ----  -----
              4786

Is­n't that cool? What it does is take the in­ven­to­ry table, se­lect over col­umns Item, Cost, Val­ue the rows with Cost > 50, then cal­cu­late the mean of the Val­ue col­umn :-)

It even sup­ports re­port­s, join­s, a bazil­lion things.

Hon­est­ly, I am not sure I can find a prac­ti­cal use for it [3], but it is a great piece of soft­ware, writ­ten by a ded­i­cat­ed guy, it is ex­treme­ly orig­i­nal, and it does work. I'd even say it's pret­ty pow­er­ful, and you must ac­cep­t, it's unixy as all hel­l.

So, Car­lo Strozzi, here's a toast for you and NoSQL. You are cool.


[1] So good is my hear­ing

[2] Sad­ly it's quite old

[3] I did on­ce, but that was when I was a re­al pro­gram­mer ;-)

25.298

And no, I am not try­ing to take Aaron's spot as most fre­quent poster ;-)

Bound by Smoke I

This is what I un­der­stood of Smoke so far. I may be way of­f, since it is C++ sor­cery of a high­er lev­el than I'm used to, but I re­al­ly think I am get­ting the hang of it (and a bunch of thanks to Richard Dale and Ash­ley Win­ters who are the ones that made me un­der­stand so far. Any mis­takes a re my fault, any good thing is theirs ;-).

This piece is on­ly half of the sto­ry, though. Maybe one third.

Concept

Since Smoke's goal is to help you write bind­ings for lan­guages oth­er than C++, it pro­vides a way to ac­cess Qt's API. The orig­i­nal thing about Smoke is that it does so by pro­vid­ing you with a small­er, more dy­nam­ic API that maps on­to Qt's.

You could write a Qt C++ pro­gram us­ing Smoke as the API in­stead of us­ing Qt's. In fac­t, you can see it here writ­ten by Ash­ley Win­ter­s.

I had to re­name it hel­lo.cpp to make it work be­cause that looks like C but may not re­al­ly be C ;-)

As you can see, the Smoke ver­sion is quite a bit more com­plex than the Qt one. But that's ok, re­mem­ber that the goal is a bind­ing, which means that what you need to make your life sim­pler is less API... which is what makes the pro­gram more ver­bose.

Let's ex­am­ine the Smoke hel­lo.cpp in de­tail.

One key point is call­Method:

// call obj->method(args)
void callMethod(Smoke *smoke, void *obj, Smoke::Index method, Smoke::Stack args) {
        Smoke::Method *m = smoke->methods + method;
        Smoke::ClassFn fn = smoke->classes[m->classId].classFn;
        fn(m->method, obj, args);
}

If you have pro­grammed in Python or Ru­by or a num­ber of oth­er dy­nam­ic lan­guages you may guess what this does al­ready.

It takes as ar­gu­ments a num­ber of things which still have to be ex­plained but the main gist is there is an ob­jec­t, there is a method, there are args, and it ends call­ing ob­j->method­(args).

The first ar­gu­ment is a Smoke point­er , which is the big ob­ject in the Smoke li­brary, cre­at­ed in our pro­gram by the init_smoke func­tion.

A Smoke ob­ject is a strange beast. It con­tains a de­scrip­tion (in this case, be­cause we got qt_smoke) for the whole Qt API.

You can find class­es in it in­dexed by their names, and meth­ods for each class in­dexed by their names and types of ar­gu­ments.

That is why you can have a gener­ic call­Method that will work for any kind of ob­ject and for any method in the class, be­cause all of them are some­where in the Smoke ob­jec­t.

The sec­ond ar­gu­ment is void *obj which is the ob­ject it­self we are ma­nip­u­lat­ing. So, if you are try­ing to call QLa­bel::set­Tex­t, it will have to be a Qla­bel* cast­ed as void*.

In the hel­lo.cpp ex­am­ple, we even cre­ate these ob­jects us­ing Smoke's API (see lat­er).

The third ar­gu­ment is a Smoke::In­dex which is what Smoke us­es to find the re­quest­ed method in its method ta­ble. This In­dex we get us­ing the get­Method func­tion, which is the sec­ond key piece of code:

// given class-name and mangled function-name, return an unambiguous method ID
Smoke::Index getMethod(Smoke *smoke, const char* c, const char* m) {
        Smoke::Index method = smoke->findMethod(c, m);
        Smoke::Index i = smoke->methodMaps[method].method;
        if(i <= 0) {
                // ambiguous method have i < 0; it's possible to resolve them, see the other bindings
                fprintf(stderr, "%s method %s::%s\n",
                i ? "Ambiguous" : "Unknown", c, m);
                exit(-1);
        }
        return i;
}

Here is an ex­am­ple of a get­Method cal­l, where we are get­ting QAp­pli­ca­tion::set­Main­Wid­get

Smoke::Index method = getMethod(smoke, "QApplication", "setMainWidget#");

As you can see, we search for the method us­ing strings of the class name and method name. Ex­capt for that pesky # at the end of set­Main­Wid­get#.

That is a ba­sic ar­gu­men­t-­type­-­man­gling scheme, since there can be more than one QAp­pli­ca­tion::set­Main­Wid­get on the Qt side of the fence, we are say­ing we want the one that has an ob­ject as the first and on­ly ar­gu­men­t. Here is the key to the man­gling tak­en from smoke.h:

* The munging works this way:
* $ is a plain scalar
* # is an object
* ? is a non-scalar (reference to array or hash, undef)
*
* e.g. QApplication(int &, char **) becomes QApplication$?

I am not yet com­plete­ly clear on how this is enough to do all the work (for ex­am­ple, what hap­pens if you have two meth­ods that take dif­fer­ent ob­jects as on­ly ar­gu­men­t?) but it's what I saw :-)

The last ar­gu­men­t, args is of Smoke::S­tack type, and it's the tricky one, at least for me.

Here's how it's used in the pre­vi­ous ex­am­ple, QAp­pli­ca­tion::set­Main­Wid­get

// qapp->setMainWidget(l)
Smoke::Index method = getMethod(smoke, "QApplication", "setMainWidget#");
Smoke::StackItem args[2];
smokeCast(smoke, method, args, 1, l, "QLabel");
smokeCastThis(smoke, method, args, qapp, "QApplication");
callMethod(smoke, args[0].s_class, method, args);

A Smoke::S­tack is a way to pass the ar­gu­ments to be used with the method get­Method gave us.

We first cre­ate an ar­ray of 2 Stack­Item­s:

Smoke::StackItem args[2];

Then we as­sign a val­ue to the sec­ond of them:

smokeCast(smoke, method, args, 1, l, "QLabel");

Here l is a point­er to a QLa­bel. ( Ok, it is re­al­ly de­clared as a void* be­cause, re­mem­ber, we are not us­ing the Qt API, so we have no clue what a QLa­bel is ;-) and what we are do­ing is stor­ing in args[1] a cast­ed ver­sion of l.

The ex­act de­tails of why you have to pass smoke and method are not that im­por­tan­t, and they seem pret­ty in­volved, so I won't try to go there, at least not yet. This has to be done for each ar­gu­ment for the method.

Then we have a sim­i­lar, yet dif­fer­ent line:

smokeCastThis(smoke, method, args, qapp, "QApplication");

This puts the qapp void * in args[0], cast­ed to QAp­pli­ca­tion. There are tricky C++ rea­sons why this is done slight­ly dif­fer­ent here than on smoke­Cast, which I am not 100% sure I get right, so I will keep qui­et ;-)

This spe­cial case is on­ly for the ob­ject to which the method be­longs (the this ob­jec­t).

Here is the code for smoke­Cast and smoke­Cast­This

// cast argument pointer to the correct type for the specified method argument
// args[i].s_class = (void*)(typeof(args[i]))(className*)obj
void smokeCast(Smoke *smoke, Smoke::Index method, Smoke::Stack args, Smoke::Index i, void *obj, const char *className) {
        // cast obj from className to the desired type of args[i]
        Smoke::Index arg = smoke->argumentList[
                smoke->methods[method].args + i - 1
        ];
        // cast(obj, from_type, to_type)
        args[i].s_class = smoke->cast(obj, smoke->idClass(className), smoke->types[arg].classId);
}

// cast obj to the required type of this, which, dur to multiple-inheritance, could change the pointer-address
// from the one returned by new. Puts the pointer in args[0].s_class, even though smoke doesn't do it that way
void smokeCastThis(Smoke *smoke, Smoke::Index method, Smoke::Stack args, void *obj, const char *className) {
        args[0].s_class = smoke->cast(obj, smoke->idClass(className), smoke->methods[method].classId);
}

But where did we get l or qap­p? You can use these same mech­a­nisms to cre­ate an ob­jec­t:

void *qapp;
{
    // new QApplication(argc, argv)
    Smoke::Index method = getMethod(smoke, "QApplication", "QApplication$?");
    Smoke::StackItem args[3];
    args[1].s_voidp = (void*)&argc;
    args[2].s_voidp = (void*)argv;
    callMethod(smoke, 0, method, args);

    qapp = args[0].s_class;
}

You get QAp­pli­ca­tion::QAp­pli­ca­tion(s­calar,un­de­f) which should hope­ful­ly map to QAp­pli­ca­tion::QAp­pli­ca­tion(argc,argv). You cre­ate a Smoke::S­tack of 3 item­s. The first is un­set be­cause this is a con­struc­tor, so it has no this yet, and the oth­er two are argc and argv.

Then you call it through call­Method, and you get the re­sult­ing ob­ject via args[0].s_­class.

Lat­er you re­peat this sort of op­er­a­tion for ev­ery method call you wan­t, and you got your­self an ap­pli­ca­tion.

The binding side of things

So, how do you use this to bind your lan­guage to Qt?

Well, you will have to cre­ate an ob­ject in your lan­guage called, for ex­am­ple, QAp­pli­ca­tion, and map the "miss­ing method" mech­a­nism (y­our script­ing lan­guage prob­a­bly has one) which is used when a method is un­de­fined so that some­how it finds the right Smoke method (man­gling the name cor­rect­ly should be the hard part), then cre­ate the Smoke::S­tack with the passed ar­gu­ments, cal­l, get the re­sult, wrap it in some lan­guage con­struct you can use of the side of your lan­guage, and pass it back.

It looks in­volved, and I am sure it's trick­y, but at least it on­ly has to be done once un­like on tra­di­tion­al bind­ings, where you had to do it for ev­ery class and for ev­ery method.

The tra­di­tion­al so­lu­tion was to au­to­mat­i­cal­ly gen­er­ate the code for such wrap­ping (like SWIG does). I think Smoke is less er­ror prone.

If I keep on un­der­stand­ing things, there may be a sec­ond part of this ar­ti­cle, ex­plain­ing the Smoke­Bind­ing class, and per­haps a short ex­am­ple of how to bind Qt in­to a lan­guage (Io is a strong can­di­date).

The reg­u­lar­i­ty of Io's syn­tax is prob­a­bly go­ing to make the bind­ing sim­pler than most.

Bacula is good, backups are hard

Just fin­ished im­ple­ment­ing a net­work back­up so­lu­tion for a cus­tomer us­ing Bac­u­la and reached these com­clu­sion­s:

  • Bac­u­la is hard. But that is most­­ly be­­cause the prob­lem is hard. You need to back­­up 30 dif­fer­­ent com­put­ers over a net, in a se­cure man­n­er, with dif­fer­­ent da­­ta sets for each... well, it is go­ing to in­­­volve defin­ing 30 datasets and so on.

  • Bac­u­la is very good. It sup­­ports VSS, which al­le­vi­ates the clas­sic win­­dows "that file is open you can not read it" prob­lem.

  • Back­­ing up Win­­dows suck­­s.

    1. It is pret­­­ty hard to know what to back­­­up ex­ac­t­­­ly to catch all con­­­figs and us­er da­­­ta. Not so much on XP, but on old­er ver­­­sion­s... yikes.

    2. You pret­­­ty much can't ev­er be sure you are al­lowed to back up ev­ery­thing. I have a file­serv­er with files the lo­­­cal ad­min­is­­­tra­­­tor can not read. So, it seems a us­er can cre­ate un­back­­­u­­­pable files!

  • I still need a de­­cent sys­tem back­­up for Lin­ux. I have used Mon­­do for a long time, but it's a pain in the but­t, and it's get­t­ing more painful as time pass­es. I want some­thing that I call and I get a nice DVD with the whole sys­tem in it. If you have a sug­­ges­­tion, please drop a com­­ment ;-)

  • A sys­tem that au­­to­­mates the back­­up pol­i­­cy as much as Bac­u­la does is great. Ex­­cept that it makes it quite hard to guess the ex­act amount of stor­age (I am do­ing disk-based back­­up­s) you will need.

  • For sim­­pler stuff, you should use flexback­­up. Their slo­­gan is aman­­da is "too much", and tar­ring things up by hand is­n't near­­ly enough and it pret­­ty much is what it does.

  • For per­­son­al stuff, rdif­f-back­­up is awe­­some. I re­al­­ly should write a graph­i­­cal tool for it one of these years :-)

Ok, so, I am a lazy guy

I just re­al­ized I have not learned a whole new re­al lan­guage in al­most 5 years.

I have learned a few di­alect­s, some mi­nor stuff to work around or through an ob­sta­cle in some pro­jec­t, but I have not writ­ten a whole pro­gram in any­thing but python since 2001.

So, since I don't want my brain to turn in­to mues­li, I will now pro­ceed to learn Io .

Nice OO lan­guage, pro­to­type­-based, which is some­thing I haven't run in­to out­side of JS, I think, a sort of LISPy fla­vor, with a dash of smalltalk for pi­quan­cy.

Maybe I will get re­al­ly bored with it, maybe not :-)

And yes, the pre­vi­ous post was about hav­ing the idea of hack­ing a Qt bind­ing for it, but it's prob­a­bly not doable right now, since ex­tend­ing the lan­guage is pret­ty much un­doc­u­ment­ed, and ex­tend­ing it in C++ is un­heard of.

To top that, it us­es con­tin­u­a­tion­s, which prob­a­bly cause all kinds of hell on Qt's mem­o­ry man­age­ment :-(