Skip to main content

Ralsina.Me — Roberto Alsina's website

If it's worth doing, it's worth doing right.

Yes­ter­day in the PyAr mail­ing list a "sil­ly" sub­ject ap­peared: how would you trans­late span­ish to rosari­no?

For those read­ing in en­glish: think of rosari­no as a sort of pig lat­in, where the ton­ic vow­el X is re­placed with XgasX, thus "rosar­i­o" -> "rosagasar­i­o".

In eng­lish this would be im­pos­si­ble, but span­ish is a pret­ty reg­u­lar lan­guage, and a writ­ten word has enough in­for­ma­tion to know how to pro­nounce it, in­clud­ing the lo­ca­tion of the ton­ic vow­el, so this is pos­si­ble to do.

Here is the thread.

It's looong but, fi­nal out­come, since I am a nerd, and a pro­gram­merm and pro­gram­mers pro­gram, I wrote it.

What sur­prised me is that as soon as I start­ed do­ing it, this throw­away pro­gram, com­plete­ly use­less...I did it clean­ly.

  • I used doc­trings.

  • I used doctest­s.

  • I was care­­ful with uni­­code.

  • Com­­ments are ad­e­quate

  • Fac­­tor­ing in­­­to func­­tions is cor­rect

A year ago I would­n't have done that. I think I am fin­ish­ing a stage in my (s­low, stum­bling) evo­lu­tion as a pro­gram­mer, and am cod­ing bet­ter than be­fore.

I had a ten­den­cy to, since python lets you write fast, write fast and dirty. Or slow and clean. Now I can code fast and clean, or at least clean­er.

BTW: this would be an ex­cel­lent ex­er­cise for "ju­nior" pro­gram­mer­s!

  • It in­­­volves string ma­nip­u­la­­tion which may (or may not) be han­­dled with reg­ex­p­s.

  • Us­ing tests is very quick­­­ly re­ward­ing

  • Makes you "think uni­­code"

  • The al­­go­rithm it­­self is not com­­pli­­cat­ed, but trick­­y.

BTW: here is the (maybe stupid­ly over­thought) pro­gram, gaso.py:

# -*- coding: utf-8 -*-

"""
Éste es el módulo gasó.

Éste módulo provee la función gasear. Por ejemplo:

>>> gasear(u'rosarino')
u'rosarigasino'
"""

import unicodedata
import re

def gas(letra):
    '''dada una letra X devuelve XgasX
    excepto si X es una vocal acentuada, en cuyo caso devuelve
    la primera X sin acento

    >>> gas(u'a')
    u'agasa'

    >>> gas (u'\xf3')
    u'ogas\\xf3'

    '''
    return u'%sgas%s'%(unicodedata.normalize('NFKD', letra).encode('ASCII', 'ignore'), letra)

def umuda(palabra):
    '''
    Si una palabra no tiene "!":
        Reemplaza las u mudas de la palabra por !

    Si la palabra tiene "!":
        Reemplaza las "!" por u

    >>> umuda (u'queso')
    u'q!eso'

    >>> umuda (u'q!eso')
    u'queso'

    >>> umuda (u'cuis')
    u'cuis'

    '''

    if '!' in palabra:
        return palabra.replace('!', 'u')
    if re.search('([qg])u([ei])', palabra):
        return re.sub('([qg])u([ei])', u'\\1!\\2', palabra)
    return palabra

def es_diptongo(par):
    '''Dado un par de letras te dice si es un diptongo o no

    >>> es_diptongo(u'ui')
    True

    >>> es_diptongo(u'pa')
    False

    >>> es_diptongo(u'ae')
    False

    >>> es_diptongo(u'ai')
    True

    >>> es_diptongo(u'a')
    False

    >>> es_diptongo(u'cuis')
    False

    '''

    if len(par) != 2:
        return False

    if (par[0] in 'aeiou' and par[1] in 'iu') or \
    (par[1] in 'aeiou' and par[0] in 'iu'):
        return True
    return False

def elegir_tonica(par):
    '''Dado un par de vocales que forman diptongo, decidir cual de las
    dos es la tónica.

    >>> elegir_tonica(u'ai')
    0

    >>> elegir_tonica(u'ui')
    1
    '''
    if par[0] in 'aeo':
        return 0
    return 1

def gasear(palabra):
    """
    Convierte una palabra de castellano a rosarigasino.

    >>> gasear(u'rosarino')
    u'rosarigasino'

    >>> gasear(u'pas\xe1')
    u'pasagas\\xe1'

    Los diptongos son un problema a veces:

    >>> gasear(u'cuis')
    u'cuigasis'

    >>> gasear(u'caigo')
    u'cagasaigo'


    Los adverbios son especiales para el castellano pero no
    para el rosarino!

    >>> gasear(u'especialmente')
    u'especialmegasente'

    """
    #from pudb import set_trace; set_trace()

    # Primero el caso obvio: acentos.
    # Lo resolvemos con una regexp

    if re.search(u'[\xe1\xe9\xed\xf3\xfa]',palabra):
        return re.sub(u'([\xe1\xe9\xed\xf3\xfa])',lambda x: gas(x.group(0)),palabra,1)


    # Siguiente problema: u muda
    # Reemplazamos gui gue qui que por g!i g!e q!i q!e
    # y lo deshacemos antes de salir
    palabra=umuda(palabra)

    # Que hacemos? Vemos en qué termina

    if palabra[-1] in 'nsaeiou':
        # Palabra grave, acento en la penúltima vocal
        # Posición de la penúltima vocal:
        pos=list(re.finditer('[aeiou]',palabra))[-2].start()
    else:
        # Palabra aguda, acento en la última vocal
        # Posición de la última vocal:
        pos=list(re.finditer('[aeiou]',palabra))[-1].start()

    # Pero que pasa si esa vocal es parte de un diptongo?

    if es_diptongo(palabra[pos-1:pos+1]):
        pos += elegir_tonica(palabra[pos-1:pos+1])-1
    elif es_diptongo(palabra[pos:pos+2]):
        pos += elegir_tonica(palabra[pos:pos+2])


    return umuda(palabra[:pos]+gas(palabra[pos])+palabra[pos+1:])

if __name__ == "__main__":
    import doctest
    doctest.testmod()
Pablo / 2010-03-19 16:21:

Muy bueno el codigo, lo lei en un ratito.

Steinke Elida / 2010-08-06 09:18:

it is the script translate spanish to rosarino?don't know,but thanks you and learning
discount watches

jayboyd / 2011-09-16 09:48:

This post is inspiring and encouraging. I can relate. After many years programming, I've found myself recently feeling a need to be test-driven, document, add UML diagrams to my personal idle time projects. I hope as you suggest that this marks the cusp of a new level of maturity in my evolution as a software maker. 

phone number lookup / 2011-12-03 22:31:

this is really interesting viewpoint on the subject i might add

employment background check / 2011-12-27 23:28:

Man ... Beautiful . Amazing ... I will bookmark your website and use the your RSS feed also


Contents © 2000-2024 Roberto Alsina