<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>sjh - mountain biking running linux vegan geek spice   </title>
    <link>https://www.svana.org/sjh/diary</link>
    <description>mtb / vegan / running / linux / canberra / cycling / etc</description>
    <language>en</language>

  <item>
    <title>[comp/prog] An interesting languages comparison</title>
    <pubDate>Mon, 01 Jun 2009 15:45:00 </pubDate>
    <link>https://www.svana.org/sjh/diary/2009/06/01#2009-06-01_01</link>
    <description>&lt;!-- 2009-06-01 15:45:07 --&gt;

I got the link to this from 
&lt;a href=&quot;http://bakeyournoodle.com/~tony/diary/&quot;&gt;Tony&lt;/a&gt; and it is interesting
to see the results of these tests. 
&lt;a href=&quot;http://gmarceau.qc.ca/blog/2009/05/speed-size-and-dependability-of.html&quot;&gt;The
speed, size and dependability of programming languages&lt;/a&gt; uses code from the
Computer Language Benchmarks Game to generate some information comparing many
(72) different languages.

&lt;p&gt;

Back in 1999 and 2000 I put a pretty trivial example of a single problem being
solved in multiple languages online. In this case scanning html for 
&lt;a href=&quot;https://svana.org/sjh/entity/&quot;&gt;entities&lt;/a&gt;, largely because I was
mildly interested in how different languages and the different implementations
of them may solve the same problem and the time it would take. I say mildly
interested because it is such a trivial example and because I did not put much
effort in. (I was amazed a few weeks ago to get an email from someone
rerunning these to see if recent Java implementations had caught up to c yet).

&lt;p&gt;

The person who wrote this speed, size and dependability post put a lot more
effort in and actually was able to draw some interesting conclusions about
languages and how they work and develop over time. For the geeks out there I
recommend having a look.</description>
  </item>
  <item>
    <title>[comp/prog] Move a little thing to python</title>
    <pubDate>Thu, 08 May 2008 13:44:00 </pubDate>
    <link>https://www.svana.org/sjh/diary/2008/05/08#2008-05-08_01</link>
    <description>&lt;!-- 2008-05-08 13:44:01 --&gt;

At ANU there is an online (web page) searchable phone database for all ANU
phone numbers. A few years ago (July 2002, according to the version control
dates) I spent an hour or two writing a command line program in perl that
queries this and prints the results. I find it much easier to use a command
line application than open a tab in a web browser and find the appropriate
page and enter a query when all I want is a simple bit of information back. I
suspect most of the staff in this department are similar (Computer Science).

&lt;p&gt;

Sometime last year I realised that though the URL I was using on the ANU
Internal Web still worked it seemed not to interface with the latest phone
database for the uni so it sometimes did not match people I knew worked on
campus, other times it contained out of date numbers for people. However there
were other important uses for my time so I did not bother looking too closely
into updating it when most of the time the old results were still good enough.

&lt;p&gt;

Finally this week Bob noticed there were no matches coming back, it seems the
old interface no longer connected to the database correctly. Thus I opened the
program and had a look at updating it. The old program used LWP to fetch the
page with a GET request. The newer interface now on ANU Web works properly
with a POST request. Also the result page is more complex to parse than the
old one (more complex regular expressions, or maybe a small state machine
needed). Still it did not look too hard to spend an hour or so fixing the old
perl code up to get the new page and parse it properly for the desired
results.

&lt;p&gt;

However I hit a snag when for some reason LWP did not fetch the entire result
from the web server that was returning the data in chunks. A tcpdump session
showed it simply closed the request rather then fetch all the data. At this
point I could have debugged the perl code and fixed, after all there is no
good reason LWP should not work. However I thought to myself, I have been keen
to write python a bit for a while. Bob bought the Mark Lutz Programming Python
book for my office and I read through about half of it. So why not rewrite the
program in python. See how a perl hacker can transfer to using python at least
for a small program.

&lt;p&gt;

I am happy to say that the page fetching in python even made perl look
complex, the code that did the job (and worked, doing a post request fine) was

&lt;p&gt;

&lt;pre&gt;
   name = ' '.join(sys.argv[1:])
   params = urllib.urlencode({'stype': 'Staff Directory', 'button': 'Search', 'querytext': name})
   f = urllib.urlopen(searchuri, params)
   r = f.read()
&lt;/pre&gt;

&lt;p&gt;

Cool I thought, this is hell easy, what a fantastic language, I will forever
give up my perl ways if everything is this easy and obvious. Obviously this
was not going to last, I guess partly because my brain meshes with perl well
after so many years, and I am used to perl associative arrays, classes,
modules, and regular expressions. Anyway I now had my result from the search
and all I had to do was parse it and extract a form that can be printed on a
terminal nicely.

&lt;p&gt;

First I tried using the python regular expression matching and needed to
create some hideous regexp to match the data returned. I also discovered that
when a search matches more than about 2 people the data is returned in a
different format. Fortunately in this second case the format is really easy to
match against with a regexp. Even though the regexp language is
similar/identical to perl I was still getting my head around the documentation
for all of what I was doing and could not at first construct a regexp that
made sense to parse the first sort of data. So I decided to get a HTMLParser
and extract the data I wanted without the crap in the tags.

&lt;p&gt;

My first attempt was to use the 
&lt;a href=&quot;http://docs.python.org/lib/module-HTMLParser.html&quot;&gt;HTMLParser&lt;/a&gt;
module, however I soon found that this threw an exception when ever I fed it
the page from the uni with the matches in it. I tried except: pass in the
hopes it would keep on going, however it stopped there and did not process the
rest of the page. So I had to change to using the 
&lt;a href=&quot;http://docs.python.org/lib/module-HTMLParser.html&quot;&gt;htmllib.HTMLParser&lt;/a&gt; 
which was almost identically easy to use and managed to process the entire
page.

&lt;p&gt;

Next I wanted to store the data until all matches were found, in perl this
would be trivial using a multiple level hash or an array of hashes. Of course
the most obvious way to do this in python now I think about it is using a list
of dicts. However I had my brain stuck on using a multi level hash. I found
this was most difficult in python as you need to initialise dict entries and
can not simply assign arbitrarily into them when you need. I needed to use the
following construct.

&lt;p&gt;

&lt;pre&gt;
if (D.has_key (key1) == 0):
   (D[key1]) = {}

if ((D[key1]).has_key (key2) == 0):
   D[key1][key2] = ''

s = D[key1][key2]
D[key1][key2] = s + data
&lt;/pre&gt;

&lt;p&gt;

Which is obviously a bit more verbose than the perl vernacular of
$H{key1}{key2} = $s; I think that dicts do not yet work this easily is a
problem, however &lt;a href=&quot;http://davyd.livejournal.com/&quot;&gt;someone&lt;/a&gt; has
assured me that future python releases will have dicts that can work as
easily as a perl hacker would expect. Anyway rather than next go on to the now
obvious that I thought about it list of dicts I was still stuck on the idea of
using a pair of keys to access some value, thus a tuple seemed obvious to
store the data in a dict still. However this meant that when I extract the
values from the dict I can not simply use len on the dict collection as it
does not accurately reflect the number of records.

&lt;p&gt;

Which of course was the perfect chance to go and learn how to use map and
lambda in python, after all I use map in perl often and it really is lovely to
have functional capabilities in a language you program in. Using a number as
one of the record keys I was then able to have constructs such as (after
refactoring to list of dicts I did not need the high = expression and modified
the second expression slightly)

&lt;p&gt;

&lt;pre&gt;
high = max (map (lambda k: k[0], D.keys()))
&lt;/pre&gt;

and

&lt;pre&gt;
name, phone, address = map (lambda k: D[(i,k)],['Name', 'Phone', 'Address'])
&lt;/pre&gt;

&lt;p&gt;

The first to find the number of records from the numeric key and the second to
extract the information I was interested in printing. The second especially is
often used in perl to extract matches with a [0..N] or range(N) sort of thing
when you get things with multiple function calls into a list. Such as the perl
expression 

&lt;pre&gt;
my @emails = map { $res-&gt;getvalue ($_,0); } (0..$res-&gt;ntuples-1);
&lt;/pre&gt;

&lt;p&gt;

The final problem I had was when printing the data, in perl and c I can do

&lt;pre&gt;
printf (&quot;%-20s %-12s %46s&quot;, name, phone, address)
&lt;/pre&gt;

However in python the string formatting in print did not justify or cut off
arguments as expected. Also string.rjust and string.ljust did not limit the
size of strings if they were larger than the field size. So I needed to do the
following.

&lt;p&gt;

&lt;pre&gt;
   print &quot;%s %s %s&quot; % (name[0:30].ljust(30), \
                       phone.rjust(12), \
                       address[0:45].rjust(45))
&lt;/pre&gt;

&lt;p&gt;

That final concern is not really a problem, and arguably clearer as to what is
going on than using printf formatting as a c programmer is used to. Anyway if
anyone who works at ANU wants to use this from a command line or anyone wants
to see it I have it
&lt;a href=&quot;https://svana.org/sjh/various/anu_phonesearch.py&quot;&gt;online for
download/viewing&lt;/a&gt;. There may be a few places I can clean this up better,
and the version online is stripped of comments. I can understand how people
like the way python works, the code really is almost like pseudo code in many
ways, it does most of the time work the way you expect it to, it is a little
hard to wrap my perl oriented brain around, however that does not take long to
work around I expect. Also anyone complaining about whitespace formatting in
python, IMO you are deranged, it really is not an issue needing to use
whitespace for program layout.
</description>
  </item>
  </channel>
</rss>