nic's tumblr blog

I maintain a blog for techy stuff and ramble on twitter a bit.
Oct 19 '11

Sep 16 '11

Jul 17 '11

1 note

Jul 17 '11

1 note

Jul 17 '11

1 note

Jul 17 '11
elnode hits #1 on hackernews

elnode hits #1 on hackernews

1 note

Jul 17 '11

Jul 16 '11
Facebook introduce a buddy list a bit like monsterchat

Trouble is facebook, it’s not integrated with anything else and you still can’t do video.

Facebook introduce a buddy list a bit like monsterchat

Trouble is facebook, it’s not integrated with anything else and you still can’t do video.

Jul 12 '11

what’s your force multiplier?

I’ve been looking back over the last couple of years at WooMe and thinking about what we’ve done really well.

A friend came up with this great line:

sql is our force multiplier

What I like about this quote is that it conveys exactly the dynamic we get with SQL.

Let me explain. About 2 years ago we finally gave up on using log analytics software. We’d been using one of those traditional log parsers and it was just taking ages to get any results out of it.

We soon found that Google Analytics wasn’t going to cut it either. So we started to put Nginx access logs into a postgres database, just so we could search them a little. It was easy, you take the files straight out of Nginx, ship them to the db box, upload them with some psql magic. We also sharded the access log tables by date (this is PostgreSQL) and added it in to our site wide shard management system. All simple, easy stuff.

I kept fretting about maybe one day building a reporting system on top of it. I kept worrying about how much that was going to cost me one day. I kept worrying about not starting it.

All the time I was worrying we were using SQL to answer questions about our users. What’s the average time people wait for pages? for the homepage? What are the browsers that people are hitting logged out pages with?

It was only this year that I realized. We had our reporting system. It was SQL. Whether the query was ad-ghoc or regular we were doing it fine. We didnt need anything than we already had. Ok. Maybe a static HTML page produced from some querys.

What was even better is that everyone in the team understood SQL to a greater or lesser degree. We all understood it enough that we could converse about business problems in SQL.

I stopped looking at GA almost entirely. I stopped having to wait for GA to tell me things. We stopped worrying about building a lot of extra logging for things and trust the access log database a bit more (ours is quite sophisticated now).

In other words our combined knowledge and use of SQL was causing us to avoid costs of money and time in all sorts of directions. It was multiplying the force we had as programmers and we could then use that spare time to play volleyball. Or whatever.

The last couple of weeks I have been looking around asking:

Have we done the same thing in other areas?

and I think I can say that we have, tho much less effectively.

I would say that unix is a force multiplier for me and a significant number of the tech team. But the knowledge of unix is shallower within the whole team. I think it’s probably more difficult to pick up than SQL because in part it’s a way of thinking.

Were there any other force multipliers? I looked at Python. I would say that everyone in my team can do Python. But I honestly can’t say it’s a force multiplier. It is what it is, just a programming language. A great trade off between elegance and practicality perhaps. But not really a force multiplier. If we were all proficient with BASIC instead would we be any less potent? I doubt it.

So I looked some more. I found more nearlly efforts. Here are a few:

  • HTTP - we understand HTTP quite well and we make it work a little for our site… but not nearlly well enough

  • Mercurial - people at WooMe really get this and we do use it a lot… again, there is more we could do to turn it into a force multiplier instead of just a force; it’s also a stretch to call it a multilier when it is basically just a unix program.

  • Trac - we’ve customized trac quite a bit and I would definitely say it’s multiplied some of our force a litte.

There’s not much left so I can conclude that we have only got one team wide multiplier. I definitely think that’s the most poweful not because of what it is but because the whole team gets it.

I started to think about the same question personally. I immediately came up with one other candidate: LISP. This isn’t going to turn into a piece about LISP as a force multiplier. I’ll do that another time. But just to note that LISP taught me ALL the tricks of modern programming - closures, event driven, lightweight data modelling - long before they were popular in other tools.

So my conclusion is that the more force multipliers in a team the better and if I could work in the team where everyone spoke good SQL, used unix shells instead of GUIs to do their own stuff and wrote code with LISP then I would be in my happy place.

So what are your force multipliers?

3 notes

Jul 11 '11

1 note

Jul 8 '11

2 notes

Jul 7 '11

Jul 6 '11

1 note

Jul 5 '11

why not use https for every site?

@symroe asks why we don’t use https for every site? all the time?

I can answer for the sites I make:

  • https requires everyone to have the certs or a lot of link abstraction
    • any reference in your dev version needs to be abstracted or you need to use https there as well
    • using https in dev requires certs to be deployed everywhere
    • lots of tools still can’t use certs, django’s dev server for example
  • https is hard to debug
    • you obviously can’t see inside the packets so you can’t see what’s going over the wire
  • https is not the best possible solution for crypto anyway, so there’s lethargy to converting the whole world to it

Having said that, the model of web development that is emerging now, of chrome pages pulling in data with ajax, that seems like it could provide the perfect split for http and https.

1 note

Jul 1 '11

using a postgresql schema dynamically

So WooMe’s DBA has been working on this nice bit of code that we’ll publish on the making blog soon and he had this in his PyPgSql function:

r = plpy.execute("""select attname 
    from pg_attribute a 
    join pg_class c on a.attrelid = c.oid 
    where c.relname = '%s' 
    and attname not in (
        'created', 
        'parsing_result', 
        'person_id', 
        'rooturl', 
        'ctid', 
        'xmin', 
        'cmin', 
        'xmax', 
        'cmax', 
        'tableoid')""" % (args[2],))

that seemed pretty ugly to me. What he’s trying to do is make his schema extensible by finding columns that aren’t in the list of columns that are in his table that are just admin columns.

This bit of code really bothered me. What would be a better way of doing that?

Well, as it happens in schema looks like this:

  created        | timestamp with time zone 
  parsing_result | text                     
  person_id      | integer                  
  rooturl        | text                     
  source         | text                     
  medium         | text                     
  campaign       | text                     
  adgroup        | text                     
  kw             | text                     
  query          | text                     
  placement      | text                     
  network        | text                     
  keyword        | text                     
  gclid          | text                     
  content        | text                     
  vipdate        | timestamp with time zone 
  referrer       | text                     

All the fields he wants are text fields. Though not all the text fields are ones he wants.

I thought it would be good if you could mark the type of the fields you wanted… and you can:

  create domain materialized_text as text;

our schema would then look like this:

  created        | timestamp with time zone 
  parsing_result | text                     
  person_id      | integer                  
  rooturl        | text                     
  source         | materialized_text                     
  medium         | materialized_text                     
  campaign       | materialized_text                     
  adgroup        | materialized_text                     
  kw             | materialized_text                     
  query          | materialized_text                     
  placement      | materialized_text                     
  network        | materialized_text                     
  keyword        | materialized_text                     
  gclid          | materialized_text                     
  content        | materialized_text                     
  vipdate        | timestamp with time zone 
  referrer       | text                     

and our SQL?

  select attname                             
    from pg_attribute a 
    join pg_class c on a.attrelid = c.oid 
  where a.atttypid=(select oid 
                    from pg_type 
                    where typname='materialized_text');

much better I think.