Tuesday, July 14, 2009

Where do we get our data for comparison?

In case anyone missed it we have been developing a flight-comparison site for the last three months, so lets do some comparing. Hold on, there is something missing. The definition of a comparison is that there should be something to compare and that is where our suppliers should come in. Suppliers in this case can be airlines or travel agencies—in short anyone who can deliver flight data and have a site where you can book the tickets.

In a simplified and uncached form a search goes something like this:
  1. The user starts a search on the Openjet site
  2. The Openjet application tells the TripTrap meta-engine the details of the search
  3. TripTrap translates the request for each supplier and sends it to them in their format
  4. The suppliers return data to TripTrap which translates it to a unified format
  5. The Openjet site fetches the unified data and does its magic
So for this to work we have to have contracts with suppliers and some sort of information. In my earlier work I have been dealing a lot with suppliers for our old meta-engine, and during that I have spent some time thinking on what we will need from them and what they will need from us. Two important areas come to mind immediately.

First comes the issue of technical documentation. In some cases this is quite good and comes with good examples and instructions on how to use a nice web service. In other cases the result may need to be retrieved in some obscure JavaScript-encoded format, directly from a web page, with no documentation whatsoever. To be able to efficiently develop new supplier connections this is something to think about. Would it be feasible to say “no” to suppliers without good documentation?
Can we afford to not have a good supplier just because they do not have documentation?

Of course you could always have a contact person to talk to in case you run into problems when developing the supplier connection. This brings me to the second important area, which is to have a technical contact address that is not tied to a specific person. It should also be an address that is not liable to change if the structure of the company changes. There have been too many times in the past when I tried to get hold of a technical contact for a supplier which stopped working, only to find that no-one replied on that address or the mail ended up in marketing and got passed around for two weeks before I got a first reply.


There are of course other pieces of information that are important to have for us and for the supplier we use, but these two areas are the ones that have touched my work the most in the past and I feel that they are key to an efficient development process and bug-handling.

Friday, July 10, 2009

Cluster-wide search

Well, after some refactoring TripTrap is finally commited to the git repository.

Besides a lot of refactorings, rearrangements, broken tests fixes and optimizations the main improvement added recently is that each TripTrap instance in cluster environment is in some sense aware of what other nodes do. We had a principal problem with that issue: each cluster node can perform the same search, even if it's already launched at some other node. That's not a big issue as it's not so often, but such redundancy overloads the system, and it can cost us some money. Nobody likes paying money for exactly nothing.

Some synchronization through DB can be implemented, but that's not a scalable solution. The DB will become a bottleneck in a while. The solution, again, is to use memcached for it and place there a mark like "I'm searching this and that for that supplier, account and request". And delete that mark after search is over. Thus, for polling requests redundancy is avoided at all, for synchronous requests, which are waiting for all possible responses, there's some complex strategy like "if there's such searches on other nodes, first launch local ones, than check again, if some of remote searches completed" and so on. Memcached is rather fast and greatly scalable thing, so no bottlenecks is expected, I guess.

Now there are two different implementations of search, configured in Spring context: cluster-aware one, which uses cluster-wide synchronization and single-instance one, which is simplier and faster.

So it goes.

Tuesday, July 7, 2009

Soft numbers

There is an interesting tendency nowadays, especially among economists, to present things in exact figures. Like they where true.

In particular this is, surprisingly enough, almost always the case when it comes to predictions. Future estimations. How can a future estimation ever be exact?

You often hear economists state things like “We expect a growth of 11% during next 6 months”.

How can he state something like that? What he has in fact done is taking some highly unsure, often estimated, figures. Applied them to one or several economic and/or mathematical models of his choice(!). Finally having selected(!) the most suitable output from them.

Now, how can he claim the outcome from that to be true?! It might be a fair estimate at best, but regarding the exact numbers the only things we can be sure about is that it's definitely not true. In fact we can be infinitely sure we will not hit that exact number.

Same thing comes down to how people make their filter selections on most flight-search sites. The figures they input are preferred figures not exact ones – in the same way the economist is presenting an estimate not an exact prediction. Still, most systems interpret the user input as if it was the exact wish of the user.

If a user for example set a filter to “leave after 08:00”, that's in most cases because he would prefer to leave after that time, rarely an absolute need. If I where to offer him a 200€ cheaper flight at 07:48 – he's very likely to take it.

In OpenJet we are trying to grasp this into the design, and our current approach on it is not to hide the results that most sites would have filtered out, but rather highlight the ones currently within the preferences of the user.

This makes it possible for the user to at all time have an idea of what has been “filtered out” from his listings, helping him to decide whenever it's worth to step up that one hour earlier or not.

Thursday, July 2, 2009

Some notes on the presentation

Now that the prototype presentation/demo has been done, we've had some time to think about how to proceed with the project. We also take quite a bit of useful info with us from the presentation in form of comments from the people attending.

Overall I think that our demo was a success, and I really feel that we have come pretty far during the short period of time we have been working on this project. Not everything is obviously working well enough for a production release, but the main skeleton is there and working. We are all satisfied with the fact that we held this prototype presentation quite early, as it minimizes the risk of doing the "wrong thing". It is very easy to get blind staring at your own code every day, and with this demo we got fresh eyes looking at the application.

One feature that seemed to really catch the attendees interest is how we lookup locations. By using our own database we can supply locations connected with airports very fast, using auto complete features in the input fields. But the really cool thing is that if we don't have the search string in our database, we will ask Google via the Maps API. Usually, Google can find what your looking for (down to a very detailed level I must say). It then replies with the coordinates of the location, and from that, we can fetch the nearby airports.

It was pretty interesting to demo that feature, as it can find really really small villages, and get the airports. So this means that you no longer have to search for an airport, or a city in which you know there are airports etc, now you can simply search for where you want to go, and we'll find you the airports.

Another thing that was appreciated by the people attending our presentation was the fact that we search while you filter the results. This means that we cut alot of the waiting time that you normally get while doing a search. In many cases, the search results will load in an instant.

But people also had some more sceptical comments. And that is where this gets really interesting. As an example, we have tried to redesign the results page to avoid the very common example of "losing your results". So what does that mean then? Well, it is not uncommon on search results pages to use filters to filter out the results that aren't of interest for you. The problem with this though is that you usually see your results disappear from the screen when you change your filters, so it is hard to relate what you see on the screen to alternatives, as the alternatives isn't visible at all. Sometimes it is even worse - the results are filtered with the search itself, returning only a very specific result set. This means you have to perform a new search just to change a price range, or maybe a set of dates.

We tried an alternative approach to solve this problem, where we display quite a lot of results, based on a very broad settings in relation to the users search. We den use the filters on the client side to filter irrelevant results. This means that we never have to perform a new search as long as the user don't want to change travel locations. Visually we chose to hide irrelevant results with the filters. So result items that for example fall outside the users set price range, will fold itself leaving only a header with the price visible. This means that the user won't have to see the irrelevant result item itself, but a header indicating that "for this price, there is a flight that might interest you". This allows the user to play with the filters, and get a picture of how they might need to change their flight plans to get onto a cheaper flight.

We felt that we had found a neat solution to a common problem with this solution. But we got comments that indicated that there might be problems understanding what you see. The way of displaying the results is fairly unconventional, and might not be as easy to grasp as we had first believed. Basically, the comments indicated that it was just to much to take in for a user, being used to a standard top-to-bottom results list.

In cases like this, we are very lucky we held this presentation early. And I mean that for several reasons. We could try out these alternative solutions without extreme risks, as we knew that people would see and comment it before it was too late to change. This means a very strong feeling of freedom. By working in small steps and keeping the process very transparent, we can try these things. If it works, it works, if it doesn't - no huge harm done.

There is no real risk of pushing a lemon into production - people will simply let us know before it gets too far. But that is if we get it wrong. This freedom also means that we might come up with something new, and it will be really good - just because we have the room to take these "risks".

The OpenJet project is in fact a very big project, but we treat it kind of like a smaller project, focusing on the small things one by one. I am sure that we will succeed with this project as it is not only about us developers, it is also about all the other great people we work closely with. And these are the people that give us invalueable input during our presentations.