The OpenJet Project: June 2009

Saturday, June 27, 2009

Martin mentioned the unique ID generation in previous post, so I recalled the problem which was solved in similar fashion. At the beginning of developing TripTrap the issue "how to find out if such request was already processed?" arose. Well, that's quite easy: store it to DB and next time check if it's already there. What becomes a problem is "how to do that fast?", especially when we have almost twenty request parameters (and thus such a number of conjunct conditions in SQL-query), also different types of requests possible, and also some of the parameters can be omitted, etc.

The solution was the following: count 64-bit integer hash for the request and store request to DB and to memcached together with that hash. Stored request is just the binary serialized and gzipped blob. So, in DB we have both BLOB field and BIGINT fields. The latter is very good indexed and is virtually unique, and it's very good key for memcached storage.

For each incoming request we count the hash, fetch the blob from cache or DB, deserialize the blob and compare it with the incoming one in Java-code. In the virtually impossible case of the same hashes for different requests the only problem is that we need to fetch, deserialize and compare more that one request blob from DB or the list of request blobs from memcached.

Voila, for memcached with a lot of RAM we get the stuff working as fast as possible.

Thursday, June 25, 2009

Pseudo-Uniqueness

The need to give things unique names is something that is present in many applications, from variable names to database fields. Openjet is no different and we already have two cases where we need the system to automatically generate unique identifiers.

The first thing we need a unique ID for is a visitor. In every part of the system we want to be able to refer to a certain visitor, to be able to retrieve and store information specifically for that user and track what the user does. Secondly, we are going to store points of interest, which have coordinates, a name, and a locale (language and culture setting). These three fields define the point of interest, and if they are the same as one already existing the points are considered equal.

Of course, we could get a unique ID for each item by just inserting them into the corresponding database table and let MySQL auto-increment the ID. In the case of visitors though, we do not want someone to be able to guess other people's IDs so it would have to be more complex than an number that increases by one for each visitor. When it comes to the points of interest we want a globally unique identifier since it might be compared with IDs from other types of locations. Because of this we cannot rely on MySQL to generate the IDs.

So what did we do? Well, I wrote a utility class that creates hashes from input strings of course! Using the MD5 algorithm for hashing, we will get a 32-digit hexadecimal number which will be virtually unique. Now I can hear some of you think "Virtually unique? How can you be sure it will not collide with an already existing ID?". Well the short answer is that we cannot!

If you are interested, the long answer now follows.

If I generate a 32-digit hexadecimal number, it consists of 128 bits. 128 bits can be set in 2^128 different combinations, which means that the chance of getting the same hash for another item is 1 in 340282366920938463463374607431768211456 (39 digits). Lets say we have an unrealistically busy site, generating a million new visitors or points of interest a day. Then after a thousand years we will have generated less than 4*10^11 different hashes. This means that have a chance of 4*10^11 in 2^128 of calculating the same hash at this point which is a probabiltiy in the magnitude of 10^-28.

Maybe our site is really successful and continues until no life can exist on earth (8Gyears from now). Well I am not going to bore you with the calculations for this one, but at that point we would have generated 3.2*10^21 different hashes. This means we would have the overwhelming probability of below 10^-18 of hitting one of the hashes we already used.

This is why I say virtually unique. The chance of getting the same ID is so small that it will probably never happen, and if it does the probability is so small that it probably did not. So I now leave it up to you to decide if this is unique enough for you!

Tuesday, June 23, 2009

Research and Development

There is philosophers who claim that he only possible knowledge is direct knowledge, things we have experienced on our own, with our own senses.

I don't share this view of knowledge, I believe we are highly capable of gathering, abstracting and understanding other peoples experiences, ideas and failures – and build upon them.

We are now at a high speed approaching the big day of this project, the day when we are going to present a first in-some-sense working prototype of the site we are working on.

On a very early stage of this project we decided that background research and deep consideration of every major choice would be, if not the most important, at least one of the most important aspects of our working methodology.

Today I'm glad we made that choice. Especially since none of us had been building this kind of software before. Sure, we had all a good experience in science and software development from before, but not of this particular kind.

So we did invest our first two months in intense research, architecturing, brainstorming and planing. Setting up tools and looking through pretty much anything that seemed relevant; tools, competitors, libraries, books, frameworks, architectures, standards, etc., etc.

Sure, there were times when, at least one of us, felt we might be investing too much time into research and preparations. Feeling that maybe we should have got started with the coding earlier. “Get something done.”

I think he has changed his mind today. The solid research and knowledge base we built up before we got started have made an incredible development speed possible during our last month of coding. An amazingly bump-free development that would for sure not have been possible without all the time we spent reading and looking through options before we got started.

We are today, just three days ahead of the demonstration, quite well prepared. Giving the final touch. With a software far more functional then at least I was expecting us to be able to build in just three months, from nothing and no knowledge.

Of course it's still a very, very limited prototype, and lots of work remain still, but I feel we have done a good job. With Kenny now focusing and working hard those last days to get all the functionalities and ideas we have into some presentable visuals.

I believe this project is just on the right path. I believe that if we are just allowed to continue this project, this is the site that will define the next generation of this industry.

On friday it will be decided whenever or not the management share our believes...

Sunday, June 21, 2009

Demoing - is polish or functionality more important?

As has been mentioned on this blog before, the date for our demo of the OpenJet project draws near. I think all of us involved can feel a bit of the heat already. This first demo is supposed to be more of a prototype demo, so that means we won't have all the polish and not even all features finalized to this date. Personally I think we have made quite a bit of progress, and just last week, the pieces started to fit togheter and work with each other in a presentable way.

Now, the problem that I feel easily arise in these situations is how to decide what makes it into the demo. I think this is a common problem for everyone when it comes to developing software. It is easy to over-promise or just be overly confident and expect to have more functionality than can be fit into a single release. The problem with that is that cramming a lot of functionality into one release can easily mean that the features just doesn't work good enough. I for one am an advocate of "less is more" in the cases where it is possible.

Of course, a prototype demo isn't exactly the same as an actual software release - in our case it is implied that not all functionality and polish will be there. It is more of a way to show that progress has been made, and that the project hopefully makes use of technologies and ideas that will make it successful.

So this kind of transforms from a "what features should be included" to a "where do we draw the line" kind of problem. What is more important - more features in the release or a more visually appealing presentation? As a developer my first thought is that obviously functionality is more important, as that is what drives the website in the long run. But when I start to think about it, I am not so sure anymore. When we will present this project, as with many other projects, it won't be presented only to strictly technical people but also to non-technical people that are more of a representation for the end-users.

These non-technical people, or end-users, what will they remember from the presentation and how will they relate to it? I have a tendency to think that they remember what they see, and that they will relate everything said to that. I think it is easier to describe how functionality fits into the design, rather than describe how the design affects the functionality. I also think that the greatest technical solution or feature just won't seem that great if it isn't presented in a nice way, so the danger of demoing "raw" interfaces and only functionality is that people just won't see how good it is.

How do I think this problem should be solved then? Well, as with many other things, the golden truth probably lies somewhere inbetween functions and design. I think that if we can present usable functionality with a design that at least makes it look like a flight search website, we can have both technical and non-technical people relate to it. With usable functionality I mean features that can be used, but that might not withstand the abuse that end-users will put them through in a production environment. It is ok for an input field to not handle wrong input correctly in a prototype phase, as long as it handles correct input in a way that makes the feature work.

I think that the design at least should be done to the level where elements are laid out in a way it makes sense, the basic color schema is there and certain image assets are there as well.

We had this discussion internally in the team, and the above is how we have decided to do it. In the end, the application will be demoed by us so we can show functionality that might have not worked properly if used by someone not involved in the project. That way, we can make sure our point gets through, and if people like it, we can make it "fool-proof" for a future production release.

Friday, June 19, 2009

As I already mentioned, the main focuses in developing TripTrap were the speed, scalability and easy clustering. Some introduction to technical details now.

There were some "not very common" decisions made while developing TripTrap.

To improve performance we store binary serialised Java objects in DB and memcached as well. That's fast, easy and universal for all kind of objects (in fact, TripTrap can be used not only for flight searches). The drawback of such approach is that DB is not self-sufficient, one cannot search anything in DB without TripTrap --- DB is a dumb persistent storage in our case. Nevertheless, up to the present moment we have not experienced any problems with that. One more reason: we try to move the work from DB to memcached as much as possible, and memcached is just a map, so we try to avoid involved data structures.
We use InnoDB engine for MySQL. Sounds a little strange, but we have a lot of read-write activity, also tests showed that MyISAM doesn't differ significantly.

The most tricky thing in cluster environment is the synchronisation. In TripTrap it's done through memcached entries, thus we have relatively small overhead.

We can say TripTrap is good enough in what it made for. Anyway, there's still issues I'm not satisfied with:

Load testing: it's done only partially and needs more test cases.
More acceptance tests: a couple of dozens exist, but should be more.
Threads: more parallel jobs should be used (for example, 16-core servers are more or less common, so TripTrap must use all available resources).
Source code tree structure: it's time for review and moving things between packages and simplifying the stuff. We aimed at keeping the project structure as clear as possible.
Documentation. Yep, TripTrap suffers from it too.
Criticism. There is no ideal systems and solutions, no silver bullets and so on. Usage experience, testing and everything that can help us finding TripTrap problems to make it better. Or ideal :)

TripTrap is getting the feature of multi-point search currently (the search with not only origin and destination, but some intermediate points). That's a kind of challenging feature as it involves extraction of large amount of data to process and filter. Will be ready soon.

So it goes.

Thursday, June 18, 2009

Openjet Magic

My 30th birthday draws near, and so does another very important date. The date I am talking about is the 26th of June. This will be the date when we will show off the Openjet prototype to the other parts of the company.

The product will not be releasable yet, but it is still an important date because it will be the first time all the pieces work together, so that everyone can get a picture of what we have actually been up to. We have held presentations before but only on certain topics and I think that most people outside the project do not know exactly how it all will fit (sometimes I have felt that way myself).

After that, what will happen? Time will tell but if everything works out as planned we are in for some exciting development. We will do magic in the coming months!

Saturday, June 13, 2009

Wrong assumptions

There was several small streams that lead to this OpenJet project. I'm not going to go into details of all of them here, but focus on one of the main ones that lead to the primary focus of this project.

I have been in this industry for less than 1½ year, but already from my very first day I felt there was something wrong with the flight search sites I could find out there. They all felt dumb and inflexible.

So I kept coming up with ideas on minor things, details, that could be improved, but even then I felt there was something more fundamentally wrong. As even with small changes, the big picture could not be changed.

Far later I've realized this come from an industry ideology, or history, where the main focuses has been on the air carriers, the suppliers and the competitors. Never on the customers.

The customer seem to have been viewed as a dart flying toward the dartboard, with its ballistics pre-determined, eventually hitting the target, and the competition being about about taking up as much room on the target as possible.

I don't share this picture of the customers. The reason for Google being the most used search engine world wide is not because it has been massively marketed, it's because it does in a very minimalistic and accessible way provide some highly relevant results – quality.

So I eventually went on a big Travel Technology Show in London, where I among a lot of things went on a presentation by Peter Ballard from Foolproof, presenting the essentials of their Online Shoppers Survey Travel report. Very inspiring!

Suddenly I was sitting there with the scientific proof of what my intuition had been telling me for over a year. The very core of why the customers, site users, where feeling so frustrated. Me among them.

One other thing that hit me during that Show, and it was the mainstream of the whole industry. I brought this up several times with my working partner. To me it looked like a bunch of players fighting about the room in a corner of a wide open field. More interested in beating each other to it then exploring the rest of the field. I was stunned.

Time moved on and there was quite a lot of changes happening in the company where I was working, eventually ending up with Stephan giving me the opportunity and trust to launch and lead this OpenJet project.

I could write quite a lot about Stephan, but there is a few things that makes me respect him. One is his very humanitarian ideologies, be it employees or customers. Another is his interest in and perspectives on innovations being more important than successes.

I was now starting to get a good picture of what was wrong. Having their focus on the suppliers, most agents had made some fundamental assumptions, conscious or unconscious, that is simply flawed.

First a high knowledge is assumed by the customer. The customer is expected to have a good and proper idea of what product they want, where they wanna go and during what dates. Now looking for the cheapest ticket providing them that trip.

Another assumption is about the customer buying that ticket as soon as they have found the cheapest one suiting their needs.

None of those are correct. The normal customer have a very weak, or fluffy, idea about where or when they wanna go. Some know where they wanna go, but are very flexible about the dates. Other are just looking for “somewhere sunny for no more than 500€”.

Few, very few, are also buying their ticket on the first visit. With no site providing them with a solid feeling of providing the information they need, they are looking around on multiple sites, exploring a multitude of different options, trying to bring their fluffy idea down into something more concrete. Not rarely being highly frustrated about the spread of the information.

Once the initially fluffy idea has been transformed into a couple of alternatives they normally disconnect. Thinking it over for a while. Discussing it with co-travelers, when there is any.

Then they keep returning to the web for more information until they feel sure enough on their choice. Be it the second visit, or much later.

Those are the new perspectives we are trying to bring into the OpenJet project. Interviewing the customers rather than the suppliers, as the answers of the later ones are quite well known already.

I'm sure we will find lots of new user perspectives and inputs during this project, and I'm sure that we later on will find that some of our own initial assumptions where flawed as well, but by that time we will be open to change them. Because this site is meant to provide the user what he/she expects, not have him/her expect what the sites of today provide.

That's also where we want You in the process. And the reason why we try to find some time between the coding and reading to publish the process online, so that You can take part and give us Your feedback.

Because this time the site will be made for You.

Friday, June 12, 2009

The search for the perfect IDE

Now that I've started working for Travellab with the OpenJet project, I've had to adapt to one thing that I haven't been very used to before. That thing is a dedicated IDE for doing development.

As a frontend developer I am pretty used to doing all my coding in a pretty light weight texteditor and as a Mac user the obvious choice is Textmate. I have been trying out some IDEs before, but I never really liked the way they overflow with features that I rarely use. I guess we can say that I just like to have the stuff I really need, and avoid the extra weight of features that I rarely or never need.

However, with the Java backend and the JSP frontend in the OpenJet project, it all seemed so much easier to just go with a good IDE to aid in the building of the project when I do development. So far, there have been three choices for me, and those have been Netbeans, Eclipse and IntelliJ. Maybe XCode could work for some people, but it is more suited for Cocoa work so I never really considered that as a nice Java web application IDE.

I was recommended to use Netbeans by my collegue Martin, and decided to give it a try. Now, I tend to have some ideas about things that just doesn't matter to other people. One of those things is how the software is visually designed. I don't know if it might be a common Mac user thing, but sometimes it seems that is the case. Maybe "we" just like software that looks as good as our operating system. :)

Anyway, at first launch Netbeans 6.5 chocked me. I don't think I have ever seen any software look so bad on OSX. I'm not blaming the Netbeans team here though, I think the look is controlled by the OS, but to be honest I'm not really sure how it works. What I am very sure about though is that it looked so awful I just couldn't stand it. The boring thing though was that feature-wise, Netbeans seems like a good choice for me.

What happened was that I moved on to test another IDE, and from another collegue, also named Martin, I got the recommendation to try out IntelliJ. So I downloaded the trial to test it out and got it up and running. I'm not sure what it was, but there was something with how IntelliJ was structured that I didn't really like. It just felt very heavy to navigate, and one thing that annoyed me was that the caret didn't jump to the end of the line when I clicked in the edit view. Instead it jumped to the position where I had clicked, be it in the middle of nowhere and nowhere close to my actual code. I'm guessing it can be turned off somewhere, but I didn't manage to find that setting anywhere.

As I have a tendency to make up my mind about what I like and dislike pretty quick (sometimes too quick. I confess.) I decided I didn't like IntelliJ. So instead I started looking into working with Textmate and write an Ant script to build and deploy the app when needed.

I am totally new to the whole Java eco-system, and Ant I hadn't used before. But after a while, I got it to compile the project and deploy it to my local Tomcat server. I was pretty happy with it as I was back to my light weight favourite text editor. The problem with my setup was that I had to build and redeploy every time I had changed a JSP, JS or CSS file. And as we can guess by now, I grew tired of that as well pretty quick.

At this stage, I was actually pretty fed up with this whole editor mess, and started to Google for more alternatives. As it turns out, I managed to find some forum post about how a development version of Netbeans was supposed to be more visually in line with the overall Mac OSX look. And as it turns out, this release had now reached Release Candidate status.

On top of that good news, in the back of my head I kind of still had the feeling of how I liked the features of Netbeans. So I went back to their site, and downloaded the 6.7 rc2 release (which I guess I had totally missed the first time).

After installing it, it was like being born again. I'm not saying it looks like a native Cocoa app (for example XCode as a comparison in this case), but it does look MUCH MUCH better than version 6.5.

So here I am, running Netbeans 6.7 rc2, and I am actually pretty happy with it. It takes care of all the heavy lifting with building and deploying for me, and it looks pretty good doing it!

Now, all of this is obviously very subjective, and I might be pretty extreme and overly sensitive regarding the software I use. But none the less, I think it is interesting what I had to go through, just to start with one software and end up with the same in a slightly different version.

Features will always be the biggest selling point for some people, and it might even be for me the day I get more used to all things Java. But one thing I know, design matters and even an awesome product might fail if it is not visually appealing.

Wednesday, June 10, 2009

Trip and Trap

TripTrap is the meta-search engine behind the OpenJet providing Open Travel - conforming interface to the data gathered from numerous sites and sources. It also has JSON-based interface suited for AJAX client applications.

The main things were kept in mind while making the TripTrap are speed and good ability for scaling and clustering.

TripTrap developed using Java 1.6 and Spring framework with data stored in MySQL and memcached server used for faster data extraction.

The source code will be in git repository soon.

/Mikhail

Here we go...

This project has been going for a while now and even though we have continuously been working, it sometimes felt like progress was too slow. Now that has changed. After two really productive days things are starting to come together. All the little pieces that we have worked on before are suddenly connected into something that works as one unit. This only goes to show that pre-studies and initial preparations pay off!

Now that we see the light at the end of the tunnel we can move quickly to be able to present the prototype before the end of the month. Go go Openjet!

Sunday, June 7, 2009

OpenJet enter the world of bloggers...

Welcome to The OpenJet Project blog!

With project site, wiki and Twitter up and running, it was time to start establish the OpenJet blog!

On this blog we will keep posting small sized news on the progress of the OpenJet Project, discussing different issues we are facing, etc. Things too minor for suiting the headline news, but still way too large for being suitable twits.

Enjoy!

/Lars

The OpenJet Project