If you have ever used a computer, you probably noticed something: In a way, computers are really fast, but sometimes, they are really slow.
Why is it that video games get more impressive every year, while a simple web-page can stall for seconds, displaying a frustrating spinning wheel?
This question is at the heart of performance engineering.
Everything a computer does takes at least two resources: time, and memory.
Since the very invention of the computer, engineers have worked to do things more efficiently, that is, using less time and less memory.
When they are successful, we get exciting games, when they fail, we get... well, frustrated.
One way to make programs faster is to look at collections.
When programmers want to group some data together, they use a collection.
One example of collection is the list of contacts in a phone.
Each collection supports a few operations we can use in programming, to modify it.
We can add to a collection, delete from it, and search for certain values in the collection.
In the last fifty years, people have come up with different types of collections, which have different performance properties.
For some, inserting is very cheap, but deleting is expensive.
For some others, deleting is cheap, but it's search that is expensive.
So, if we want programs to be fast, we need to consider how a collection is used, and choose carefully.
Software developers can think very hard about what collection to use, but because code is complicated, choosing wisely is difficult.
The program still works, but it is slower than intended.
In this thesis, we try to speed up programs by suggesting to the developer which collections they should use, based on how they use them.
We try to build a tool akin to a ``code spellchecker'', which will suggest fixes to the developer's code, so that the program runs faster.
We focus on the Java Programming language, because it is very popular.
To build such a tool, we need to answer a couple of questions.
The first one seems trivial, but it is not: How do we even measure how long a program takes?
We can take a stopwatch and measure the time it takes to run the program.
The problem is that computers are very complicated, so if we measure twice, we'll get different numbers.
Indeed, there are so many things influencing our results!
Worse, if we measure on another machine, we might get completely different numbers!
A significant part of this thesis is about what influences the execution time of a program, and how to make meaningful comparisons between running times.
The second question is: Can we predict the performance of a collection?
Indeed, if we want a tool that suggests option A over option B, it needs to have a rough idea of how long A and B would take to run.
The tricky part is that we want to know that without running the programs, because running a program can take seconds, minutes, hours, or days.
Therefore, we would prefer the computer to guess.
To be able to guess, the computer needs to learn.
This our third question: How can we learn effectively about collections' strengths and weaknesses?
In this thesis, we ask the computer to generate artificial (short) programs, run them, and learn from them.
Here's a metaphor.
To learn about the strength and weaknesses of collections, we'll build a "race-track" for collections.
This race-track is a list of operations (insert this, delete that, search for something) in sequence.
Then, we organize a race.
We try different collections on the same race-track, to see how well they do.
Usually, people might write these by hand.
Small programs like these are called a micro-benchmark.
Instead, we have the computer generate them.
It makes a lot of them.
So, how well does this work?
In our case, we run into a problem: some collections beat the competition in many, many cases.
That sounds great: Then, why not use those all the time?
It turns out that when we plug those collections into some real programs (that people have written), we do not observe significant differences.
So in practice, the tool we've built is not effective at optimizing those programs.
We finish this thesis by looking at other approaches that people have tried in the past.
We investigate how some real programs use collections, and try to reproduce the results that others have found earlier.
We see that even if their methods worked when they published their studies, they do not seem to work very well anymore.