September 14, 2019

Using scripting languages for number crunching


In the past, the task of number crunching was handled very well with the C/C++ language. The optimizing compiler generates efficient machine code and forces the programmer to use the existing ressources optimal. A C++ library which calculates prime numbers is the most efficient way in software engineering. That means, the C/C++ is the perfect choice for such applications.
Surprisingly the situation has changed nowadays. Scripting language were used not only for prototyping reasons, but especially for number crunching. On the first look, it doesn't make much sense to calculate prime numbers or to plan the trajectory of a robot arm with a python script, because both tasks are numerical intensive tasks which needs a lot of cpu ressources. On the other hand, a slow scripting language gives the programmer the opportunity to think twice about the implemented algorithm. He knows, that the Python interpreter is not very efficient so there is a need to search for a fast algorithm first, before programming a single line of code. The idea is to ignore a potential speedup by the C compiler by the factor 5, and search for an algorithm which improves the program by the factor 500.
The paradox success of Python wasn't limited to GUI prototyping and interactive applications but has become obvious for number crunching applications. The question is not how to convert an existing algorithm into efficient machine code, but the problem has to do with algorithm identification and discuss existing one in academic papers. The python language has become a quasi standard for such applications and has replaced C/C++ in number crunching tasks.
Prime numbers
On the first look, a prime number generator has to do with a certain algorithm which has to be implemented. The following codesnippet is moderately fast.

def isPrime(n) : 
    if (n <= 1) : 
        return False
    if (n <= 3) : 
        return True
    if (n % 2 == 0 or n % 3 == 0) : 
        return False
    i = 5
    while(i * i <= n) : 
        if (n % i == 0 or n % (i + 2) == 0) : 
            return False
        i = i + 6
    return True

In contrast to a naive prime number testing, it will check only for 6. value in the for loop which reduced the amount of computational effort. It's surprisingly to know, that far more efficient prime algorithms are known. To get access to them is not a question of mathematics, but using a search engine for asking the Stackoverflow website. After a bit of search we will identify a Stackoverflow post which gives a more efficient implementation:
def primes(n):
  """ Returns  a list of primes < n """
  sieve = [True] * n
  for i in range(3,int(n**0.5)+1,2):
      if sieve[i]:
          sieve[i*i::2*i]=[False]*((n-i*i-1)//(2*i)+1)
  return [2] + [i for i in range(3,n,2) if sieve[i]]

Now we can compare both versions in term of speed. The first one calculates the prime numbers from 0 to 3000000 in 15 seconds on a standard computer. The second version is doing the same task in only 0.3 seconds. That's a 50x times speed up.

The reason for the enourmous speed up was, the Stackoverflow forum. On the same place different users have argued pro and against certain algorithm and other have found the information with the search engine. That means, the speed up was not the result of using efficient machine language nor it has to do with a certain mathematical approach, but it is located in the gutenberg galaxy. Python is a language which supports the communication about different algorithms. Python programs are human readable. This is interesting for number crunching applications.