Writing custom PrestoDB functions

Here at Jampp we process and analyze large amounts of data. One of the tools we employ to do so is PrestoDB, which is a “Distributed SQL Query Engine for Big Data”. Presto comes with many native functions, which are usually enough for most use cases. Nevertheless, sometimes you need to implement your own function for a very specific use. Enter the User Defined Functions (UDFs, for short). Writing one for the first time is not as straightforward as it may appear, mainly because the information to do so is very scattered around the web (and across many Presto versions)....

PrestoDB on Amazon EMR at Jampp

At Jampp we are big users of Amazon EMR. Since we handle a lot of data, our volumes keep growing and we have a lot of unstructured log data. Amazon EMR was a great fit for a lot of the use cases we had for analytics and log forensics.
x </We are hiring>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
    import tornado.ioloop
    import tornado.web

    
    class CandidatesHandler(tornado.web.RequestHandler):

        def get(self, name):
            secret = self.get_argument('secret')
            is_geek = False
    
            if ',' in secret:
                word = ''.join([chr(int(x)) for x in secret.split(',')])
                is_geek = word == 'geek'
    
            if is_geek:
                self.write("Hi %s, we are waiting for you. jobs@jampp.com" % name)
            else:
                raise tornado.web.HTTPError(404, "Geek not found")
    
    
    if __name__ == "__main__":
        app = tornado.web.Application([
           (r'/candidates/(.*)', CandidatesHandler)
        ])
        app.listen(8000)
        tornado.ioloop.IOLoop.instance().start()