Recent Updates RSS Hide threads | Keyboard Shortcuts

  • HTML5 (and Flash) MP4 videos

    crodas 4:11 am on October 24, 2010 | 0 Permalink

    Long time no write!, I’ve been very busy last year hacking on some cool projects (most of them are hosted on my Github account) and (tele) working for some cool clients (Opendrive, Meneame among others) and the best, I have a brand new job. Now I finally found some little time and of course, something interesting to write about in this blog.

    Last week I was quite busy building an online video player for one customer, it was a nice challenge, and I thought it was pretty easy task. In this post I will describe what I’ve done to get it working from the perspective of the backend (from the frontend is not a big deal though, you only need some jquery player for HTML5 player with fallback to flash)

    The first thing that I had to do was a super simple queue to process video conversions (FIFO),  I didn’t want to suck up all the CPU from the servers, right now it is converting 10 videos at a time. For this task I created a tiny webserver using NodeJS, which is pretty awesome btw to create child processes, and the best it’s a single process (no forks, no locks, no threads).  Because the queue has its interface through webserver I can create task and check their status with a simple REST interface (which is pretty neat).

    The skeleton was built, now I had to Google about ffmpeg and the right parameters to get a mp4 version of the current video, and this video should work for HTML5 (Chrome, and Safary I think) and Flash (JWPlayer — Firefox, IE, etc).

    After hours of Googling and testing I ended up using these parameters:

    ffmpeg -i <input> -vcodec libx264 -vpre -vpre “>-vpre “>hq -vpre ipod640 -b 250k -bt 50k -acodec libfaac -ab 56k -ac 2 -s 480×320 -g <output>

    That command was perfect for our use case, fast and with good quality.

    Nonetheless I found some issues,  for instance the flash player wasn’t able to play the video until it was totally downloaded, which was pretty annoying for large videos and/or slow connections. For my surprise it was working perfectly fine on HTML5 (with last Chrome for OpenSuse), then watching Apache’s log I found out that Chrome was doing a couple of partial requests, the first one to get the meta-data of the video, and the other to get the video itself.

    After tons of hours researching about how to move  meta-data to the beginning of the file, and ideally it should be done only with ffmpeg, but I couldn’t find it. (if you know how to do it, ping me pls)

    Thank goodness I found a tiny program that does that pretty quickly, its name is MP4Box which is part of GPAC.

    MP4Box -inter 0.5 <output>

    After these unexpected issues, everything is working as a charm :-)

    PD: Fixed some typos (it is 5am here)

     
  • ActiveMongo

    crodas 1:47 am on February 27, 2010 | 7 Permalink
    Tags: ActiveMongo, ,

    After awhile without blogging, I was quite busy with work and some personal projects. Well, I created my first PECL Package (and therefore I have my @php.net mail) that I built in order to accomplish my personal goal that is already working on my sandbox (I’ll cover this later in this blog).

    This time I will talk about MongoDB, and my simple yet efficient ActiveRecord class that I wrote in less than a week, in order to make even easier use MongoDB from PHP.

    If you already have any experience with MongoDB, you might be wondering why did you create it? MongoDB is already very simple. That’s right, but I want to make even easier and amicable, so I focused in these things:

    • Keep it simple stupid.
    • Easy iteration.
    • Optimized Updates.
    • MongoDB is already good for queries, don’t wrap it

    Of course, you can do all this with just MongoDB, but it’s a bit tricky, especially the Optimized updates (just updating what had changed).

    ActiveMongo usually would look like this,

    <?php
    
    ActiveMongo::connect("testing_db", "localhost");
    
    class Users extends ActiveMongo
    {
        /* Define our User's document properties, */
        /* just to make our code readable, not really needed */
        public $username;
        public $password;
        public $country;
        public $address;
    }
    
    $user = new User;
    $user->username = "crodas";
    $user->password = "password";
    $user->country  = "Paraguay";
    $user->address[0] = array('address1' => 'foobar', 'city' => 'Asuncion', 'zip' => 'xxxx');
    /* Insert */
    $user->save();
    $user->password = "another_password";
    /* Update, only the password would be updated. */
    $user->save();
    ?>

    Until here, there is nothing new, except that for Updates, instead of put the whole object back, ActiveMongo will perform a diff between the current result and the object’s property, and it will generate a special document using $unset, $set that is going to be sent to Mongo. Again, this operation is very simple, but it might be hard to detect, look at the next example:

    <?php
    /* .... */
    $user->address[0] ['address1'] = 'Bar';
    unset($user->address[0]['zip']);
    $user->address[1] = array('address1' => 'another address', 'city' => 'Asuncion');
    $user->save();
    ?>

    In this case, ActiveMongo Instead of send the whole User object, just the following object will be sent to Mongo:

    {
    '$set' : [
         {'address.0.address1' : 'Bar'},
         {'address.1': {'address1' : 'another address', 'city' : 'Asuncion'}}
        ],
    '$unset' : [{'address.0.zip' : 1}]
    }

    That is quite hard to generate by hand for every table collection, and put the entire document back is a waster of resource (network, IO and so forth).

    Another important feature (at least for me :-) , are the data validation. I implemented it in a simple way (at least I think). Suppose that in the User collection, we want to store the password encoded with SHA1, and the username can’t be changed, this can be done as follow:

    <?php
    class User Extends ActiveMongo
    {
        public $username;
        public $password;
        public $country;
        public $address;
    
        function username_filter($value, $past_value)
        {
            if ($past_value != null && $value != $past_value) {
               throw new FilterException("Can't change username");
            }
            if (!preg_match("s/[a-z][a-z0-9\-]+/", $value)) {
               throw new FilterException("Invalid username");
            }
        }
    
        function password_filter(&$value)
        {
            if (strlen($value) < 5) {
                return false; /* same as throw something */
            }
            $value = sha1($value);
        }
    }
    ?>

    Nice, isn’t it?. Of course, we can’t check if a current field exists or not, we can only validate that if exists. If you need to ensure that every document has some properties you can use the pre_save hook that receive as first parameter the operation (‘create’ or ‘update’) and document that will be saved in the second parameter.

    class User Extends ActiveMongo
    {
        /* ... previous code .../
        function pre_save($op, $document)
        {
            $to_check = array('username', 'password');
    
            switch($op) {
            case 'create':
                foreach ($check as $field) {
                   if (!isset($document[$field])) {
                       throw new FilterException("Missing field {$field}");
                   }
                }
                break;
            case 'update':
                foreach ((array)$document['$unset'] as $field) {
                    if (isset($document['$unset'][$field])) {
                        throw new FilterException("The field {$field} cant be unset");
                    }
                }
                break;
            }
        }
    }

    If the folk request (and if it is useful) this checking could be automated somehow (e.j: implementing a method checkFields() that return an array of required fields), meanwhile I’ll find this way pretty useful and amicable, also this hook could be used to check if the current user has permissions to perform an insert or creation (useful for CRM, currently ActiveMongo support only three possible hooks, pre_save, on_save (after the save()) and on_iterate (when it moves to the next record).

    The most important part is how you query your database, for our luck, MongoDB has a very flexible query method, think on it as a compiled SQL, with no join :-) ),  because MongoDB is already kickass in queries, there is no need to abstract it.

    class User Extends ActiveMongo
    {
        /* ... previous code .../
        /* SELECT * FROM user WHERE karma > 15 ORDER BY karma DESC */
        function my_query($karma)
        {
             /* get MongoDB collection object */
             $col = $this->_getCollection();
             /* Let's build our request */
             $query = $col->find( array('karma' => array('$gt' => 15)) );
             $query->sort(array('karma' => -1));
             /* now give the query to ActiveMongo */
             $this->setCursor($query);
    
             return $this; /* to use directly with foreach */
        }
    }
    
    /* How to use it. */
    $users = new User;
    foreach ($users->my_query(15) as $user) {
        $user->user_type = 'super_user';
        $user->save();
    }
    
    /* ActiveMongo on its own provides a very simple query API */
    /* no limit, no sorting */
    $user = new User;
    $user->username = 'crodas';
    $user->find(); /* read the parameters from object property */

    Currently ActivoMongo is still under development, but as far as I have tested seems pretty stable, most of development are to add new functionality, for instance I’m looking for an easy way to add references to other document, or set of documents, keeping in mind efficiency, talking as less as possible to the database.

    Every release will be hosted at PHPClasses, and in the git repository I’ll mirror my under-development version.

    Comments, patches, fork/merge request are more than welcome :-)

     
    • Steve B 12:38 pm on March 1, 2010 Permalink | Reply

      This looks great! I’m interested in learning to use mongoDB in my applications, and I’m familiar with Active Record through CodeIgniter. I would love to be able to use this class in my CI applications. Do you have any plans to post this into Google Code? I would like to continue following your development of this class.

    • Dayle Rees 8:08 am on March 2, 2010 Permalink | Reply

      Hey, great work. I discovered mongoDB earlier today, had a quick play with it, and then through some browsing discovered this page. I think that accessing a collection using an object like approach rather than array syntax could be really handy.

      Thanks for your work!

    • Alan Lake 9:51 am on March 4, 2010 Permalink | Reply

      The git repository README says, “Don’t use it now.” I’m going to put my project on hold for a while because I want to use it. Will you be able to notifiy us when it is hosted on PHP Classes? I believe that they are set up to notify those who have downloaded a class of any updates to it. Thanks.

      • crodas 1:34 pm on March 25, 2010 Permalink | Reply

        I’m finish the first release, I introduced huge changes which attempts to make it even easier to use. I’m currently finishing the testing suites (very important for the project consistency). I believe it would be in PHPClasses in few weeks..

    • Gilberto Ramos 9:07 am on March 23, 2010 Permalink | Reply

      at your — pre_save function — you first defined your array as $to_check but then just below you’ve used $check.

      It’s perfectly understandable since you posted in at 1:47 am ;)

    • DrZippie 6:20 am on April 8, 2010 Permalink | Reply

      Great work!!.

      I made a small change to enable MongoDB Auth

      http://github.com/DrZippie/ActiveMongo/commits/master/lib/ActiveMongo.php

      Un saludo desde España ;-)

  • Thinking in Documents (and dropping ACID)

    crodas 5:51 pm on November 28, 2009 | 0 Permalink
    Tags: Brazil, CouchDB, , phpconf2009

     
  • Distributing PHP processing with Gearman

    crodas 11:32 pm on November 23, 2009 | 0 Permalink
    Tags: gearman, , phpclasses

    The weekend I wrote an article about Gearman published on the PHPClasses site.

    My post at PHPClasses

    My post at PHPClasses

    Special thanks to my friend Manuel Lemos

     
  • Latinoware 2009

    crodas 12:37 pm on October 23, 2009 | 1 Permalink
    Tags: latinoware, speech

     
  • Weird but cool Pagerank's usage

    crodas 1:42 pm on October 6, 2009 | 9 Permalink
    Tags: algorithms, pagerank, weirdness

    Yesterday, talking with a good friend, he told me he needed a good algorithm to detect keywords (relevant words) from a document. The first algorithm that came out from my head was a simple word frequency counter, discarding common words by building a list of stop-words with a previous learning. This algorithm is pretty obvious and I’m sure it is very used out there.

    Then Googling for some papers (I have a bunch on my laptop but I do not recall where I stored it)  I found a paper that opened my mind (TextRank: Bringing order into Texts). It suggests to build a graph of words, then apply the PageRank Algorithm to the graph in order to know relevant words. I haven’t read it deeply yet, but I’ve got that idea with a brief reading, and it makes sense, I’m wondering why I never thought about it.

    I’m planning to code it, just as a proof-of-concepts during this week.  Basically I will use some old code that I’ve coded (but never finished) awhile ago, I remember I build it very modular using classes, so adapt that code for these needs will be pretty straightforward.  And in the graph of words (and sets of  1, 2, 3 words probably), the previous word will reference the next word (If you have no idea about what I said here, just take a look here).

    I will post the results here.

     
    • crodas 7:35 pm on October 7, 2009 Permalink | Reply

      Well, I’ve decided to test this out on the lunch time, I Googled for a simple Pagerank implementation in Python (because I haven’t finished my own implementation in PHP yet) and I found the one that I’ve used some time ago (http://www.eioba.com/a69792/the_google_pagerank_algorithm_in_126_lines_of_python ).

      My code itself was very simple, it just split-up the text into words, and treats every word as a webpage that links to the next word and to the previous word.

      Then I run my simple program against http://en.wikipedia.org/wiki/RAID and it returns this set of words, in this order.

      raid, disk, data, parity, disks

      Of course, a bunch unused words were in the among these lists (the, an, in) but I removed it, and do it automatically is pretty straightforward

      I will keep playing this night to find two and n-words *keyphrase*.

    • David Hofmann 9:53 am on October 8, 2009 Permalink | Reply

      Hmm, Cesar you are doing very interesting stuff :D . I think here are very few people that cares about this kind of stuff. Keep on working Cesar ! I see this stuff working for blogposts to quickly show the relevant information on the top of the page as normaly people fail to make a good introducction of what they are going to talk about :D

      • David Hofmann 9:59 am on October 8, 2009 Permalink | Reply

        Like me :)

      • crodas 10:13 am on October 8, 2009 Permalink | Reply

        Well, I have a similar idea, probably I will create a web service that suggests keywords in order to improve SEO for bloggers.

        Also the suggestion of synonyms, that has a better SEO, could be an valid application for this cool algorithm.

    • Gilberto Ramos 11:31 am on October 16, 2009 Permalink | Reply

      I wish I would have enough time to research and code something too! :(
      I just want to finish university! Then I’ll have a normal geek life! ;)

    • propsimmige 10:45 am on November 2, 2009 Permalink | Reply

      Other variant is possible also

c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
esc
cancel