We have moved our forum to GitHub Discussions. For questions about Phalcon v3/v4/v5 you can visit here and for Phalcon v6 here.

ElasticSearch and QueryBuilder

Hello,

I'm evaluating possibilities of elastic search related to Phalcon (version 2 in my case). I already checked the implementation of phosphorum to get an idea.

What my implementation does at the moment:

  1. Query to elastic search
  2. Iterate over result to get ids of model entries
  3. Use QueryBuilder to look after this ids in MySQL database
  4. Pagination with this QueryBuilder as parameter

Here some code snippets:

<?php
// query builder
$builder = $this->modelsManager->createBuilder()
            ->columns('Bf\Model\Blog.id,title,content,date,name,AuthorAddress.street')
            ->from('Bf\Model\Blog')
            ->join('Bf\Model\Author')
            ->join('Bf\Model\Address', 'Bf\Model\Author.id = AuthorAddress.author', 'AuthorAddress')
            ->orderBy('Bf\Model\Blog.id');
...

// after elastic search I have a list of ids and add them as clause to builder
$builder->inWhere('Bf\Model\Blog.id', $hitIds); // hitIds = array with ids found by elastic search
...

// and finally the paginator
$paginator = new QueryBuilder(
            array(
                "limit"=> 500,
                "page" => $page,
                'builder' => $builder
            )
        );
$page = $paginator->getPaginate();

// page is assigned to volt template to build up list
$this->view->page = $page;

Now the issue description ;-)

With increasing number of ids returned by elastic search and inserted to QueryBuilder->inWhere() request gets extremly slow:

  • 1 id found: 0.02 sec.
  • 1000 ids found: 0.24 sec.
  • 10000 ids found: ~20 sec!!!

The time is used somewhere in QueryBuilder where the statement is parsed.

Any ideas how to speed up? Maybe another implementation for getting phalcon models?

@phalcon can you explain what parser does and why it gets slower with increasing number of array values given to the inWhere() method?



98.9k

I think it's creating a bigger PHQL query with 10000 placeholders and 10000 values to bound

This would explain why it takes 20 seconds and 1 GB memory limit to execute the query ;-)