Hello,
I am trying to index a large amount of documents using the php client for elasticsearch. I have written a php script using RecursiveIteratorIterator to sort though the complex directory that I have and put it all into an array and then index it in Elasticsearch.
Here is the code:
<?php
require 'vendor/autoload.php';
$client = new Elasticsearch/Client();
$root = realpath('~/elkdata/for_elk_test_2014_11_24/Agencies');
$iter = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($root, RecursiveDirectoryIterator::SKIP_DOTS),
RecursiveIteratorIterator::SELF_FIRST,
RecursiveIteratorIterator::CATCH_GET_CHILD);
$paths = array($root);
foreach ($iter as $path => $dir) {
if ($dir -> isDir()) {
$paths[] = $path;
}
}
//Create the index and mappings
$mapping['index'] = 'rvuehistoricaldocuments2009-2013'; //mapping code
$mapping['body'] = array (
'mappings' => array (
'documents' => array (
'_source' => array (
'enabled' => true
),
'properties' => array(
'doc_name' => array(
'type' => 'string',
'analyzer' => 'standard'
),
'description' => array(
'type' => 'string'
)
)
)
)
);
$client ->indices()->create($mapping)
//Now index the documents
for ($i = 0; $i <= count($paths); $i++) {
$params ['body'][] = array(
'index' => array(
'type' => 'documents'
'body' => array(
'foo' => 'bar' //Document body goes here
)
)
);
//Every 1000 documents stop and send the bulk request.
if($1 % 1000) {
$responses = $client->bulk($params);
// erase the old bulk request
$params = array();
// unset the bulk response when you are done to save memory
unset($responses);
}
}
?>
I just want to know if this looks right. Thanks