Redis troubleshooting: read error on connection

Overview

We’re facing “read error on connection” exception while trying to send “select” command (selects DB) to Redis. Ilya has updated the exception message recently and now it says “The redis database could not be selected.”

Exception appears spontaneously and holds for quite a while (e.g. few hours) occurring a couple of times per minute (everything is clearly seen from reports on v185 and v186 for LIVE ENV).

Ideas on the cause of the issue + TODO`s

Ensure we use Phpredis flow and not “Standalone” flow

Simply confirm that Live ENV uses Phpredis flow (note, from local.xml we already know phpredis should be used, so this is about confirming that the code uses it indeed and doesn’t fallback to standalone).

We should use Phpredis due to multiple comments about Phpredis as a more mature and reliable solution (comments like this one: https://github.com/colinmollenhour/Cm_Cache_Backend_Redis/issues/37#issuecomment-58007678 and few more found).

Nevertheless I have found Guthy Renker (quite a solid app) having1

<force_standalone>1</force_standalone>

in their local.xml.

So we might switch to “standalone” flow just to give it a try at some point if we’re out of any other ideas.

Check the Redis error which gets returned

Hook into the exception we currently get (changed to more comprehensive text by Ilya recently) and add call to getLastError() phpredis method (see https://github.com/phpredis/phpredis#getlasterror for implementation details). Log message carefully)

Retry the failed redis call or Reinitiate the whole application in case redis error has been caught

Once Redis exception has been caught (meaning end customer should get a report instead of the valid page) try to:

  1. re-send the failed call. So let’s simply try to re-execute the failed call to Redis. This would allow to continue the execution with no report shown. And together with what’s mentioned in “Check the Redis error which gets returned” section it will enable both: collecting the info on error and seamless app execution
  2. In case prev. variant doesn’t work let’s use Mage::reset() and then Mage::run(…proper run type and code…) to reinitiate the application and once again make an attempt to deliver the valid page
  3. in case both 1st and 2nd variants don’t work let’s terminate the app with “Location: <current uri>” header so that end customer’s browser repeat the request itself

First option is the preferred one while the last one is the worst variant due to monitoring systems attached and for better customer experience. Note, we should repeat reinitiating for 3-4 times max and then give up showing the report (in case none of the rounds resulted in a valid page shown).

Too much data was saved / appended into redis key or value

Ideas comes from the following comment in phpredis issue #70: https://github.com/phpredis/phpredis/issues/70#issuecomment-6025598.

It might happen that we send too much data within one request to Redis.

Ideas was already partially confirmed by too long keys detected (fixed by Ilya in Vaimo_Dyson_Model_Product::getCategoryByLevel() with https://bitbucket.org/vaimo/vaimo_dyson/pull-requests/5/dyson-1729-add-filter-by-store-specific/diff (see changes to this method in app/code/local/Vaimo/Dyson/Model/Product.php))

Proposal is to log the length of the Redis Key and length of the Value if they exceed some threshold.

Note, it is very likely that our previous idea about logging the last error would also provide this kind of info (if Redis got stuck because of too much data transferred).

@TODO: log all the savings bigger than 1Mb
@TODO: test save / load operations with big chunks of data being transferred
@TODO: log all the loads from redis (in a format: KEY – data size) in order to see if there’s an issue with redis value length (because it gets appended constantly and never gets flushed)

Using old phpredis which might result in this error

Using old phpredis which might result in this error (“read error on connection”)

Latest phpredis version is 2.2.7

Ours is 2.2.4 from 2013-09-02

(see https://pecl.php.net/package/redis)

This is what makes me think it might be a version-related issue:

https://github.com/phpredis/phpredis/pull/643 (Aug 5, 2015 – open). So let’s ensure our phpredis lib already has this commit inside (https://github.com/phpredis/phpredis/pull/643/commits).

Issues for Credis lib

https://github.com/colinmollenhour/Cm_Cache_Backend_Redis/issues/37

(looks like ours – see the very beginning. Also check this https://github.com/colinmollenhour/Cm_Cache_Backend_Redis/issues/37#issuecomment-19012366)

Issues for Phpredis lib

https://github.com/phpredis/phpredis/pull/643 (Aug 5, 2015 – open)

https://github.com/phpredis/phpredis/issues/492 (Aug 3, 2014 – open)

https://github.com/phpredis/phpredis/issues/70 (open, very long with just few days old comments).

BTW, here Colin himself writes (check https://github.com/phpredis/phpredis/issues/70#issuecomment-4721338):

– Standalone PHP driver used with no errors (while phpredis has errors)

– he’s not using persistent connections (neither for phpredis nor standalone mode)

do this for debugging: https://github.com/phpredis/phpredis/issues/70#issuecomment-38945798

https://github.com/phpredis/phpredis/issues/668 (October, 5 – open)

Useful info gathered (incl. well-known stuff)

Magento local.xml redis config explained:

https://github.com/colinmollenhour/Cm_Cache_Backend_Redis

SUNION with 180k sets

Investigate why SUNION calls take a lot of time sometimes.

In observed case (from SLOWLOG), the SUNION command took just over 20 sec when it was called with 180k tags.

Is this normal?

Why would Magento ever try to run SUNION with so many tags at one time?

Looking at the list of tags, why is it 180k tags long? Is it all the tags that exist? Even if so, why do we ever have that many tags in total?

Advertisements

Resources and Best practices on redis setup

Resources

Reference on redis configuration in Magento’s local.xml file

http://inchoo.net/magento/using-redis-cache-backend-and-session-storage-in-magento/

and

http://devdocs.magento.com/guides/m1x/ce18-ee113/using_redis.html

Few important links from this Magento doc (link posted above) follow below:

1. Magento Expert Consulting Group (ECG) article about Redis:

https://info2.magento.com/rs/318-XBX-392/images/MagentoECG-UsingRedisasaCacheBackendinMagento.pdf

2. Redis session config:

https://github.com/colinmollenhour/Cm_RedisSession/blob/master/README.md

3. Redis back end config:

https://github.com/colinmollenhour/Cm_Cache_Backend_Redis/blob/master/README.md

CCB module page

CCB module page in confluence (https://confluence.vaimo.com/display/CUSEXP/Vaimo_CacheCleanBuffer)

Best practices

Configure different <database> and <port> for each use of Redis

Configure different <database> and <port> for each use of Redis (session, fpc and cache) in local.xml (better performance as Redis is not multi thread)

Nevertheless at Vaimo we have so far standardised on only 2 Redis instances primarily, so we can configure the session store differently than the cache backend. In benchmarks we have seen so far, splitting FPC and Cache didn’t make much of a difference, but it might well be worth revisiting with our newer and bigger sites

Compression off for FPC only

Always turn compression off for FPC only (it has its own compression).

Turn persistent off

Always turn persistent off (because of bug with php-fpm)

Use a cronjob to keep Redis size healthy (garbage collection with rediscli.php)

Our Redis servers at Vaimo are configured to automatically expire old values and replace old keys, once a predefined memory limit it hit (limit is set bigger that what’s needed). Because of our server configuration, we have thus always felt that the cronjob is not necessary. Some recent projects have shown however that tag management can push up the latency to Redis, so potentially having fewer keys in those tags, at the expense of a slightly lower cache hit rate, could be beneficial to those projects. What is also potentially interesting is to see the cost of tag operations on empty tags (tags in redis for which all the keys have already expired). So this is all definitely worth investigating. 

Using Redis with Magento

Redis is a modern key/value cache that supports a number of complex datatypes (lists, sets, hashes, …).

It runs via sockets so can be used for both networked and local operation.

  • For single webnode setups the connection should be configured via socket.
  • For multinode setups the connection should be configured via local private IP address (10.xx.xx.xx)

Redis is installed by Hosting to every dedicated server that is installed. Usually 2 separate instances of Redis are installed 

  • Redis for Backend cache (+ Full Page Cache if Enterprise Magento)
  • Redis for Sessions

Configuring redis-server for backend (and FullPageCache)

Install Redis Cache module to site.

    • We have a Magento module installable via ServerPortal: Icommerce_Redis

Configure Magento.

      • Redis-server for backend and fpc is running on port 6379, unless specified otherwise
      • <id_prefix> is used to avoid mixup up cache keys between different instances
      • <database> code can be the same for backend and fpc, but should be different between environments (staging vs live). Usually our Redis instances can hold up to 16 databases.
      • <compress_data> & <compress_tags> should NOT be enabled for FPC as there is compression already in FPC functionality
      • Edit app/etc/local.xml to configure:
      • local.xml cache config (UNIX socket)
  • <!– This is a child node of config/global –>
    <cache>
    <id_prefix>x8888</id_prefix> <!– Use instance code here, to avoid cache key mixups –>
    <backend>Cm_Cache_Backend_Redis</backend>
    <backend_options>
    <server>unix:///var/run/redis/redis-server.sock</server>
    <!– <server>10.xx.xx.xx</server>  TCP IP connection should be used if Redis is located in another server eg. in a multinode setup –>
    <port>6379</port>
    <persistent></persistent> 
    <database>0</database> 
    <password></password>
    <force_standalone>0</force_standalone> 
    <connect_retries>1</connect_retries>   
    <read_timeout>10</read_timeout>        
    <automatic_cleaning_factor>0</automatic_cleaning_factor>
    <compress_data>1</compress_data> 
    <compress_tags>1</compress_tags> 
    <compress_threshold>20480</compress_threshold> 
    <compression_lib>gzip</compression_lib>
    <use_lua>0</use_lua>
    </backend_options>
    </cache>

  • local.xml cache config for FPC (UNIX socket)

  • <!– This is a child node of config/global for Magento Enterprise FPC –>
    <full_page_cache>
    <id_prefix>x8888</id_prefix>
    <backend>Cm_Cache_Backend_Redis</backend>
    <backend_options>
    <server>unix:///var/run/redis/redis-server.sock</server>
    <!– <server>10.xx.xx.xx</server>  TCP IP connection should be used if Redis is located in another server eg. in a multinode setup –>
    <port>6379</port>
    <persistent></persistent>  
    <database>0</database>     
    <password></password>      
    <force_standalone>0</force_standalone> 
    <connect_retries>1</connect_retries> 
    <read_timeout>10</read_timeout> 
    <automatic_cleaning_factor>0</automatic_cleaning_factor>  
    <compress_data>0</compress_data>
    <compress_tags>0</compress_tags>
    <lifetimelimit>57600</lifetimelimit>   
    <use_lua>0</use_lua>       
    </backend_options>
    </full_page_cache>
        • Regarding option automatic_cleaning_factor above, it is important that to set it to 0 (disable) since otherwise, Redis (or rather PhpRedis) will clean itself on every N:th request.

Configuring redis-server for sessions

        • By default the redis sessions is running on port 6380 (this is needed when )we run this extra redis-server-sessions on port 6380 (instead of redis’s default 6379, which we use for normal cache storage)

Configure Magento

        • Edit app/etc/local.xml to configure. For single-node site use socket based connection, for multinode environment use TCP based connection.
          local.xml sessions config
  • <!– This is a child node of config/global –>
    <session_save><![CDATA[redis]]></session_save>
    <session_save_path><![CDATA[unix:///var/run/redis/redis-server-sessions.sock?database=0]]></session_save_path>
    <!– <session_save_path><![CDATA[tcp://10.xx.xx.xx:6380?database=1]]></session_save_path>  TCP IP connection should be used if Redis is located in another server eg. in a multinode setup –>
     

Checking the Redis server status

          • To check Redis service statuses, run:
  • # For backend, FPC redis:
    $ service redis-server status
    # for Redis sessions:
    $ service redis-server-sessions status
          • Test if you can connect to local Redis vis socket by :
  • $ redis-cli
    # If you want to check the usage of redis, then type “info” in the redis-cli
    $ 127.0.0.1:6379> info
          • redis.conf: This can be used as a way to handle the case when cache memory is consumed:
  • maxmemory-policy allkeys-lru

  • .

Notes

Data Compression

Magento will pull significant amounts of configuration + layout data from the cache backend. Summing the size of all cache read requests, while building a page, on the TSW, indicated that around 1.7 MB was pulled from the cache. With Redis, this data has to travel over the network unless UNIX socket is used, and actually will dominate in size, over the final HTML generated (200-300 KB). We will conserve server bandwidth activating the setting to comress data in the PHP redis client. 

<compress_data>1</compress_data>

NOTE! We have observed intermittent errors of this type when configuring Redis without data compression on some higher traffic sites:

a:4:{i:0;s:24:”read error on connection”;i:1;s:1341:”#0 [internal function]: Credis_Client->__call(‘exec’, Array)

The error has completely gone away after activating compression in Redis. So compression is strongly recommended for several reasons.

Don’t use compression with FPC as it already includes sort of compression.

Links: