Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"03-04-15 - LZ Match Finding Redux"

8 Comments -

1 – 8 of 8
Blogger won3d said...

For multi-way cache tables, do you ever do SIMD parallel probing?

March 16, 2015 at 7:34 AM

Blogger Fabian Giesen said...

There's no big win from SIMD in this case since the values in the cache table are pointers, not the actual characters you're matching against.

As Charles mentions, you can store the first few characters at each offset in the cache table (along with the offset itself), which avoids the cost of potentially missing the cache accessing the window only to find out you had a hash collision. In that case, you might be able to profitably SIMD it.

March 16, 2015 at 1:24 PM

Blogger won3d said...

You can also store a few high hash bits (assuming low bits are used as bucket index), which will filter out a lot of non-matches very quickly. U32 prefix seems like a waste of space, but I guess it depends on how long your matches tend to be.

March 16, 2015 at 2:06 PM

Blogger cbloom said...

The nice thing about the U32 prefix is you can completely resolve a len-3 match without chasing a pointer at all.

It gives you sort of proportional speed - any time you eat a cache miss, you know you have at least a len-4 match, which means hey at least you're getting a 4 byte step from that time penalty.

But yeah, it just has to be tested. You're right that only a few high hash bits (eg. 8) would resolve most collisions.

March 16, 2015 at 2:20 PM

Blogger Fabian Giesen said...

I did test the U32 prefix and it was faster than chasing the pointer on my tests (8-way/16-way), but really not by much (something like 1%). That was testing with somewhat weird data though, so no particular conclusion.

Doing direct len-3 matches from your hash table means you can't be sloppy with cleaning out stale offsets (>=1 wrap-around ago) from the hash table, or you can generate invalid encodings based on aliased match offsets.

March 16, 2015 at 2:34 PM

Blogger won3d said...

Right, but you potentially eat more cache misses while probing large buckets. You could squeeze high_hash_bits + offset into a U32, and maybe double the number of probes per cache miss compared to interning. You'd also get more associative ways and maybe can find better matches.

Associativity effects aside, you could probably predict the # cache misses per match find based on the LZ dumps. Also, IIRC, short matches also tend to be recent/frequent, so might even not incur a cache miss even though you have to chase the offset.

BTW, I tend to call the "hash high bits" the "syndrome" on loan from CRC terminology.

March 16, 2015 at 2:35 PM

Blogger Cyan said...

Quick question, to be sure to understand words used in this post :
hash-ways : related to the number of pointers or indexes on a hash row
2nd hash : prefix hash-bits into cell ?

March 26, 2015 at 6:42 AM

Blogger cbloom said...

hash-ways : yes, # of entries in a hash row. So you look up by the hash and then iterate over the # of hash ways.

2nd-hash : this is using a different hash function, but writing in the same hash table. So it just stomps on the primary hash.

It's most effective if the 2nd hash is on a different length substring. eg. if the primary hash is on length 3, the 2nd hash is on length 4 or 5.

March 26, 2015 at 7:52 AM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.