Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jeffhostetler/memihash perf #964

Merged

Conversation

jeffhostetler
Copy link

A series of performance enhancements in the memihash and name-cache area.

On Windows, calls to memihash() and maintaining the istate.name_hash and istate.dir_hash HashMaps take significant time on very large repositories. This series of changes reduces the overall time taken for various operations by reducing the number calls to memihash(), moving some of them into multi-threaded code, and etc.

Remove duplicate memihash() call in hash_dir_entry().
The existing code called memihash() to do the find_dir_entry()
and it not found, called memihash() again to do the hashmap_add().

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add variant of memihash() to allow the hash computation to
be continued.  There are times when we compute the hash on
a full path and then the hash on just the path to the parent
directory.  This can be expensive on large repositories.

With this, we can hash the parent directory first. And then
continue the computation to include the "/filename".

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Precompute the istate.name_hash and istate.dir_hash values
for each cache-entry during the preload-index phase.

Move the expensive memihash() calculations from lazy_init_name_hash()
to the multi-threaded preload-index phase.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Specify an initial size for the istate.dir_hash HashMap matching
the size of the istate.name_hash.

Previously hashmap_init() was given 0, causing a 64 bucket
hashmap to be created.  When working with very large
repositories, this would cause numerous rehash() calls to
realloc and rebalance the hashmap. This is especially true
when the worktree is deep, with many directories containing
a few files.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach hash_dir_entry() to remember the previously found dir_entry
during lazy_init_name_hash() iteration.  This is a performance
optimization.  Since items in the index array are sorted by full
pathname, adjacent items are likely to be in the same directory.
This can save memihash() computations and HashMap lookups.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
@@ -23,17 +23,23 @@ static int dir_entry_cmp(const struct dir_entry *e1,
name ? name : e2->name, e1->namelen);
}

static struct dir_entry *find_dir_entry(struct index_state *istate,
const char *name, unsigned int namelen)
static struct dir_entry *find_dir_entry__hash(struct index_state *istate,

This comment was marked as off-topic.

* Incoporate another chunk of data into a memihash
* computation.
*/
unsigned int memihash2(unsigned int hash_seed, const void *buf, size_t len)

This comment was marked as off-topic.

int nr;

if (istate->name_hash_initialized)
return;
hashmap_init(&istate->name_hash, (hashmap_cmp_fn) cache_entry_cmp,
istate->cache_nr);
hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0);
hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp,
istate->cache_nr);

This comment was marked as off-topic.

static struct dir_entry *hash_dir_entry(struct index_state *istate,
struct cache_entry *ce, int namelen)
struct cache_entry *ce, int namelen, struct dir_entry **p_previous_dir)

This comment was marked as off-topic.

@dscho dscho merged commit 7b8583a into git-for-windows:master Nov 18, 2016
@dscho
Copy link
Member

dscho commented Nov 18, 2016

Thank you so much!

@dscho dscho added this to the v2.11.0 milestone Nov 18, 2016
dscho added a commit to git-for-windows/build-extra that referenced this pull request Nov 18, 2016
Performance of the cache of case-insensitive file names [has been
improved](git-for-windows/git#964).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
dscho added a commit to dscho/git that referenced this pull request Nov 24, 2016
…er/memihash_perf

Jeffhostetler/memihash perf
dscho pushed a commit that referenced this pull request Nov 29, 2016
dscho added a commit to dscho/git that referenced this pull request Nov 29, 2016
…er/memihash_perf

Jeffhostetler/memihash perf
dscho pushed a commit that referenced this pull request Nov 30, 2016
@jeffhostetler jeffhostetler deleted the jeffhostetler/memihash_perf branch December 1, 2016 19:56
dscho added a commit that referenced this pull request Dec 6, 2016
dscho added a commit that referenced this pull request Dec 12, 2016
unsigned char *ucbuf = (unsigned char *) buf;
while (len--) {
unsigned int c = *ucbuf++;
if (c >= 'a' && c <= 'z')

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

dscho added a commit that referenced this pull request Jan 11, 2017
dscho added a commit that referenced this pull request Jan 18, 2017
dscho added a commit that referenced this pull request Jan 18, 2017
dscho added a commit that referenced this pull request Mar 26, 2017
git-for-windows-ci pushed a commit that referenced this pull request Mar 27, 2017
dscho added a commit that referenced this pull request Mar 27, 2017
…ash_perf

I should really implement special-handling for a new drop! prefix to
commit messages...

This commit reverts that Pull Request, in preparation for merging in a new
iteration of that work (which looks substantially different from the
previous iteration...).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
git-for-windows-ci pushed a commit that referenced this pull request Mar 27, 2017
git-for-windows-ci pushed a commit that referenced this pull request Mar 27, 2017
…ash_perf

I should really implement special-handling for a new drop! prefix to
commit messages...

This commit reverts that Pull Request, in preparation for merging in a new
iteration of that work (which looks substantially different from the
previous iteration...).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
git-for-windows-ci pushed a commit that referenced this pull request Mar 28, 2017
git-for-windows-ci pushed a commit that referenced this pull request Mar 28, 2017
…ash_perf

I should really implement special-handling for a new drop! prefix to
commit messages...

This commit reverts that Pull Request, in preparation for merging in a new
iteration of that work (which looks substantially different from the
previous iteration...).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
dscho added a commit that referenced this pull request Mar 30, 2017
dscho added a commit that referenced this pull request Mar 30, 2017
This commit helps to teach `git rerere` to resolve merge 
conflicts when cherry-picking:

	e920a5b (fixup! Merge pull request #964 from jeffhostetler/jeffhostetler/memihash_perf, 2017-03-27)
dscho added a commit that referenced this pull request Mar 30, 2017
This commit helps to teach `git rerere` to resolve merge 
conflicts when cherry-picking:

	e920a5b (fixup! Merge pull request #964 from jeffhostetler/jeffhostetler/memihash_perf, 2017-03-27)
dscho added a commit that referenced this pull request Mar 30, 2017
This commit helps to teach `git rerere` to resolve merge 
conflicts when cherry-picking:

	e920a5b (fixup! Merge pull request #964 from jeffhostetler/jeffhostetler/memihash_perf, 2017-03-27)
dscho added a commit that referenced this pull request Mar 30, 2017
This commit helps to teach `git rerere` to resolve merge 
conflicts when cherry-picking:

	e920a5b (fixup! Merge pull request #964 from jeffhostetler/jeffhostetler/memihash_perf, 2017-03-27)
git-for-windows-ci pushed a commit that referenced this pull request Mar 30, 2017
git-for-windows-ci pushed a commit that referenced this pull request Mar 30, 2017
…ash_perf

I should really implement special-handling for a new drop! prefix to
commit messages...

This commit reverts that Pull Request, in preparation for merging in a new
iteration of that work (which looks substantially different from the
previous iteration...).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
git-for-windows-ci pushed a commit that referenced this pull request Mar 30, 2017
git-for-windows-ci pushed a commit that referenced this pull request Mar 30, 2017
…ash_perf

I should really implement special-handling for a new drop! prefix to
commit messages...

This commit reverts that Pull Request, in preparation for merging in a new
iteration of that work (which looks substantially different from the
previous iteration...).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
git-for-windows-ci pushed a commit that referenced this pull request Mar 30, 2017
git-for-windows-ci pushed a commit that referenced this pull request Mar 30, 2017
…ash_perf

I should really implement special-handling for a new drop! prefix to
commit messages...

This commit reverts that Pull Request, in preparation for merging in a new
iteration of that work (which looks substantially different from the
previous iteration...).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
dscho added a commit that referenced this pull request Apr 2, 2017
dscho added a commit that referenced this pull request Apr 2, 2017
…ash_perf

I should really implement special-handling for a new drop! prefix to
commit messages...

This commit reverts that Pull Request, in preparation for merging in a new
iteration of that work (which looks substantially different from the
previous iteration...).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
git-for-windows-ci pushed a commit that referenced this pull request Apr 12, 2017
git-for-windows-ci pushed a commit that referenced this pull request Apr 12, 2017
…ash_perf

I should really implement special-handling for a new drop! prefix to
commit messages...

This commit reverts that Pull Request, in preparation for merging in a new
iteration of that work (which looks substantially different from the
previous iteration...).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants