status and commit uses different algorithms to find changes

Discussion:

Alexander Belchenko

2008-10-08 13:02:57 UTC

Hi John,

Once not so while ago we talked in IRC and I said that it's to sad that commit don't use
working tree's iter_changes() method to collect changes for commit. As I remember you
said that it's not true and in reality commit uses it.

I can show use case where commit behaves differently than status in the same tree:

C:\Temp\stvsci>bzr init
Standalone tree (format: pack-0.92)
Location:
branch root: .

C:\Temp\stvsci>bzr mkdir foo
added foo

C:\Temp\stvsci>bzr st
added:
foo/

C:\Temp\stvsci>rmdir foo

C:\Temp\stvsci>bzr st

C:\Temp\stvsci>mkdir FOO

C:\Temp\stvsci>bzr st
unknown:
FOO/

C:\Temp\stvsci>bzr ci -m1
Committing to: C:/Temp/stvsci/
added foo
Committed revision 1.

C:\Temp\stvsci>bzr st
removed:
foo/
unknown:
FOO/

So, commit actually collect changes not via iter_changes, or uses iter_changes data incorrectly.
I think there is already bug report filed about case-insensitive filesystems and maybe even not one,
but IMHO problem not in case-insensitive filesystem. Bug in commit's collector code.

John Arbash Meinel

2008-10-08 13:19:29 UTC

Permalink

Post by Alexander Belchenko
Hi John,
Once not so while ago we talked in IRC and I said that it's to sad that commit don't use
working tree's iter_changes() method to collect changes for commit. As I remember you
said that it's not true and in reality commit uses it.

I wouldn't have said that, as I know commit does *not* use
iter_changes(). It can't, because of issues when there are more than one
parent. We have to compare against both sides, *and* pay attention to
last-modified revisions, while iter_changes() does not.

A while back Ian wrote a patch which turned on a special case when there
was a single parent, and used iter_changes in that circumstance. IIRC
Robert didn't like having the special case in there.
...

Post by Alexander Belchenko
C:\Temp\stvsci>bzr ci -m1
Committing to: C:/Temp/stvsci/
added foo
Committed revision 1.
C:\Temp\stvsci>bzr st
foo/
FOO/
So, commit actually collect changes not via iter_changes, or uses
iter_changes data incorrectly.
I think there is already bug report filed about case-insensitive
filesystems and maybe even not one,
but IMHO problem not in case-insensitive filesystem. Bug in commit's collector code.

commit *isn't* using 'iter_changes'. It is going out and using
"os.lstat()" on all the records it thinks it needs to inspect, which on
a case insensitive system will return a value. 'iter_changes' uses
list_dir() and the does its own filename comparison, which will see the
files/dirs as different objects.

What we *should* do is write something like iter_changes which handles
the extra stuff that commit needs. However, it ends up as a rather large
overhaul of our commit logic. And I think Robert is focusing on
reworking Inventory so that the data recorded can allow us to improve
things higher up the stack, rather than pushing down from above.

I'm not entirely sure where things were left with what Ian and Robert
were discussing with the commit code. I can agree, though, that having a
special case for 1-parent, and then stuff like your commit would work in
a plain commit, but fail for a merge commit, wouldn't really be optimal
either.

John
=:->

Alexander Belchenko

2008-10-08 13:38:40 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I wouldn't have said that, as I know commit does *not* use
iter_changes().

So I misunderstood you then.

Why this question arised: I've discussed with you problem in my branch
where I have commit with no changes in some files. So `bzr log FILE`
show wrong log.
I think it was because I've used during that commit my rejected patch
for end-of-lines support.

And I don't know does this problem affect Ian's filtering work too or not.
In my case it seems like commit schedule file to commit even before eol filter
starting to work.

And it's too sad to see what's going on with Ian's work for filtering content.
I.e. nothing. It's just hanging indefinitely again.
Six months ago Ian started his work. It's a shame to see this end.

It can't, because of issues when there are more than one
parent. We have to compare against both sides, *and* pay attention to
last-modified revisions, while iter_changes() does not.

OK. I was wrong but anyway my point remains.

Thank you for explanation.

Alexander Belchenko

2008-10-08 17:45:42 UTC

Permalink

Post by John Arbash Meinel

Stop. I should make frank confession that I don't understand this.
Perhaps I'm inevitably stupid.

You said that commit should compare against both sides, otherwise what?

You want to say that after merge both `bzr status` and `bzr diff` show to user
*wrong* info? IIUC they both are used iter_changes under the hood. Is it true?
So they are produced wrong output? Is it true?
So all this years I've used bzr diff to see merge result, and all this years bzr lies me?
Am I understand correctly?

I think something wrong here in either case. Because I don't understand how it possible
to see *right* output from status and diff after merge, while they're using iter_changes
and in the same time claim that iter_changes wrong for collecting changes for commit
pending merges.

I'm totally lost here.

John Arbash Meinel

2008-10-08 18:21:18 UTC

Permalink

Post by Alexander Belchenko
I think something wrong here in either case. Because I don't understand how it possible
to see *right* output from status and diff after merge, while they're using iter_changes
and in the same time claim that iter_changes wrong for collecting changes for commit
pending merges.
I'm totally lost here.

iter_changes gives the correct results when comparing 2 trees.

commit after a merge has to record information about 3 trees. base
parent, merged parent, and current tree.

As I said before we *could* use iter_changes as it is right now in the
case of a single parent (no merges).

"bzr status" after a merge only shows the changes versus the base
parent, which is generally what you want anyway. But for *commit* you
need to be aware of what was in the base parent, what was brought in
from the merge parent, what is new between both sides.

John
=:->

Alexander Belchenko

2008-10-08 21:28:25 UTC

Permalink

Post by John Arbash Meinel

iter_changes gives the correct results when comparing 2 trees.
commit after a merge has to record information about 3 trees. base
parent, merged parent, and current tree.
As I said before we *could* use iter_changes as it is right now in the
case of a single parent (no merges).
"bzr status" after a merge only shows the changes versus the base
parent, which is generally what you want anyway. But for *commit* you
need to be aware of what was in the base parent, what was brought in
from the merge parent, what is new between both sides.

I'm still don't understand why you need to compare against each parent.
bzr force user to use silver mainline in most of the cases. Why you
need track changes against merge parent?
Just to write down merge point in file graph? Why this cannot be obtained
as output from graph merge algorithm? Why you need to compare trees?
In the case of octopus merge you need to compare against 3 and more parents.

So why bzr don't run iter_changes N times to compare N trees and collect results?
It's slow?
IIUC, result of N comparing will be summed up to get resulting set of changes.

Anyway, to collect changes in the tree against parent iter_changes should be used,
because they're designed to get correct diff between working and revision tree.
Because working tree should be filtered out correctly before comparison.

And anyway I don't understand why commit uses os.lstat. It's wrong.
It's break not only with case-insensitive filesystem but with fake symlinks too.

As said: anything I did is sucks by commit.