Help diagnosing a problem with cygwin/samba on ROS

All development related issues welcome

Moderator: Moderator Team

bugdude
Posts: 28
Joined: Wed Nov 27, 2019 11:41 pm

Help diagnosing a problem with cygwin/samba on ROS

Post by bugdude »

Hello everyone,

First off Kudos, I have been working on 4.13-dev for some time and recently jumped to 4.15-dev and the greater stability really shows everywhere.

I have been working on porting Samba4 (including the servers) to ROS as a pet project for some time, Because Samba has a long dependency list I decided to use cygwin. In order to be compatible with Win2003 API levels an older cygwin was the only choice so I used the 2.5.2 version which was the last version compatible with W2K3 that uses LGPL. I finally got everything to build and I have a test candidate that works with alternate ports on Win 10, but unfortunately it died on first run on ROS 4.13-dev so I shifted to 4.15-dev hoping for better and found it did the same thing there. I instrumented Samba, and then cygwin, and recently ROS a bit, but after putting things together I ran into a new problem because I cannot get ROS file debug logging to work on ROS 4.15-dev..

I edited Freeeldr.ini and configured the LogFile boot item and tested it using both ArcName and Device paths but could not get the logging to work. I added some DPRINTS, commented out the NDEBUG define, and set the environment variables to enable all channels then edited freeldr.ini. It seems to slow down execution some, but no logfile seems to be getting created anywhere. I also tried it with no path specified but nothing works leaving me a bit stalled leading to my first question:
Any idea how I can get debug logging to work under 4.15-dev since the methods in the wiki page don't seem to work ?

The reason I am trying to enable logging is to track down a problem that occurs in ROS that does not happen when the same binary is executed under Windows 10. The application is basically opening a file and doing a random read of a small (4-byte) header data element using the NT Native API NtOpenFile, NtCreateFile, NtReadFile routines. The Failure actually happens in a call to NtReadFile which ends up returning STATUS_INVALID_PARAMETER. After reviewing the code in io/iomgr/iofunc.c and related files I decided to instrument the exit points to see where it exits because there are quite a few conditions that set that status. Most of them seem to be parameter checks and failures that apply only for non-cached access, but the file is not being opened specifying direct access or no caching, so those should not apply in this case. In any event that leads to my second question:
Are there any parameter validations in NtReadFile() under ROS that are more strict than in WIN 10 that could cause this ?

Based on tracing the call coming from cygwin looks like this:
NtOpenFile(<handle>, 0xC0100000, \??\C:\ReactOS\samba\var\lock\names.tdb, io, 0x7, 0x4020 ); -- this call succeeds
NtReadFile(<handle>, NULL NULL, NULL, &io, buf, count, &off, NULL); /* where count=4 and off=44 */ -- this call fails

I also tried rebuilding after replacing the first NULL in teh NtReadFile() call with a zero because some docs indicate it should be a UINT type and got the same results.

Help !!

Bugdude
oldman
Posts: 1179
Joined: Sun Dec 20, 2009 1:23 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by oldman »

Log files are saved in C:\reactos.
Please go to Mattermost chat and ask there, linking to your post here. You should get a faster response there.
Please keep the Windows classic 9x/2000 look and feel.
The layman's guides - debugging - bug reporting - compiling - ISO remaster.
They may help you with a problem, so do have a look at them.
learn_more
Developer
Posts: 246
Joined: Fri Dec 19, 2014 10:00 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by learn_more »

If you run this in a vm debugging should be a lot easier ;)
Check that your structs / pointers are properly aligned?

And if you can modify the kernel (but for some weird reason not view the logging) you can always change error codes around so you can see which path causes it :)
bugdude
Posts: 28
Joined: Wed Nov 27, 2019 11:41 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by bugdude »

Changing the codes is an interesting suggestion, I hadn't considered that one. Just goes to prove there are lots of ways to skin the proverbial cat.

I was being a fly on the wall on the chat last night for a while and noticed some mentions of logging problems in some builds.. maybe the nightly I grabbed as my testbed happens to have a logging issue. It's strange, I never had problems with other nightly builds, but they are all marked caveat emptor so I may have tripped myself up this time by grabbing one with a file logging issue. It logs to the screen fine (but is completely unusable due to the verbosity !), but apparently will not log to a file.

I have been testing in a VM, but under a shared linux host so I don't have easy access to the serial or pipe debug options.. I'll shift over to a local VirtualBox so I can see the debug for this stage of testing. I'll fall back to changing the error codes if I can't get the traces working on VBox.

Thanks for both suggestions, I was beating my head on the same wall rather than finding a new wall to try it on !

bugdude
bugdude
Posts: 28
Joined: Wed Nov 27, 2019 11:41 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by bugdude »

Hello again,

I believe I found a bug in the FastFat driver, It's a bug that requires a specific multi step sequence executed in order to trigger the failure. It must not be a very commonly used sequence to have survived this long in the driver, but the details are below. I found it because Cygwin uses this sequence fairly often and will trigger the failure predictably every time some applications are run.

I ended up instrumenting more parts of ReactOS in order to track down the problem I was having running some Samba servers. After instrumenting The application and Cygwin I ended up having to build a more heavily and specifically instrumented version of ReactOS to figure out what triggered the failure. It turned out to be something I had not expected. Based on what I read in MS Documentation it looks like it is a incorrect behavior in ReactOS in how it handles the windows internal API call NtOpenFile(). The failure I hit results from how Cygwin emulates the posix functions pread() and pwrite(). Since the Posix definition of those functions requires that the filehandles offset (or file pointer) not be moved so that other processes are unaffected by the use of pread/pwrite, Cygwin actually opens a second handle for pread() and pwrite() without closing the first handle.

In order to do that it calls NtOpenFile() again in what is sometimes referred to as 'open by handle' mode. The first call uses ObjectInitializeAttributes() to set the path using an NT path and OBJ_CASE_INSENSITIVE with no RootDirectory handle, but for the second call The Attributes are initialized with the RootDirectory Handle attribute set as the handle returned by the prior call to NtOpenFile(), and the path is left empty and no other elements initialized. Under Windows when this is done the handle is used to search for the object and it returns the same file object from the first call along with a new handle. Under ROS 0.4.15-dev-1610-g0cce134-x86-gcc-lin-dbg the search for the object is seen running and returns the status success but it actually returns the incorrect object. In the case of the problem I am working on the instrumentation I added to ROS showed that object returned by the search is the FAT Volume the file we attempted to open resides on (I think).

Here is an extract of the trace produced by the instrumentation I added in my build of ROS for testing (line numbers do not match normal build):

============First open call uses native path and is successful (view without word wrap, long lines)
(ntoskrnl/io/iomgr/file.c:3913) ntoskrnl/io/iomgr/file.c - NtCreateFile starting DA:0x3222274048 FA:0x128 SA:0x7 CDisp:0x3 CO:0x16416 EaL:0x0 Name: \??\C:\ReactOS\samba\var\lock\names.tdb
(ntoskrnl/io/iomgr/file.c:3162) ntoskrnl/io/iomgr/file.c - IoCreateFile starting DA:0x3222274048 FA:0x128 SA:0x7 CO:0x16416 EaL:0x0 Name: \??\C:\ReactOS\samba\var\lock\names.tdb
(ntoskrnl/io/iomgr/file.c:2651) ntoskrnl/io/iomgr/file.c - IopCreateFile Starting - FileName: \??\C:\ReactOS\samba\var\lock\names.tdb
(ntoskrnl/io/iomgr/file.c:2669) ntoskrnl/io/iomgr/file.c - IopCreateFile AccessMode != KernelMode.
(ntoskrnl/io/iomgr/file.c:2977) ntoskrnl/io/iomgr/file.c - IopCreateFile calling ObOpenObjectByName() without FILE_NO_INTERMEDIATE_BUFFERING CreateOption - FileName: \??\C:\ReactOS\samba\var\lock\names.tdb
(ntoskrnl/ob/obhandle.c:2557) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() starting, DesiredAccess: 0x3222274048 ObjectAttributes->RootDirectory: 00000000 FileName:\??\C:\ReactOS\samba\var\lock\names.tdb
(ntoskrnl/ob/obhandle.c:2595) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() received no PassedAccessState, preparing builtin.
(ntoskrnl/ob/obhandle.c:2631) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() calling ObpLookupObjectName().
(ntoskrnl/ob/obname.c:490) ntoskrnl/ob/obname.c - ObLookupObjectName ObpLookupObjectName - Finding Object: \??\C:\ReactOS\samba\var\lock\names.tdb. Expecting: 00000000
(ntoskrnl/io/iomgr/file.c:901) ntoskrnl/io/iomgr/file.c - IopParseDevice FileObject->Flags DOESN't include FO_NO_INTERMEDIATE_BUFFERING before OpenPacket call ParseObject: B234FB80 RemainingName: \ReactOS\samba\var\lock\names.tdb
(ntoskrnl/ob/obname.c:1244) ntoskrnl/ob/obname.c - ObpLookupObjectName completed, returning status: 0x0 ObpLookupObjectName - Found Object: B18D9620. Expected: 00000000
(ntoskrnl/ob/obhandle.c:2690) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() creating handle, calling ObpCreateHandle().
(ntoskrnl/ob/obhandle.c:1525) ntoskrnl/ob/obhandle.c - ObpCreateHandle: ObpCreateHandle - Creating handle for: B18D9620. Reason: 0. HC PC 0 2
(ntoskrnl/ob/obhandle.c:2737) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() completed, returning status - 0x0 Object: B18D9620 Handle: F51AEC2C
(ntoskrnl/io/iomgr/file.c:2989) ntoskrnl/io/iomgr/file.c - IopCreateFile received handle from ObOpenObjectByName(), handle: 00000150 - FileName: \??\C:\ReactOS\samba\var\lock\names.tdb
(ntoskrnl/io/iomgr/file.c:3069) ntoskrnl/io/iomgr/file.c - IopCreateFile saving new Handle - FileObject: B18D9620 FileHandle 00000150 FileName: \??\C:\ReactOS\samba\var\lock\names.tdb
(ntoskrnl/io/iomgr/file.c:3087) ntoskrnl/io/iomgr/file.c - IopCreateFile succesfully generated new FileObject/Handle - Dereferencing OpenPacket->FileObject: B18D9620
(ntoskrnl/io/iomgr/file.c:3097) ntoskrnl/io/iomgr/file.c - IopCreateFile Completed successfully - returning status: 0x0 handle: 00000150 FileName: \??\C:\ReactOS\samba\var\lock\names.tdb
(ntoskrnl/io/iomgr/file.c:3930) ntoskrnl/io/iomgr/file.c - NtCreateFile completed returning Status - 0x0 FileHandle: 0022C474 Name: \??\C:\ReactOS\samba\var\lock\names.tdb

... intervening operations removed, most unrelated to this file, the handle is used here to read the first block and validate the file type.

... at this point the application calls pread() and Cygwin attempts to open a new handle to the same file using 'open by handle' mode
==== This call succeeds on both Windows 10 and ReactOS, but ReactOS is returning incorrect object (visible in failed NtReadFile call following)
(ntoskrnl/io/iomgr/file.c:4117) ntoskrnl/io/iomgr/file.c - NtOpenFile starting, Calling IoCreateFile() DesiredAccess: 0x3222274048 ObjectAttributes->RootDirectory: 00000150 ShareAccess: 0x7 OpenOptions: 0x16416 FileName:(empty)
(ntoskrnl/io/iomgr/file.c:3162) ntoskrnl/io/iomgr/file.c - IoCreateFile starting DA:0x3222274048 FA:0x0 SA:0x7 CO:0x16416 EaL:0x0 Name:
(ntoskrnl/io/iomgr/file.c:2651) ntoskrnl/io/iomgr/file.c - IopCreateFile Starting - FileName:
(ntoskrnl/io/iomgr/file.c:2669) ntoskrnl/io/iomgr/file.c - IopCreateFile AccessMode != KernelMode.
(ntoskrnl/io/iomgr/file.c:2977) ntoskrnl/io/iomgr/file.c - IopCreateFile calling ObOpenObjectByName() without FILE_NO_INTERMEDIATE_BUFFERING CreateOption - FileName:
(ntoskrnl/ob/obhandle.c:2557) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() starting, DesiredAccess: 0x3222274048 ObjectAttributes->RootDirectory: 00000150 FileName:
(ntoskrnl/ob/obhandle.c:2595) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() received no PassedAccessState, preparing builtin.
(ntoskrnl/ob/obhandle.c:2631) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() calling ObpLookupObjectName().
(ntoskrnl/ob/obname.c:490) ntoskrnl/ob/obname.c - ObLookupObjectName ObpLookupObjectName - Finding Object: (null). Expecting: 00000000
(ntoskrnl/io/iomgr/file.c:901) ntoskrnl/io/iomgr/file.c - IopParseDevice FileObject->Flags DOESN't include FO_NO_INTERMEDIATE_BUFFERING before OpenPacket call ParseObject: B22E6018 RemainingName: (null)
(ntoskrnl/ob/obname.c:605) ntoskrnl/ob/obname.c - ObpLookupObjectName completed, returning status: 0x0 ObpLookupObjectName - Found Object: B18D2E50. Expected: 00000000
(ntoskrnl/ob/obhandle.c:2690) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() creating handle, calling ObpCreateHandle().
(ntoskrnl/ob/obhandle.c:2557) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() starting, DesiredAccess: 0x983103 ObjectAttributes->RootDirectory: 00000144 FileName:SAM
(ntoskrnl/ob/obhandle.c:2595) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() received no PassedAccessState, preparing builtin.
(ntoskrnl/ob/obhandle.c:2631) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() calling ObpLookupObjectName().
(ntoskrnl/ob/obname.c:490) ntoskrnl/ob/obname.c - ObLookupObjectName ObpLookupObjectName - Finding Object: SAM. Expecting: 00000000

... The object search loops through a good number of objects here looking for the handle, removed those entries to make this easier to follow

(ntoskrnl/ob/obhandle.c:1525) ntoskrnl/ob/obhandle.c - ObpCreateHandle: ObpCreateHandle - Creating handle for: B234F7F8. Reason: 1. HC PC b 2b
(ntoskrnl/ob/obhandle.c:2868) ntoskrnl/ob/obhandle.c - ObOpenObjectByPointer() completed, returning status - 0x0 Object: B234F7F8 Handle: F5224CDC
(ntoskrnl/ob/obhandle.c:1838) ntorkrnl/ob/obhandle.c - ObpCloseHandle ObpCloseHandle - Closed handle: 000007F8 S: 0
(ntoskrnl/ob/obhandle.c:1525) ntoskrnl/ob/obhandle.c - ObpCreateHandle: ObpCreateHandle - Creating handle for: B18D2E50. Reason: 0. HC PC 0 2
(ntoskrnl/ob/obhandle.c:2737) ntoskrnl/ob/obhandle.c - ObOpenObjectByName() completed, returning status - 0x0 Object: B18D2E50 Handle: F51AEC40
(ntoskrnl/io/iomgr/file.c:2989) ntoskrnl/io/iomgr/file.c - IopCreateFile received handle from ObOpenObjectByName(), handle: 00000128 - FileName:
(ntoskrnl/io/iomgr/file.c:3069) ntoskrnl/io/iomgr/file.c - IopCreateFile saving new Handle - FileObject: B18D2E50 FileHandle 00000128 FileName:
(ntoskrnl/io/iomgr/file.c:3087) ntoskrnl/io/iomgr/file.c - IopCreateFile succesfully generated new FileObject/Handle - Dereferencing OpenPacket->FileObject: B18D2E50
(ntoskrnl/io/iomgr/file.c:3097) ntoskrnl/io/iomgr/file.c - IopCreateFile Completed successfully - returning status: 0x0 handle: 00000128 FileName:
(ntoskrnl/io/iomgr/file.c:4140) ntoskrnl/io/iomgr/file.c - NtOpenFile completed returning Status - 0x0 Name:

... above open returned file handle 00000128 which should be pointing at the same file that was opened in handle 00000150 (see ObjectAttributes->RootDirectory at start of call) but is not

=========== This is our faulting read. Apparently the object lookup gave us the wrong object because this object is the volume
=========== Volume is non-cached, so our random read fails because it is not aligned as required for non-cached reads
ntoskrnl/io/iomgr/iofunc.c - NtReadFile starting FileHandle: 00000128 Length: 4 Offset: 2c
(ntoskrnl/io/iomgr/iofunc.c:140) IopPerformSynchronousRequest starting IRP: B1995008. DO: B22E6018. FO: B18D2E50
(drivers/filesystems/fastfat/rw.c:731) driver/filesystem/fastfat/rw.c - VfatRead(IrpContext B19F19C8)
(drivers/filesystems/fastfat/rw.c:755) driver/filesystem/fastfat/rw.c - VfatRead IsVolume set True from Fcb->Flags.
(drivers/filesystems/fastfat/rw.c:763) driver/filesystem/fastfat/rw.c - VfatRead IsVolume set False from Irp->Flags.
(drivers/filesystems/fastfat/rw.c:780) driver/filesystem/fastfat/rw.c - VfatRead found FCB_IS_PAGE_FILE False.
(drivers/filesystems/fastfat/rw.c:783) driver/filesystem/fastfat/rw.c - VfatRead Fcb->PathNameU: <\$$Volume$$>
(drivers/filesystems/fastfat/rw.c:798) drivers/filesystems/fastfat/rw.c - VFatRead '\$$Volume$$', Offset: 44, Length 4
(drivers/filesystems/fastfat/rw.c:813) driver/filesystem/fastfat/rw.c - VfatRead processing Cached IrpContext: B19F19C8 IRP: B1995008 BytesPerSector: 512 Length: 4 Offset: 44 <\$$Volume$$>
(drivers/filesystems/fastfat/rw.c:839) drivers/filesystems/fastfat/rw.c - VFatRead 44 4
(drivers/filesystems/fastfat/rw.c:841) drivers/filesystems/fastfat/rw.c - VFatRead non cached read is not sector aligned, set status STATUS_INVALID_PARAMETER, jump to exit.
(drivers/filesystems/fastfat/rw.c:933) drivers/filesystems/fastfat/rw.c - VFatRead() completed, returning STATUS_INVALID_PARAMETER.
(ntoskrnl/io/iomgr/iofunc.c:197) IopPerformSynchronousRequest completed IRP: B1995008. DO: B22E6018. FO: B18D2E50 status - 0xc000000d
(ntoskrnl/io/iomgr/iofunc.c:3254) ntoskrnl/io/iomgr/iofunc.c - NtReadFile completed, IRP: B1995008 returning Status - 0xc000000d (STATUS_INVALID_PARAMETER).
======== The calling application validates this result so it cleans up, aborts and exits.

Can anyone in the ROS team look at this and determine if this is a 'bug' in the current version of ROS. The traces seem to indicate it is a bug in the current FastFat driver code. Apparently this method is not very commonly used if the issue has survived unnoticed this long.

BugDude
bugdude
Posts: 28
Joined: Wed Nov 27, 2019 11:41 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by bugdude »

I am trying to set up a VM with Windows server 2003 to confirm that this issue is not present in the windows API version that ROS is currently targeting. It does not occur on Windows 10, but ROS is not yet supposed to be compatible with that API version.

EDIT: Just tested it and this problem doesn't come up on Server 2003, the application handle points at the correct file there when NtOpenFile() is called this way and the application does not crash.

BugDude
Last edited by bugdude on Mon Feb 01, 2021 7:07 pm, edited 1 time in total.
vicmarcal
Test Team
Posts: 2733
Joined: Mon Jul 07, 2008 12:35 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by vicmarcal »

Just an idea...have you tried BTRFS or a different file system? Does still show same issue? (If your Samba is compatible, not sure about that, not a FS dev here) Just to check if the bug is in the driver or deeper in the FS stack.
bugdude
Posts: 28
Joined: Wed Nov 27, 2019 11:41 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by bugdude »

I also built a test rig application using Delphi and was able to reproduce the failure in ROS but outside of Samba if I followed the same sequence of calls.

I'll test for it under BtrFs, but since the fault seems to be in the VFat driver itself I wouldn't expect to find the same bug (though it's not impossible they both have the same bug). Now that I have the test rig it's much easier to test this since fewer things need to be copied, so I'll let you know what I find. If it is not preset on BtrFs it at least narrows scope by confirming the issue lies only in the VFat driver..

Bugdude
bugdude
Posts: 28
Joined: Wed Nov 27, 2019 11:41 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by bugdude »

As I expected, under BtrFs the error does not occur. That confirms it is an issue in the VFat filesystem driver, and that the code where it occurs was not carried over into the BtrFs driver.

I had to set up a BtrFs test under 4.13 rel 16 because I cannot get 4.15 dev to install on BtrFs or even upgrade a 4.13 install on BtrFs. USetup doesn't think there's an existing install and sees the partition as a Linux native partition.. Very odd, but since the 4.13 release installed on BtrFs I was able to test there just fine.

The test rig I built in Delphi just carries out the minimal steps and confirms that the reads fail for the second open. I made it not only to have a lighter test to move around, but also to confirm that I had the complete sequence required and could reproduce the failure. I'm going to tinker with the VFat driver code and see if I can find the issue. Since I'm not much of a C programmer (more of hack than anything) I'm not sure I can find or fix it, but here goes nuthin..

BugDude
Zero3K
Posts: 45
Joined: Tue Jul 05, 2016 12:56 am

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by Zero3K »

Please test it the CC/MM PR available at https://github.com/reactos/reactos/pull/3361 since it enables the new FastFat driver that might not have the issue (and should be fixed if it does).
learn_more
Developer
Posts: 246
Joined: Fri Dec 19, 2014 10:00 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by learn_more »

bugdude wrote: Mon Feb 01, 2021 7:38 am The failure I hit results from how Cygwin emulates the posix functions pread() and pwrite(). Since the Posix definition of those functions requires that the filehandles offset (or file pointer) not be moved so that other processes are unaffected by the use of pread/pwrite,
what?
do you expect that if process a opens a handle to a file, and so does b,
that when process a reads something, process b is affected by the read pointer from a moving?
extravert34
Posts: 7
Joined: Sun Apr 07, 2013 11:50 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by extravert34 »

The pull request has been merged, try the latest nightly build again
bugdude
Posts: 28
Joined: Wed Nov 27, 2019 11:41 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by bugdude »

I will look at the newer build, but I'm still interested in finding the root cause for this as it may point to a bug somewhere.

At the moment it doesn't get far enough to see if there's a problem with read pointer handling. Currently what happens is that the first call to NtFileOpen (done with a full path and a blank RootDir handle) works fully, but the second call (done with RootDir handle set to the handle returned by the first open and a empty path) results in a different FileObject being returned that points at the Volume object hosting the file rather than the file, and then the NtReadFile call which is done using an ByteOffset of 44 and Length of 4 fails because the Volume object is treated as non-cached in NtreadFile and therefore requires sector alignment (which an offset 44 does not have). Under Windows the second call to NtFileOpen does read the File rather than the Volume.

The interesting part right now, is that this issue doesn't happen under BtrFs, even though most of parsing and object selection code used by NtFileOpen is in the kernel Object Manager or IO manager which both filesystems share.. I have been using a combination of tracing and additional instrumentation to find out precisely why this fails under FatFs and then I plan to test under BtrFs to see why it differs. Using the new FatFs could fix the problem, but I am curious about why it happens at all so I am still tearing into the Kernel and FatFs to see if I can pinpoint the cause.

Calls to NtOpenFile with RootDir->Handle set are very common and the instrumentation shows a lot of them during OS Startup. This looks like an edge case that is not very common though because few or none of those calls seem to pass in a blank/empty Path with a handle. Digging through it is slow because instrumenting or tracing the object manager IO Manager and FatFs results in very slow execution, and tracing this requires a good bit more instrumentation than those elements normally have..

As far as the read pointer goes, this version of Cygwin seems to assume the read pointer is associated with the 'handle' object under Windows so having two 'handles' means having functionally independent file pointers. It is possible that this underlying assumption is incorrect, but I really haven't got a failure that is caused by that assumption 'yet' because I don't get that far.. Even so, if the assumption is true under Windows and not under ROS that could be another source of failure further down the line, and might also be an operational difference between Windows and ROS. Unfortunately more testing would be needed to confirm that after getting past the current problem and I have limited time to work on it.. I may build a test rig in Delphi to test that assumption about the file Pointer on Windows and ROS as well, it's another interesting question.. It's a bit of a chicken and egg type question, who cane first the handle or the FileObject ? It could be that they used this odd method of opening the file because under Windows it causes creation of a separate FileObject yielding two handles, or maybe they just used it because they didn't have ready access to the filename but they did have the handle, it's anyones guess at this point.


bugdude
bugdude
Posts: 28
Joined: Wed Nov 27, 2019 11:41 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by bugdude »

BTW, thanks to each of you for your response, I'm very glad to have advice from others. This has been a learning exercise for me because it's been years since I did any real development work.
bugdude
Posts: 28
Joined: Wed Nov 27, 2019 11:41 pm

Re: Help diagnosing a problem with cygwin/samba on ROS

Post by bugdude »

For anyone interested I found that adding the PID to the tracing output and then filtering it by PID allowed me to get a clearer picture of what is happening. The expanded and lightly annotated trace extract is below for the curious.

I still haven't gotten to the bottom of this, but the trace now shows a much easier to interpret pattern. The why is not yet clear, but the output of the Delphi test app and the trace show that under ROS we definitely do not get back the correct FileObject from the Object Manager when this variation of NtOpenFile is used with the older VFat driver, while on Windows 10 we do get the correct file.. The first sign of this is seen in the test application, the application follows both calls to NtFileOpen by calling NtQueryInformationFile to get the file size before reading anything. When run on Windows 10 the file sizes from both open calls are identical (696 bytes), while under ReactOs the first open returns that file Length expected and the second returns 42677423 bytes. The expanded instrumentation gives more detail showing a path and flags for the FileObject, indicating that the parent Volume Object is returned rater than the desired file. VFatRead shows IsVolume is true in the Object properties, and the FCB->PathNameU it sees is $$Volume$$. The object returned does not seem to be the file we expected. This looks like a problem in ObOpenObjectByName making me wonder why it doesn't fail on BtrFs. Presumably this is due to the callout for the Parse functions which is where the two interact in the process, but I haven't found the exact friction point yet.

===== second NtOpenFile call made using the file handle returned by first open as RootDir->Handle, with blank path =========================
===== this call returns success status and the program happily continues using this handle, but it is NOT pointing at our file so we get odd results
===== the handle it returns is in fact not even a file, it is the Volume that hosts the file(I think), it is definitely a volume according to the flags
===== present in subsequent reads, and like all filesystem volumes accessed directly it is non-cached so access must be sector aligned and sized
===== when the same application is run unchanged on BtrFs volume this does not happen, we get back a file handle that allows unaligned reads
===== (more testing on BtrFs is needed to see if it is getting separate handle to correct file, as Cygwin expects, same handle, something else)

(ntoskrnl/io/iomgr/file.c:4182) (PID 5bc) NtOpenFile starting, Calling IoCreateFile() DesiredAccess: 0x3222274048 ObjectAttributes->RootDirectory: 0000041C ShareAccess: 0x7 OpenOptions: 0x16416 FileName:
(ntoskrnl/io/iomgr/file.c:3185) (PID 5bc) IoCreateFile starting DA:0x3222274048 FA:0x0 SA:0x7 CO:0x16416 EaL:0x0 Name:
(ntoskrnl/io/iomgr/file.c:2672) (PID 5bc) IopCreateFile Starting - FileName:
(ntoskrnl/io/iomgr/file.c:2690) (PID 5bc) IopCreateFile AccessMode != KernelMode.
(ntoskrnl/io/iomgr/file.c:2998) (PID 5bc) IopCreateFile calling ObOpenObjectByName() without FILE_NO_INTERMEDIATE_BUFFERING CreateOption - FileName:
(ntoskrnl/ob/obhandle.c:2557) (PID 5bc) ObOpenObjectByName starting, DesiredAccess: 0x3222274048 ObjectAttributes->RootDirectory: 0000041C FileName: <=== Object lookup via handle only
(ntoskrnl/ob/obhandle.c:2595) (PID 5bc) ObOpenObjectByName received no PassedAccessState, preparing builtin.
(ntoskrnl/ob/obhandle.c:2631) PID 5bc) ObOpenObjectByName calling ObpLookupObjectName().
(ntoskrnl/ob/obname.c:490) (PID 5bc) ObLookupObjectName ObpLookupObjectName - Finding Object: (null). Expecting: 00000000
(ntoskrnl/ob/obname.c:564) (PID 5bc) ObpLookupObjectName starting parse while loop.
(ntoskrnl/ob/obname.c:572) (PID 5bc) ObpLookupObjectName calling Parse with RootDirectory: B19CFA38 RemainingName: (null) <=== Parse gets incorrect Object, first NtOpenFile created one
(ntoskrnl/io/iomgr/file.c:1413) (PID 5bc) IopParseFile Associated driver: B234E4E8 (\FileSystem\fastfat) <=== gets correct associated driver
(ntoskrnl/io/iomgr/file.c:901) (PID 5bc) IopParseDevice FileObject->Flags DOESN't include FO_NO_INTERMEDIATE_BUFFERING before OpenPacket call ParseObject: B22E4018 RemainingName: (null)
(ntoskrnl/ob/obname.c:609) (PID 5bc) ObpLookupObjectName completed via ParseRoutine, returning status: 0x0 ObpLookupObjectName - Found Object: B19CED78. Expected: 00000000
== appearances are that this is where the problem comes from, the object manager did not return our file object
(ntoskrnl/ob/obhandle.c:2690) (PID 5bc) ObOpenObjectByName creating handle, calling ObpCreateHandle(). <=== from this point forward the trace mirrors the original NtOpenFile
(ntoskrnl/ob/obhandle.c:1525) (PID 5bc) ObpCreateHandle: ObpCreateHandle - Creating handle for: B19CED78. Reason: 0. HC PC 0 2 <=== new handle created
(ntoskrnl/ob/obhandle.c:2737) (PID 5bc) ObOpenObjectByName completed, returning status - 0x0 Object: B19CED78 Handle: F51D3C40 <=== Object manager returns object and handle
(ntoskrnl/io/iomgr/file.c:3010) (PID 5bc) IopCreateFile received handle from ObOpenObjectByName(), handle: 00000418 - FileName:
(ntoskrnl/io/iomgr/file.c:3090) (PID 5bc) IopCreateFile saving new Handle - FileObject: B19CED78 FileHandle 00000418 FileName: <=== got back different Object and handle (418 not 41C)
(ntoskrnl/io/iomgr/file.c:3108) (PID 5bc) IopCreateFile succesfully generated new FileObject/Handle - Dereferencing OpenPacket->FileObject: B19CED78 <=== prior open got B19CFA38
(ntoskrnl/io/iomgr/file.c:3118) (PID 5bc) IopCreateFile Completed successfully - returning status: 0x0 handle: 00000418 FileName:
(ntoskrnl/io/iomgr/file.c:3203) (PID 5bc) IoCreateFile completed status - 0X0 DA:0x3222274048 FA:0x0 SA:0x7 CO:0x16416 EaL:0x0 Name:
(ntoskrnl/io/iomgr/file.c:4203) (PID 5bc) NtOpenFile completed returning Status - 0x0 FileHandle: 0012ED04 Name: <=== NtOpenFile returns success because Object Manager did same
===== The stage is set for our failure, so far ROS thinks everything is correct and kosher but it does not look like its correct

===== The test application continues performing the same sequence because it has received success status from all of the API calls it made
===== we may see the first sign of problems is filesize from test application which is different on Win 10 and ReactOS
===== got back 426770432 on ROS, but the correct filesize (696) is returned from both handles on Win 10
(ntoskrnl/io/iomgr/iofunc.c:2492) (PID 5bc) NtQueryInformationFile starting FileHandle: 00000418
(ntoskrnl/io/iomgr/iofunc.c:2581) (PID 5bc) NtQueryInformationFile Associated driver: B234E4E8 (\FileSystem\fastfat)
(ntoskrnl/io/iomgr/iofunc.c:2688) (PID 5bc) NtQueryInformationFile completed in FastIO FileHandle: 00000418 status: 0x0

===== This NtReadFile call is the first actual API call by test application that fails on ROS, fails because the length of four is not sector aligned =====
(ntoskrnl/io/iomgr/iofunc.c:2945) (PID 5bc) NtReadFile starting FileHandle: 00000418 Length: 4 Offset: NULL
(ntoskrnl/io/iomgr/iofunc.c:2970) (PID 5bc) NtReadFile Associated driver: B234E4E8 (\FileSystem\fastfat)
(ntoskrnl/io/iomgr/iofunc.c:3214) (PID 5bc) NtReadFile preparing stack adn IRP: B19CD9A8 FileHandle: 00000418 Length: 4
(ntoskrnl/io/iomgr/iofunc.c:140) (PID 5bc) IopPerformSynchronousRequest starting IRP: B19CD9A8. DO: B22E4018. FO: B19CED78
(ntoskrnl/io/iomgr/iofunc.c:141) (PID 5bc) IopPerformSynchronousRequest Associated driver: B234E4E8 (\FileSystem\fastfat)
(drivers/filesystems/fastfat/rw.c:740) (PID 5bc) VfatRead IrpContext B1A516D8 IRP: B19CD9A8 DO: B22E4018 FO: B19CED78 PagingIo: 0 NoCache: 0
(drivers/filesystems/fastfat/rw.c:758) (PID 5bc) VfatRead IrpContext: B1A516D8 IsVolume set True from Fcb->Flags.
(drivers/filesystems/fastfat/rw.c:790) (PID 5bc) VFatRead IrpContext: B1A516D8 FCB->PathNameU: <\$$Volume$$>, Offset: 0, Length 4
(drivers/filesystems/fastfat/rw.c:803) (PID 5bc) VfatRead processing Cached IrpContext: B1A516D8 IRP: B19CD9A8 BytesPerSector: 512 Length: 4 Offset: 0 <\$$Volume$$>
(drivers/filesystems/fastfat/rw.c:826) (PID 5bc) VFatRead IrpContext: B1A516D8 Offset: 0 Length: 4
(drivers/filesystems/fastfat/rw.c:828) (PID 5bc) VFatRead IrpContext: B1A516D8 non cached read is not sector aligned, set status STATUS_INVALID_PARAMETER, jump to exit.
(drivers/filesystems/fastfat/rw.c:913) (PID 5bc) VFatRead IrpContext: B1A516D8 IRP: B19CD9A8 completed at end of routine, returning STATUS_INVALID_PARAMETER.
(ntoskrnl/io/iomgr/iofunc.c:198) (PID 5bc) IopPerformSynchronousRequest completed IRP: B19CD9A8. DO: B22E4018. FO: B19CED78 status - 0xc000000d
(ntoskrnl/io/iomgr/iofunc.c:3331) (PID 5bc) NtReadFile completed, IRP: B19CD9A8 returning Status - 0xc000000d (STATUS_INVALID_PARAMETER).
===== we ignore the failure and continue, the SAMBA library doesn't check the status of this operation and continues as well ====

The results of the next read also fail and the SAMBA library does validate that one, so the application crashes and exits at that point. the next read is identical to the above call to NtReadFile except that it also adds a ByteOffset of 44. If this status had been validated the application would have crashed here instead..

bugdude
Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests