§ ¶Scheduling bottlenecks in 3D filter acceleration
As I noted last time, there are reasons why VirtualDub's 3D filter acceleration has problems if a display mode switch is triggered. There are, however, also some performance bottlenecks in the implementation in 1.9.5 that I'm working on resolving. Here's an example:
This is a screenshot from VirtualDub's real-time profiler, showing CPU usage during a video analysis pass using a mix of CPU and GPU filters (warp sharp on GPU + rotate2 on CPU). The main things to notice are the long V-Filter section on the Processor and the idle times on the Filter 3D Accel thread. This is the time during which the video filter system runs. The basic problem here is that the video filter system is single-threaded and all calls into the accelerator are done as blocking calls, synchronizing the threads. The result is that the processing thread is blocked while the readback is occurring (the long operation with Poll and Readback blocks) and then the acceleration thread goes idle while the processing thread is busy doing other tasks. This limits concurrency between the CPU and the GPU.
In my current dev branch, the situation has changed a bit.
The first thing to notice... is the lack of color. That's because I'm currently redoing the profiling architecture to be lighter weight and to capture a performance log instead of just per-second snapshots. I didn't think it would make a lot of difference, but now it definitely seems harder to read without the color coding.
That aside, you can see the the accelerator thread (top thread) is much more busy in this version. There are two reasons for this. The first is that in this build the filter system has the ability to "hand off" a particular invocation of a filter instance. For various reasons the filter system cannot currently be run multithreaded, but the filter instances can -- so what the filter system does is set up a frame, hand it off for asynchronous execution, and then later closes the frame and collects the output once the filter is done. The second reason for the improvement is that the render pipeline can now queue more than one frame request in the filter system. It still isn't possible to allow a single filter instance to process multiple frames in parallel, since the filters have mutable state which generally prohibits this, but this does permit different filters to queue up behind each other, so that the accelerator can work on the warpsharp instance for one frame and then download another frame without intervention from the main processing thread. The result is a modest increase in frame rate for this chain, going from about 17 fps to 20 fps for about a 20% improvement.
(Read more....)§ ¶Why 3D accelerated rendering stops if the machine is locked
If you have video filters in the chain which are using 3D acceleration, VirtualDub aborts rendering as soon as you lock the computer.
There are three reasons for this.
The first is the old bane of Direct3D 9 programming, the "lost device" state. When you lock the workstation, Windows switches desktops to the secure desktop that holds the login dialog. This deactivates the desktop that holds all of your applications, and a side effect is that it also deactivates all Direct3D 9 contexts. All D3D9-based applications are then blocked from doing any rendering as API calls return the error code D3DERR_DEVICELOST. That forces VirtualDub to pause rendering.
The second reason is a bit more annoying. When the "device lost" state is inflicted, it also causes all surfaces on the 3D device to be immediately dropped. This is a problem for VirtualDub because any video frames that are in progress or cached from accelerated filters exist only on the 3D device and are lost. They are kept only on the 3D device because it's really expensive to read them back into system memory. This means that as soon as the lost device state occurs, a number of frames in the video pipeline have been lost. Recovering from this situation means backing up the video pipeline and retrying the frames that were lost, which VirtualDub can't currently do. Therefore, the only course of action possible is to abort the render.
The third reason is the most subtle. Assuming that VirtualDub's renderer could be modified to support backing up and re-rendering the lost frames, that assumes that it knows which frames were lost. In the past, I've seen issues with Direct3D drivers and the runtime where the device lost state wasn't reported until a readback into system memory (GetRenderTargetData) had succeeded. The result is that a bad frame is read back and reported as successfully copied. In a retry-capable scenario, this would mean silent corruption of the video, which would be a very bad thing.
Some or all of these issues could theoretically be bypassed by using a Direct3D 9Ex device object, which is available under Windows Vista or later with a WDDM display driver. Direct3D 9Ex bypasses the lost device emulation that is normally done for D3D9 applications and theoretically would avoid problems with lost devices. I haven't implemented this or tested if it works reliably, however.
To sum it up, if you have enabled 3D filter acceleration in VirtualDub and have a render going... you should leave the machine alone until it's done.
(Read more....)§ ¶VirtualDub 1.9.5 released
VirtualDub 1.9.5 is now out. This release is also tagged "stable" and contains only bug fixes.
Previously, a commenter asked about what changes were new to the 1.9.x series. Here's a list:
- Additional YCbCr format support. v210 and NV12 are now supported by the blitter engine and can be selected as input and output formats. The internally supported 4:4:4 planar "YV24" mode is now exposed as well.
- Display improvements: The Direct3D 9 display minidriver has been updated to render several more formats in hardware, including HDYC, v210, and Pal8. In addition, on devices supporting pixel shader 2.0 or above, it can dither higher precision data down to 8-bit, most noticeable with the 10-bit/channel v210 format.
- Multi-frame fetching in video filters: The video filter system has been rewritten to support a request/process model. The result is that video filters can now explicitly request multiple source frames for each output frame they produce. This eliminates the need to do tricky internal buffering inside of a video filter to emulate such support and makes it much easier to implement video filters that require a frame window, such as field shifters and deinterlacers.
- New video filters: You can now switch field dominance with field delay, change frame rate with linear blending with interpolate, and merge fields into interlaced frames with interlace. The warp sharp filter is now internal, with added YCbCr support. IVTC is now a video filter instead of a special-cased pipeline stage, so it can be repositioned within the filter chain and its output can now be previewed on the timeline.
- Video filter updates: Levels, brightness/contrast, field swap, and convert format have improved YCbCr support. The deinterlace filter now supports more powerful ELA and Yadif modes.
- Video filter 3D acceleration: The video filter system now supports 3D hardware acceleration through Direct3D 9 for filters that support it, which can give speed improvements with a fast video card and a long chain of 3D accelerated filters. This is still in its infancy, but expect improvements in performance here in the future.
- Video filter API improvements: Additions to the API to aid video filter authors include support for aligned scanlines for SSE2 support, and the ability to tell the smart rendering system when a filter can be bypassed.
- Configurable keyboard shortcuts: Keyboard shortcuts for all menu commands in edit mode are now customizable, letting users set up more efficient workflows.
- Additional internal decoders: The internal MJPEG decoder can now be used in 64-bit builds, and the internal Huffyuv decoder is also useful when no 64-bit decoder for Huffyuv is available.
- Performance improvements: The audio render buffer size can now be tuned. Also, the rendering status dialog now shows you more pertinent stats on the buffer levels of the audio and video pipelines and the activity levels of the I/O and processing threads, giving better visibility into bottlenecks.
Change list for 1.9.5:
(Read more....)§ ¶Windows Installer... well, you know
I've never kept it a very good secret that I don't like Windows Installer. It's slow, it eats an unreasonable amount of disk space for caches and even more disk space during installation (>3x install footprint), it throws errors that are cryptic even to programmers, and it has lame limitations like needing to map the entire installer into contiguous address space on XP. My experience installing VS2005 SP1 was so bad that I now keep around a slipstreamed install just so I never have to run the horrid patcher again. I believe I've found a new reason to hate it:
If TARGETDIR itself isn’t redirected, it defaults to ROOTDRIVE which is the fixed drive with the most free space available, and “fixed drive” doesn’t necessarily mean its an internal drive. External drives these days are growing more common. Any descendants of TARGETDIR are also located relative to ROOTDRIVE then.
So if you have an external drive with a lot of free space connected when you installed Visual Studio, a number of components may have gotten installed there. Even if you do not see any files mysteriously appear after installing Visual Studio, components may still have gotten registered to the other drive. This can actually happen for any Windows Installer products.
If I understand this correctly, this means that if I'm on a system where the system drive is C:, the Program Files directory is on C:, and I tell the installer that I want Visual Studio installed on C:, the default behavior for the install system unless overridden by the install script is to install onto the external backup drive H: that I just happened to have plugged in. This might explain why I keep finding weird installation folders like Office10 in places such as my data drive and video capture drive that I explicitly do not want programs installed into!
Could someone at Microsoft pleeeease write a better install system....
(Read more....)§ ¶Flat buttons vs. 3D look
The current trend in user interfaces is to eliminate 3D borders and make everything look flat, especially buttons.
Why??
When I first started using UIs that had the 3D look, I loved it because it gave visual cues as to what could be interacted with and what was a container. Raised objects were things you could drag or twiddle. Impressed areas grouped things or could accept things. Flat objects were static indicators. Sure, there were people who overdid it and made their UI look like an obstacle course, but for the well designed UIs I thought it made a huge difference.
The trend now is to eliminate all of the borders and leave just the text or icon. The main argument is that it's visually cleaner, but to me, it makes it really hard to tell what you can and can't interact with. If you're lucky, the person who designed it at least used controls that show borders when you mouse over them. Otherwise, you have to click all over the window like an idiot, trying to figure out which of the icons do something and whether they do something different than the nearby widgets and text. It makes me feel like I'm playing an old LucasArts adventure, mousing everywhere trying to find some object in the dark cave that I can light to get to the two-headed squirrel.
(Read more....)