GENERAL TECHNICAL ASSESSMENT OF THE PROPOSED BTeV PIXEL-BASED TRIGGER SYSTEM. * * * * * * * * June 12, 2000 R. Tschirhart, P. Wilson Fermilab. We have been charged by the Directorate to make a general assessment regarding the technical feasibility of the pixel-based detached vertex trigger scheme presented in the BTeV proposal. This assessment is intended as support material for Fermilab PAC deliberations of the BTeV proposal. Materials for this assessment included the BTeV proposal text, as well as a total of about seven hours of meetings with BTeV proponents where approximately twenty technical discussion points were addressed in detail. In addition we have had some discussion with Beams Division staff regarding the projected beam conditions at the C0 interaction region. We appreciate the patience of BTeV proponents and the forthcoming nature of our discussions. Additional information relevant to this assessment was collected during the cost review of the BTeV WBS. The BTeV pixel-based detached vertex trigger concept is an ambitious and powerful scheme that proposes to simultaneously exploit the pattern recognition potential of pixel detectors and the ever increasing muscle of commercially available processors, memory, firmware and software. It is clear that success requires high performance on all these fronts, where "success" means the following: 1) Reject 99.97% of the non-B interactions in real-time at a mean luminosity of 2*10**32/cm**2-sec 2) Maintain a trigger acceptance of 50% or more over a broad range of B-decays at a mean luminosity of 2*10**32/cm**2-sec. (Trigger acceptance here is normalized to the offline B-reconstruction efficiency.) 3) An online trigger system that is highly "self-aware", that is, where the integrity and content of the data is continuously monitored. BTeV proposes to achieve the above performance with a 3-level trigger system where the Level-1 trigger operates at the 108-bunch crossing rate (132ns spacing) and provides the bulk (~x100) of the rejection. The technical risk of the Level-1 trigger will be assessed below. The Level-2 and Level-3 triggers are implemented in commercial processor farms where essentially the offline reconstruction code is running as a filter on the data. Level-2 is distinguished from Level-3 by how much of the event data is used to process the event. Level-2 is proposed to process the event based on a refined analysis of the pixel data and some additional tracking data that is dynamically acquired event by event. Level-3 uses data from all detector elements to process the event. At this point it is not clear as to how much of the event data is used by Level-2 for a trigger decision. If more than about 1/3 of the event data must be available to Level-2 on average, then it might make more sense to remove Level-2 (x,y)from the architecture and enlarge the size of the Level-3 system where the full event is processed. Such a move has the significant advantage of reducing the complexity of the overall system. The BTeV proponents are weighing these Level-2/Level-3 architectural issues, and we urge them to assign a substantial weight to reducing the complexity of the system at the cost of increasing the size of the Level-3 farm. Although the Level-2/Level-3 filters present a significant challenge, the recent experience with large scale farms suggest that a Level-2/Level-3 system the size that BTeV proposes can probably be achieved with about the resources described in the proposal and WBS documents. BTeV has assigned a relatively large cost contingency (100%) for processors that make up the Level-3 system, which results from the current uncertainty of how much processing is required at Level-3 to reduce the final stored data volume to a manageable level. The understanding of the Level-3 processing needs will no doubt improve as the trigger/filter algorithms are better understood and the date of processor purchase gets closer. The success of the Level-2/Level-3 and ultimately that of the experiment hinges on the performance of the pixel-based Level-1 trigger. It's clear that in principle a pixel detector of the type that BTeV is proposing can dramatically ease the pattern recognition burden online and offline. The experience with pixel prototypes in the BTeV test-beam has been quite positive, and the studies of the pixel resolution function presented in the proposal are particularly impressive. As to whether these promising results can eventually evolve into the high Level-1 trigger performance required was the primary focus of our discussions with BTeV proponents. A summary of the key issues, which are not restricted to Level-1 concerns, follows. As a mechanism to order our assessment of technical risk, we will summarize the issues below with a risk index of 1-10, where 10 is normalized to what we regard as the greatest technical risk (grading on a curve). 1) Physics inputs and accelerator performance inputs. The simulation of collision activity in the pixel detector and surrounding detector system is based on a PYTHIA generator followed by full GEANT3(FLUKA) treatment. In response to our concerns as to how well the particle fluence is understood in the forward direction, the BTeV proponents produced published CDF and UA5 data that is reasonably well described by PYTHIA in the relevant forward regions. * *Technical Risk: 3. PYTHIA simulations predict that about 1.6% of the forward cross section will contain charm. At first blush this might appear to be a mechanism to swamp a tracking trigger that requires x100 rejection at the outset. In fact, given that the central B cross section exceeds expectations by about a factor of x2, this "charm pollution" could be considerably higher. The efficiency for charm is quite a bit less however, and if necessary an effective mass cut can be imposed to ease the burden on the higher levels of the trigger. There appears to be adequate bandwidth headroom at Level-1 to deal with even large variations in the charm fraction. * * Technical Risk: 1. Regarding the required accelerator performance, we considered the impact of bunch-to-bunch intensity variations, effect of the attendant bunch RF pulse, and the stability and size of of the interaction volume. Of these concerns the effect of the RF pulse on the analog performance of the pixel device and the beam stability were most significant. Regarding the effect of the bunch RF pulse, the BTeV proponents have done some nice work with test structures described in the proposal that demonstrates that the digital circuitry of the pixel readout will likely be able to tolerate the RF pulse. Similar work remains to be done on the very sensitive analog components of the readout. BTeV proponents presented plausibility arguments as to why this will not be a problem, as well as referring to the successful noise performance of the HERA-B silicon system. These arguments and experiences notwithstanding, we regard this point as a risk concern. * * Technical Risk: 2. Regarding the stability and size of the interaction volume, it became clear during our discussions that the current online vertexing algorithm requires that the transverse (x,y) size of the beam volume must remain well within a 250 micron radius. This requirement stems from the first stage of the primary vertex reconstruction that only considers tracks that pass within 250 microns of the beam transverse centroid, assumed at (0,0). BTeV chose this parameter based on assurances from the Beams Division that achieving a spot size of 50 microns (RMS) with +/-100 micron control of the centroid would be straightforward. This algorithm may in fact evolve in the future to be less sensitive to the beam size and centroid. This requirement on the beam centroid also implies a requirement on the angle of beam relative to Pixels at level of 250-micron/1-m or 250 micro-radians. The BTeV beam position requirements are less stringent than those for the CDF Silicon Vertex Trigger. In the CDF Trigger, the primary vertex is not reconstructed and the trigger cuts on the impact parameter relative to an assumed beam position. Based on CDF measurements from Run-I, the beam position is stable within a store to about 50 microns in the transverse directions and 150 micro-radians in angle. Over a period of months the variations observed were much larger: several millimeters in the transverse direction and 500 micro radians. To achieve the required stability of the beam in Run II, CDF will be installing an active feedback system to the Beams IR control. Although the requirements for BTeV are less stringent, the prime trigger for BTeV is the impact parameter at L1 so the need to achieve this stability is a high priority. * * Technical Risk: 2, ASSUMING the Beams Division will be able to control the the beam centroid and volume to the required level. 2) Level-1 trigger architecture and algorithms. The Level 1 trigger algorithm is subdivided into a 5 step sequence: finding clusters of Pixel hits, matching pixel clusters in sets of 3 stations to track segment patterns, linking segments into tracks in a track fitter, finding primary vertices and finally identifying high impact parameter tracks for triggering. The first two stages of the algorithm are to be executed in a parallel fashion with custom logic using Field Programmable Gate Arrays (FPGAs). The latter stages are to be executed with custom electronics using a large number of Digital Signal Processors (DSPs). Since the raw data entering the first stages of L1 are not time ordered, a fast switch is required to sort the data between the segment finding and track fitting stages. Clustering: The clustering algorithm is simple and should be easily implemented in an FPGA as planned. Contiguous readout pixels are clustered together and a centroid is assigned. In discussions with the simulation group and the hardware engineers, it became apparent that the algorithm simulated to date is not exactly the one that can be practically implemented in hardware. The simulation does not take account of any physical boundaries between columns in readout chips or between readout chips. The effect of this will be to find extra clusters in the real hardware. The number of extra clusters will be different in the x and y planes because of the effect of the dipole field but is not expected to be more than about 10% in either case. The simulation should be updated to include the real boundaries to ensure there are no unforeseen problems. The clustering discussion led to a discussion on whether to cluster across column boundaries within the FPIX2 chip. This would result in a significant increase in the complexity of the system since the data into a given cluster would then no longer be time ordered. * *Technical Risk (Clustering Algorithm): 2, assuming no attempt is made to cluster across column boundaries, Segment Finding: The bulk of the recent development work on the trigger algorithms has been on the segment finding and track fitting. The segment finder provides a good reduction in data rate for input to the track linking/fitting stage while retaining high efficiency for real tracks. The number of found triplet segments is about a factor of x5 lower than the number of input pixel clusters. The algorithm is explicitly designed to be insensitive to the beam position. Large FPGAs that are available today are well suited to the segment finding algorithm. The large I/O can be handled with high pin density packages. Simple calculations doing linear projections can be carried out quickly. There are also sufficient resources for buffering data on chip to guard against rate fluctuations. Timing of the segment finding operation is driven by the number of pixel clusters that must be processed per quadrant. An estimate of this timing has been made assuming an average of 2 interactions per crossing and cluster occupancies from the GEANT simulation. This estimate gives an average of one event processed every 70ns which is within the absolute requirement of 132ns per event. Although the margin appears generous, the consequence of exceeding 132ns are very dear. If the average time goes to 140ns/event, due to higher than expected occupancies for example, the front-end will very quickly back up the Level 1 input buffers. The trigger is presented as having sufficient buffering to handle the crossing rate as input with no throttle on the input stream for the possibility of saturation. Such a throttle is important insurance that would allow the experiment to operate (at reduced efficiency) while working on fixes for a rate problem. * * Technical Risk (Segment Finding algorithm): 5 Track Fitting and Vertexing: The Track Fitting and Vertexing are to be performed in two farms of DSPs. Algorithms sufficient for these tasks have been developed and extensively simulated. The primary hurdle to demonstrating the feasibility of this scheme is to determine the processing requirements of the algorithms. The algorithms were coded in C and compiled against a currently available Texas Instruments DSP. The number of DSP clock cycles required (which scales the number of DSPs required) for the compiled C code was considered to be much too large. As a solution, the assembly language output of the C compiler was optimized by hand. The optimized code was estimated to require about a factor of x50 fewer instructions/event than the non-optimized code. The size of the proposed DSP processor plant includes this factor of x50 presumed speed improvement. There are several concerns about this code optimization. During discussions it became apparent that after optimization the assembly language code has not been run on a DSP or DSP simulator to demonstrate that it works. Further, the person who carried out this specialized optimization is no longer with the BTeV project. We note that this custom assembly language code would have to be maintained throughout the life of the experiment. In fact, it is likely that significant revision of the code would be required in the face of real data from the experiment. A final concern is the small margins built in to estimates for the DSP requirements. A relatively small margin of about 5% is included for overhead for "operating system". This seems like a very optimistic overhead assumption. There is no margin factor (above some generous assumptions on individual sections of code) included in determining the number of processors needed to carryout the calculations in 132ns. The need for an input throttle described above applies to the DSP processing as well. * * Technical Risk (Track Fitting and Vertex Finding Algorithms): 9 3) Simulation studies of the Level-1 trigger. The design of algorithms for the Level-1 trigger has been driven by extensive physics simulations using BTeV GEANT package. These physics simulations are a proof-of-principal for the detached vertex trigger algorithms. The simulations are well developed and where possible have been bench marked against real data (e.g. pixel hit rates from test beam). The BTeV proposal provides limited information on underlying performance measures from the simulation. The BTeV proponents provided plots demonstrating: track momentum resolution, track impact parameter resolution, and primary vertex resolution (in x, y and z coordinates). Further, they provided tables indicating event composition of the Level 1 and Level 2 trigger outputs. These measures are consistent with the 100 to 1 rejection of the Level 1 trigger. After the Level 1 trigger the event composition is approximately 4% b-quark, 10% c-quark, 40% contain strange particles and the remaining 45% consist of tracks with "false impact parameter" resulting from various types of detector or reconstruction effects. An analysis of how daughter tracks from strange particles are rejected (essentially 100% of crossings contain strange daughters) was also provided. Two requirements provide much of the rejection for strange daughters at Level 1: Pt>0.5 GeV/c rejects ~5/6 tracks and ~6/7 are rejected by the upper limit on impact parameter (2mm). The majority of the events passing due to "false impact parameter" are the result of conversions, re-interactions, multiple scattering and delta rays or more globally interaction with detector material. This emphasizes the need to keep a close handle on the material budget of the pixel detector. Concerns with possible confusion on primary vertex finding at high luminosity were addressed. The vertex position resolutions were consistent with vertex separation distributions shown in the proposal. The study of number of vertices found as a function of average number of interactions per crossing (ranging from 1 to 6) shows an average of 0.6 vertices found/interaction. The efficiency is this low because these interactions include elastic and diffractive events which are much less likely to generate a reconstructible primary. No increase (or decrease) is seen in the slope indicating no significant problem with extra vertices found. * *Technical Risk (Physics Simulations): 6 Another type of simulation which is no less critical to the success of the trigger has seen less attention. These are queuing simulations and timing studies to ensure that the processing and data pipelines are capable of handling the onslaught of data. Studies have been performed on a previous version of the clustering and segment finding algorithm. The results were described as showing the Level-1 timing was: "...workable but not comfortable". Given the extensive changes in the the Level-1 and Level-2 algorithms since this queuing study was done, it is critical that timing studies be repeated and expanded. * *Technical Risk (Queuing simulations): 7 4) System Robustness and Integrity. BTeV proponents have studied how the rejection of min-bias events and acceptance of B-decay events is affected by the addition of random noise hits (up to x40 of expected) that are distributed flat and peaked at the beam. The rejection and acceptance show a remarkable robustness against these noise hits, which no doubt traces back to the pattern recognition power of a pixel system. In addition, BTeV proponents have studied how the rejection and acceptance varies as a function of mean luminosity, and have found that the Acceptance*Rejection product degrades by about (40+/-10)% (~33% rejection loss, ~7% acceptance loss) at twice the nominal mean of two interactions/crossing. Although this loss is somewhat uncomfortable, these studies indicate that the trigger algorithm is not sitting near a cliff of instability. * *Technical Risk (Algorithm stability to noise/luminosity variations): 3. Regarding the robustness and integrity of the hardware and software design of the trigger system, these issues and concerns have only begun to be addressed at a conceptual level by BTeV proponents. It is clear that BTeV proponents have focused their resources on developing pixel detector prototypes and generating a plausibility argument that a workable trigger algorithm is possible. Given the very complex nature of this system where thousands of events are simultaneously and asynchronously cooking, issues of data integrity, robustness, and monitoring are critically important and have the capacity to cripple a design if not dealt with at the outset. It is simply a fact of life that processors and processes die and get corrupted, sometimes in subtle ways. BTeV has allocated some resources for control and monitoring, but our assessment is that the current allocation of resources will be insufficient to supply the necessary level of "self-awareness" in the trigger system. In particular, the level of physicist involvement in trigger design and online monitoring is far too low, and needs to be increased to realize a physics ready system. * *Technical Risk (Error Detection/Correction, Monitoring): 10. Summary of Assessment and Manpower Considerations. ---------------------------------------------------- It is clear that BTeV proponents have done some excellent work towards demonstrating that a pixel-based vertex trigger is plausible at Tevatron luminosities. The pattern recognition power of a pixel-based detector is quite impressive, which together with BTeV's reasonable estimates of particle fluence and rates suggest that fast and robust trigger algorithms should be achievable. However, the previously listed caveats must be addressed before this is convincingly demonstrated. For the project to succeed it is equally clear that there is an enormous amount of work remaining to realize an engineered system. One could argue that expecting fully engineered solutions is not realistic at the proposal stage, but it is important not to lose sight of the large amount of development work summarized above that will be necessary for eventual success. The most critical element to eventual success is not addressed above, and that is the issue of physicist and engineering manpower. This effort MUST GROW on both fronts, and likely beyond the bounds of required manpower outlined in the WBS documentation. In particular, the current level of physicist involvement in design and close interaction with the engineering effort through all phases of the project must grow substantially. This is not to make light of the very impressive physicist and engineering trigger efforts to date, which most recently have been dominated by Fermilab staff. A system of this complexity and innovation however would clearly benefit from other strong physicist & engineering group(s) from outside Fermilab playing central roles as well. Such group(s) would serve as a critical internal counterbalance during the design, construction, commissioning, and operational phase. Without an increased pool of design skills and experience to draw from and thermalize with, the project will remain at risk. The exciting challenge of designing and building a real life pixel-based trigger system certainly has the potential to attract additional strong groups. Please don't hesitate to contact us if you have any questions or comments regarding this assessment. Best Regards, R. Tschirhart, P. Wilson.