Fp64 software emulation

12/24/2022

The list of my commits are available at FunctionĬompiler OK but trouble with the algorithm

The functions I have to implement are the following: I should therefore translate one of this library (generally write in C) to pure GLSL 1.30. There are many library of software double precision floating point for devices that lack floating-point hardware. The goal of this project is to implement a library of double precision operations in pure GLSL 1.30 for this GPU using bit twiddling operations and integer math. GPUs natively support single precision, but only OpenGL 4.0 class GPUs have hardware support for double precision. Mesa ties into several other open-source projects: the Direct Rendering Infrastructure and X.org to provide OpenGL support to users of X on Linux, FreeBSD and other operating systems. Mesa is an open-source implementation of the OpenGL specification - a system for rendering interactive 3D graphics.Ī variety of device drivers allows Mesa to be used in many different environments ranging from software emulation to complete hardware acceleration for modern GPUs. The organisation:ĭuring this GSoC, I worked on the Mesa project. You can find me on GitHub or on LinkedIn. Me:įor the people who don’t know me, I’m Elie Tournier, and I finish in June my study in IT engineering at Télécom Physique in Strasbourg, France. The list of my commits done during the GSoC are available at. The purpose of this post is to present the project progress. I do not have data for the double-float addition, but its error bound should be similar.It’s the last week of the Google Summer of Code but not of my contribution to Mesa. For the double-float multiplication above, using 2 billion random test cases (with all source operands and results within the bounds stated above), I observed an upper bound of 1.42e-14 for the relative error. When denormal support is turned on, the 49 bits of precision are guaranteed for 2 -101 = sm_20, which means all architectures supported by the currently shipping version, CUDA 7.0.Īs opposed to operations on IEEE-754 double data, double-float operations are not correctly rounded. However double-float cannot maintain this precision for operands small in magnitude, as the tail portion can become a denormal or zero. In the comments above, Robert Crovella also pointed to a GTC 2015 talk by Scott LeGrand, which I haven't had time to check out yet.Īs for accuracy, double-float has a representational precision of 49 (24+24+1) bits, compared with IEEE-755 double which provides 53 bits. I gave a brief overview of easily available literature on such methods in a recent posting in the NVIDIA developer forums. Instead one can use float computation, augmented by error compensating techniques, one of the oldest of which is the Kahan summation. Note that in various applications, full double-float arithmetic may not be necessary. x = tailĭblfloat mul_dblfloat (dblfloat x, dblfloat y) You can easily measure this yourself, in the context relevant to you, i.e. A reasonably conservative estimate may therefore be that double-float arithmetic performs at 1/20 the throughput of native float arithmetic. However, the instruction sequences for double-float operations also require temporary variables, which increases register pressure and can decrease occupancy. The addition requires around 20 float instructions. NVIDIA's double-double code supports the operations addition, subtraction, division, square root, and reciprocal square root.Īs you can see, the multiplication below requires 8 float instructions unary negation is absorbed into FMA. If you are a registered CUDA developer you can download double-double code from NVIDIA's developer website (log in at ) which is under BSD license, and rework it relatively quickly into double-float code. From previous analysis I believe the addition code given in the paper is correct, and that it avoids common pitfalls in faster but less accurate implementations (which lose accuracy when the magnitude of the operands is within a factor of two). For double-float addition code, I would point you to a paper by Andrew Thall as I do not have the time to code this up right now. I am showing a double-float multiplication below that takes full advantage of FMA (fused multiply-add) support on the GPU. You would want to inspect binary code with cuobjdump -dump-sass to get an accurate count. You can get a rough estimate of the performance by counting the number of float operations required to implement each double-float operation.

0 Comments

Fp64 software emulation

Leave a Reply.

Author

Archives

Categories