------------------------------------------------------------------------------
NVIDIA CUDA Profiler Tools Interface (CUPTI)
Release Notes
CUDA Toolkit 4.2
------------------------------------------------------------------------------

FILES IN THE RELEASE:
--------------------
* <cupti_dir>/include  : Contains CUPTI header files

* <cupti_dir>/lib*     : Contains CUPTI library

* <cupti_dir>/sample   : Contains samples showing use of the CUPTI APIs

* <cupti_dir>/doc      : Contains the CUPTI release notes and User's Guide.


SUPPORTED DISTRIBUTIONS
-----------------------
CUPTI is supported on all platforms for which CUDA Toolkit is supported.


SYSTEM REQUIREMENTS
-------------------
. CUDA-enabled GPU
  See http://www.nvidia.com/object/cuda_learn_products.html

. NVIDIA Display Driver

. NVIDIA CUDA Toolkit


INSTALLATION AND SETUP
---------------------
1) Install the NVIDIA display driver

2) Install the NVIDIA CUDA Toolkit

This will install CUPTI into <cuda_dir>/extras/CUPTI (<cuda_dir>
is specified during Toolkit install).


COMPILING AND RUNNING CUPTI SAMPLES
----------------------------------- 
On Windows, the compiling and running CUPTI samples using the included
Makefiles requires the Cygwin environment.

To compile:
 > cd <cupti_dir>/sample/<sample>
 > make

To run the sample:
 > make run


INCOMPATIBLE CHANGES FROM CUPTI 4.0
-----------------------------------
A number of non-backward compatible API changes have been made since
the 4.0 release (there are no non-backward compatible API changes from
4.1 to 4.2). These changes require minor source modifications to
existing code compiled against CUPTI 4.0. In addition, some previously
incorrect and undefined behavior is now prevented by improved error
checking. Your code may need to be modified to handle these new error
cases.

- Multiple CUPTI subscribers are not allowed. In 4.0, cuptiSubscribe()
  could be used to enable multiple subscriber callback functions to be
  active at the same time. When multiple callback functions were
  subscribed, invocation of those callbacks did not respect the domain
  registration for those callback functions. Starting in 4.1,
  cuptiSubscribe() returns CUPTI_ERROR_MAX_LIMIT_REACHED if there is
  already an active subscriber.

- The CUpti_EventID values for tesla devices were changed in 4.1 to
  make all CUpti_EventID values unique across all devices. Going
  forward CUpti_EventID values will be added for new devices and
  events, but existing values will not be changed. If your application
  has stored CUpti_EventID values collected using CUPTI version 4.0
  (for example, as part of the data collected for a profiling
  session), those CUpti_EventIDs must be translated to the new ID
  values before being used in 4.2 APIs.

- Starting in 4.1, in enumeration CUpti_EventDomainAttribute,
  CUPTI_EVENT_DOMAIN_MAX_EVENTS has been removed. The number of events
  in an event domain can be retrieved with
  cuptiEventDomainGetNumEvents().

- Starting in 4.1, cuptiDeviceGetAttribute(),
  cuptiEventGroupGetAttribute() and cuptiEventGroupSetAttribute() now
  take a size parameter and the 'value' parameter now has type 'void
  *'.

- Starting in 4.1, cuptiEventDomainGetAttribute() no longer takes a
  CUdevice parameter. This function is now used to get event domain
  attributes that are device independent. A new function
  cuptiDeviceGetEventDomainAttribute() is added to get event domain
  attributes that are device dependent.

- Starting in 4.1, cuptiEventDomainGetNumEvents(),
  cuptiEventDomainEnumEvents() and cuptiEventGetAttribute() no longer
  take a CUdevice parameter.

- Starting in 4.1, the contextUid field of the CUpti_CallbackData
  structure has been changed from type uint64_t to type uint32_t.


KNOWN ISSUES
------------

- CUPTI activity record collection must be initialized before any CUDA
  function is invoked. If not, activity collection may be incomplete
  or entirely disabled. Make sure that some CUPTI activity API (such
  as cuptiActivityEnable()) is called before the first CUDA driver or
  runtime function.

- Profiling an application with CUPTI can introduce host thread
  blocking that does not occur when the application is run
  normally. When CUPTI is active, a host thread the executes a
  synchronization function (cudaDeviceSynchronize(),
  cudaStreamSynchronize(), etc.) will block all other host threads
  from executing any CUDA driver or runtime function, until the thread
  returns from the synchronization function. This host thread blocking
  will be fixed in a future CUPTI release.
