如何解决TensorRT samples的BufferManager不支持有动态维度的engine的问题
TensorRT samples里common的代码是写得不错的用于简化调用TensorRT engine的套路的封装代码,使用这些封装类可以节省些代码,也使得代码更优雅点,但是里面有点问题,例如,有dynamic batch_size或者height/width维度的模型engine,在调用时会发生崩溃:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
追查了一下这个abort发生的原因是当有dynamic维度,例如batch_size=-1时,崩溃在/usr/src/tensorrt/samples/common/buffers.h里面的代码:
class BufferManager{public: static const size_t kINVALID_SIZE_VALUE = ~size_t(0); //! //! \brief Create a BufferManager for handling buffer interactions with engine. //! BufferManager(std::shared_ptr engine, const int batchSize = 0, const nvinfer1::IExecutionContext* context = nullptr) : mEngine(engine) , mBatchSize(batchSize) { // Full Dims implies no batch size. assert(engine->hasImplicitBatchDimension() || mBatchSize == 0); // Create host and device buffers for (int i = 0; i getNbBindings(); i++) { auto dims = context ? context->getBindingDimensions(i) : mEngine->getBindingDimensions(i); size_t vol = context || !mBatchSize ? 1 : static_cast(mBatchSize); nvinfer1::DataType type = mEngine->getBindingDataType(i); int vecDim = mEngine->getBindingVectorizedDim(i); if (-1 != vecDim) // i.e., 0 != lgScalarsPerVector { int scalarsPerVec = mEngine->getBindingComponentsPerElement(i); dims.d[vecDim] = divUp(dims.d[vecDim], scalarsPerVec); vol *= scalarsPerVec; } vol *= samplesCommon::volume(dims); std::unique_ptr manBuf{new ManagedBuffer()}; manBuf->deviceBuffer = DeviceBuffer(vol, type); manBuf->hostBuffer = HostBuffer(vol, type); mDeviceBindings.emplace_back(manBuf->deviceBuffer.data()); mManagedBuffers.emplace_back(std::move(manBuf)); } }
manBuf->deviceBuffer = DeviceBuffer(vol, type); 这行,而DeviceBuffer继承自GenericBuffer,实际上是GenericBuffer的构造函数里抛出了std::bad_alloc()异常,原因是allocFn()执行失败了:
GenericBuffer(size_t size, nvinfer1::DataType type) : mSize(size) , mCapacity(size) , mType(type) { if (!allocFn(&mBuffer, this->nbBytes())) { throw std::bad_alloc(); } }
而对于GPU Device和Host有各自的allocFn()和freeFn():
class DeviceAllocator{public: bool operator()(void** ptr, size_t size) const { return cudaMalloc(ptr, size) == cudaSuccess; }};class DeviceFree{public: void operator()(void* ptr) const { cudaFree(ptr); }};class HostAllocator{public: bool operator()(void** ptr, size_t size) const { *ptr = malloc(size); return *ptr != nullptr; }};class HostFree{public: void operator()(void* ptr) const { free(ptr); }};
很显然是cudaMalloc(ptr, size)这里分配GPU内存失败了,为何失败呢?查看了一下发现size值是18446744073709100032!原因就是这里了,那为何size值异常大呢?追查了一下发现根本原因就是engine的第一个维度dims.d[0]值为-1,导致BufferManager的构造函数里samplesCommon::volume(dims)计算出来的值为-451584,vol是无符号数,所以
vol *= samplesCommon::volume(dims);这句让vol得到了一个超大的值!
为了防止动态维度-1造成这种错误,可以做针对性修改,把动态维度先强制修改成对应的具体维度值就是了,对于height/weight是动态的也做类似处理:
if (dims.d[0] == -1) dims.d[0] = vol; vol *= samplesCommon::volume(dims);
做了上面的修改,维度计算就正确了,用于存储数据的buffer内存分配就没问题了。
当然,对于推理过程中需要动态改变输入数据的heigh/width维度的话,可能需要修改BufferManger的构造函数以便针对不同的具体height/width维度创建多个BufferManager实例用于对应维度输入数据的推理调用。
创作打卡挑战赛
赢取流量/现金/CSDN周边激励大奖伪原创接口