I do not think it is the problem of ARMV8 library. Because the different between them is the compiler, they are using the same SDK source code.
Maybe you can use our demo to make a small test first.
I will continue to investigate this small issue.
I continue my work with Jetson Nano. For information, Cuda programming with Python is ok. In fact, i have to use PyCuda library to do this. It's a kind of Python programming but in fact, it allows you to insert C code in Python program which is compiled with nvcc compiler.
It works well.
The algorithm was applied with Jetson Nano (it can perform a real time filtering with a 1280*720 25 fps). It is just a test but i think it is interesting.
You can see it here :
just for information about Nvidia Jetson Nano :
yesterday, i have compiled CUDA examples (from Cuda toolkit) to see how Maxwell GPU perform.
I tried some 3D and image filtering demos. The result is just amazing. It is far far far more powerful than classic opencv routines.
Of course, Cuda is not easy to manage but the performances are really impressive.
With that kind of SBC, you can manage very complex programs with very good performances.
You should try Jetson Nano Cuda examples.
If one day you plan to get a really powerful ASIAIR, SBC like Jetson Nano can bring you the power you need.