Multicores architectures increase the challenge of writing performant dense linear algebra kernels, that provides performance at level beyond SCALAPACK reach. In this context developing algorithms that seamlessly scales to thousands of cores can be achieved using DPLASMA (Distributed PLASMA). DPLASMA take advantage of a novel generic distributed Direct Acyclic Graph Engine (DAGuE). The engine has been designed for fine granularity tasks and thus it enables scaling of tile algorithms, originating in PLASMA, on large distributed memory systems. The underlying DAGuE framework has many appealing features when considering distributed-memory platforms with heterogeneous multicore nodes : DAG representation that is independent of the problem-size, automatic extraction of the communication from the dependencies, overlapping of communication and computation, task prioritization, and architecture-aware scheduling and management of tasks.