I am exploring Azure Data Lake and I am new to this field. I explored many things and read many articles. Basically I have to develop Power BI dashboard from data of different sources.
In classic SQL Server stack I can write an ETL (Extract, Transform, Load) process to bring the data from my system databases into the Data Warehouse database. Then use that Data Warehouse with Power BI by using SSAS etc.
But I want to use Azure Data Lake and I explored Azure Data Lake Store and Azure Data Lake Analytic(U-SQL). I draw following architecture diagram.
1 & 2) Currently ADLS only has limited support for allowing PowerBI to query directly over it. If your data is too large (greater than about 10GB I believe), then PowerBI cannot work directly over data in your ADLS account. In this case, I would recommend either moving your processed data in ADLS to a SQL Database or SQL Data Warehouse, as this allows for PowerBI to operate over larger amounts of data. You can use Azure Data Factory to move your data, or Polybase if moving data into SQL DW.
3) A data lake is still distinct from a data warehouse, and they have separate strengths and weaknesses. The data lake is best for storing your raw or slightly processed data, which may have a variety of formats and schemas. After you process and filter this data using Azure Data Lake Analytics, you can move that data into SQL DW for interactive analytics and data management (but at the cost of inflexibility of schema).
4) Depends on your use case. If you plan on continuing to process the data in ADLS, I recommend you output into an ADLS table for greater performance. However, if you need to pass this data into another service, then CSV is a good choice. You can find more outputters on our GitHub such as JSON and XML.