QBoard » Advanced Visualizations » Viz - Tableau » tableau extract vs live

tableau extract vs live

  • I just need a bit more clarity around tableau extract VS live. I have 40 people who will use tableau and a bunch of custom SQL scripts. If we go down the extract path will the custom SQL queries only run once and all instances of tableau will use a single result set or will each instance of tableau run the custom SQL separately and only cache those results locally?

     
      September 9, 2021 1:02 PM IST
    0
  • There are some aspects of your configuration that aren't completely clear from your question. Tableau extracts are a useful tool - they essentially are temporary, but persistent, cache of query results. They act similar to a materialized view in many respects.
    You will usually want to employ your extract in a central location, often on Tableau Server, so that it is shared by many users. That's typical. With some work, you can make each individual Tableau Desktop user have a copy of the extract (say by distributing packaged workbooks). That makes sense in some environments, say with remote disconnected users, but is not the norm. That use case is similar to sending out data marts to analysts each month with information drawn from a central warehouse.
    So the answer to your question is that Tableau provides features that you can can employ as you choose to best serve your particular use case -- either replicated or shared extracts. The trick is then just to learn how extracts work and employ them as desired.
    The easiest way to have a shared extract, is to publish it to Tableau Server, either embedded in a workbook or separately as a data source (which is then referenced by workbooks). The easiest way to replicate extracts is to export your workbook as a packaged workbook, after first making an extract.
    A Tableau data source is the meta data that references an original source, e.g. CSV, database, etc. A Tableau data source can optionally include an extract that shadows the original source. You can refresh or append to the extract to see new data. If published to Tableau Server, you can have the refreshes happen on schedule.
    Storing the extract centrally on Tableau Server is beneficial, especially for data that changes relatively infrequently. You can capture the query results, offload work from the database, reduce network traffic and speed your visualizations.
    You can further improve performance by filtering (and even aggregating) extracts to have only the data needed to display your viz. Very useful for large data sources like web server logs to do the aggregation once at extract creation time. Extracts can also just capture the results of long running SQL queries instead of repeating them at visualization time.
    If you do make aggregated extracts, just be careful that any further aggregation you do in the visualization makes sense. SUMS of SUMS and MINS of MINs are well defined. Averages of Averages etc are not always meaningful.
      September 9, 2021 9:44 PM IST
    0
  • The extract is used when the data need to be processed very fast. In this case, the copy of the source of data is stored in the Tableau memory engine, so the query execution is very fast compared to the live. The only problem with this method is that the data won't automatically update when the source data is updated. The live is used when handling real-time data. Here each query is accessed from the source data, so the performance won't be as good as the extract. If you need to work on a static database use extract else the live.

     
      September 13, 2021 1:31 PM IST
    0
  • I am feeling from your question that you are worrying about performance issues, which is why you are wondering if your users should use tableau extract or use live connection.

    From my opinion for both cases (live vs extract) it all depends on your infrastructure and the size of the table. It makes no sense to make an extract of a huge table that would take hours to download (for example 1 billion rows and 400 columns).

    In the case all your users are directly connected on a database (not a tableau server), you may run on different issues. If the tables they are connecting to, are relatively small and your database processes well multiple users that may be OK. But if your database has to run many resource-intensive queries in parallel, on big tables, on a database that is not optimized for many users to access at the same time and located in a different time zone with high latency, that will be a nightmare for you to find a solution. On the worse case scenario you may have to change your data structure and update your infrastructure to allow 40 users to access the data simultaneously.

      November 20, 2021 12:24 PM IST
    0