QBoard » Big Data » Big Data - Spark » How can I increase big data performance?

How can I increase big data performance?

  • I am new at this concept, and still learning. I have total 10 TB json files in AWS S3, 4 instances(m3.xlarge) in AWS EC2 (1 master, 3 worker). I am currently using spark with python on Apache Zeppelin.

    I am reading files with the following command;

    hcData=sqlContext.read.option("inferSchema","true").json(path)

    In zeppelin interpreter settings:

    master = yarn-client
    spark.driver.memory = 10g
    spark.executor.memory = 10g
    spark.cores.max = 4​



    It takes 1 minute to read 1GB approximately. What can I do more for reading big data more efficiently?

    • Should I do more on coding?
    • Should I increase instances?
    • Should I use another notebook platform?

    Thank you.

      October 13, 2021 2:21 PM IST
    0
  • For performance issue, the best is to know where is the performance bottleneck. Or try to see where the performance problem could be.

    Since 1 minute to read 1GB is pretty slow. I would try the following steps.

    • Try to explicitly specify schema instead of inferschema
    • Try to use Spark 2.0 instead of 1.6
    • Check the connection between S3 and EC2, in case there were some misconfiguration
    • Using different file format like parquet other than json
    • Increase the executor memory and decrease the driver memory
    • Use Scala instead of Python, although in this case is the least likely the issue.
    This post was edited by Advika Banerjee at October 16, 2021 1:20 PM IST
      October 16, 2021 1:19 PM IST
    0
  • Using data analytics can help. By combining a high-performer job task analysis with a quantitative gap analysis, you can collect solid evidence of performance needs, enabling you to prioritize — and defend — your performance improvement efforts and create targeted, learner-centric solutions.

    Imagine that 85% of performers rate a task as very important to the job role. If managers rate the task similarly, there is agreement about its importance. On the other hand, if managers rate it lower, then there is a disconnect. Similarly, if high performers think something is important, but lower performers do not, you now have an important insight into the different mindsets of the two groups.

      January 6, 2022 1:08 PM IST
    0
  • The capability to produce good and intelligent decisions as quickly as possible is vital for any organization to enhance business performance and outperform competitors. In this era of fierce competition, companies cannot simply rely on intuition and experience for decision-making. The effective use of big data allows organizations to derive insights from information in order to make better, more intelligent, real-time and fact-based decision so that they will remain proactive instead of being reactive in their strategy. Big data eliminates intuition-based and promotes data-driven decision making that is important in managing and improving organizational performance. How a company can work faster and stay agile will determine its competitive advantage over its peers.

    Netflix, the world’s leading internet entertainment service has been collecting huge amount of data about the viewing habits of its millions of users in more than 50 countries to make a business decision on what programs it should create and buy that will attract large audiences. Relying on big data and analytics, the firm decided to create products which appeal to the local tastes in each of countries it operates in, and it worked. On average, the success rate for Netflix’s original shows is 80% as compared to 30%-40% success rates of traditional TV shows.
      December 11, 2021 3:18 PM IST
    0
  • 1. Insight into the Customer-Facing Operations.

    Many CRM projects are launched to provide transparency, increase the effectiveness, and drive down the operational costs of sales, service and marketing. Advanced analytics will provide proof of the ROI and effectiveness of the sales, service and marketing operations. For example, understanding the sales and marketing costs involved in acquiring new clients or the service costs involved in retaining customers.

    2. Predictive Sales Forecasting.

    Accurate forecasting is both a science and an art. There are a lot of patterns, relationships and personal subjectivity that has to be taken into account to get an accurate forecast. Predictive algorithms convert the personal subjectivity of sales reps, account for any seasonality or other factors that have an impact and produce a completely objective, fact-based forecast.

    3. Decision Support.

    A comprehensive analysis on all deals can reveal tactics or unique combinations of activities that are proven to work. Pushing for a demo before trial or vice versa, sending customer success stories early on and involving an executive during negotiations can make a win/loss difference. This analysis can be automated and the advice for sales reps can be delivered in real time and contextualized in the CRM to provide effective decision support.

    4. Better Customer Understanding.

    Your clients and prospects are sharing a ton of valuable information about your products and services, pricing and licensing, competitors, etc. However, the information is often spread around across systems and different formats. Email and support communication, RFPs, survey responses, interviews and meeting notes and so on. There’s a lot of gold there but it’s really hard to extract at scale. Having the ability to gather and analyze this vital information can tell you all you need to know about what your company should do next to keep growing. Most businesses have lots of data about their operations, what their clients think and say about products, services, pricing, and even their competitors. That wasn’t the case some 10-15 years ago when the struggle was to implement systems to capture this data. The next revolution will not be about growing systems of record but about making sense of all the data to support decisions, measure operational effectiveness and navigate the business in the right direction. The businesses that do not adapt to this changing environment will struggle and eventually fail.

    Continue reading at https://www.saleshacker.com/4-ways-big-data-can-improve-analytics-performance-management/ |  href="https://www.saleshacker.com/">Sales Hacker
      January 10, 2022 12:24 PM IST
    0