QBoard » Big Data » Big Data on Cloud » Working with very big data faster in Matlab?

Working with very big data faster in Matlab?

  • I have to deal with very big data (Point clouds generally more than 30 000 000 points) using Matlab. I can read ascii data using textscan function. After reading, I need to detect invalid data (points with 0,0,0 coordinates) and then I need to do some mathematical operations on each point or each line in the data. In my way, first I read data with textscan and then I assign this data to a matrix. Secondly, I use for loops for detecting invalid points and doing some mathematical operations on each point or line in the data. A sample of my code is shown as below. According to profile tool of Matlab textscan takes 37% and line

    transformed_list((i:i),(1:4)) = coordinate_list((i:i),(1:4))*t_matrix; 
    ​

    takes 35% of all computation time.

    I tried it with another point cloud (stores around 5 500 000) and profile tool reported same results. Is there a way of avoiding for loops, or is there another way of speeding up this computation?

    fileID = fopen('C:\Users\Mustafa\Desktop\ptx_all_data\dede5.ptx');             
    original_data = textscan(fileID,'%f %f %f %f %f %f %f', 'delimiter',' ');
    fclose(fileID);
    column = original_data{1}(1);
    row = original_data{1}(2);
    t_matrix = [original_data{1}(7) original_data{2}(7) original_data{3}(7) original_data{4}(7)
        original_data{1}(8) original_data{2}(8) original_data{3}(8) original_data{4}(8)
        original_data{1}(9) original_data{2}(9) original_data{3}(9) original_data{4}(9)
        original_data{1}(10) original_data{2}(10) original_data{3}(10) original_data{4}(10)];
    coordinate_list(:,1) = original_data{1}(11:length(original_data{1}));
    coordinate_list(:,2) = original_data{2}(11:length(original_data{2}));
    coordinate_list(:,3) = original_data{3}(11:length(original_data{3}));
    coordinate_list(:,4) = 0;
    coordinate_list(:,5) = original_data{4}(11:length(original_data{4}));
    
    transformed_list = zeros(length(coordinate_list),5);
    for i = 1:length(coordinate_list)
        if coordinate_list(i,1) == 0 && coordinate_list(i,2) == 0 && coordinate_list(i,3) == 0
            transformed_list(i,:) = NaN;
        else
            %transformed_list(i,:) = coordinate_list(i,:)*t_matrix;
            transformed_list((i:i),(1:4)) = coordinate_list((i:i),(1:4))*t_matrix;
            transformed_list(i,5) = coordinate_list(i,5);
        end
        %i
    end​

    Thanks in advance

     
      September 2, 2021 1:59 PM IST
    0
  • To find whether (x,y,z) are zeros, you need not to run a loop. You can find in single stretch.
     
    id = sum(coordinate_list,2)==0 ; % this output will be logical
    idx = find(sum(coordinate_list,2)==0) ; % this output will give positions where are zeros
    You can achieve all the loop things with out using for loop.
      September 4, 2021 9:05 PM IST
    0
  • Use

    textscan( ..., 'CollectOutput',true )
    ​


    Neither of your two samples matches
     

    textscan(fileID,'%f %f %f %f %f %f %f', 'delimiter',' ');
    ​
      September 7, 2021 1:38 PM IST
    0
  • for loops with conditional statements like those will take ages to run. But what Matlab lacks in loop speed it makes up with vectorization and indexing.

    Let's try some logical indexing like this to solve the first step:

    coordinate_list(coordinate_list(:,1) == 0 .* ...
                    coordinate_list(:,2) == 0 .* ...
                    coordinate_list(:,3) == 0)=nan;​


    And then vectorize the second statement:

    transformed_list(:,(1:4)) = coordinate_list(:,(1:4))*t_matrix;
    


    As EBH mentioned above this might be a bit heavy on your RAM. If it's more than your computer can handle asks yourself if the coordinates really have to be doubles, maybe single precision will do. If that still doesn't do, try slicing the vector and performing the operation in parts.

    Small example to give you an idea because I had a 2million element point cloud around here:

    In R2015a

    transformed_list = zeros(length(coordinate_list),5);
    
    tic
    for i = 1:length(coordinate_list)
        if coordinate_list(i,1) == 0 && coordinate_list(i,2) == 0 && coordinate_list(i,3) == 0
            transformed_list(i,:) = NaN;
        else
            %transformed_list(i,:) = coordinate_list(i,:)*t_matrix;
            transformed_list((i:i),(1:3)) = coordinate_list((i:i),(1:3))*t_matrix;
            transformed_list(i,5) = 1;
        end
        %i
    end
    toc


    Returns Elapsed time is 10.928142 seconds.

    transformed_list=coordinate_list;
    tic 
    
    coordinate_list(coordinate_list(:,1) == 0 .* ...
                    coordinate_list(:,2) == 0 .* ...
                    coordinate_list(:,3) == 0)=nan;
    
    transformed_list(:,(1:3)) = coordinate_list(:,(1:3))*t_matrix;
    
    
    toc


    Returns Elapsed time is 0.101696 seconds.


      September 9, 2021 12:48 PM IST
    0
  • Rather than read the whole file, you'd be better off using a loop with fscanf(fileID, '%f', 7) and processing input as you read it.
      October 20, 2021 1:04 PM IST
    0