QBoard » Statistical modeling » Stats - Tech » What parts of analytical code to write unit tests for?

What parts of analytical code to write unit tests for?

  • For the last while I've been writing analytical Python code that gets run on demand when users interact with a front end tool throught a queue based batch processing.

    Typically the users set some values in the front end tool that get passed as parameters to the analytical code and they either supply a dataset or choose a subset of data from an overall data source that their company provides.

    Typically each analytical model sits in a larger repo amongst other analytical models so each model would usually sit in it's own module and that module would export one function which is the entrypoint in to that model. The models range from being simple models that take on the order of minutes to very complex stastical or machine learning based models and might use combinations of numpy/Pandas/Numba or Dask dataframes that take on the order of hours.

    Now to my question, I've been going back on forth on where I should be aiming to concentrate my testing efforts for this type of code. Previously earlier on in my career I naively thought that every function should have a unit test so my code would have a comprehensive of set of tests. I quickly realised that this was counter-productive as even a small performance refactor could result in ripping apart and possibly even throwing away a lot of the unit tests. So clearly it felt like I should only be writing tests for the main public function of each model, however, this usually means the opposite happening, for some of the more complex models, edge cases that were quite deep into the control flow were now hard to test.

    My question then is how should I be aiming to properly test these analytical models? Some people would probably say "Only test public facing functions, if you can't test edge cases through the public facing functions then they should technically not be reachable so don't need to be there". But, I've found, in reality this doesn't quite work.

    To provide a simple example, say the particular model is to calculate a frequency matrix for dropoff/pickoff points from a taxi dataset.

    import pandas as pd
    
    
    def _cat(col1, col2):
        cat_col = col1.astype(str).str.cat(col2.astype(str), ', ')
        return cat_col
    
    
    def _make_points_df(taxi_df):
        pickup_points = _cat(taxi_df["pickup_longitude"], taxi_df["pickup_latitude"])
        dropoff_points = _cat(taxi_df["dropoff_longitude"], taxi_df["dropoff_latitude"])
        points_df = pd.DataFrame({"pickup": pickup_points, "dropoff": dropoff_points})
        return points_df
    
    
    def _points_df_to_freq_mat(points_df):
        mat_df = points_df.groupby(['pickup', 'dropoff']).size().unstack(fill_value=0)
        return mat_df
    
    
    def _validate_taxi_df(taxi_df):
        if type(taxi_df) is not pd.DataFrame:
            raise TypeError(f"taxi_df param must be a pandas dataframe, got: {type(taxi_df)}")
        expected_cols = {
            "pickup_longitude",
            "pickup_latitude",
            "dropoff_longitude",
            "dropoff_latitude",
        }
        if set(taxi_df) != expected_cols:
            raise RuntimeError(
                f"Expected the following columns for taxi_df param: {expected_cols}."
                f"Got: {set(taxi_df)}"
            )
    
    
    def calculate_frequency_matrix(taxi_df, long_round=1, lat_round=1):
        """Calculate a dropoff/pickup frequency matrix which tells you the number of times
        passengers have been picked up and dropped from a given discrete point. The
        resolution of these points is controlled by using the long_round and lat_round params
    
        Paramaters
        ----------
        taxi_df : pandas.DataFrame
            Dataframe specifying dropoff and pickup long/lat coordinates
        long_round : int
            Number of decimal places to round the dropoff and pickup longitude values to
        lat_round : int
            Number of decimal places to round the dropoff and pickup latitude values to
    
        Returns
        -------
        pandas.DataFrame
            Dataframe in matrix format of frequency of dropoff/pickup points
    
        Raises
        ------
        TypeError : If taxi_df is not a pandas DataFrame
        RuntimeError : If taxi_df does not contain correct columns
        """
        _validate_taxi_df(taxi_df)
        taxi_df = taxi_df.copy()
        taxi_df["pickup_longitude"] = taxi_df["pickup_longitude"].round(long_round)
        taxi_df["dropoff_longitude"] = taxi_df["dropoff_longitude"].round(long_round)
        taxi_df["pickup_latitude"] = taxi_df["pickup_latitude"].round(lat_round)
        taxi_df["dropoff_latitude"] = taxi_df["dropoff_latitude"].round(lat_round)
    
        points_df = _make_points_df(taxi_df)
        mat_df = _points_df_to_freq_mat(points_df)
        return mat_df

     

    Taking in a dataframe like

           pickup_longitude  pickup_latitude  dropoff_longitude  dropoff_latitude
    0         -73.988129        40.732029         -73.990173         40.756680
    1         -73.964203        40.679993         -73.959808         40.655403
    2         -73.997437        40.737583         -73.986160         40.729523
    3         -73.956070        40.771900         -73.986427         40.730469
    4         -73.970215        40.761475         -73.961510         40.755890
    5         -73.991302        40.749798         -73.980515         40.786549
    6         -73.978310        40.741550         -73.952072         40.717003
    7         -74.012711        40.701527         -73.986481         40.719509


    Say in terms of a folder structure this code would sit at analytics/models/taxi_freq/taxi_freq.py and the analytics/models/taxi_freq/__init__.py file would look like

    from taxi_freq import calculate_frequency_matrix
    


    And obviously the private functions in the above code could be split across multiple utiltiy files in analytics/models/taxi_freq/.

    Would the consensus be to only test the calculate_frequency_matrix function, or should the "private" helper methods and other utility files/functions within the taxi_freq module also be tested?

      August 12, 2021 2:14 PM IST
    0
  • As with software development in general, also with testing you always have to search for solutions that represent the (ideally optimal) tradeoff between competing goals. One of the primary goals of testing in general and also for unit-testing is to find bugs (see Myers, Badgett, Sandler: The Art of Software Testing, or, Beizer: Software Testing Techniques, but also many others).

    In your project you may have a more relaxed position on this, but there are many software projects where it would have serious consequences if implementation level bugs escape to later development phases or even to the field. Some say, your goal should rather be to increase confidence in your code - and this is also true, but confidence can only be a consequence of doing testing right. If you don't test to find bugs, then I will simply not have confidence in your code after you have finished testing.

    When finding bugs is a primary goal of unit-testing, then attempts to keep unit-test suites completely independent of implementation details is likely to result in inefficient test suites - that is, test suites that are not suited to find all bugs that could be found. Different implementations have different potential bugs. If you don't use unit-testing for finding these bugs, then any other test level (integration, subsystem, system) is definitely less suited for finding them systematically.

    For example, think about the different ways to implement a Fibonacci function: as an iterative or recursive function, as a closed form expression (Moivre/Binet), or as a lookup table: The interface is always the same, the possible bugs differ significantly, and so do the unit-testing strategies. There will be a useful set of implementation independent test cases, but these alone will not be sufficient to find all bugs that are likely for the specific implementation.

    The goal to have an effective test suite therefore is in competition with another goal, namely to have a maintenance friendly test suite. This goal, however, comes in different forms with different consequences: You could demand that the unit-test suite shall not be affected when implementation details change. This is quite tough and IMO puts the secondary goal of maintenance friendly test code above the primary goal of finding bugs.

    Meszaros has a more balanced formulation, namely "The effort for changes to the code base shall be commensurate with the effort to maintain the test suite." (see Meszaros: Principles of Test Automation: Ensure Commensurate Effort). That is, little changes to the SUT shall only require little changes to the test suite, for larger changes to the SUT it is acceptable that the test suite requires equally large changes. (However, for me personally the formulation "the effort for test code maintenance shall be low" is sufficient.)

    Conclusion:

    For me, as I see finding bugs as the primary goal and test suite maintainability as a secondary goal, this leads to the following consequence: I accept that I have to test also implementation details to find more bugs. But, despite this fact I nevertheless try to keep the maintenance effort low: I do this mostly by applying the following mechanisms that aim at making it simpler to adjust the test suite in case of changes to the SUT:

    • First, if the goal of a specific test case can be reached by an implementation agnostic test case and an implementation dependent test case, prefer the implementation agnostic test case. In other words, don't make individual test cases unnecessarily implementation dependent.
    • Second, hide implementation details behind helper functions. There can be helper functions for specific setups, teardowns, assertions etc. This is an extremely powerful mechanism to limit the effect of implementation details within the test suite.
      October 15, 2021 1:49 PM IST
    0
  • Other answers have shown you how to use JUnit to set up test classes. JUnit is not the only Java test framework. Concentrating on the technical details of using a framework however detracts from the most important concepts that should be guiding your actions, so I will talk about those.
    • Testing (of all kinds of all kinds of things) compares the actual behaviour of something (The System Under Test, SUT) with its expected behaviour.
    • Automated testing can be done using a computer program. Because that comparison is being done by an inflexible and unintelligent computer program, the expected behaviour must be precisely and unambiguously known.
    • What a program or part of a program (a class or method) is expected to do is its specification. Testing software therefore requires that you have a specification for the SUT. This might be an explicit description, or an implicit specification in your head of what is expected.
    • Automated unit testing therefore requires a precise and unambiguous specification of the class or method you are testing.
    • But you needed that specification when you set out to write that code. So part of what testing is about actually begins before you write even one line of the SUT. The testing technique of Test Driven Development (TDD) takes that idea to an extreme, and has you create the unit testing code before you write the code to be tested.
    • Unit testing frameworks test your SUT using assertions. An assertion is a logical expression (an expression with a boolean result type; a predicate) that must be true if the SUT is behaving correctly. The specification must therefore be expressed (or re-expressed) as assertions.
    • A useful technique for expressing a specification as assertions is Design by Contract. These specifications are in terms of postconditions. A postcondition is an assertion about the publicly visible state of the SUT after return from a method or a constructor. Some methods have postconditions that are invariants, which are predicates that are true before and after execution of the method. A class can also be said to have invariants, which are postconditions of every constructor and method of the class, and hence should always be true. Postconditions (And invariants) are expressed only in terms of publicity visible state: public and protected fields, the values returned by returned by public and protected methods (such as getters), and the publicly visible state of objects passed (by reference) to methods.
      October 18, 2021 2:01 PM IST
    0
  • Check that your code is working as expected by creating and running unit tests. It's called unit testing because you break down the functionality of your program into discrete testable behaviors that you can test as individual units. Visual Studio Test Explorer provides a flexible and efficient way to run your unit tests and view their results in Visual Studio. Visual Studio installs the Microsoft unit testing frameworks for managed and native code. Use a unit testing framework to create unit tests, run them, and report the results of these tests. Rerun unit tests when you make changes to test that your code is still working correctly. Visual Studio Enterprise can do this automatically with Live Unit Testing, which detects tests affected by your code changes and runs them in the background as you type.

    Unit testing has the greatest effect on the quality of your code when it's an integral part of your software development workflow. As soon as you write a function or other block of application code, create unit tests that verify the behavior of the code in response to standard, boundary, and incorrect cases of input data, and that check any explicit or implicit assumptions made by the code. With test driven development, you create the unit tests before you write the code, so you use the unit tests as both design documentation and functional specifications.

    Test Explorer can also run third-party and open source unit test frameworks that have implemented Test Explorer add-on interfaces. You can add many of these frameworks through the Visual Studio Extension Manager and the Visual Studio gallery. For more information, see Install third-party unit test frameworks.

      January 7, 2022 12:47 PM IST
    0