Skip to content

Taking snapshots of pandas data frames

pytest-regtest implements snapshot testing for pandas data frames snapshot.check offers options for managing numerical accuracies and implements tailored diagnostics for failing tests.

Write a snapshot test for a data frame

# test_dataframe.py

import pandas as pd
import numpy as np

def test_dataframe(snapshot):
    df = pd.DataFrame(np.eye(3), columns=["a", "b", "c"])
    snapshot.check(df, atol=1e-2, rtol=1e-2)

Run the test

The first execution of this test fails:

$ pytest -v test_dataframe.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item

test_dataframe.py::test_dataframe FAILED                                 [100%]

=================================== FAILURES ===================================
________________________________ test_dataframe ________________________________

snapshot error(s) for test_dataframe.py::test_dataframe:

snapshot not recorded yet:
    > test_dataframe.py +6
    > snapshot.check(df, atol=1e-2, rtol=1e-2)
         a    b    c
    0  1.0  0.0  0.0
    1  0.0  1.0  0.0
    2  0.0  0.0  1.0
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests  : 1
=========================== short test summary info ============================
FAILED test_dataframe.py::test_dataframe
============================== 1 failed in 0.13s ===============================

Reset the snapshot

Let us record the snapshot:

$ pytest -v --regtest-reset test_dataframe.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item

test_dataframe.py::test_dataframe RESET                                  [100%]

---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests  : 0
the following output files have been reset:
  _regtest_outputs/test_dataframe.test_dataframe__0
============================== 1 passed in 0.01s ===============================

Now test test passes:

$ pytest -v test_dataframe.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item

test_dataframe.py::test_dataframe PASSED                                 [100%]

---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests  : 0
============================== 1 passed in 0.01s ===============================

Break the test

We break the test by using the column name d instead of c:

# test_dataframe.py

import pandas as pd
import numpy as np

def test_dataframe(snapshot):
    df = pd.DataFrame(np.eye(3), columns=["a", "b", "d"])
    snapshot.check(df, atol=1e-2, rtol=1e-2)
$ pytest -v test_dataframe.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item

test_dataframe.py::test_dataframe FAILED                                 [100%]

=================================== FAILURES ===================================
________________________________ test_dataframe ________________________________

snapshot error(s) for test_dataframe.py::test_dataframe:

snapshot mismatch:
    > test_dataframe.py +6:
    > snapshot.check(df, atol=1e-2, rtol=1e-2)
    --- current
    +++ expected
    @@ -3,4 +3,4 @@
     ---  ------  --------------  -----  
      0   a       3 non-null      float64
      1   b       3 non-null      float64
    - 2   d       3 non-null      float64
    + 2   c       3 non-null      float64

    --- current
    +++ expected
    @@ -1,4 +1,4 @@
    -     a    b    d
    +     a    b    c
     0  1.0  0.0  0.0
     1  0.0  1.0  0.0
     2  0.0  0.0  1.0
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests  : 1
=========================== short test summary info ============================
FAILED test_dataframe.py::test_dataframe
============================== 1 failed in 0.14s ===============================

Changing the display options

pytest-regtest uses per default the display options from pandas. Adapting print options can lead to more readable diagnostic messages.

In the following example we create a data frame with 61 rows:

# test_datafrae_with_printoptions.py

import pandas as pd
import numpy as np

def test_dataframe(snapshot):
    rows = 61
    matrix = np.arange(rows * 3).reshape(-1, 3) * 1.001
    df = pd.DataFrame(matrix, columns=["a", "b", "c"])

    snapshot.check(df)

The default settings of pandas truncate the display of data frames with more than 60 rows:

$ pytest -v test_datafrae_with_printoptions.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item

test_datafrae_with_printoptions.py::test_dataframe FAILED                [100%]

=================================== FAILURES ===================================
________________________________ test_dataframe ________________________________

snapshot error(s) for test_datafrae_with_printoptions.py::test_dataframe:

snapshot not recorded yet:
    > test_datafrae_with_printoptions.py +9
    > snapshot.check(df)
              a        b        c
    0     0.000    1.001    2.002
    1     3.003    4.004    5.005
    2     6.006    7.007    8.008
    3     9.009   10.010   11.011
    4    12.012   13.013   14.014
    ..      ...      ...      ...
    56  168.168  169.169  170.170
    57  171.171  172.172  173.173
    58  174.174  175.175  176.176
    59  177.177  178.178  179.179
    60  180.180  181.181  182.182

    [61 rows x 3 columns]
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests  : 1
=========================== short test summary info ============================
FAILED test_datafrae_with_printoptions.py::test_dataframe
============================== 1 failed in 0.13s ===============================

To fix this we change the max_rows setting:

# test_datafrae_with_printoptions.py

import pandas as pd
import numpy as np

def test_dataframe(snapshot):
    rows = 61
    matrix = np.arange(rows * 3).reshape(-1, 3) * 1.001
    df = pd.DataFrame(matrix, columns=["a", "b", "c"])

    # 'None" value means "unlimited":
    with pd.option_context("display.max_rows", None):
        snapshot.check(df)

Now we can see all rows:

$ pytest -v test_datafrae_with_printoptions.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item

test_datafrae_with_printoptions.py::test_dataframe FAILED                [100%]

=================================== FAILURES ===================================
________________________________ test_dataframe ________________________________

snapshot error(s) for test_datafrae_with_printoptions.py::test_dataframe:

snapshot not recorded yet:
    > test_datafrae_with_printoptions.py +11
    > snapshot.check(df)
              a        b        c
    0     0.000    1.001    2.002
    1     3.003    4.004    5.005
    2     6.006    7.007    8.008
    3     9.009   10.010   11.011
    4    12.012   13.013   14.014
    5    15.015   16.016   17.017
    6    18.018   19.019   20.020
    7    21.021   22.022   23.023
    8    24.024   25.025   26.026
    9    27.027   28.028   29.029
    10   30.030   31.031   32.032
    11   33.033   34.034   35.035
    12   36.036   37.037   38.038
    13   39.039   40.040   41.041
    14   42.042   43.043   44.044
    15   45.045   46.046   47.047
    16   48.048   49.049   50.050
    17   51.051   52.052   53.053
    18   54.054   55.055   56.056
    19   57.057   58.058   59.059
    20   60.060   61.061   62.062
    21   63.063   64.064   65.065
    22   66.066   67.067   68.068
    23   69.069   70.070   71.071
    24   72.072   73.073   74.074
    25   75.075   76.076   77.077
    26   78.078   79.079   80.080
    27   81.081   82.082   83.083
    28   84.084   85.085   86.086
    29   87.087   88.088   89.089
    30   90.090   91.091   92.092
    31   93.093   94.094   95.095
    32   96.096   97.097   98.098
    33   99.099  100.100  101.101
    34  102.102  103.103  104.104
    35  105.105  106.106  107.107
    36  108.108  109.109  110.110
    37  111.111  112.112  113.113
    38  114.114  115.115  116.116
    39  117.117  118.118  119.119
    40  120.120  121.121  122.122
    41  123.123  124.124  125.125
    42  126.126  127.127  128.128
    43  129.129  130.130  131.131
    44  132.132  133.133  134.134
    45  135.135  136.136  137.137
    46  138.138  139.139  140.140
    47  141.141  142.142  143.143
    48  144.144  145.145  146.146
    49  147.147  148.148  149.149
    50  150.150  151.151  152.152
    51  153.153  154.154  155.155
    52  156.156  157.157  158.158
    53  159.159  160.160  161.161
    54  162.162  163.163  164.164
    55  165.165  166.166  167.167
    56  168.168  169.169  170.170
    57  171.171  172.172  173.173
    58  174.174  175.175  176.176
    59  177.177  178.178  179.179
    60  180.180  181.181  182.182
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests  : 1
=========================== short test summary info ============================
FAILED test_datafrae_with_printoptions.py::test_dataframe
============================== 1 failed in 0.14s ===============================