CDU data science team blog: Patient feedback text mining project update

YiWen Hon

The second phase of the pxtextmining / experiencesdashboard project has just begun, after a year-long hiatus. This NHS England funded project aims to enable NHS Trusts to make better use of their qualitative NHS Friends and Family Test data, through the creation of an open source product featuring text classification and data visualisation. This blog post will outline what we’ve been up to so far and our next steps.

First steps

Picking up someone else’s code isn’t always easy, and it’s taken time for Oluwasegun (working on the R/Shiny-based experiencesdashboard) and myself (working on the Python-based pxtextmining element) to feel confident in making changes to the packages. Much of my early work on pxtextmining has involved package management, such as switching to poetry for dependency management and mkdocs for documentation. These changes should improve the usability and accessibility of the package - for example, the number of dependencies has been reduced from 51 to 18, and we are now using pyproject.toml instead of setup.py/requirements.txt, in compliance with PEP 518.

The second phase of the project also involves a brand new dataset, which is currently being labelled by qualitative researchers from NHS England’s Insight and Feedback Team. This dataset utilises a new framework of labels which is derived from the data; there are more categories/targets than in phase 1, so we already know this will be an interesting challenge. The dataset should also be richer as we have new partner organisations contributing - whilst in phase 1 the data was derived from community and acute trusts, we also have an ambulance trust this time round. Work on this is still underway and the process of manually labelling the data is quite labour-intensive, but we are hoping to have some initial data to work with in the next few weeks.

What happens next?

Some features we’re looking to implement in pxtextmining, when we have the new data, are the implementation of active learning / human in the loop approaches to the machine learning pipeline, and the addition of multilabel classification functionality.

Oluwasegun, will be working on enabling users of experiencesdashboard to edit the data that they have uploaded, and improving the general usability and functionality of the frontend.

We will also improve the way the two packages work alongside each other by implementing an API in pxtextmining which can be called by the Shiny frontend, keeping the two packages separate but interconnected. At the moment, experiencesdashboard installs an R port of the pxtextmining package, so there is some duplication.

Communication around this project will also be improved, as we will be setting up a new documentation site that discusses the project as a whole for a non-technical audience. This project site will link out to the existing documentation pages for each project, which are more technically oriented.

We’ve got lots to do over the next few months! Any comments, questions or suggestions are very welcome.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Hon (2023, Jan. 9). CDU data science team blog: Patient feedback text mining project update. Retrieved from https://cdu-data-science-team.github.io/team-blog/posts/2023-01-09-patient-feedback-project-update/

BibTeX citation

@misc{hon2023patient,
  author = {Hon, YiWen},
  title = {CDU data science team blog: Patient feedback text mining project update},
  url = {https://cdu-data-science-team.github.io/team-blog/posts/2023-01-09-patient-feedback-project-update/},
  year = {2023}
}